WO2018118986A1 - Multi-source modeling for network predictions - Google Patents

Multi-source modeling for network predictions Download PDF

Info

Publication number
WO2018118986A1
WO2018118986A1 PCT/US2017/067414 US2017067414W WO2018118986A1 WO 2018118986 A1 WO2018118986 A1 WO 2018118986A1 US 2017067414 W US2017067414 W US 2017067414W WO 2018118986 A1 WO2018118986 A1 WO 2018118986A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
items
published
model
category
Prior art date
Application number
PCT/US2017/067414
Other languages
French (fr)
Inventor
Austin Avery Booker
Estefan Miquel ORTIZ
Nakul Jeirath
Original Assignee
Estia, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Estia, Inc. filed Critical Estia, Inc.
Publication of WO2018118986A1 publication Critical patent/WO2018118986A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • Implementations of the present disclosure are generally directed using to time series modeling for generating predictions regarding network(s), such as social network(s). More particularly, implementations of the present disclosure are directed to determining a predictive model using time series data regarding network(s) and/or other sources of data, and using the model to predict future trends regarding publications or other aspects of the network(s).
  • implementations of innovative aspects of the subject matter described in this specification can be embodied in methods that include operations of: receiving a first set of data points each indicating a number of items published on a network during a respective time period, wherein the items are associated with a category; determining a model based on the first set of data points and further based on data from one or more data sources, wherein the model describes a time series of the first set of data points; employing the model to determine a predicted number of items, associated with the category, that are published on the network during at least one subsequent time period; and based on the predicted number of items, subsequently publishing within the network information that is associated with the category and that targets a set of users to receive the subsequently published information within the network.
  • the data from the one or more data sources includes one or more of financial data, weather data, environmental data, event data, news data, or demographic data; the data from the one or more data sources includes dynamic data; the data is received from at least two data sources; determining the model includes correlating the data received from the at least two data sources; the operations further include receiving a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; the operations further include determining an updated version of the model based on the first set of data points and the second set of data points, and further based on an updated version of the data from the one or more data sources; the operations further include receiving a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; the operations further include determining an updated version of the model based on at least one difference between the second set of data points and the predicted
  • FIG. 1 Other implementations of any of the above aspects include corresponding systems, apparatus, and computer programs that are configured to perform the actions of the methods, encoded on computer storage devices.
  • the present disclosure also provides a computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein.
  • the present disclosure further provides a system for implementing the methods provided herein.
  • the system includes one or more processors, and a computer-readable storage medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein.
  • Implementations of the present disclosure provide one or more of the following technical advantages and improvements over traditional systems.
  • a particular category e.g., brands, topics, etc.
  • implementations provide influence predictions that enable the generation of a plan (e.g., marketing strategy, etc.) that accurately targets users in a network for dissemination of information regarding the category.
  • a plan e.g., marketing strategy, etc.
  • Such targeting may be more accurate and/or more effective at disseminating the information compared to traditional techniques which may employ an ad hoc, untargeted, unfocused, and/or otherwise more general approach to disseminating information.
  • implementations make more efficient use of processing capacity, storage space, active memory, networking capabilities, and/or other computing resources compared to traditional systems. Moreover, by employing other sources of data to develop a predictive model, implementations provide a model that may more accurately predict influence and/or change of influence within a network.
  • implementations may predict future values (e.g., assuming no strategy changes) and may recommend courses of action to help customers, such as report consumers and/or other entities, achieve their desired goals. For example, incoming data may be monitored and provided as input to the models to predict future trends. Recommendations may be provided to consumers, where such recommendations include actions that may be taken by consumers to modify the predicted future trends and bring them into line with the goals of the consumers.
  • implementations may detect that the received new data deviates (e.g., beyond a predetermine threshold deviation) from the anticipated values, indicating some anomalous behavior in the system and/or warranting further investigation. In such instances, implementations may notify the consumer(s) of the anomalous behavior so they may take appropriate action depending on the nature of the deviation. For example, higher than usual buzz about a company may be driven by some external event, and the company may be notified to take action to capitalize on the higher than usual buzz.
  • the development of predictive models based on different (e.g., social network or other) data sources provides the ability to detect and predict communities not otherwise discoverable using a single independent data source.
  • multi-source analysis described herein may enable implementations to identify connections between social network users who share fitness as a common interest with other users who post pictures of running routes on fitness websites.
  • implementations in accordance with the present disclosure can include any combination of the aspects and features described herein. That is, implementations in accordance with the present disclosure are not limited to the combinations of aspects and features specifically described herein, but also include any other appropriate combinations of the aspects and features provided.
  • FIG. 1 depicts an example system for predicting published items in a network, according to implementations of the present disclosure.
  • FIG. 2 depicts example time series data used to update a predictive model, according to implementations of the present disclosure.
  • FIG. 3 depicts a flow diagram of an example process for predicting published items in a network based on a model of time series data, according to implementations of the present disclosure.
  • FIG. 4 depicts a flow diagram of an example process for updating a predictive model based on time series data, according to implementations of the present disclosure.
  • FIG. 5 depicts a schematic of an example of a report that includes predicted publish item data, according to implementations of the present disclosure.
  • FIG. 6 depicts an example computing system, according to implementations of the present disclosure.
  • Implementations of the present disclosure are directed to systems, devices, methods, and computer-readable media for analyzing time series data describing a number and/or frequency of published items in a network (e.g., a social network), the published items related to a category such as a product, service, brand, company, or other type of category. Based on the analysis of the time series data, implementations may determine a predictive model that may be employed to predict the future number and/or frequency of published items associated with the category.
  • a network e.g., a social network
  • implementations may determine a predictive model that may be employed to predict the future number and/or frequency of published items associated with the category.
  • a platform collects and analyzes a number (e.g., millions, billions) of published data items generated by a (e.g., large) population of users on one or more networks such as social networks.
  • the platform may track the network communication and re- communication of the published items among the large population of users connected over the network(s) and analyze the items to determine any suitable number of categories or subcategories associated with the items.
  • the platform may organize the data into a time series which includes any suitable number of data points, each data point indicating a number of items that are associated with a category and that were published during a respective time period on the network(s).
  • the platform may analyze the time series data to develop a model that characterizes the time series.
  • the model may be used to predict subsequent numbers of published items associated with the category. Predictions may be provided (e.g., as reports) to entities such as advertisers, marketers, product managers, and so forth.
  • the model of influence for a particular category may be further based on additional sources of data.
  • additional data sources may include, but are not limited to, one or more of the following:
  • Weather data describing the past, current, and/or future weather conditions at one or more locations, such as temperature, air pressure, wind speed and/or direction, precipitation, severe weather events, a change or trend in any of these whether characteristics, and so forth;
  • Event data describing past, current, and/or future scheduled events such as festivals, concerts, rallies, speeches, fairs, holiday events, political events (e.g., elections), film premieres, television premieres, music album launches, sporting events, and so forth;
  • News data describing past and/or current (e.g., unscheduled) news events such as crimes, unscheduled demonstrations, disasters (natural or otherwise), traffic conditions, celebrity activities or life events (e.g., marriage, death, etc.), outcomes of sporting events, and so forth; and/or
  • Demographic data e.g., census data
  • future demographic characteristics of population(s) in particular locations or areas such as the number and/or percentage of people in various age ranges, gender categories, employment/occupation areas, education status (e.g., student, non-student, college graduate, etc.), income information, language groups, and so forth.
  • Other data sources may also include, but are not limited to, one or more of the following: e-commerce (e.g., online shopping) data, and/or other commercial data, related to prices or inventory of items over time; data from ride sharing services; data from bike sharing services; and/or data from social fitness tracking, such as common running routes, number of interactions for a given workout, shared diet information and recipes, and so forth.
  • e-commerce e.g., online shopping
  • other commercial data related to prices or inventory of items over time
  • ride sharing services e.g., ride sharing services
  • bike sharing services e.g., bike sharing services
  • social fitness tracking such as common running routes, number of interactions for a given workout, shared diet information and recipes, and so forth.
  • the other data sources may include dynamic data that may be changeable over a particular time scale (e.g., per second, minute, hour, day, week, etc.).
  • the other data source(s) may provide a real time feed and/or stream of data that is used, e.g., by a modeling engine, to adapt the model in real time with respect to the changing data.
  • the other data sources may include static data that is unchanging or that may change more slowly over a longer time scale compared dynamic data (e.g., changing from year to year instead of hourly).
  • the other data source(s) may include publicly available data sources, such as public financial data source(s), weather service(s), event data calendar(s), news feed(s), and so forth.
  • multiple data sources may be employed to develop the predictive model, in addition to the time series data measuring influence in network(s).
  • the social media influence data e.g., in time series
  • financials from given company, global market data and/or other types of data may be employed in conjunction with financials from given company, global market data and/or other types of data to perform a multi-source integration and develop a model for category-based (e.g., brand) analytics and/or prediction.
  • a model may be developed that enables prediction network influence (e.g., buzz) and/or sales of a particular brand of boots, based on correlation between multiple data types such as weather data, event data, demographic data, and so forth.
  • the model may predict an increase (e.g., spike) in buzz and/or sales for a brand of boots, where the increase is correlated with an event such as a music festival in combination with high rainfall in the area of the festival, also in combination with demographic data showing a high concentration of people of an age likely to attend the festival.
  • the model may predict an increase in buzz and/or sales that lags the combination of data that is correlated with the increase.
  • the increase in buzz and/or sales of boots may begin to appear in the network influence data and/or sales numbers at least two days following the occurrence of an outdoor music festival that correlated with heavy rainfall.
  • Correlating may include identifying a relationship between changes in different types of data. Such correlation may be a positive correlation in which one type of data exhibits an increase during a time period when another type of data exhibits an increase. Correlation may be a negative correlation in which one type of data exhibits an increase during a time period when another type of data exhibits a decrease.
  • the model may make predictions regarding particular locales (e.g., cities, counties, countries, regions, etc.) and/or predictions with a global scope. For example, implementations may enable predictions to be made on a global scale, by examining global market data and analyzing similarities and/or disparities between different areas. For example, a trend predicted or observed for the United States may be extrapolated to predict a similar trend to occur in the United Kingdom based on similar geographic and/or demographic characteristics, and/or other factors.
  • locales e.g., cities, counties, countries, regions, etc.
  • implementations may enable predictions to be made on a global scale, by examining global market data and analyzing similarities and/or disparities between different areas. For example, a trend predicted or observed for the United States may be extrapolated to predict a similar trend to occur in the United Kingdom based on similar geographic and/or demographic characteristics, and/or other factors.
  • the model may enable predictions to be made regarding the level of influence of a category (e.g., brand) in one or more (e.g., social) network(s).
  • the model may also enable predictions to be made regarding the level of sales of a particular category.
  • the prediction(s) may be provided in report(s) to various report consumer(s).
  • the prediction(s) may enable a marketing team to plan a future marketing strategy to market boots targeting a region where an outdoor festival is scheduled during a rainy time of year, and the marketing campaign may be scheduled during a period of time prior to the festival and/or expected rain.
  • the campaign may be targeted to users within a demographic group (e.g., age group) that is likely to attend, or that is otherwise correlated with, the particular festival.
  • a demographic group e.g., age group
  • the report(s) may enable report consumer(s) to determine a type of campaign, a channel to use for the campaign (e.g., social network(s) or other channels), the demographic to target, the time frame of the campaign, the duration of the campaign, and so forth.
  • a channel to use for the campaign e.g., social network(s) or other channels
  • the report(s) generated based on the model may include predictions regarding future influence and/or sales.
  • the report(s) may also include current information to provide report consumer(s) with insight into the current state of influence and/or sales regarding a particular category.
  • a category may be a brand of product or service, a type of product or service, or some other topic of information. Categories may be arranged into a multi-level hierarchy with different levels of specificity. For example, a category of the brand PorscheTM may be a subcategory (e.g., more specific subcategory) of the category sports cars, which itself may be a subcategory of cars, which may be a subcategory of vehicles, which may be a subcategory of consumer goods, and so forth.
  • a particular category may have any number of parent categories (e.g., less specific categories) as well as any number of child categories (e.g., more specific categories).
  • a child category may be a more specific category than a parent category.
  • a category may also be related to other categories through other types of relationships.
  • a time prediction model may be used to predict the future propagation, buzz, and/or influence that a particular category may exhibit on a network.
  • the model may also be employed to predict particular user(s) who are likely to be influencers in the future, based on the propagation of their published items (e.g., through reposts, retweets, comments, etc.) on the network.
  • implementations provide a model that may be used to predict the future growth and/or success of a brand of product or other category.
  • the model may also be used to predict the degradation or decay in the amount of influence exhibited by a particular category in the network, e.g., as the number of published items associated with the category declines over time.
  • the predictions regarding the growth, decay, and/or lack of change in the number (or frequency) of published over time may be used to make recommendations for how an organization may expend marketing and/or advertising efforts. For example, if the model is used to predict that the influence (e.g., popularity or buzz) around a product is likely to fall off in the future, an organization may use this prediction to target marketing efforts toward increasing the influence of the product, raising awareness of the product, evangelizing the sale or use of the product, maintaining the influence of the product, and so forth. [0037] Once a trend is identified using the model, the time series data may be further analyzed to determine the audience of users who published items regarding the category and/or viewed published items regarding the category.
  • the influence e.g., popularity or buzz
  • the data may also be analyzed to determine influencers, tastemakers, thought leaders, and/or other users who may be influential with regard to a particular category, because their published items tend to be republished (e.g., retweeted, reposted, etc.) by other users.
  • the data may also be analyzed to determine a geographic span of the influence of a category. For example, whether a product has particular buzz in a more specific region (e.g., in a particular city) or in a broader region (e.g., across one or more countries).
  • the model may be developed based on a time series for published items that originate with users in a particular location.
  • a location may be a geographic location of any suitable specificity, such as a street, city block, neighborhood, borough, district, city, county, province, state, prefecture, region, nation, and so forth. Accordingly, the model may be used to predict the future level of influence that may be exhibited by a category in the particular location. Models may be used to compare the changes and/or trends in influence exhibited by a brand in different locations, such as the trend in influence for a particular brand of shoe in the United States compared to the trend in influence of the brand in Japan.
  • a static approach may be employed for the predictive model.
  • the model may be developed based on a time series of historic published item counts over time.
  • the model may be used to predict, with a determined level of confidence indicated by the model, a next set of published item counts over a subsequent period of time (e.g., the next five to six weeks).
  • the model may not be updated based on incoming data that is received after the predictions are made.
  • an adaptive filter approach may be employed to develop and update the predictive model.
  • the model may be initially developed based on a time series of data, and the model may be dynamically updated as new data is received.
  • the model may be used to predict influence across any suitable time period in the future (e.g., five or six weeks from the time of the data used to develop the model), and the model itself may change over time as it is updated based on newly received time series data.
  • Implementations may update the model with any suitable frequency (e.g., nightly, weekly, etc.).
  • the model may be dynamically updated in response to receiving new time series data regarding the number and/or frequency of published items in one or more networks.
  • regression may be employed to update the model based on newly received time series data. Predictions may be made based on a previous model, and the predictions may be compared to newly received time series data indicating the number of published items on a network. The predicted number of published items for various time periods may be compared to the actual, measured number of published items for those time periods. Based on the actual measured counts, and/or based on the difference between predicted and actual counts, the predictive model may be updated, e.g., the coefficients of the model may be updated.
  • spike events, outliers, and/or other anomalous data may be detected for further investigation and/or filtered out of the time series data that is used to generate the model.
  • noise filtering may be performed on the time series to eliminate at least some of the random fluctuations between time periods that may be present, and that may cloud the underlying signal of interest. For example, such fluctuations may be high frequency noise, and any suitable low pass filtering technique may be applied.
  • the time series data may be filtered to remove seasonal fluctuations. For example, retail brands may generate more buzz during holiday shopping seasons than during the rest of the year. As another example, local brands may have more active users during normal waking hours at that locality and outside of working hours. The time series data may be filtered or otherwise modified to account for such expected variations, prior to generating the model(s).
  • the predictive model may be employed to make predictions regarding future numbers of published items to be published in a network.
  • the prediction may be specific to a particular location of the users publishing the items in a network and/or specific to a particular category for the topic of the published items.
  • the prediction(s) may be provided in reports to various entities such as marketers, advertisers, retailers, product manufacturers or sellers, and so forth, who may use the prediction(s) to determine a marketing strategy to be implemented on the network(s). For example, in some instances a particular type of event (e.g., a concert in a particular location) is followed by a pattern of buzz on a network (e.g., a spike in discuss of the particular acts that performed at the concert).
  • a particular type of event e.g., a concert in a particular location
  • a pattern of buzz on a network e.g., a spike in discuss of the particular acts that performed at the concert.
  • the pattern may increase dramatically after the event, and fall off gradually as time passes and interest wanes. Based on such a predicted pattern, a marketer or other influencer may attempt to extend by buzz by implementing a particular targeted marketing campaign. Such a campaign may lead to a longer tail for the influence of a particular topic, leading to enhanced interest, more sales, and so forth.
  • Implementations may search and mine data from a network, such as published item data, in real time as the item(s) are published and/or become available.
  • real-time data extraction and analysis modules e.g., such as the data collection module(s), time series generation module(s), and/or modeling engine described below
  • adaptive filters may be employed to perform such analysis, given their suitability for noise cancellation, target data identification, and/or other aspects.
  • Implementations employ adaptive filters in conjunction with the feeding back of the model error to improve the predictive model, as described further below.
  • Adaptive filters may be applied to model the time series data that is gathered from user and concept information published on a network.
  • the adaptive filters may be applied to achieve various objectives.
  • a first objective may be to model the time lagged behavior of the network time series data. This type of modeling may serve as an explanatory model to understand the extent to which the previous data influences a current social network system.
  • a second objective may be to detect anomalous events that correspond to an increase in published items and/or discussions regarding a particular category, user, brand, concept, and/or other topic.
  • a third objective may be to make predictions based on a particular time series.
  • Adaptive filters have the capability to make such a prediction, wait for the actual occurrence of data, and update the predictive model coefficients, e.g., in real time, based on the error between the predicted value and the actual value.
  • Implementations may use various techniques for determining and/or updating the model, including but not limited to a Least Means Squares (LMS) algorithm, a Normalized Least Means Squares (NLMS) algorithm, a Recursive Least Squares algorithm (RLS), and so forth. Implementations may also employ any suitable type of non-linear adaptive filter, including but not limited to kernel LMS, kernel RLS, and/or others.
  • LMS Least Means Squares
  • NLMS Normalized Least Means Squares
  • RLS Recursive Least Squares algorithm
  • Implementations may also employ any suitable type of non-linear adaptive filter, including but not limited to kernel LMS, kernel RLS, and/or others.
  • the error in the prediction made by a previous version of the model may be fed back into itself to update the model, with the goal of reducing the occurrence of the error in the future.
  • the predictive model may be revised in real time based on differences between predicted values (e.g., published item counts) and the corresponding actual measured values.
  • multiple models may be employed to make predictions based on the same time series data, and the different models may be trained or otherwise developed using different techniques. Accordingly, the different models may output somewhat different results, even based on the same or similar input data. For example, three different neural network-based models may be developed to make predictions regarding the time series data for published items in network(s). In some implementations, the output predictions of the multiple models may be averaged, differently weighted, and/or otherwise combined to determine an overall result. The weighting of the various models relative to one another may be adjusted based on regression techniques to refine the overall combined model based on new data.
  • FIG. 1 depicts an example system for predicting published items in a network, according to implementations of the present disclosure.
  • the environment may include one or more networks 102.
  • the network may include any number of nodes 104 that are able to communicate with one another through the network 102.
  • a node 104 may be a user of the network 102.
  • a network 102 may include any type of network in which user(s) may publish item(s) to be viewed by other user(s).
  • the published item(s) may be republished by the user(s) on the network, and/or published to other network(s).
  • a network 102 may be a social network in which users communicate with other users via published items.
  • a network 102 may include users who have registered with the network 102, such that the users have accounts, profiles, or other forms of presence in the network 102. Examples of a network 102 may include FacebookTM, TwitterTM, InstagramTM, PinterestTM, WeiboTM, WeChatTM, FacebookTM, or others.
  • a network 102 may be public, such that any user may be allowed to publish, view, and republish items.
  • a network 102 may be, to some extent, private, such that a subset of the general public is allowed to publish, view, and republish items.
  • a user may publish item(s) 106 that may be viewable and/or republishable by other user(s) in the same network 102 and/or other network(s).
  • a network 102 may employ any data suitable format or arrangement of data for published items 106, and published items 106 may be communicated within the network 102 using any suitable communication protocol.
  • a published item may include one or more types of data, including but not limited to text data, graphics, images, videos, audio data, and so forth.
  • the publishing user may be associated with a set of followers, e.g., other user(s) in the network 102.
  • a follower of a publishing user may include a user who has indicated a desired to view published item(s) 106 of the publishing user 104.
  • a follower may edit their user profile or account information to follow the publishing user, and subsequently the follower may receive notifications indicating when the publishing user publishes an item 106.
  • a follower may be variously described in different social networks as a follower, a friend, a contact, a link, a fan, and so forth.
  • the followers of the publishing user 104 may also republish the original published item(s) 106 of the publishing user.
  • Republication may include, but is not limited to, sharing, reposting, retweeting, or commenting on the published item 106, such that the published item 106 may then be viewed by other users.
  • Republication may include republication of the published item 106 in its entirety, or republication of any portion of the published item 106 (e.g., as an excerpt).
  • a follower of the publishing user may republish an item 106 such that the item 106 is viewable by other users who are followers of the republishing user.
  • Any number of those followers may then republish the item 106 to be viewable by other, who may themselves republish the item 106, and so on to any number of republication levels.
  • a published item 106 may propagate through a network 102.
  • Each set of republications by one or more republishing users may be described as a ripple of the published item 106 as it propagates within the network 102.
  • examples herein may describe users viewing an item that is published in a network 102, implementations are not limited to item(s) 106 that are visually presented to users.
  • An item 106 may also be presented, at least in part, as audio data, haptic data (e.g., vibrations or other movements of a computing device), or via other modes of presentation.
  • the environment may include one or more analysis computing devices 110, which may include any suitable number and type of computing device.
  • the analysis computing device(s) 110 may be described as a platform for measuring influence in the network(s) 102, e.g., in the form of time series data, and for making predictions regarding influence in the network(s) 102.
  • the analysis computing device(s) 110 may execute any suitable number of software module(s), which may be described as an engine for making predictions.
  • the analysis computing device(s) 110 may execute one or more data collection module(s) 108 which collect information regarding one or more network(s) 102.
  • the data collection module(s) 108 may retrieve and store one or more published item(s) 106 published on the network(s) 102.
  • the data collection module(s) 108 may also retrieve metadata describing the published item(s) 106, including but not limited to a timestamp (e.g., date and/or time) of publication, the publishing user, a subject line, title, or summary of the item 106 as published, a category of the item 106, and/or other metadata such as tags, hashtags, and so forth.
  • a timestamp e.g., date and/or time
  • the data collection module(s) 108 may also retrieve and store other information available in the network(s) 102, such as demographic information regarding the users of the network(s) 102, such as the user(s) who publish item(s) 106.
  • Demographic information may include various user characteristics, including but is not limited to one or more of the following: user location (e.g., to any degree of specificity), age, gender, ethnic identification, spoken language(s), profession, hobbies, interests, income level, purchase history, group affiliation(s), education level, or other characteristics.
  • the published item(s) 106 and/or other data regarding the network(s) 102 may be accessed and analyzed by one or more time series generation modules 112 executing on the analysis computing device(s) 110.
  • the time series generation module(s) 112 may analyze the published items 106 and generate time series data 114.
  • the time series data 114 may include a series of data points or data elements which each include a date and/or time indicator and a number of published items associated with that date and/or time indicator.
  • a time series data point may indicate a number of items 106 published during a particular time period, such as over the course of a particular hour or day.
  • the time series data may be specific to a particular category, such as a particular product.
  • the time series data may track the level of influence exhibited by a particular product or other topic within a network over time.
  • the time series data may be specific to a particular location, such as a location of the users who publish items regarding a particular product. Accordingly, the time series data may track the level of influence exhibited by a particular product within the particular location, within a network over time. Time series data may describe a number of published items generally, or the number of published items for a particular category, location, and/or other user characteristic.
  • the time series data 114 may be received by a modeling engine 116 executing on the analysis device(s) 110.
  • the modeling engine 116 may generate a predictive model 118 based on the time series data 114.
  • the modeling engine 116 may also access data from one or more other data sources 124 such as those described above.
  • the data from the other data source(s) 124 may be employed (e.g., correlated with the time series data 114) to generate and/or modify the predictive model 118.
  • the other data source(s) 124 may be present on the analysis device(s) 110, as shown in the example of FIG. 1.
  • one or more other data sources 124 may be external to the analysis device(s) 110, and the data from the other data source(s) 124 may be provided to the modeling engine 116 over one or more networks.
  • the predictive model 1 18 may be employed to predict the number of published items that may be published, e.g., for a particular category, location, and/or other demographic characteristic, during one or more time periods in the future. Such predictions may be included in report(s) 120 that are generated on the analysis device(s) 110 and provided to one or more report consumers 122.
  • the report consumer(s) 122 may include such entities as marketers, advertisers, brand managers, and so forth.
  • the report(s) 120 may be used by such entities to make decisions regarding brands, products, services, campaigns, and so forth.
  • Employing the predictions made based on time series data may enable marketers, advertisers, or others to create targeted, category-specific campaigns that are more effective at spreading information than traditional campaigns which may indiscriminately broadcast information within a network 102, leading to potential higher return on investment for marketing or advertising expenditures.
  • the model 118 may be generated and/or refined through the use of one or more machine learning (ML) techniques. Implementations may employ any suitable ML techniques, such as supervised and/or unsupervised techniques.
  • the model 118 may be trained or otherwise developed using time series data 114 that is measured within a network 102. The model 118 may be used to make predictions regarding future numbers of items published in the network 102, and those predictions may be compared to the actual numbers of published items over a corresponding time period. The result of the comparison may be employed to further train and/or update the model 118 to more accurately reflect the actual results. For example, if an initial set of predictions varies from the actual number of published items, the actual number, and/or the difference between the predictions and the actual number, may be employed as training data to further refine the model 118.
  • ML machine learning
  • the training techniques used to develop and/or update the model(s) may be dependent on which particular signal is being investigated.
  • Developing, updating, and/or using the model(s) may include one or more of the following operations: 1) Collecting the data and formatting and/or conditioning the data for analysis; 2) Plotting and identifying signal characteristics; 3) Proposing a (e.g., broad level) model and attempting to fit the data to it; 4) Using the model along with original data to find residuals (e.g., errors of the model); 5) Analyzing the residuals to determine if the model is to be refined, and if so proposing a refined model. If the model is not to be refined, the current model may be used until a determination is made that refinement is needed.
  • FIG. 2 depicts example time series data 114 used to generate and/or update a predictive model 118, according to implementations of the present disclosure.
  • a first set of time series data 114(1) may be employed to determine a first version of a model 118(1).
  • the model 118(1) may reflect the patterns, trends, and/or other characteristics of the time series data 114(1).
  • a time series may include any suitable number of ordered pairs, in which each order pairs includes a date and/or time indicator and an associated value.
  • An ordered pair may be associate a particular date/time range (e.g., from a first date/time to a second date/time) with a value that the number of items that are published on a network during that range.
  • a time series may include a series of data points each indicating a number of published items for a particular category during a particular time period (e.g., a day, an hour, a 12-hour period, a week, etc.)
  • the first version of the model 118(1) may be used to generate one or more predictions 202.
  • a prediction 202 may be a prediction that a particular number of items (e.g., for a particular category, location, and/or other characteristic(s)), or range of number of items, will be published during a future time period.
  • the prediction(s) 202 may include predictions regarding any suitable number of future time periods.
  • the model 118(1) may be used to predict that 100 items will be published on network ABC during a future time period that is subsequent to the time period(s) of the time series data 114(1) used to generate and/or update the model 118(1).
  • actual measured item count data may be collected for those time periods for which the predictions 202 were made.
  • the actual data is included in time series data 114(2).
  • the time series data 114(2) may be used, along with the prediction(s) 202, to update the model 118(1) and determine an updated model 118(2).
  • the updated model 118(2) may then be used to make subsequent prediction(s) 202.
  • the error in the prediction(s) 202 (e.g., as indicated by the actual data) may be used to refine the model 118(2) and reduce the incidence of such error in future prediction(s).
  • data from the other data source(s) 124 may be used to generate the model 118(1). Such data may be used in addition to the time series data 114(1) to generate the model 118(1).
  • the data from the other data source(s) 124 may also be used to update and/or refine the model 118(1) to generate the model 118(2).
  • a newer version of dynamic and/or frequently changing data from the other data source(s) 124 such as weather data, news data, real time financial information, and so forth, may be used to modify the model 118 in real time with respect to receiving the updated data from the other data source(s) 124.
  • FIG. 3 depicts a flow diagram of an example process for predicting published items 106 in a network 102 based on a model 118 of time series data, according to implementations of the present disclosure. Operations of the process may be performed by one or more of the data collection module(s) 108, the time series generation module(s) 112, the modeling engine 116, the model 118, and/or other software module(s) executing on the analysis device(s) 110 or elsewhere.
  • Network data may be received (302) indicating the item(s) 106 published on a network 102 during one or more time periods.
  • Published item(s) 106 may include, but are not limited to, tweets, social network posts, comments on posts or other published items, retweets, reposts, articles, and so forth.
  • At least one category may be determined (304) that is associated with the item(s) 106.
  • the item 106 may be analyzed and term(s) present in the item 106 may be compared to a list of terms corresponding to a category, for each of one or more categories.
  • a term may include any amount of data.
  • a term may be a single word or sequence of characters.
  • a term may also include multiple words, such as a phrase or multi-word term.
  • the data in an item 106 may be preprocessed to determine the terms that are present in the item 106.
  • the item 106 may be parsed based on separator characters such as white space (e.g., spaces, new lines, carriage returns), punctuation characters, or other separators.
  • the item 106 may be processed using speech-to-text (STT) conversion method(s) to generate text data based on audio input data, prior to calculating the similarity.
  • STT speech-to-text
  • a determination may be made of a degree of similarity between the terms in an item and the list of terms corresponding to a category. If a calculated similarity meets or exceeds a predetermined threshold level of similarity, the item 106 may be associated with the category.
  • This process may be repeated for a particular item 106 with respect to any number of categories, and the process may be repeated for any number of items 106.
  • a particular item 106 may be associated with any number of categories. For example, an item 106 may be associated with a category of "restaurants" as well as a more specific category of "Japanese restaurants” and/or “teriyaki restaurants.”
  • the analysis may determine one or more categories and/or keywords for the item 106.
  • the analysis may compare words or multi-word terms in the item 106 to a list of terms that are known to relate to a category, such as a library of terms that have been manually curated for each category.
  • the use of the word “Tiffany” in the published item 106 may lead to a determination that the item 106 is in the categories “jewelry” and “Tiffany brand jewelry.” Further use of the words “engagement” and “ring” in the published item 106 may indicate other categories of "ring” and "engagement ring.”
  • the platform determines a probability that the published item 106 corresponds to a category based on a correspondence (e.g., a statistical similarity measure) between terms in the published item 106 and terms known to correspond to the category.
  • the analysis identifies an exact match between terms in the item 106 and terms in the category-specific list to determine similarity.
  • the analysis may employ semantic analysis based on natural language processing (NLP) or other methods to calculate a similarity based on a semantic closeness between the terms of the item 106 and the category-specific list of terms.
  • NLP natural language processing
  • an emoji-based closeness measure may be employed. For example, the use of emojis in published items, their frequency of use, their order of use, and/or other considerations may be employed to determine categorization of content (e.g., demographics-based, sentiment- based, and so forth).
  • a published item 106 may be designated as being within a category if the calculated similarity exceeds a threshold value.
  • a particular published item 106 may be associated with a probability matrix indicating the probabilities that the item 106 corresponds to various categories.
  • Time series data may be determined (306).
  • the time series data may indicate, for each of a plurality of time periods, a number of items that were published on a network during the respective time period.
  • the time series data may be for a particular category of published items.
  • the time series data may indicate, for each of the plurality of time periods, a number of items that are associated with a particular category and that were published on a network during the respective time period.
  • the time series data may be for a location.
  • the time series data may indicate, for each of the plurality of time periods, a number of items that were published by users in a particular location (e.g., city, state, county, country, etc.) and that were published on a network during the respective time period.
  • the time series data may also be associated with other demographic characteristics of the users, such as gender, age range, and so forth.
  • the time series data may indicate, for each of the plurality of time periods, a number of items that were published by users with a particular demographic characteristic and that were published on a network during the respective time period.
  • a model 118 may be determined (308) based on the time series data determined at 306. In instances where the time series data is particular to a category, location, and/or other demographic characteristic, the determined model 118 may also be particular to the corresponding category, location, and/or other demographic characteristic. For example, the model 118 may be used to predict the number of future published items in the particular category, by users in the particular location, and/or by users exhibiting the particular demographic characteristic. As described above, the model 118 may be determined using suitable ML techniques. In some implementations, data from one or more other data sources 124 may be used to determine the model 118.
  • the model 118 may be stored and employed (310) to predict the number of items to be published on the network during at least one subsequent time period, e.g., subsequent to the time periods of the time series data used to determine and/or update the model 118.
  • the model 118 may also be used to predict sales of a particular category, such as a particular product brand.
  • the predictions made using the model 118 may be incorporated into report(s) 120, which may be provided (312) to one or more data consumers 122 as described above.
  • FIG. 4 depicts a flow diagram of an example process for updating a predictive model 118 based on time series data, according to implementations of the present disclosure. Operations of the process may be performed by one or more of the data collection module(s) 108, the time series generation module(s) 112, the modeling engine 116, the model 118, and/or other software module(s) executing on the analysis device(s) 110 or elsewhere.
  • Network data may be received (402) indicating items 106 published on a network 102.
  • a first set of time series data may be determined (404) indicating the number of items published during various time periods in a first time span.
  • a first version of a model may be determined (406) based on the first time series data and/or data from the other data source(s) 124.
  • the first model may be employed (408) to predict the number of items to be published on the network during at least one subsequent time period (e.g., subsequent to the first time span). Operations of 402, 404, 406, and 408 may proceed as described above.
  • updated network data may be received (410) indicating items published on the network during a second time span that includes the at least one subsequent time period for which predictions were made.
  • a second set of time series data may be determined (412) indicating the number of items published during various time periods in the second time span.
  • the first model may be updated (414) to generate a second (e.g., updated, revised) model.
  • the updating may be based on comparing the predicted numbers of items to the second time series data (e.g., the actual numbers of items published in the network) as described above.
  • the second model may be generated based at least partly on (e.g., recent) data from the other data source(s) 124.
  • the second version of the model may be stored and used to make subsequent predictions that may be more accurate than the predictions made using the first version of the model.
  • the updated version of the model may be determined based on both the first set of time series data and the second set of time series data. For example, the entire set of determined time series data may be used to regenerate the second version of the model as if the model was being initially determined, e.g., without regard to the previous version of the model 116.
  • the updated version of the model may be determined based on the difference(s) (e.g., a delta) between the second set of time series data and the predicted number(s) of items made using he first version of the model, e.g., in a regression paradigm as described above.
  • time series data is determined based on the number of published items during various time periods. Implementations also support the use of other time series data developed using other measurements. For example, time series data may be determined based on the number of follower counts over time, number of likes and/or shares of items over time, and so forth. Implementations may apply the analysis described herein to generate predictions based on these other types of time series data, and/or any other suitable type of time series data.
  • FIG. 5 depicts a schematic of an example of a report 120 that includes predicted publish item data, according to implementations of the present disclosure.
  • the predictions 202 are made regarding the number of published items that may be published during subsequent time periods, and that are associated with a particular category (e.g., "Epiffany Wedding Rings"), and that are published by users in a particular location (e.g., the Shanghai region of China).
  • the example category is a subcategory within a hierarchy of categories, e.g., a particular brand of wedding rings, which is a subcategory of rings, which is a subcategory of jewelry.
  • time series analysis descriptions may describe the use of time series data based on published item counts over time, implementations are not limited to this particular type of time series data. In some implementations, multiple sets of time series data, of differing types, may be used to develop models and make predictions. Such time series data may also include data not directly related to social network data. For example, time series data such as current news events, customer marketing campaigns, weather data, and so forth may be input to a model to predict a level of buzz for a brand or product.
  • implementations are not limited to using the example categorization methodology described herein for grouping similar text and/or audio data. Some implementations may also employ emoji-based similarity determination as part of the analysis, as described above. In some implementations, image and/or video data analysis may also be used for categorization and/or other aspects.
  • FIG. 6 depicts an example computing system, according to implementations of the present disclosure.
  • the system 600 may be used for any of the operations described with respect to the various implementations discussed herein.
  • the system 600 may be included, at least in part, in the analysis computing device(s) 110 described herein, or in computing device(s) operated by one or more of the user(s) 104, the user(s) 108, or the metrics consumer(s) 122.
  • the system 600 may include one or more processors 610, a memory 620, one or more storage devices 630, and one or more input/output (I/O) devices 650 controllable via one or more I/O interfaces 640.
  • the various components 610, 620, 630, 640, or 650 may be interconnected via at least one system bus 660, which may enable the transfer of data between the various modules and components of the system 600.
  • the processor(s) 610 may be configured to process instructions for execution within the system 600.
  • the processor(s) 610 may include single-threaded processor(s), multi -threaded processor(s), or both.
  • the processor(s) 610 may be configured to process instructions stored in the memory 620 or on the storage device(s) 630.
  • the processor(s) 610 may execute instructions for the various software module(s) described herein.
  • the processor(s) 610 may include hardware-based processor(s) each including one or more cores.
  • the processor(s) 610 may include general purpose processor(s), special purpose processor(s), or both.
  • the memory 620 may store information within the system 600. In some implementations, the memory 620 includes one or more computer-readable media.
  • the memory 620 may include any number of volatile memory units, any number of non-volatile memory units, or both volatile and non-volatile memory units.
  • the memory 620 may include read-only memory, random access memory, or both. In some examples, the memory 620 may be employed as active or physical memory by one or more executing software modules.
  • the storage device(s) 630 may be configured to provide (e.g., persistent) mass storage for the system 600.
  • the storage device(s) 630 may include one or more computer-readable media.
  • the storage device(s) 630 may include a floppy disk device, a hard disk device, an optical disk device, or a tape device.
  • the storage device(s) 630 may include read-only memory, random access memory, or both.
  • the storage device(s) 630 may include one or more of an internal hard drive, an external hard drive, or a removable drive.
  • One or both of the memory 620 or the storage device(s) 630 may include one or more computer-readable storage media (CRSM).
  • the CRSM may include one or more of an electronic storage medium, a magnetic storage medium, an optical storage medium, a magneto- optical storage medium, a quantum storage medium, a mechanical computer storage medium, and so forth.
  • the CRSM may provide storage of computer-readable instructions describing data structures, processes, applications, programs, other modules, or other data for the operation of the system 600.
  • the CRSM may include a data store that provides storage of computer-readable instructions or other information in a non-transitory format.
  • the CRSM may be incorporated into the system 600 or may be external with respect to the system 600.
  • the CRSM may include read-only memory, random access memory, or both.
  • One or more CRSM suitable for tangibly embodying computer program instructions and data may include any type of non-volatile memory, including but not limited to: semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • the processor(s) 610 and the memory 620 may be supplemented by, or incorporated into, one or more application-specific integrated circuits (ASICs).
  • ASICs application-specific integrated circuits
  • the system 600 may include one or more I/O devices 650.
  • the I/O device(s) 650 may include one or more input devices such as a keyboard, a mouse, a pen, a game controller, a touch input device, an audio input device (e.g., a microphone), a gestural input device, a haptic input device, an image or video capture device (e.g., a camera), or other devices.
  • the I/O device(s) 650 may also include one or more output devices such as a display, LED(s), an audio output device (e.g., a speaker), a printer, a haptic output device, and so forth.
  • the I/O device(s) 650 may be physically incorporated in one or more computing devices of the system 600, or may be external with respect to one or more computing devices of the system 600.
  • the system 600 may include one or more I/O interfaces 640 to enable components or modules of the system 600 to control, interface with, or otherwise communicate with the I/O device(s) 650.
  • the I/O interface(s) 640 may enable information to be transferred in or out of the system 600, or between components of the system 600, through serial communication, parallel communication, or other types of communication.
  • the I/O interface(s) 640 may comply with a version of the RS-232 standard for serial ports, or with a version of the IEEE 1284 standard for parallel ports.
  • the I/O interface(s) 640 may be configured to provide a connection over Universal Serial Bus (USB) or Ethernet.
  • USB Universal Serial Bus
  • the I/O interface(s) 640 may be configured to provide a serial connection that is compliant with a version of the IEEE 1394 standard.
  • the I/O interface(s) 640 may also include one or more network interfaces that enable communications between computing devices in the system 600, or between the system 600 and other network-connected computing systems.
  • the network interface(s) may include one or more network interface controllers (NICs) or other types of transceiver devices configured to send and receive communications over one or more communication networks using any network protocol.
  • NICs network interface controllers
  • Computing devices of the system 600 may communicate with one another, or with other computing devices, using one or more communication networks.
  • Such communication networks may include public networks such as the internet, private networks such as an institutional or personal intranet, or any combination of private and public networks.
  • the communication networks may include any type of wired or wireless network, including but not limited to local area networks (LANs), wide area networks (WANs), wireless WANs (WWANs), wireless LANs (WLANs), mobile communications networks (e.g., 3G, 4G, Edge, etc.), and so forth.
  • LANs local area networks
  • WANs wide area networks
  • WWANs wireless WANs
  • WLANs wireless LANs
  • mobile communications networks e.g., 3G, 4G, Edge, etc.
  • the communications between computing devices may be encrypted or otherwise secured.
  • communications may employ one or more public or private cryptographic keys, ciphers, digital certificates, or other credentials supported by a security protocol, such as any version of the Secure Sockets Layer (SSL) or
  • the system 600 may include any number of computing devices of any type.
  • the computing device(s) may include, but are not limited to: a personal computer, a smartphone, a tablet computer, a wearable computer, an implanted computer, a mobile gaming device, an electronic book reader, an automotive computer, a desktop computer, a laptop computer, a notebook computer, a game console, a home entertainment device, a network computer, a server computer, a mainframe computer, a distributed computing device (e.g., a cloud computing device), a microcomputer, a system on a chip (SoC), a system in a package (SiP), and so forth.
  • SoC system on a chip
  • SiP system in a package
  • a computing device may include one or more of a virtual computing environment, a hypervisor, an emulation, or a virtual machine executing on one or more physical computing devices.
  • two or more computing devices may include a cluster, cloud, farm, or other grouping of multiple devices that coordinate operations to provide load balancing, failover support, parallel processing capabilities, shared storage resources, shared networking capabilities, or other aspects.
  • Implementations and all of the functional operations described in this specification may be realized in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations may be realized as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus.
  • the computer readable medium may be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them.
  • the term "computing system” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus may include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus.
  • a computer program (also known as a program, software, software application, script, or code) may be written in any appropriate form of programming language, including compiled or interpreted languages, and it may be deployed in any appropriate form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program does not necessarily correspond to a file in a file system.
  • a program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
  • a computer program may be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification may be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows may also be performed by, and apparatus may also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any appropriate kind of digital computer.
  • a processor may receive instructions and data from a read only memory or a random access memory or both.
  • Elements of a computer can include a processor for performing instructions and one or more memory devices for storing instructions and data.
  • a computer may also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer may be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few.
  • Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
  • the processor and the memory may be supplemented by, or incorporated in, special purpose logic circuitry.
  • implementations may be realized on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user may provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any appropriate form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any appropriate form, including acoustic, speech, or tactile input.
  • Implementations may be realized in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a web browser through which a user may interact with an implementation, or any appropriate combination of one or more such back end, middleware, or front end components.
  • the components of the system may be interconnected by any appropriate form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
  • LAN local area network
  • WAN wide area network
  • the computing system may include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Abstract

Techniques are described for analyzing time series data associated with one or more networks, such as social networks, and based on the analysis determining a one or more predictive models. The time series data describe a number and/or frequency of published items on the network(s), and the predictive model(s) may be employed to predict the future number and/or frequency of published items associated with the category. The time series data may also describe a number and/or frequency of followers of a user in the network(s), republications of published items, and/or other types of data, and the model(s) may be developed that predict the future changes (e.g., trends) of such metrics in the network(s). The model(s) for a particular category may be further based on additional sources of data, including but not limited to financial data, weather data, environmental data, event data, news data, demographic data, and so forth.

Description

MULTI-SOURCE MODELING FOR NETWORK PREDICTIONS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present disclosure is related to, and claims priority to, U.S. Provisional Patent Application Serial No. 62/436,776, titled "Multi-Source Modeling For Network Predictions," which was filed on December 20, 2016, the entirety of which is incorporated by reference into the present disclosure.
BACKGROUND
[0002] As the amount of information published on networks has increased, organizations have developed various channels that attempt to use information published online to promote brands or other topics. Traditional marketing or advertising techniques have employed a generally unfocused approach in which information is indiscriminately targeted at a large population of individuals. Given their unfocused nature, such efforts may fail to effectively promote a topic (e.g., brand) or reach new audiences, leading to a diminished return on investment in marketing or advertising campaigns.
SUMMARY
[0003] Implementations of the present disclosure are generally directed using to time series modeling for generating predictions regarding network(s), such as social network(s). More particularly, implementations of the present disclosure are directed to determining a predictive model using time series data regarding network(s) and/or other sources of data, and using the model to predict future trends regarding publications or other aspects of the network(s).
[0004] In general, implementations of innovative aspects of the subject matter described in this specification can be embodied in methods that include operations of: receiving a first set of data points each indicating a number of items published on a network during a respective time period, wherein the items are associated with a category; determining a model based on the first set of data points and further based on data from one or more data sources, wherein the model describes a time series of the first set of data points; employing the model to determine a predicted number of items, associated with the category, that are published on the network during at least one subsequent time period; and based on the predicted number of items, subsequently publishing within the network information that is associated with the category and that targets a set of users to receive the subsequently published information within the network.
[0005] These and other implementations can each optionally include one or more of the following innovative aspects: the data from the one or more data sources includes one or more of financial data, weather data, environmental data, event data, news data, or demographic data; the data from the one or more data sources includes dynamic data; the data is received from at least two data sources; determining the model includes correlating the data received from the at least two data sources; the operations further include receiving a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; the operations further include determining an updated version of the model based on the first set of data points and the second set of data points, and further based on an updated version of the data from the one or more data sources; the operations further include receiving a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; the operations further include determining an updated version of the model based on at least one difference between the second set of data points and the predicted number of items, and further based on an updated version of the data from the one or more data sources; the model is determined at least in part through a machine learning algorithm; the network is a social network; the items are published as one or more of a tweet, a post, a share, or a comment; the operations further include determining the category of each of the items, including calculating a similarity between terms in a respective item and a list of terms corresponding to the category, and associating the respective item with the category based on the similarity exceeding a threshold level of similarity; the category is included in a hierarchy of categories with different degrees of specificity; the operations further include determining a location of users who published the items on the network; determining a model is based on the first set of data points for items published by the users in the location; and/or the operations further include transmitting, over one or more networks, at least one report that includes the predicted number of items, associated with the category, that are published on the network during at least one subsequent time period. [0006] Other implementations of any of the above aspects include corresponding systems, apparatus, and computer programs that are configured to perform the actions of the methods, encoded on computer storage devices. The present disclosure also provides a computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein. The present disclosure further provides a system for implementing the methods provided herein. The system includes one or more processors, and a computer-readable storage medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations in accordance with implementations of the methods provided herein.
[0007] Implementations of the present disclosure provide one or more of the following technical advantages and improvements over traditional systems. By employing time series data to predict the influence, and/or change in influence, of a particular category (e.g., brands, topics, etc.) in a network, implementations provide influence predictions that enable the generation of a plan (e.g., marketing strategy, etc.) that accurately targets users in a network for dissemination of information regarding the category. Such targeting may be more accurate and/or more effective at disseminating the information compared to traditional techniques which may employ an ad hoc, untargeted, unfocused, and/or otherwise more general approach to disseminating information. Accordingly, implementations make more efficient use of processing capacity, storage space, active memory, networking capabilities, and/or other computing resources compared to traditional systems. Moreover, by employing other sources of data to develop a predictive model, implementations provide a model that may more accurately predict influence and/or change of influence within a network.
[0008] Moreover, once the time series models have been built, such as the models representing the underlying processes and/or systems, implementations may predict future values (e.g., assuming no strategy changes) and may recommend courses of action to help customers, such as report consumers and/or other entities, achieve their desired goals. For example, incoming data may be monitored and provided as input to the models to predict future trends. Recommendations may be provided to consumers, where such recommendations include actions that may be taken by consumers to modify the predicted future trends and bring them into line with the goals of the consumers.
[0009] In some implementations, once the models are sufficiently accurate in modeling steady-state behavior, on the receipt of new data implementations may detect that the received new data deviates (e.g., beyond a predetermine threshold deviation) from the anticipated values, indicating some anomalous behavior in the system and/or warranting further investigation. In such instances, implementations may notify the consumer(s) of the anomalous behavior so they may take appropriate action depending on the nature of the deviation. For example, higher than usual buzz about a company may be driven by some external event, and the company may be notified to take action to capitalize on the higher than usual buzz.
[0010] Additionally, the development of predictive models based on different (e.g., social network or other) data sources provides the ability to detect and predict communities not otherwise discoverable using a single independent data source. For example, multi-source analysis described herein may enable implementations to identify connections between social network users who share fitness as a common interest with other users who post pictures of running routes on fitness websites.
[0011] It is appreciated that implementations in accordance with the present disclosure can include any combination of the aspects and features described herein. That is, implementations in accordance with the present disclosure are not limited to the combinations of aspects and features specifically described herein, but also include any other appropriate combinations of the aspects and features provided.
[0012] The details of one or more implementations of the present disclosure are set forth in the accompanying drawings and the description below. Other features and advantages of the present disclosure will be apparent from the description and drawings, and from the claims.
BRIEF DESCRIPTION OF DRAWINGS
[0013] FIG. 1 depicts an example system for predicting published items in a network, according to implementations of the present disclosure. [0014] FIG. 2 depicts example time series data used to update a predictive model, according to implementations of the present disclosure.
[0015] FIG. 3 depicts a flow diagram of an example process for predicting published items in a network based on a model of time series data, according to implementations of the present disclosure.
[0016] FIG. 4 depicts a flow diagram of an example process for updating a predictive model based on time series data, according to implementations of the present disclosure.
[0017] FIG. 5 depicts a schematic of an example of a report that includes predicted publish item data, according to implementations of the present disclosure.
[0018] FIG. 6 depicts an example computing system, according to implementations of the present disclosure.
DETAILED DESCRIPTION
[0019] Implementations of the present disclosure are directed to systems, devices, methods, and computer-readable media for analyzing time series data describing a number and/or frequency of published items in a network (e.g., a social network), the published items related to a category such as a product, service, brand, company, or other type of category. Based on the analysis of the time series data, implementations may determine a predictive model that may be employed to predict the future number and/or frequency of published items associated with the category.
[0020] In some implementations, a platform collects and analyzes a number (e.g., millions, billions) of published data items generated by a (e.g., large) population of users on one or more networks such as social networks. The platform may track the network communication and re- communication of the published items among the large population of users connected over the network(s) and analyze the items to determine any suitable number of categories or subcategories associated with the items. The platform may organize the data into a time series which includes any suitable number of data points, each data point indicating a number of items that are associated with a category and that were published during a respective time period on the network(s). For example, during a first time slice there may be 50 instances of published items related to a particular category, during a second time slice there may be 80 instances of published items related to the category, and so forth. The platform may analyze the time series data to develop a model that characterizes the time series. The model may be used to predict subsequent numbers of published items associated with the category. Predictions may be provided (e.g., as reports) to entities such as advertisers, marketers, product managers, and so forth.
[0021] In some implementations, the model of influence for a particular category may be further based on additional sources of data. Such other data sources may include, but are not limited to, one or more of the following:
[0022] Financial data regarding one or more businesses, such as past, current, and/or future earnings, revenues, expenses, stock prices, and so forth;
[0023] Weather data describing the past, current, and/or future weather conditions at one or more locations, such as temperature, air pressure, wind speed and/or direction, precipitation, severe weather events, a change or trend in any of these whether characteristics, and so forth;
[0024] Other environmental data describing the past, current, and/or future environment conditions and/or events, such as seismic activity, wildfires, disease outbreaks, military actions or other types of violence, and so forth;
[0025] Event data describing past, current, and/or future scheduled events, such as festivals, concerts, rallies, speeches, fairs, holiday events, political events (e.g., elections), film premieres, television premieres, music album launches, sporting events, and so forth;
[0026] News data describing past and/or current (e.g., unscheduled) news events, such as crimes, unscheduled demonstrations, disasters (natural or otherwise), traffic conditions, celebrity activities or life events (e.g., marriage, death, etc.), outcomes of sporting events, and so forth; and/or
[0027] Demographic data (e.g., census data) describing the past, current, and/or future (e.g., predicted) demographic characteristics of population(s) in particular locations or areas, such as the number and/or percentage of people in various age ranges, gender categories, employment/occupation areas, education status (e.g., student, non-student, college graduate, etc.), income information, language groups, and so forth. [0028] Other data sources may also include, but are not limited to, one or more of the following: e-commerce (e.g., online shopping) data, and/or other commercial data, related to prices or inventory of items over time; data from ride sharing services; data from bike sharing services; and/or data from social fitness tracking, such as common running routes, number of interactions for a given workout, shared diet information and recipes, and so forth.
[0029] Other suitable types of data may also be employed to develop and/or refine the predictive model. In some instances, the other data sources may include dynamic data that may be changeable over a particular time scale (e.g., per second, minute, hour, day, week, etc.). In such instances, the other data source(s) may provide a real time feed and/or stream of data that is used, e.g., by a modeling engine, to adapt the model in real time with respect to the changing data. In some instances, the other data sources may include static data that is unchanging or that may change more slowly over a longer time scale compared dynamic data (e.g., changing from year to year instead of hourly). The other data source(s) may include publicly available data sources, such as public financial data source(s), weather service(s), event data calendar(s), news feed(s), and so forth.
[0030] In some implementations, multiple data sources may be employed to develop the predictive model, in addition to the time series data measuring influence in network(s). The social media influence data (e.g., in time series) may be employed in conjunction with financials from given company, global market data and/or other types of data to perform a multi-source integration and develop a model for category-based (e.g., brand) analytics and/or prediction.
[0031] For example, a model may be developed that enables prediction network influence (e.g., buzz) and/or sales of a particular brand of boots, based on correlation between multiple data types such as weather data, event data, demographic data, and so forth. As a particular example, the model may predict an increase (e.g., spike) in buzz and/or sales for a brand of boots, where the increase is correlated with an event such as a music festival in combination with high rainfall in the area of the festival, also in combination with demographic data showing a high concentration of people of an age likely to attend the festival. In some instances, the model may predict an increase in buzz and/or sales that lags the combination of data that is correlated with the increase. Following the particular example above, the increase in buzz and/or sales of boots may begin to appear in the network influence data and/or sales numbers at least two days following the occurrence of an outdoor music festival that correlated with heavy rainfall. Correlating may include identifying a relationship between changes in different types of data. Such correlation may be a positive correlation in which one type of data exhibits an increase during a time period when another type of data exhibits an increase. Correlation may be a negative correlation in which one type of data exhibits an increase during a time period when another type of data exhibits a decrease.
[0032] The model may make predictions regarding particular locales (e.g., cities, counties, countries, regions, etc.) and/or predictions with a global scope. For example, implementations may enable predictions to be made on a global scale, by examining global market data and analyzing similarities and/or disparities between different areas. For example, a trend predicted or observed for the United States may be extrapolated to predict a similar trend to occur in the United Kingdom based on similar geographic and/or demographic characteristics, and/or other factors.
[0033] The model may enable predictions to be made regarding the level of influence of a category (e.g., brand) in one or more (e.g., social) network(s). The model may also enable predictions to be made regarding the level of sales of a particular category. The prediction(s) may be provided in report(s) to various report consumer(s). For example, the prediction(s) may enable a marketing team to plan a future marketing strategy to market boots targeting a region where an outdoor festival is scheduled during a rainy time of year, and the marketing campaign may be scheduled during a period of time prior to the festival and/or expected rain. The campaign may be targeted to users within a demographic group (e.g., age group) that is likely to attend, or that is otherwise correlated with, the particular festival. The report(s) may enable report consumer(s) to determine a type of campaign, a channel to use for the campaign (e.g., social network(s) or other channels), the demographic to target, the time frame of the campaign, the duration of the campaign, and so forth.
[0034] The report(s) generated based on the model may include predictions regarding future influence and/or sales. The report(s) may also include current information to provide report consumer(s) with insight into the current state of influence and/or sales regarding a particular category. In some instances, the report(s) may include historical information to provide consumer(s) with information regarding past trends and/or levels of influence and/or sales. Implementations provide a model that generates reports to provide a view into the influence and/or sales for a particular category on a global scale and/or local scale, the reports generated based on correlations between different types of data across various geographies, factoring in cultural, demographic, location-based differences, language differences, and so forth.
[0035] A category may be a brand of product or service, a type of product or service, or some other topic of information. Categories may be arranged into a multi-level hierarchy with different levels of specificity. For example, a category of the brand Porsche™ may be a subcategory (e.g., more specific subcategory) of the category sports cars, which itself may be a subcategory of cars, which may be a subcategory of vehicles, which may be a subcategory of consumer goods, and so forth. A particular category may have any number of parent categories (e.g., less specific categories) as well as any number of child categories (e.g., more specific categories). A child category may be a more specific category than a parent category. A category may also be related to other categories through other types of relationships.
[0036] As described herein, a time prediction model may be used to predict the future propagation, buzz, and/or influence that a particular category may exhibit on a network. The model may also be employed to predict particular user(s) who are likely to be influencers in the future, based on the propagation of their published items (e.g., through reposts, retweets, comments, etc.) on the network. Accordingly, implementations provide a model that may be used to predict the future growth and/or success of a brand of product or other category. The model may also be used to predict the degradation or decay in the amount of influence exhibited by a particular category in the network, e.g., as the number of published items associated with the category declines over time. The predictions regarding the growth, decay, and/or lack of change in the number (or frequency) of published over time may be used to make recommendations for how an organization may expend marketing and/or advertising efforts. For example, if the model is used to predict that the influence (e.g., popularity or buzz) around a product is likely to fall off in the future, an organization may use this prediction to target marketing efforts toward increasing the influence of the product, raising awareness of the product, evangelizing the sale or use of the product, maintaining the influence of the product, and so forth. [0037] Once a trend is identified using the model, the time series data may be further analyzed to determine the audience of users who published items regarding the category and/or viewed published items regarding the category. The data may also be analyzed to determine influencers, tastemakers, thought leaders, and/or other users who may be influential with regard to a particular category, because their published items tend to be republished (e.g., retweeted, reposted, etc.) by other users. The data may also be analyzed to determine a geographic span of the influence of a category. For example, whether a product has particular buzz in a more specific region (e.g., in a particular city) or in a broader region (e.g., across one or more countries). In some instances, the model may be developed based on a time series for published items that originate with users in a particular location. A location may be a geographic location of any suitable specificity, such as a street, city block, neighborhood, borough, district, city, county, province, state, prefecture, region, nation, and so forth. Accordingly, the model may be used to predict the future level of influence that may be exhibited by a category in the particular location. Models may be used to compare the changes and/or trends in influence exhibited by a brand in different locations, such as the trend in influence for a particular brand of shoe in the United States compared to the trend in influence of the brand in Japan.
[0038] In some implementations, a static approach may be employed for the predictive model. For example, the model may be developed based on a time series of historic published item counts over time. The model may be used to predict, with a determined level of confidence indicated by the model, a next set of published item counts over a subsequent period of time (e.g., the next five to six weeks). Using a static approach, the model may not be updated based on incoming data that is received after the predictions are made.
[0039] In some implementations, an adaptive filter approach may be employed to develop and update the predictive model. In this approach, the model may be initially developed based on a time series of data, and the model may be dynamically updated as new data is received.
Accordingly, the model may be used to predict influence across any suitable time period in the future (e.g., five or six weeks from the time of the data used to develop the model), and the model itself may change over time as it is updated based on newly received time series data.
Implementations may update the model with any suitable frequency (e.g., nightly, weekly, etc.).
In some instances, the model may be dynamically updated in response to receiving new time series data regarding the number and/or frequency of published items in one or more networks. [0040] In some implementations, regression may be employed to update the model based on newly received time series data. Predictions may be made based on a previous model, and the predictions may be compared to newly received time series data indicating the number of published items on a network. The predicted number of published items for various time periods may be compared to the actual, measured number of published items for those time periods. Based on the actual measured counts, and/or based on the difference between predicted and actual counts, the predictive model may be updated, e.g., the coefficients of the model may be updated.
[0041] In some implementations, spike events, outliers, and/or other anomalous data may be detected for further investigation and/or filtered out of the time series data that is used to generate the model. In some implementations, noise filtering may be performed on the time series to eliminate at least some of the random fluctuations between time periods that may be present, and that may cloud the underlying signal of interest. For example, such fluctuations may be high frequency noise, and any suitable low pass filtering technique may be applied. Moreover, in some implementations the time series data may be filtered to remove seasonal fluctuations. For example, retail brands may generate more buzz during holiday shopping seasons than during the rest of the year. As another example, local brands may have more active users during normal waking hours at that locality and outside of working hours. The time series data may be filtered or otherwise modified to account for such expected variations, prior to generating the model(s).
[0042] The predictive model may be employed to make predictions regarding future numbers of published items to be published in a network. In some instances, the prediction may be specific to a particular location of the users publishing the items in a network and/or specific to a particular category for the topic of the published items. The prediction(s) may be provided in reports to various entities such as marketers, advertisers, retailers, product manufacturers or sellers, and so forth, who may use the prediction(s) to determine a marketing strategy to be implemented on the network(s). For example, in some instances a particular type of event (e.g., a concert in a particular location) is followed by a pattern of buzz on a network (e.g., a spike in discuss of the particular acts that performed at the concert). The pattern may increase dramatically after the event, and fall off gradually as time passes and interest wanes. Based on such a predicted pattern, a marketer or other influencer may attempt to extend by buzz by implementing a particular targeted marketing campaign. Such a campaign may lead to a longer tail for the influence of a particular topic, leading to enhanced interest, more sales, and so forth.
[0043] Implementations may search and mine data from a network, such as published item data, in real time as the item(s) are published and/or become available. In some implementations, real-time data extraction and analysis modules (e.g., such as the data collection module(s), time series generation module(s), and/or modeling engine described below) may respond to newly available network data by analyzing the data and updating the predictive model as needed. In some implementations, adaptive filters may be employed to perform such analysis, given their suitability for noise cancellation, target data identification, and/or other aspects.
[0044] Implementations employ adaptive filters in conjunction with the feeding back of the model error to improve the predictive model, as described further below. Adaptive filters may be applied to model the time series data that is gathered from user and concept information published on a network. In some instances, the adaptive filters may be applied to achieve various objectives. A first objective may be to model the time lagged behavior of the network time series data. This type of modeling may serve as an explanatory model to understand the extent to which the previous data influences a current social network system. A second objective may be to detect anomalous events that correspond to an increase in published items and/or discussions regarding a particular category, user, brand, concept, and/or other topic. A third objective may be to make predictions based on a particular time series. Adaptive filters have the capability to make such a prediction, wait for the actual occurrence of data, and update the predictive model coefficients, e.g., in real time, based on the error between the predicted value and the actual value.
[0045] Implementations may use various techniques for determining and/or updating the model, including but not limited to a Least Means Squares (LMS) algorithm, a Normalized Least Means Squares (NLMS) algorithm, a Recursive Least Squares algorithm (RLS), and so forth. Implementations may also employ any suitable type of non-linear adaptive filter, including but not limited to kernel LMS, kernel RLS, and/or others. In general, the error in the prediction made by a previous version of the model may be fed back into itself to update the model, with the goal of reducing the occurrence of the error in the future. The predictive model may be revised in real time based on differences between predicted values (e.g., published item counts) and the corresponding actual measured values.
[0046] In some implementations, multiple models (e.g., multiple neural networks) may be employed to make predictions based on the same time series data, and the different models may be trained or otherwise developed using different techniques. Accordingly, the different models may output somewhat different results, even based on the same or similar input data. For example, three different neural network-based models may be developed to make predictions regarding the time series data for published items in network(s). In some implementations, the output predictions of the multiple models may be averaged, differently weighted, and/or otherwise combined to determine an overall result. The weighting of the various models relative to one another may be adjusted based on regression techniques to refine the overall combined model based on new data.
[0047] FIG. 1 depicts an example system for predicting published items in a network, according to implementations of the present disclosure. As shown in the example of FIG. 1, the environment may include one or more networks 102. The network may include any number of nodes 104 that are able to communicate with one another through the network 102. In some instances, a node 104 may be a user of the network 102. A network 102 may include any type of network in which user(s) may publish item(s) to be viewed by other user(s). In some instances, the published item(s) may be republished by the user(s) on the network, and/or published to other network(s). In some instances, a network 102 may be a social network in which users communicate with other users via published items. A network 102 may include users who have registered with the network 102, such that the users have accounts, profiles, or other forms of presence in the network 102. Examples of a network 102 may include Facebook™, Twitter™, Instagram™, Pinterest™, Weibo™, WeChat™, Alibaba™, or others. A network 102 may be public, such that any user may be allowed to publish, view, and republish items. A network 102 may be, to some extent, private, such that a subset of the general public is allowed to publish, view, and republish items.
[0048] A user may publish item(s) 106 that may be viewable and/or republishable by other user(s) in the same network 102 and/or other network(s). A network 102 may employ any data suitable format or arrangement of data for published items 106, and published items 106 may be communicated within the network 102 using any suitable communication protocol. A published item may include one or more types of data, including but not limited to text data, graphics, images, videos, audio data, and so forth. The publishing user may be associated with a set of followers, e.g., other user(s) in the network 102. A follower of a publishing user may include a user who has indicated a desired to view published item(s) 106 of the publishing user 104. For example, a follower may edit their user profile or account information to follow the publishing user, and subsequently the follower may receive notifications indicating when the publishing user publishes an item 106. A follower may be variously described in different social networks as a follower, a friend, a contact, a link, a fan, and so forth.
[0049] The followers of the publishing user 104 may also republish the original published item(s) 106 of the publishing user. Republication may include, but is not limited to, sharing, reposting, retweeting, or commenting on the published item 106, such that the published item 106 may then be viewed by other users. Republication may include republication of the published item 106 in its entirety, or republication of any portion of the published item 106 (e.g., as an excerpt). A follower of the publishing user may republish an item 106 such that the item 106 is viewable by other users who are followers of the republishing user. Any number of those followers may then republish the item 106 to be viewable by other, who may themselves republish the item 106, and so on to any number of republication levels. In this way, a published item 106 may propagate through a network 102. Each set of republications by one or more republishing users may be described as a ripple of the published item 106 as it propagates within the network 102.
[0050] Although examples herein may describe users viewing an item that is published in a network 102, implementations are not limited to item(s) 106 that are visually presented to users. An item 106 may also be presented, at least in part, as audio data, haptic data (e.g., vibrations or other movements of a computing device), or via other modes of presentation.
[0051] As shown in the example of FIG. 1, the environment may include one or more analysis computing devices 110, which may include any suitable number and type of computing device. The analysis computing device(s) 110 may be described as a platform for measuring influence in the network(s) 102, e.g., in the form of time series data, and for making predictions regarding influence in the network(s) 102. The analysis computing device(s) 110 may execute any suitable number of software module(s), which may be described as an engine for making predictions.
[0052] The analysis computing device(s) 110 may execute one or more data collection module(s) 108 which collect information regarding one or more network(s) 102. The data collection module(s) 108 may retrieve and store one or more published item(s) 106 published on the network(s) 102. The data collection module(s) 108 may also retrieve metadata describing the published item(s) 106, including but not limited to a timestamp (e.g., date and/or time) of publication, the publishing user, a subject line, title, or summary of the item 106 as published, a category of the item 106, and/or other metadata such as tags, hashtags, and so forth. The data collection module(s) 108 may also retrieve and store other information available in the network(s) 102, such as demographic information regarding the users of the network(s) 102, such as the user(s) who publish item(s) 106. Demographic information may include various user characteristics, including but is not limited to one or more of the following: user location (e.g., to any degree of specificity), age, gender, ethnic identification, spoken language(s), profession, hobbies, interests, income level, purchase history, group affiliation(s), education level, or other characteristics.
[0053] The published item(s) 106 and/or other data regarding the network(s) 102 may be accessed and analyzed by one or more time series generation modules 112 executing on the analysis computing device(s) 110. The time series generation module(s) 112 may analyze the published items 106 and generate time series data 114. The time series data 114 may include a series of data points or data elements which each include a date and/or time indicator and a number of published items associated with that date and/or time indicator. For example, a time series data point may indicate a number of items 106 published during a particular time period, such as over the course of a particular hour or day. In some instances, the time series data may be specific to a particular category, such as a particular product. Accordingly, the time series data may track the level of influence exhibited by a particular product or other topic within a network over time. In some instances, the time series data may be specific to a particular location, such as a location of the users who publish items regarding a particular product. Accordingly, the time series data may track the level of influence exhibited by a particular product within the particular location, within a network over time. Time series data may describe a number of published items generally, or the number of published items for a particular category, location, and/or other user characteristic.
[0054] The time series data 114 may be received by a modeling engine 116 executing on the analysis device(s) 110. The modeling engine 116 may generate a predictive model 118 based on the time series data 114. In some implementations, the modeling engine 116 may also access data from one or more other data sources 124 such as those described above. The data from the other data source(s) 124 may be employed (e.g., correlated with the time series data 114) to generate and/or modify the predictive model 118. The other data source(s) 124 may be present on the analysis device(s) 110, as shown in the example of FIG. 1. In some implementations, one or more other data sources 124 may be external to the analysis device(s) 110, and the data from the other data source(s) 124 may be provided to the modeling engine 116 over one or more networks.
[0055] The predictive model 1 18 may be employed to predict the number of published items that may be published, e.g., for a particular category, location, and/or other demographic characteristic, during one or more time periods in the future. Such predictions may be included in report(s) 120 that are generated on the analysis device(s) 110 and provided to one or more report consumers 122. The report consumer(s) 122 may include such entities as marketers, advertisers, brand managers, and so forth. The report(s) 120 may be used by such entities to make decisions regarding brands, products, services, campaigns, and so forth. Employing the predictions made based on time series data may enable marketers, advertisers, or others to create targeted, category-specific campaigns that are more effective at spreading information than traditional campaigns which may indiscriminately broadcast information within a network 102, leading to potential higher return on investment for marketing or advertising expenditures.
[0056] In some implementations, the model 118 may be generated and/or refined through the use of one or more machine learning (ML) techniques. Implementations may employ any suitable ML techniques, such as supervised and/or unsupervised techniques. In some implementations, the model 118 may be trained or otherwise developed using time series data 114 that is measured within a network 102. The model 118 may be used to make predictions regarding future numbers of items published in the network 102, and those predictions may be compared to the actual numbers of published items over a corresponding time period. The result of the comparison may be employed to further train and/or update the model 118 to more accurately reflect the actual results. For example, if an initial set of predictions varies from the actual number of published items, the actual number, and/or the difference between the predictions and the actual number, may be employed as training data to further refine the model 118.
[0057] In some instances, the training techniques used to develop and/or update the model(s) may be dependent on which particular signal is being investigated. Developing, updating, and/or using the model(s) may include one or more of the following operations: 1) Collecting the data and formatting and/or conditioning the data for analysis; 2) Plotting and identifying signal characteristics; 3) Proposing a (e.g., broad level) model and attempting to fit the data to it; 4) Using the model along with original data to find residuals (e.g., errors of the model); 5) Analyzing the residuals to determine if the model is to be refined, and if so proposing a refined model. If the model is not to be refined, the current model may be used until a determination is made that refinement is needed.
[0058] FIG. 2 depicts example time series data 114 used to generate and/or update a predictive model 118, according to implementations of the present disclosure. As shown in the example of FIG. 2, a first set of time series data 114(1) may be employed to determine a first version of a model 118(1). The model 118(1) may reflect the patterns, trends, and/or other characteristics of the time series data 114(1). A time series may include any suitable number of ordered pairs, in which each order pairs includes a date and/or time indicator and an associated value. An ordered pair may be associate a particular date/time range (e.g., from a first date/time to a second date/time) with a value that the number of items that are published on a network during that range. For example, a time series may include a series of data points each indicating a number of published items for a particular category during a particular time period (e.g., a day, an hour, a 12-hour period, a week, etc.)
[0059] The first version of the model 118(1) may be used to generate one or more predictions 202. A prediction 202 may be a prediction that a particular number of items (e.g., for a particular category, location, and/or other characteristic(s)), or range of number of items, will be published during a future time period. The prediction(s) 202 may include predictions regarding any suitable number of future time periods. For example, the model 118(1) may be used to predict that 100 items will be published on network ABC during a future time period that is subsequent to the time period(s) of the time series data 114(1) used to generate and/or update the model 118(1).
[0060] In some implementations, actual measured item count data may be collected for those time periods for which the predictions 202 were made. In the example of FIG. 2, the actual data is included in time series data 114(2). The time series data 114(2) may be used, along with the prediction(s) 202, to update the model 118(1) and determine an updated model 118(2). The updated model 118(2) may then be used to make subsequent prediction(s) 202. The error in the prediction(s) 202 (e.g., as indicated by the actual data) may be used to refine the model 118(2) and reduce the incidence of such error in future prediction(s).
[0061] In some implementations, data from the other data source(s) 124 may be used to generate the model 118(1). Such data may be used in addition to the time series data 114(1) to generate the model 118(1). The data from the other data source(s) 124 may also be used to update and/or refine the model 118(1) to generate the model 118(2). For example, a newer version of dynamic and/or frequently changing data from the other data source(s) 124, such as weather data, news data, real time financial information, and so forth, may be used to modify the model 118 in real time with respect to receiving the updated data from the other data source(s) 124.
[0062] FIG. 3 depicts a flow diagram of an example process for predicting published items 106 in a network 102 based on a model 118 of time series data, according to implementations of the present disclosure. Operations of the process may be performed by one or more of the data collection module(s) 108, the time series generation module(s) 112, the modeling engine 116, the model 118, and/or other software module(s) executing on the analysis device(s) 110 or elsewhere.
[0063] Network data may be received (302) indicating the item(s) 106 published on a network 102 during one or more time periods. Published item(s) 106 may include, but are not limited to, tweets, social network posts, comments on posts or other published items, retweets, reposts, articles, and so forth.
[0064] In some implementations, at least one category may be determined (304) that is associated with the item(s) 106. In some implementations, to determine a category for an item 106, the item 106 may be analyzed and term(s) present in the item 106 may be compared to a list of terms corresponding to a category, for each of one or more categories. A term may include any amount of data. For example, a term may be a single word or sequence of characters. A term may also include multiple words, such as a phrase or multi-word term. In some implementations, the data in an item 106 may be preprocessed to determine the terms that are present in the item 106. For example, the item 106 may be parsed based on separator characters such as white space (e.g., spaces, new lines, carriage returns), punctuation characters, or other separators. In some examples, where the item 106 includes audio data, the item 106 may be processed using speech-to-text (STT) conversion method(s) to generate text data based on audio input data, prior to calculating the similarity. A determination may be made of a degree of similarity between the terms in an item and the list of terms corresponding to a category. If a calculated similarity meets or exceeds a predetermined threshold level of similarity, the item 106 may be associated with the category. This process may be repeated for a particular item 106 with respect to any number of categories, and the process may be repeated for any number of items 106. A particular item 106 may be associated with any number of categories. For example, an item 106 may be associated with a category of "restaurants" as well as a more specific category of "Japanese restaurants" and/or "teriyaki restaurants."
[0065] For each published item 106, the analysis may determine one or more categories and/or keywords for the item 106. The analysis may compare words or multi-word terms in the item 106 to a list of terms that are known to relate to a category, such as a library of terms that have been manually curated for each category. For example, the use of the word "Tiffany" in the published item 106 may lead to a determination that the item 106 is in the categories "jewelry" and "Tiffany brand jewelry." Further use of the words "engagement" and "ring" in the published item 106 may indicate other categories of "ring" and "engagement ring." In some examples, the platform determines a probability that the published item 106 corresponds to a category based on a correspondence (e.g., a statistical similarity measure) between terms in the published item 106 and terms known to correspond to the category. In some implementations, the analysis identifies an exact match between terms in the item 106 and terms in the category-specific list to determine similarity. In some implementations, the analysis may employ semantic analysis based on natural language processing (NLP) or other methods to calculate a similarity based on a semantic closeness between the terms of the item 106 and the category-specific list of terms. In some implementations, an emoji-based closeness measure may be employed. For example, the use of emojis in published items, their frequency of use, their order of use, and/or other considerations may be employed to determine categorization of content (e.g., demographics-based, sentiment- based, and so forth).
[0066] A published item 106 may be designated as being within a category if the calculated similarity exceeds a threshold value. In some implementations, the threshold value may be determined by applying a machine learning method based on statistical analysis. For example, implementations may determine a weighting of multiple items in a particular category based on keyword terms present in the items, and use the mean average of the weighting as a threshold or as the basis for the threshold (e.g., threshold = 80% of the mean average). In some implementations, a particular published item 106 may be associated with a probability matrix indicating the probabilities that the item 106 corresponds to various categories.
[0067] Time series data may be determined (306). As described above, the time series data may indicate, for each of a plurality of time periods, a number of items that were published on a network during the respective time period. In some instances, the time series data may be for a particular category of published items. For example, the time series data may indicate, for each of the plurality of time periods, a number of items that are associated with a particular category and that were published on a network during the respective time period. In some instances, the time series data may be for a location. For example, the time series data may indicate, for each of the plurality of time periods, a number of items that were published by users in a particular location (e.g., city, state, county, country, etc.) and that were published on a network during the respective time period. The time series data may also be associated with other demographic characteristics of the users, such as gender, age range, and so forth. For example, the time series data may indicate, for each of the plurality of time periods, a number of items that were published by users with a particular demographic characteristic and that were published on a network during the respective time period.
[0068] A model 118 may be determined (308) based on the time series data determined at 306. In instances where the time series data is particular to a category, location, and/or other demographic characteristic, the determined model 118 may also be particular to the corresponding category, location, and/or other demographic characteristic. For example, the model 118 may be used to predict the number of future published items in the particular category, by users in the particular location, and/or by users exhibiting the particular demographic characteristic. As described above, the model 118 may be determined using suitable ML techniques. In some implementations, data from one or more other data sources 124 may be used to determine the model 118.
[0069] The model 118 may be stored and employed (310) to predict the number of items to be published on the network during at least one subsequent time period, e.g., subsequent to the time periods of the time series data used to determine and/or update the model 118. The model 118 may also be used to predict sales of a particular category, such as a particular product brand.
[0070] The predictions made using the model 118 may be incorporated into report(s) 120, which may be provided (312) to one or more data consumers 122 as described above.
[0071] FIG. 4 depicts a flow diagram of an example process for updating a predictive model 118 based on time series data, according to implementations of the present disclosure. Operations of the process may be performed by one or more of the data collection module(s) 108, the time series generation module(s) 112, the modeling engine 116, the model 118, and/or other software module(s) executing on the analysis device(s) 110 or elsewhere.
[0072] Network data may be received (402) indicating items 106 published on a network 102.
[0073] A first set of time series data may be determined (404) indicating the number of items published during various time periods in a first time span.
[0074] A first version of a model may be determined (406) based on the first time series data and/or data from the other data source(s) 124.
[0075] The first model may be employed (408) to predict the number of items to be published on the network during at least one subsequent time period (e.g., subsequent to the first time span). Operations of 402, 404, 406, and 408 may proceed as described above.
[0076] In some implementations, updated network data may be received (410) indicating items published on the network during a second time span that includes the at least one subsequent time period for which predictions were made.
[0077] A second set of time series data may be determined (412) indicating the number of items published during various time periods in the second time span. [0078] The first model may be updated (414) to generate a second (e.g., updated, revised) model. The updating may be based on comparing the predicted numbers of items to the second time series data (e.g., the actual numbers of items published in the network) as described above. In some implementations, the second model may be generated based at least partly on (e.g., recent) data from the other data source(s) 124. The second version of the model may be stored and used to make subsequent predictions that may be more accurate than the predictions made using the first version of the model.
[0079] In some implementations, the updated version of the model may be determined based on both the first set of time series data and the second set of time series data. For example, the entire set of determined time series data may be used to regenerate the second version of the model as if the model was being initially determined, e.g., without regard to the previous version of the model 116. Alternatively, the updated version of the model may be determined based on the difference(s) (e.g., a delta) between the second set of time series data and the predicted number(s) of items made using he first version of the model, e.g., in a regression paradigm as described above.
[0080] In the example of FIG. 4, the time series data is determined based on the number of published items during various time periods. Implementations also support the use of other time series data developed using other measurements. For example, time series data may be determined based on the number of follower counts over time, number of likes and/or shares of items over time, and so forth. Implementations may apply the analysis described herein to generate predictions based on these other types of time series data, and/or any other suitable type of time series data.
[0081] FIG. 5 depicts a schematic of an example of a report 120 that includes predicted publish item data, according to implementations of the present disclosure. In the example of FIG. 5, the predictions 202 are made regarding the number of published items that may be published during subsequent time periods, and that are associated with a particular category (e.g., "Epiffany Wedding Rings"), and that are published by users in a particular location (e.g., the Shanghai region of China). The example category is a subcategory within a hierarchy of categories, e.g., a particular brand of wedding rings, which is a subcategory of rings, which is a subcategory of jewelry. [0082] Although the examples herein for time series analysis descriptions may describe the use of time series data based on published item counts over time, implementations are not limited to this particular type of time series data. In some implementations, multiple sets of time series data, of differing types, may be used to develop models and make predictions. Such time series data may also include data not directly related to social network data. For example, time series data such as current news events, customer marketing campaigns, weather data, and so forth may be input to a model to predict a level of buzz for a brand or product.
[0083] Moreover, implementations are not limited to using the example categorization methodology described herein for grouping similar text and/or audio data. Some implementations may also employ emoji-based similarity determination as part of the analysis, as described above. In some implementations, image and/or video data analysis may also be used for categorization and/or other aspects.
[0084] FIG. 6 depicts an example computing system, according to implementations of the present disclosure. The system 600 may be used for any of the operations described with respect to the various implementations discussed herein. For example, the system 600 may be included, at least in part, in the analysis computing device(s) 110 described herein, or in computing device(s) operated by one or more of the user(s) 104, the user(s) 108, or the metrics consumer(s) 122. The system 600 may include one or more processors 610, a memory 620, one or more storage devices 630, and one or more input/output (I/O) devices 650 controllable via one or more I/O interfaces 640. The various components 610, 620, 630, 640, or 650 may be interconnected via at least one system bus 660, which may enable the transfer of data between the various modules and components of the system 600.
[0085] The processor(s) 610 may be configured to process instructions for execution within the system 600. The processor(s) 610 may include single-threaded processor(s), multi -threaded processor(s), or both. The processor(s) 610 may be configured to process instructions stored in the memory 620 or on the storage device(s) 630. For example, the processor(s) 610 may execute instructions for the various software module(s) described herein. The processor(s) 610 may include hardware-based processor(s) each including one or more cores. The processor(s) 610 may include general purpose processor(s), special purpose processor(s), or both. [0086] The memory 620 may store information within the system 600. In some implementations, the memory 620 includes one or more computer-readable media. The memory 620 may include any number of volatile memory units, any number of non-volatile memory units, or both volatile and non-volatile memory units. The memory 620 may include read-only memory, random access memory, or both. In some examples, the memory 620 may be employed as active or physical memory by one or more executing software modules.
[0087] The storage device(s) 630 may be configured to provide (e.g., persistent) mass storage for the system 600. In some implementations, the storage device(s) 630 may include one or more computer-readable media. For example, the storage device(s) 630 may include a floppy disk device, a hard disk device, an optical disk device, or a tape device. The storage device(s) 630 may include read-only memory, random access memory, or both. The storage device(s) 630 may include one or more of an internal hard drive, an external hard drive, or a removable drive.
[0088] One or both of the memory 620 or the storage device(s) 630 may include one or more computer-readable storage media (CRSM). The CRSM may include one or more of an electronic storage medium, a magnetic storage medium, an optical storage medium, a magneto- optical storage medium, a quantum storage medium, a mechanical computer storage medium, and so forth. The CRSM may provide storage of computer-readable instructions describing data structures, processes, applications, programs, other modules, or other data for the operation of the system 600. In some implementations, the CRSM may include a data store that provides storage of computer-readable instructions or other information in a non-transitory format. The CRSM may be incorporated into the system 600 or may be external with respect to the system 600. The CRSM may include read-only memory, random access memory, or both. One or more CRSM suitable for tangibly embodying computer program instructions and data may include any type of non-volatile memory, including but not limited to: semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. In some examples, the processor(s) 610 and the memory 620 may be supplemented by, or incorporated into, one or more application-specific integrated circuits (ASICs).
[0089] The system 600 may include one or more I/O devices 650. The I/O device(s) 650 may include one or more input devices such as a keyboard, a mouse, a pen, a game controller, a touch input device, an audio input device (e.g., a microphone), a gestural input device, a haptic input device, an image or video capture device (e.g., a camera), or other devices. In some examples, the I/O device(s) 650 may also include one or more output devices such as a display, LED(s), an audio output device (e.g., a speaker), a printer, a haptic output device, and so forth. The I/O device(s) 650 may be physically incorporated in one or more computing devices of the system 600, or may be external with respect to one or more computing devices of the system 600.
[0090] The system 600 may include one or more I/O interfaces 640 to enable components or modules of the system 600 to control, interface with, or otherwise communicate with the I/O device(s) 650. The I/O interface(s) 640 may enable information to be transferred in or out of the system 600, or between components of the system 600, through serial communication, parallel communication, or other types of communication. For example, the I/O interface(s) 640 may comply with a version of the RS-232 standard for serial ports, or with a version of the IEEE 1284 standard for parallel ports. As another example, the I/O interface(s) 640 may be configured to provide a connection over Universal Serial Bus (USB) or Ethernet. In some examples, the I/O interface(s) 640 may be configured to provide a serial connection that is compliant with a version of the IEEE 1394 standard.
[0091] The I/O interface(s) 640 may also include one or more network interfaces that enable communications between computing devices in the system 600, or between the system 600 and other network-connected computing systems. The network interface(s) may include one or more network interface controllers (NICs) or other types of transceiver devices configured to send and receive communications over one or more communication networks using any network protocol.
[0092] Computing devices of the system 600 may communicate with one another, or with other computing devices, using one or more communication networks. Such communication networks may include public networks such as the internet, private networks such as an institutional or personal intranet, or any combination of private and public networks. The communication networks may include any type of wired or wireless network, including but not limited to local area networks (LANs), wide area networks (WANs), wireless WANs (WWANs), wireless LANs (WLANs), mobile communications networks (e.g., 3G, 4G, Edge, etc.), and so forth. In some implementations, the communications between computing devices may be encrypted or otherwise secured. For example, communications may employ one or more public or private cryptographic keys, ciphers, digital certificates, or other credentials supported by a security protocol, such as any version of the Secure Sockets Layer (SSL) or the Transport Layer Security (TLS) protocol.
[0093] The system 600 may include any number of computing devices of any type. The computing device(s) may include, but are not limited to: a personal computer, a smartphone, a tablet computer, a wearable computer, an implanted computer, a mobile gaming device, an electronic book reader, an automotive computer, a desktop computer, a laptop computer, a notebook computer, a game console, a home entertainment device, a network computer, a server computer, a mainframe computer, a distributed computing device (e.g., a cloud computing device), a microcomputer, a system on a chip (SoC), a system in a package (SiP), and so forth. Although examples herein may describe computing device(s) as physical device(s), implementations are not so limited. In some examples, a computing device may include one or more of a virtual computing environment, a hypervisor, an emulation, or a virtual machine executing on one or more physical computing devices. In some examples, two or more computing devices may include a cluster, cloud, farm, or other grouping of multiple devices that coordinate operations to provide load balancing, failover support, parallel processing capabilities, shared storage resources, shared networking capabilities, or other aspects.
[0094] Implementations and all of the functional operations described in this specification may be realized in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations may be realized as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium may be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "computing system" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus may include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus.
[0095] A computer program (also known as a program, software, software application, script, or code) may be written in any appropriate form of programming language, including compiled or interpreted languages, and it may be deployed in any appropriate form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program may be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
[0096] The processes and logic flows described in this specification may be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows may also be performed by, and apparatus may also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
[0097] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any appropriate kind of digital computer. Generally, a processor may receive instructions and data from a read only memory or a random access memory or both. Elements of a computer can include a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer may also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory may be supplemented by, or incorporated in, special purpose logic circuitry.
[0098] To provide for interaction with a user, implementations may be realized on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any appropriate form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any appropriate form, including acoustic, speech, or tactile input.
[0099] Implementations may be realized in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a web browser through which a user may interact with an implementation, or any appropriate combination of one or more such back end, middleware, or front end components. The components of the system may be interconnected by any appropriate form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), e.g., the Internet.
[00100] The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[00101] While this specification contains many specifics, these should not be construed as limitations on the scope of the disclosure or of what may be claimed, but rather as descriptions of features specific to particular implementations. Certain features that are described in this specification in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some examples be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.
[00102] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged into multiple software products.
[00103] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. For example, various forms of the flows shown above may be used, with steps reordered, added, or removed. Accordingly, other implementations are within the scope of the following claims.

Claims

CLAIMS:
1. A computer-implemented method performed by at least one processor, the method comprising:
receiving, by the at least one processor, a first set of data points each indicating a number of items published on a network during a respective time period, wherein the items are associated with a category;
determining, by the at least one processor, a model based on the first set of data points and further based on data from one or more data sources, wherein the model describes a time series of the first set of data points;
employing, by the at least one processor, the model to determine a predicted number of items, associated with the category, that are published on the network during at least one subsequent time period; and
based on the predicted number of items, subsequently publishing within the network, by the at least one processor, information that is associated with the category and that targets a set of users to receive the subsequently published information within the network.
2. The method of claim 1, wherein the data from the one or more data sources includes one or more of financial data, weather data, environmental data, event data, news data, or
demographic data.
3. The method of claim 1, wherein the data from the one or more data sources includes dynamic data.
4. The method of claim 1, wherein:
the data is received from at least two data sources; and
determining the model includes correlating the data received from the at least two data sources.
5. The method of claim 1, further comprising: receiving, by the at least one processor, a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; and
determining, by the at least one processor, an updated version of the model based on the first set of data points and the second set of data points, and further based on an updated version of the data from the one or more data sources.
6. The method of claim 1, further comprising:
receiving, by the at least one processor, a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; and
determining, by the at least one processor, an updated version of the model based on at least one difference between the second set of data points and the predicted number of items, and further based on an updated version of the data from the one or more data sources.
7. The method of claim 1, wherein the model is determined at least in part through a machine learning algorithm.
8. The method of claim 1, wherein:
the network is a social network; and
the items are published as one or more of a tweet, a post, a share, or a comment.
9. The method of claim 1, further comprising determining, by the at least one processor, the category of each of the items, including:
calculating a similarity between terms in a respective item and a list of terms
corresponding to the category; and
associating the respective item with the category based on the similarity exceeding a threshold level of similarity.
10. The method of claim 1, wherein the category is included in a hierarchy of categories with different degrees of specificity.
11. The method of claim 1, further comprising:
determining, by the at least one processor, a location of users who published the items on the network;
wherein determining a model is based on the first set of data points for items published by the users in the location.
12. The method of claim 1, further comprising:
transmitting, by the at least one processor, over one or more networks, at least one report that includes the predicted number of items, associated with the category, that are published on the network during at least one subsequent time period.
13. A system, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor, the memory storing instructions which, when executed by the at least one processor, cause the at least one processor to perform operations comprising:
receiving a first set of data points each indicating a number of items published on a network during a respective time period, wherein the items are associated with a category;
determining a model based on the first set of data points and further based on data from one or more data sources, wherein the model describes a time series of the first set of data points;
employing the model to determine a predicted number of items, associated with the category, that are published on the network during at least one subsequent time period; and
based on the predicted number of items, subsequently publishing within the network information that is associated with the category and that targets a set of users to receive the subsequently published information within the network.
14. The system of claim 13, wherein the data from the one or more data sources includes one or more of financial data, weather data, environmental data, event data, news data, or demographic data.
15. The system of claim 13, wherein the data from the one or more data sources includes dynamic data.
16. The system of claim 13, wherein:
the data is received from at least two data sources; and
determining the model includes correlating the data received from the at least two data sources.
17. The system of claim 13, the operations further comprising:
receiving a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; and
determining an updated version of the model based on the first set of data points and the second set of data points, and further based on an updated version of the data from the one or more data sources.
18. The system of claim 13, the operations further comprising:
receiving a second set of data points each indicating the number of items, associated with the category, that are published on the network during the at least one subsequent time period; and
determining an updated version of the model based on at least one difference between the second set of data points and the predicted number of items, and further based on an updated version of the data from the one or more data sources.
19. The system of claim 13, wherein the model is determined at least in part through a machine learning algorithm.
20. One or more computer-readable media storing instructions which, when executed by at least one processor, cause the at least one processor to perform operations comprising:
receiving a first set of data points each indicating a number of items published on a network during a respective time period, wherein the items are associated with a category;
determining a model based on the first set of data points and further based on data from one or more data sources, wherein the model describes a time series of the first set of data points; employing the model to determine a predicted number of items, associated with the category, that are published on the network during at least one subsequent time period; and
based on the predicted number of items, subsequently publishing within the network information that is associated with the category and that targets a set of users to receive the subsequently published information within the network.
PCT/US2017/067414 2016-12-20 2017-12-19 Multi-source modeling for network predictions WO2018118986A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662436776P 2016-12-20 2016-12-20
US62/436,776 2016-12-20

Publications (1)

Publication Number Publication Date
WO2018118986A1 true WO2018118986A1 (en) 2018-06-28

Family

ID=62627218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/067414 WO2018118986A1 (en) 2016-12-20 2017-12-19 Multi-source modeling for network predictions

Country Status (1)

Country Link
WO (1) WO2018118986A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11107166B2 (en) * 2018-09-25 2021-08-31 Business Objects Software Ltd. Multi-step day sales outstanding forecasting
CN111538935B (en) * 2019-12-26 2023-08-25 北京玖天气象科技有限公司 Fine precipitation fusion method, system, electronic equipment and storage medium based on terrain features and multi-source mode products
CN117817675A (en) * 2024-03-06 2024-04-05 泓浒(苏州)半导体科技有限公司 Prediction method of motion trail of wafer handling mechanical arm based on time sequence
CN117817675B (en) * 2024-03-06 2024-04-30 泓浒(苏州)半导体科技有限公司 Prediction method of motion trail of wafer handling mechanical arm based on time sequence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150235133A1 (en) * 2012-09-12 2015-08-20 Nec Corporation Data concentration prediction device, data concentration prediction method, and recording medium recording program thereof
US20150381552A1 (en) * 2014-06-30 2015-12-31 Ravi Kiran Holur Vijay Personalized delivery time optimization
US20160086222A1 (en) * 2009-01-21 2016-03-24 Truaxis, Inc. Method and system to remind users of targeted offers in similar categories

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160086222A1 (en) * 2009-01-21 2016-03-24 Truaxis, Inc. Method and system to remind users of targeted offers in similar categories
US20150235133A1 (en) * 2012-09-12 2015-08-20 Nec Corporation Data concentration prediction device, data concentration prediction method, and recording medium recording program thereof
US20150381552A1 (en) * 2014-06-30 2015-12-31 Ravi Kiran Holur Vijay Personalized delivery time optimization

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11107166B2 (en) * 2018-09-25 2021-08-31 Business Objects Software Ltd. Multi-step day sales outstanding forecasting
CN111538935B (en) * 2019-12-26 2023-08-25 北京玖天气象科技有限公司 Fine precipitation fusion method, system, electronic equipment and storage medium based on terrain features and multi-source mode products
CN117817675A (en) * 2024-03-06 2024-04-05 泓浒(苏州)半导体科技有限公司 Prediction method of motion trail of wafer handling mechanical arm based on time sequence
CN117817675B (en) * 2024-03-06 2024-04-30 泓浒(苏州)半导体科技有限公司 Prediction method of motion trail of wafer handling mechanical arm based on time sequence

Similar Documents

Publication Publication Date Title
US10367862B2 (en) Large-scale page recommendations on online social networks
JP6268105B2 (en) Targeting ads to groups of social networking system users
US11580447B1 (en) Shared per content provider prediction models
US10699320B2 (en) Marketplace feed ranking on online social networks
US20160307241A1 (en) Targeting items to a user of a social networking system based on a predicted event for the user
US20120016817A1 (en) Predicting Life Changes of Members of a Social Networking System
US8688717B2 (en) Method and apparatus for generating and using an interest graph
US20140172545A1 (en) Learned negative targeting features for ads based on negative feedback from users
JP2019527874A (en) Predict psychometric profiles from behavioral data using machine learning while maintaining user anonymity
US20140089084A1 (en) Generation of advertising targeting information based upon affinity information obtained from an online social network
US20170140397A1 (en) Measuring influence propagation within networks
US20180365710A1 (en) Website interest detector
US20140172544A1 (en) Using negative feedback about advertisements to serve advertisements
US20120254184A1 (en) Methods And Systems For Analyzing Data Of An Online Social Network
US20190095530A1 (en) Tag relationship modeling and prediction
US20180308133A1 (en) Generating Personalized Messages According To Campaign Data
US20190116233A1 (en) Organizing Application-Reported Information
US11386349B1 (en) Systems and methods for distinguishing human users from bots
US20180314925A1 (en) Systems and methods for sending notifications based on predicted effectiveness
WO2018118986A1 (en) Multi-source modeling for network predictions
WO2018118982A1 (en) Time series modeling for network predictions
US20220215431A1 (en) Social network optimization
US11263648B1 (en) Inferring location structures based on conversion data
Signorini Social Web Information Monitoring for Health

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17883704

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.10.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17883704

Country of ref document: EP

Kind code of ref document: A1