WO2016186634A1 - Maximizing information value of web content - Google Patents

Maximizing information value of web content Download PDF

Info

Publication number
WO2016186634A1
WO2016186634A1 PCT/US2015/031263 US2015031263W WO2016186634A1 WO 2016186634 A1 WO2016186634 A1 WO 2016186634A1 US 2015031263 W US2015031263 W US 2015031263W WO 2016186634 A1 WO2016186634 A1 WO 2016186634A1
Authority
WO
WIPO (PCT)
Prior art keywords
web content
content items
characteristic
computing device
vector
Prior art date
Application number
PCT/US2015/031263
Other languages
French (fr)
Inventor
Bernardo Huberman
Sitaram Asur
Sandra SERVIA-RODRIGUEZ
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2015/031263 priority Critical patent/WO2016186634A1/en
Publication of WO2016186634A1 publication Critical patent/WO2016186634A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Definitions

  • FIG. 1 is a block diagram of an example computing device for maximizing information value of web content
  • FIG. 3 is a flowchart of an example method for execution by a computing device for maximizing information value of web content
  • FIG.4 is a diagram of an example index rankings map that is ordered to maximize information value.
  • Social media services feature a large number of subscribers and serve as aggregators of content such as news, promotional campaigns, media and status updates from users. Given the diversity and magnitude of content that is available, it is important, from the service provider's point of view, to ensure easy access to relevant information to users in order to retain and increase user engagement with the platform. For example, a timeline on a social media site may display content in decreasing order of publication. However, novelty is not the only feature that makes social media content valuable to users, and other features such as popularity can also contribute to give value to the content.
  • Examples disclosed herein improve information value of web content by using characteristics vectors that are based on sharing statistics to generate index ranking maps for ordering items in the web content.
  • the novelty and popularity of web content items can be used as objective measures of the items’ relevance and utility.
  • the Huberman-Wu algorithm can be used to automatically select the items that should receive the most attention in the next time interval.
  • sharing statistics for web content items are collected from a data computing device, where the sharing statistics include time-dependence data and temporal comparisons.
  • a characteristic 2-vector is generated for each of the web content items based on the sharing statistics, where the characteristic 2-vector includes novelty values and popularity values.
  • the characteristic 2-vector of each web content item is normalized, and a Markov process is applied to each web content item to determine a corresponding transition probability based on a normalized, characteristic 2- vector associated with the web content item.
  • the web content items are continually ordered based on the transition probabilities, where a subset of the web content items are displayed according to the order of the web content items.
  • FIG.1 is a block diagram of an example computing device 100 for maximizing information value of web content.
  • Computing device 100 may be any computing device (e.g., server, desktop, notebook, tablet, etc.) with access to web content provided by, for example, data servers such as data computing device 200 of FIG.2.
  • computing device 100 includes a processor 110, an interface 115, and a machine-readable storage medium 120.
  • Processor 110 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 120.
  • Processor 110 may fetch, decode, and execute instructions 122, 124, 126, 128, 130 to improve information value of web content, as described below.
  • processor 110 may include one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of instructions 122, 124, 126, 128, 130.
  • Interface 115 may include a number of electronic components for communicating with other computing devices.
  • interface 115 may be an Ethernet interface, a Universal Serial Bus (USB) interface, an IEEE 1394 (Firewire) interface, an external Serial Advanced Technology Attachment (eSATA) interface, or any other physical connection interface suitable for communication with the other computing devices.
  • interface 115 may be a wireless interface, such as a wireless local area network (WLAN) interface or a near-field communication (NFC) interface.
  • WLAN wireless local area network
  • NFC near-field communication
  • interface 115 may be used to send and receive data, such as web content to and from a corresponding interface of another computing device.
  • Machine-readable storage medium 120 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions.
  • machine-readable storage medium 120 may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like.
  • RAM Random Access Memory
  • EEPROM Electrically-Erasable Programmable Read-Only Memory
  • machine-readable storage medium 120 may be encoded with executable instructions for improving information value of web content.
  • the machine-readable storage medium 120 may be non-transitory.
  • Characteristic vector generating instructions 124 creates a characteristic vector(s) for each item of web content based on the sharing statistics.
  • each characteristic vector can be a constant 2-vector that has a range of characteristic values for two attributes (e.g., novelty and popularity, etc.).
  • the range of values for novelty can describe possible values for the number of times an item is reshared immediately after the item is initially posted
  • the range of values for popularity can describe possible values for the number of times an item is reshared over a longer interval (in comparison to the interval use for novelty) of time.
  • Characteristic normalizing instructions 126 normalize the characteristic vectors so that each of the possible values applies to an equal sized subset of the web content. For example, the majority of web content is reshared less than 100 times while a very small percentage of web content is reshared over 1000 times.
  • the characteristic vector for popularity is normalized so that the range of possible values is equally distributed when applied to web content.
  • the normalized characteristic vector can also be modified to reflect an average number of reshares per time interval (e.g., minute, hour, day, etc.). Each possible value in a characteristic vector is attributed with a reward that is used to calculate the utility of an item of web content as described below.
  • Markov process applying instructions 128 applies a Markov process to the normalized characteristics vectors to determine a state for each item of web content.
  • the Markov process can be applied to the normalized characteristic vectors of the web content to dynamically determine transition probabilities of the states for the items of web content.
  • a Markov process is a stochastic process that satisfies the memory-less property (i.e., Markov property), which states that futures states are dependent on the present state and not previous events.
  • the states of an item of web content includes the range of possible combinations of characteristic values that are possible for the item. For example, the range of an item’s state can be from low utility to high utility, where a high utility indicates that the item is highly popular and novel.
  • FIG. 2 is a block diagram of an example computing device 250 in communication via a network 245 with data computing devices 200A-200N. As illustrated in FIG. 2 and described below, computing device 250 may communicate with data computing devices 200A-200N to maximize information value of web content.
  • each data computing device 200A-200N may include a corresponding web content module 200A-200N, while computing device 250 may include a number of modules 252-268.
  • Each of the modules may include a series of instructions encoded on a machine-readable storage medium and executable by a processor of the respective device 200, 250.
  • each module may include one or more hardware devices including electronic circuitry for implementing the functionality described below.
  • each data computing device 200A-200N may be a server, a notebook, desktop, tablet, workstation, mobile device, or any other device suitable for executing the functionality described below.
  • each data computing device 200A-200N may include a web content module 210A, 210N for providing web content and associated sharing statistics.
  • data computing device 200A can be a web server that is configured to provide web content in a social media network.
  • the web content module 210A is configured to provide computing device 250 with access to the web content and associated sharing statistics, which describe various attributes of the web content).
  • computing device 250 may be any computing device with access to data computing devices 200A-200N over a network 245 that is suitable for executing the functionality described below. As detailed below, computing device 250 may include a series of modules 252-268 for improving information value of web content.
  • Interface module 252 may manage communications with the data computing devices 200A-200N. Specifically, the interface module 252 may initiate connections with the data computing devices 200A-200N and then send or receive web content data to the data computing devices 200A-200N.
  • Statistics module 256 may collect and process web content and associated sharing statistics from data computing devices 200A-200N. Collecting module 258 of statistics module 256 uses interface 252 to collect the web content and sharing statistics from data computing devices 200A-200N. The data can be collected in real-time and/or based on a schedule. Each data computing device 200A-200N can provide web content of a different source such as a social media service, an online news journal, etc. Further, the collected data can be filtered based on various parameters. For example, the data collected can be associated with news media sources.
  • Characteristics module 260 of statistics module 256 processes the web content and sharing statistics to determine characteristics of the web content. For example, a range of popularity and novelty values can be determined for the web content of each source.
  • Characteristics module 260 can process and aggregate sharing statistics that describe the resharing (e.g., share, retweet, forward, etc.) of web content.
  • the sharing statistics may specify the number of times that each item of web content is reshared, which can vary greatly depending on the author, type of content, etc.
  • the resharing statistics can be used to determine the average number of times content is reshared for a time interval (e.g., every minute, hourly, daily, etc.).
  • trends can be identified in the resharing statistics as time- dependence data. For example when observing a web content source that prioritizes novelty (e.g., TWITTER®), an item of web content receives more engagement shortly after (e.g., in the second and third minute) the item is posted, and after an initial discovery period with little engagement (e.g., after 1 minute), the number of reshares greatly increases and then fits a power law distribution. At some stage, the number of reshares of an item decreases significantly because its novelty is lower. For less popular content, the increase in engagement after the initial discovery period and the following decline in engagement can occur over a larger time scale.
  • TWITTER® is a registered trademark of Twitter, Inc., which is headquartered in San Francisco, California.
  • Temporal comparisons with other platforms can also be observed.
  • a web content source that account for novelty and popularity
  • community-managed content sources e.g., REDDIT®
  • the quantity of reshares does not decrease as dramatically with time for web content that is very popular (i.e., highly up voted).
  • REDDIT® is a registered trademark of Reddit Inc., which is headquartered in San Francisco, California.
  • Characteristics module 260 can also process sharing statistics to determine the conditional variance of the number of reshares received after t minutes from the publication of an original item of web content.
  • Conditional variance describes the variance between the number of reshares of an item received after t minutes from publication and other items that received the same number of reshares after t - 1 minutes from publication.
  • Analysis module 262 may analyze data collected by statistics module 256 to improve information value of web content.
  • Normalizing module 264 of analysis module 262 creates characteristics vectors for items of web content based on the sharing statistics. Characteristics vectors can be made for multiple attributes (e.g., novelty, popularity, etc.), where each characteristic vector describes the range of possible values for an item of web content for a particular attribute. Normalizing module 264 can also normalize the characteristics vectors so that web content is evenly distributed throughout the characteristics vectors. Each of the normalized values can be attributed with a reward that can be used to determine the utility of items of web content. For example, a characteristic vector for novelty can have higher rewards for values that indicate an item is more novel.
  • Markov module 266 of analysis module 262 applies a Markov process to web content determine transitional probabilities. It is assumed that the state of each item changes according to the Markov process independent of the state of other items, with transition probabilities if the item is in a top list (e.g., top 10 items of web content) and if the item is not in the top list. In order to empirically calculate the transition probabilities, the web content posted during set interval of observation is considered. For example, assuming that all the items are on the top list (i.e., all of them are displayed), is defined as:
  • I i (t) is the set of items in state i at time t and I j (t + 1) the set of items in state j at t + 1 that transited to this state from state i.
  • I j (t + 1) the set of items in state j at t + 1 that transited to this state from state i.
  • Ordering module 268 of analysis module 262 orders items of web content based on the transitional probabilities.
  • the G index i.e., ordering index
  • rankings of the 101 states are calculated using, for example, the Bertsimas-Nino-Mora (BNM) adaptive greedy algorithm.
  • BNM Bertsimas-Nino-Mora
  • a set of constants calculated. Assuming that E is finite, for any subset S , the S-active policy u S is defined to be the strategy that recommends items whose state is in S. Considering an item that starts from an initial state X ( ) , under the action implied by strategy u S , its total occupancy time in S is given by
  • the items of web content can then be described based on the G index (e.g., ordering, top 10 items, etc.).
  • Example G index rankings are shown in FIG. 4.
  • the items can be displayed according to the ordering to improve the utility of the information displayed.
  • the top 10 items can be displayed and updated as the ordering of the web content is dynamically determined based on real-time sharing statistics.
  • FIG. 3 is a flowchart of an example method 300 for execution by a computing device 100 for maximizing information value of web content. Although execution of method 300 is described below with reference to computing device 100 of FIG. 1, other suitable devices for execution of method 300 may be used, such as computing device 250 of FIG.2.
  • Method 300 may be implemented in the form of executable instructions stored on a machine-readable storage medium, such as storage medium 120, and/or in the form of electronic circuitry.
  • Method 300 may start in block 305 and continue to block 310, where computing device 100 collects sharing statistics for web content from a data computing device. Sharing statistics can be collected from multiple data devices and for any number of users at each of those devices.
  • a characteristic vector is created for each item of web content based on the sharing statistics. For example, a novelty and a popularity vector can created based on the sharing statistics.
  • each post e.g., age, number of reshares, favorites, etc.
  • the properties that define the state of each web content item at each instant t are its novelty (i.e., time since publication) and popularity (i.e., number of reshares of the item).
  • novelty i.e., time since publication
  • popularity i.e., number of reshares of the item.
  • each state can be represented as a 2-vector (n, p) ⁇ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ⁇ .
  • the unknown state i.e., state 0
  • Each item initially starts in the unknown state and also ends on the unknown state (i.e., the unknown state serves as both the sink and the source).
  • the novelty and popularity of the web content items posted during an observation period are considered.
  • Limits for the characteristics vectors can be set based on the observed data. For example, if dealing with a social media source that favors novelty, the limits between the different novelty intervals can be as follows:
  • the state of novelty i n contains the items that were posted between limn[i] and limn[i + 1] - 1 minutes before the current time of observation.
  • reshares per item is distributed according to a power law distribution, where the majority of the items receive less than 100 reshares whereas a very small percentage of items are reshared more than 1000 times.
  • the reshares are split, sorted according to the times they are reshared, into equal sized subsets.
  • the limits between the different intervals that define the state are:
  • the state of popularity j p contains the items that have been retweeted between lim p [j] and lim p [j + 1]– 1 times before the current time of observation.
  • the characteristic vectors are normalized so that each of the possible values applies to an equal sized subset of the web content. Further, the normalized characteristic vector can also be modified to reflect an average number of reshares per minute. Each possible value in a characteristic vector is attributed with a reward that is used to calculate the utility of an item of web content.
  • the reward of each state can be set to
  • r n and r p are the normalized average number of reshares per interval.
  • the average number of reshares received between lim n [i] and lim n [i + 1] - 1 minutes after publication in the case of novelty and the average number of total reshares received by those items that have received between lim p [i] and lim p [i + 1]– 1 reshares in the case of popularity, which results in
  • a Markov process is applied to the normalized characteristics vectors to determine a state for each item of web content.
  • the Markov process can be applied to the normalized characteristic vectors of the web content to dynamically determine transition probabilities of the states for the items of web content.
  • the items of web content are ordered based on the transition probabilities. Because the transition probabilities are dynamically determined, the ordering of the web content can be updated in real-time as the characteristics of the items change. In some cases where user interface real estate is limited, the items of web content displayed can be restricted to, for example, the top 10 items. Further, the refresh rate of items of web content can also be dependent on their ordering (i.e., higher priority items can be refreshed with new values more frequently).
  • Method 300 may then continue to block 335, where method 300 may stop.
  • FIG. 4 is a diagram of an example index rankings map 400 that is ordered to maximize information value.
  • the index rankings map 400 shows rankings for items of web content, which are ordered according to a G index, the value of which is indicated on a node associated with each item.
  • state (6; 2) has the largest G index
  • state (6; 3) the second-largest, and so on.
  • Index rankings map 400 has a popularity axis 404 and a novelty axis 406.
  • the absolute value of the indices are not as important as their relative orders, and items should be displayed according to the relative order of the indices of their states.
  • state (6; 2) which has a G index of 1
  • state (5; 2) which has a G index of 4
  • state (6; 5) which has a G index of 5
  • state (6; 5) which has a G index of 5
  • the algorithm gives high index values to potentially valuable states means, the unknown state 402, which gives no reward, should have a higher display priority than other states with a positive reward. Further, the influence of the popularity in the output is higher than the influence of novelty.
  • the foregoing disclosure describes a number of examples for improving information value of web content by a computing device.
  • the examples disclosed herein enable improving information value by using characteristics vectors that are based on sharing statistics to dynamically order items of the web content.

Abstract

Examples relate to maximizing information value of web content. In some examples, sharing statistics for web content items are collected from a data computing device, where the sharing statistics include time-dependence data and temporal comparisons. A characteristic 2-vector is generated for each of the web content items based on the sharing statistics, where the characteristic 2-vector includes novelty values and popularity values. Next, the characteristic 2-vector of each web content item is normalized, and a Markov process is applied to each web content item to determine a corresponding transition probability based on a normalized, characteristic 2-vector associated with the web content item. At this stage, the web content items are continually ordered based on the transition probabilities, where a subset of the web content items are displayed according to the order of the web content items.

Description

MAXIMIZING INFORMATION VALUE OF WEB CONTENT BACKGROUND
[0001] The popularity of the Web and social media services has resulted in a constant flood of information that can make it difficult for users to identify and consume the most relevant and useful pieces of content. Given the limited amount of attention that users can afford, providers of content can decide what items to prioritize in order to gain the attention of users and become popular. Examples of techniques for prioritizing web content include ranking (e.g., relevance algorithms used by search engines) and recommendations (i.e., user voting to specify if content is useful or not). BRIEF DESCRIPTION OF THE DRAWINGS
[0002] The following detailed description references the drawings, wherein:
[0003] FIG. 1 is a block diagram of an example computing device for maximizing information value of web content;
[0004] FIG. 2 is a block diagram of an example computing device in communication with data computing devices for maximizing information value of web content;
[0005] FIG. 3 is a flowchart of an example method for execution by a computing device for maximizing information value of web content; and
[0006] FIG.4 is a diagram of an example index rankings map that is ordered to maximize information value. DETAILED DESCRIPTION
[0007] As detailed above, web content can be prioritized using ranking algorithms and/or user-based recommendations. However, ranking and recommendations are limited when prioritizing content in web media because the former uses a keyword query and the latter uses the preferences of the subjects (i.e., users). In online newspapers, magazines and blogs; editors can decide the choice of content and the presentation order. Further, the emergence of news media aggregators has introduced citizen journalism-based ordering. That is, instead of having professional editors to determine the important news, people can vote for news that they find interesting and the votes received by an article play an important role in its ranking with respect to other news on the front page or in the different ordered lists of news.
[0008] Social media services feature a large number of subscribers and serve as aggregators of content such as news, promotional campaigns, media and status updates from users. Given the diversity and magnitude of content that is available, it is important, from the service provider's point of view, to ensure easy access to relevant information to users in order to retain and increase user engagement with the platform. For example, a timeline on a social media site may display content in decreasing order of publication. However, novelty is not the only feature that makes social media content valuable to users, and other features such as popularity can also contribute to give value to the content.
[0009] In this disclosure, a technique for selecting the arrangement of tweets that improves the information value of users is described. Considering the number of, for example, shares as an indicator of the popularity of a web content item and the time since the item was posted as an indicator of its novelty, examples herein use the solution proposed by Huberman and Wu (i.e., a dynamical model characterized by a single novelty factor) to obtain the arrangement of web content that improves the information value for the users. By mapping the problem to that of improved allocation of effort for a number of competing projects, Huberman and Wu formulate the problem as a special case of the bandit problem, which they solve by applying the adaptive greedy algorithm proposed by Bertsimas and Nino- Mora.
[0010] Examples disclosed herein improve information value of web content by using characteristics vectors that are based on sharing statistics to generate index ranking maps for ordering items in the web content. For example, the novelty and popularity of web content items can be used as objective measures of the items’ relevance and utility. In this example, the Huberman-Wu algorithm can be used to automatically select the items that should receive the most attention in the next time interval.
[0011] In some examples, sharing statistics for web content items are collected from a data computing device, where the sharing statistics include time-dependence data and temporal comparisons. A characteristic 2-vector is generated for each of the web content items based on the sharing statistics, where the characteristic 2-vector includes novelty values and popularity values. Next, the characteristic 2-vector of each web content item is normalized, and a Markov process is applied to each web content item to determine a corresponding transition probability based on a normalized, characteristic 2- vector associated with the web content item. At this stage, the web content items are continually ordered based on the transition probabilities, where a subset of the web content items are displayed according to the order of the web content items.
[0012] Referring now to the drawings, FIG.1 is a block diagram of an example computing device 100 for maximizing information value of web content. Computing device 100 may be any computing device (e.g., server, desktop, notebook, tablet, etc.) with access to web content provided by, for example, data servers such as data computing device 200 of FIG.2. In FIG.1, computing device 100 includes a processor 110, an interface 115, and a machine-readable storage medium 120.
[0013] Processor 110 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 120. Processor 110 may fetch, decode, and execute instructions 122, 124, 126, 128, 130 to improve information value of web content, as described below. As an alternative or in addition to retrieving and executing instructions, processor 110 may include one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of instructions 122, 124, 126, 128, 130. [0014] Interface 115 may include a number of electronic components for communicating with other computing devices. For example, interface 115 may be an Ethernet interface, a Universal Serial Bus (USB) interface, an IEEE 1394 (Firewire) interface, an external Serial Advanced Technology Attachment (eSATA) interface, or any other physical connection interface suitable for communication with the other computing devices. Alternatively, interface 115 may be a wireless interface, such as a wireless local area network (WLAN) interface or a near-field communication (NFC) interface. In operation, as detailed below, interface 115 may be used to send and receive data, such as web content to and from a corresponding interface of another computing device.
[0015] Machine-readable storage medium 120 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions. Thus, machine-readable storage medium 120 may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, and the like. As described in detail below, machine-readable storage medium 120 may be encoded with executable instructions for improving information value of web content. The machine-readable storage medium 120 may be non-transitory.
[0016] Sharing statistics collecting instructions 122 collects sharing statistics for web content from a data source (e.g., data computing device). Sharing statistics describe attributes of web content such as posts to a social media page, content provided by an online journal, etc. The attributes can include, for example, a timestamp for when the content was created, whether and how often the content was reshared, user voting, how often the content was viewed, etc. Sharing statistics can be collected from multiple sources and for any number of users at each of those sources.
[0017] Characteristic vector generating instructions 124 creates a characteristic vector(s) for each item of web content based on the sharing statistics. For example, each characteristic vector can be a constant 2-vector that has a range of characteristic values for two attributes (e.g., novelty and popularity, etc.). In this example, the range of values for novelty can describe possible values for the number of times an item is reshared immediately after the item is initially posted, and the range of values for popularity can describe possible values for the number of times an item is reshared over a longer interval (in comparison to the interval use for novelty) of time.
[0018] Characteristic normalizing instructions 126 normalize the characteristic vectors so that each of the possible values applies to an equal sized subset of the web content. For example, the majority of web content is reshared less than 100 times while a very small percentage of web content is reshared over 1000 times. In this example, the characteristic vector for popularity is normalized so that the range of possible values is equally distributed when applied to web content. Further, the normalized characteristic vector can also be modified to reflect an average number of reshares per time interval (e.g., minute, hour, day, etc.). Each possible value in a characteristic vector is attributed with a reward that is used to calculate the utility of an item of web content as described below.
[0019] Markov process applying instructions 128 applies a Markov process to the normalized characteristics vectors to determine a state for each item of web content. Specifically, the Markov process can be applied to the normalized characteristic vectors of the web content to dynamically determine transition probabilities of the states for the items of web content. A Markov process is a stochastic process that satisfies the memory-less property (i.e., Markov property), which states that futures states are dependent on the present state and not previous events. The states of an item of web content includes the range of possible combinations of characteristic values that are possible for the item. For example, the range of an item’s state can be from low utility to high utility, where a high utility indicates that the item is highly popular and novel.
[0020] Web content ordering instructions 130 order the items of web content based on the transition probabilities. Because the transition probabilities are dynamically determined, the ordering of the web content can be updated in real- time as the characteristics of the items change. In some cases where user interface real estate is limited, the items of web content displayed can be restricted to, for example, the top 10 items. [0021] FIG. 2 is a block diagram of an example computing device 250 in communication via a network 245 with data computing devices 200A-200N. As illustrated in FIG. 2 and described below, computing device 250 may communicate with data computing devices 200A-200N to maximize information value of web content.
[0022] As illustrated, each data computing device 200A-200N may include a corresponding web content module 200A-200N, while computing device 250 may include a number of modules 252-268. Each of the modules may include a series of instructions encoded on a machine-readable storage medium and executable by a processor of the respective device 200, 250. In addition or as an alternative, each module may include one or more hardware devices including electronic circuitry for implementing the functionality described below.
[0023] As with computing device 250 of FIG. 2, each data computing device 200A-200N may be a server, a notebook, desktop, tablet, workstation, mobile device, or any other device suitable for executing the functionality described below. As detailed below, each data computing device 200A-200N may include a web content module 210A, 210N for providing web content and associated sharing statistics. For example, data computing device 200A can be a web server that is configured to provide web content in a social media network. In this example, the web content module 210A is configured to provide computing device 250 with access to the web content and associated sharing statistics, which describe various attributes of the web content).
[0024] As with server 100 of FIG. 1, computing device 250 may be any computing device with access to data computing devices 200A-200N over a network 245 that is suitable for executing the functionality described below. As detailed below, computing device 250 may include a series of modules 252-268 for improving information value of web content.
[0025] Interface module 252 may manage communications with the data computing devices 200A-200N. Specifically, the interface module 252 may initiate connections with the data computing devices 200A-200N and then send or receive web content data to the data computing devices 200A-200N. [0026] Statistics module 256 may collect and process web content and associated sharing statistics from data computing devices 200A-200N. Collecting module 258 of statistics module 256 uses interface 252 to collect the web content and sharing statistics from data computing devices 200A-200N. The data can be collected in real-time and/or based on a schedule. Each data computing device 200A-200N can provide web content of a different source such as a social media service, an online news journal, etc. Further, the collected data can be filtered based on various parameters. For example, the data collected can be associated with news media sources.
[0027] Characteristics module 260 of statistics module 256 processes the web content and sharing statistics to determine characteristics of the web content. For example, a range of popularity and novelty values can be determined for the web content of each source.
[0028] Characteristics module 260 can process and aggregate sharing statistics that describe the resharing (e.g., share, retweet, forward, etc.) of web content. Specifically, the sharing statistics may specify the number of times that each item of web content is reshared, which can vary greatly depending on the author, type of content, etc. In this case, the resharing statistics can be used to determine the average number of times content is reshared for a time interval (e.g., every minute, hourly, daily, etc.).
[0029] Further, trends can be identified in the resharing statistics as time- dependence data. For example when observing a web content source that prioritizes novelty (e.g., TWITTER®), an item of web content receives more engagement shortly after (e.g., in the second and third minute) the item is posted, and after an initial discovery period with little engagement (e.g., after 1 minute), the number of reshares greatly increases and then fits a power law distribution. At some stage, the number of reshares of an item decreases significantly because its novelty is lower. For less popular content, the increase in engagement after the initial discovery period and the following decline in engagement can occur over a larger time scale. TWITTER® is a registered trademark of Twitter, Inc., which is headquartered in San Francisco, California. [0030] Temporal comparisons with other platforms can also be observed. In this case, when making a comparison with a web content source that account for novelty and popularity such as community-managed content sources (e.g., REDDIT®), the quantity of reshares does not decrease as dramatically with time for web content that is very popular (i.e., highly up voted). In this case, the more times popular items are displayed, the more prone the items are to obtain users' attention. REDDIT® is a registered trademark of Reddit Inc., which is headquartered in San Francisco, California.
[0031] Characteristics module 260 can also process sharing statistics to determine the conditional variance of the number of reshares received after t minutes from the publication of an original item of web content. Conditional variance describes the variance between the number of reshares of an item received after t minutes from publication and other items that received the same number of reshares after t - 1 minutes from publication.
[0032] Analysis module 262 may analyze data collected by statistics module 256 to improve information value of web content. Normalizing module 264 of analysis module 262 creates characteristics vectors for items of web content based on the sharing statistics. Characteristics vectors can be made for multiple attributes (e.g., novelty, popularity, etc.), where each characteristic vector describes the range of possible values for an item of web content for a particular attribute. Normalizing module 264 can also normalize the characteristics vectors so that web content is evenly distributed throughout the characteristics vectors. Each of the normalized values can be attributed with a reward that can be used to determine the utility of items of web content. For example, a characteristic vector for novelty can have higher rewards for values that indicate an item is more novel.
[0033] Markov module 266 of analysis module 262 applies a Markov process to web content determine transitional probabilities. It is assumed that the state of each item changes according to the Markov process independent of the state of other items, with transition probabilities
Figure imgf000009_0001
if the item is in a top list (e.g., top 10 items of web content) and if the item is not in
Figure imgf000009_0002
the top list. In order to empirically calculate the transition probabilities, the web content posted during set interval of observation is considered. For example, assuming that all the items are on the top list (i.e., all of them are displayed),
Figure imgf000010_0002
is defined as:
Figure imgf000010_0003
where Ii(t) is the set of items in state i at time t and Ij(t + 1) the set of items in state j at t + 1 that transited to this state from state i. At this stage, is fixed for , which accounts for the fact that displaying an item on the top list accelerates its transition speed by ten times.
[0034] Ordering module 268 of analysis module 262 orders items of web content based on the transitional probabilities. In an example with two characteristic vectors with 10 states each, the G index (i.e., ordering index) rankings of the 101 states (100 states from the combination of the 2 vectors and an additional state 0 that is the unknown state) are calculated using, for example, the Bertsimas-Nino-Mora (BNM) adaptive greedy algorithm. [0035] Before using the BNM algorithm, a set of constants
Figure imgf000010_0004
calculated. Assuming that E is finite, for any subset
Figure imgf000010_0005
S , the S-active policy uS is defined to be the strategy that recommends items whose state is in S. Considering an item that starts from an initial state X
Figure imgf000010_0006
( ) , under the action implied by strategy
Figure imgf000010_0007
uS, its total occupancy time in S is given by
where
Figure imgf000010_0001
It is provided that
Figure imgf000011_0001
The variables
Figure imgf000011_0004
can be solved from the set of linear equations above. A matrix of constants defined by means of
Figure imgf000011_0006
Figure imgf000011_0002
as follows:
Figure imgf000011_0005
The constants
Figure imgf000011_0007
are then used in the BNM algorithm as shown below:
Figure imgf000011_0003
The items of web content can then be described based on the G index (e.g., ordering, top 10 items, etc.).
[0036] Example G index rankings are shown in FIG. 4. After the web content is ordered, the items can be displayed according to the ordering to improve the utility of the information displayed. For example, the top 10 items can be displayed and updated as the ordering of the web content is dynamically determined based on real-time sharing statistics. [0037] FIG. 3 is a flowchart of an example method 300 for execution by a computing device 100 for maximizing information value of web content. Although execution of method 300 is described below with reference to computing device 100 of FIG. 1, other suitable devices for execution of method 300 may be used, such as computing device 250 of FIG.2. Method 300 may be implemented in the form of executable instructions stored on a machine-readable storage medium, such as storage medium 120, and/or in the form of electronic circuitry.
[0038] Method 300 may start in block 305 and continue to block 310, where computing device 100 collects sharing statistics for web content from a data computing device. Sharing statistics can be collected from multiple data devices and for any number of users at each of those devices. In block 315, a characteristic vector is created for each item of web content based on the sharing statistics. For example, a novelty and a popularity vector can created based on the sharing statistics.
[0039] For a social media source, a certain set of properties for each post (e.g., age, number of reshares, favorites, etc.) can be tracked. The properties that define the state of each web content item at each instant t are its novelty (i.e., time since publication) and popularity (i.e., number of reshares of the item). In order to have a finite set of states E, the possible values of novelty and number of retweets are discretized, resulting in 10 different values for novelty and 10 different values for popularity. At this stage, each state can be represented as a 2-vector (n, p) {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. In addition to these 100 states, the unknown state (i.e., state 0) is also considered. Each item initially starts in the unknown state and also ends on the unknown state (i.e., the unknown state serves as both the sink and the source).
[0040] In order to set the reward and the values of the properties that define each state, the novelty and popularity of the web content items posted during an observation period are considered. Limits for the characteristics vectors can be set based on the observed data. For example, if dealing with a social media source that favors novelty, the limits between the different novelty intervals can be as follows:
Figure imgf000013_0002
In this example, the state of novelty i n contains the items that were posted between limn[i] and limn[i + 1] - 1 minutes before the current time of observation.
[0041] With respect to the popularity of the items, it is observed that the number of
reshares per item is distributed according to a power law distribution, where the majority of the items receive less than 100 reshares whereas a very small percentage of items are reshared more than 1000 times. In order to set the popularity of the states, the reshares are split, sorted according to the times they are reshared, into equal sized subsets. In this example, the limits between the different intervals that define the state are:
Figure imgf000013_0003
So, the state of popularity j p contains the items that have been retweeted between limp[j] and limp[j + 1]– 1 times before the current time of observation.
[0042] In block 320, the characteristic vectors are normalized so that each of the possible values applies to an equal sized subset of the web content. Further, the normalized characteristic vector can also be modified to reflect an average number of reshares per minute. Each possible value in a characteristic vector is attributed with a reward that is used to calculate the utility of an item of web content.
[0043] In this example, the reward of each state can be set to
Figure imgf000013_0001
where the rn and rp are the normalized average number of reshares per interval. In other words, the average number of reshares received between limn[i] and limn[i + 1] - 1 minutes after publication in the case of novelty, and the average number of total reshares received by those items that have received between limp[i] and limp[i + 1]– 1 reshares in the case of popularity, which results in
Figure imgf000014_0001
The reward when p = 1 is not zero but, in order to conserve the reward of the novelty in r(n, 1) / n {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, the average number of reshares in this set is considered to be 1.
[0044] In block 325, a Markov process is applied to the normalized characteristics vectors to determine a state for each item of web content. Specifically, the Markov process can be applied to the normalized characteristic vectors of the web content to dynamically determine transition probabilities of the states for the items of web content. In block 330, the items of web content are ordered based on the transition probabilities. Because the transition probabilities are dynamically determined, the ordering of the web content can be updated in real-time as the characteristics of the items change. In some cases where user interface real estate is limited, the items of web content displayed can be restricted to, for example, the top 10 items. Further, the refresh rate of items of web content can also be dependent on their ordering (i.e., higher priority items can be refreshed with new values more frequently).
[0045] Method 300 may then continue to block 335, where method 300 may stop.
[0046] FIG. 4 is a diagram of an example index rankings map 400 that is ordered to maximize information value. The index rankings map 400 shows rankings for items of web content, which are ordered according to a G index, the value of which is indicated on a node associated with each item. As shown in the known states 408 for the web content, state (6; 2) has the largest G index, state (6; 3) the second-largest, and so on. Index rankings map 400 has a popularity axis 404 and a novelty axis 406. The absolute value of the indices are not as important as their relative orders, and items should be displayed according to the relative order of the indices of their states. For example, state (6; 2), which has a G index of 1, is not the most novel but is the most popular, and a display of . On the other hand, state (5; 2), which has a G index of 4, is less popular but more novel than state (6; 5), which has a G index of 5. Also, because the algorithm gives high index values to potentially valuable states means, the unknown state 402, which gives no reward, should have a higher display priority than other states with a positive reward. Further, the influence of the popularity in the output is higher than the influence of novelty.
[0047] The foregoing disclosure describes a number of examples for improving information value of web content by a computing device. In this manner, the examples disclosed herein enable improving information value by using characteristics vectors that are based on sharing statistics to dynamically order items of the web content.

Claims

CLAIMS We claim:
1. A computing device for maximizing information value of web content, the computing device comprising:
a processor to:
collect sharing statistics for a plurality of web content items from a data computing device, wherein the sharing statistics comprises time- dependence data and temporal comparisons;
generate a characteristic 2-vector for each of the plurality of web content items based on the sharing statistics, wherein the characteristic 2-vector comprises a plurality of novelty values and a plurality of popularity values;
normalize the characteristic 2-vector of each web content item of the plurality of web content items;
apply a Markov process to each web content item of the plurality of web content items to determine a corresponding transition probability of a plurality of transition probabilities based on a normalized, characteristic 2-vector associated with the web content item; and
continually order the plurality of web content items based on the plurality of transition probabilities, wherein a subset of the plurality of web content items are displayed according to the order of the plurality of web content items.
2. The computing device of claim 1, wherein the ordering of the plurality of web content items is performed using a Bertsimas-Nino-Mora adaptive greedy algorithm.
3. The computing device of claim 1, wherein a refresh rate for updating each transitional probability of the plurality of transitional probabilities is determined by the order of a corresponding web content item of the plurality of web content items.
4. The computing device of claim 1, wherein the processor is further to select a state from the corresponding normalized, characteristic 2-vector, wherein the state is associated with a utility reward that is used to determine the corresponding transition probability.
5. The computing device of claim 1, wherein the data computing device provides a social media service, and wherein the temporal comparisons are obtained by comparing the social media service to a community-managed content source.
6. The computing device of claim 1, wherein the plurality of popularity values are determined using a plurality of reshares that satisfy a power law distribution.
7. A method for maximizing information value of web content, the computing device comprising:
collecting sharing statistics for a plurality of web content items from a data computing device, wherein the sharing statistics comprises time- dependence data and temporal comparisons;
generating a characteristic 2-vector for each of the plurality of web content items based on the sharing statistics, wherein the characteristic 2-vector comprises a plurality of novelty values and a plurality of popularity values;
normalizing the characteristic 2-vector of each web content item of the plurality of web content items;
applying a Markov process to each web content item of the plurality of web content items to determine a corresponding transition probability of a plurality of transition probabilities based on a normalized, characteristic 2-vector associated with the web content item; and
continually using an adaptive greedy algorithm to order the plurality of web content items based on the plurality of transition probabilities, wherein a subset of the plurality of web content items are displayed according to the order of the plurality of web content items.
8. The method of claim 7, wherein a refresh rate for updating each transitional probability of the plurality of transitional probabilities is determined by the order of a corresponding web content item of the plurality of web content items.
9. The method of claim 7, further comprising selecting a state from the corresponding normalized, characteristic 2-vector, wherein the state is associated with a utility reward that is used to determine the corresponding transition probability.
10. The method of claim 7, wherein the data computing device provides a social media service, and wherein the temporal comparisons are obtained by comparing the social media service to a community-managed content source.
11. The method of claim 7, wherein the plurality of popularity values are determined using a plurality of reshares that satisfy a power law distribution.
12. A non-transitory, machine-readable storage medium encoded with instructions executable by a processor for maximizing information value of web content, the machine-readable storage medium comprising instructions to:
collect sharing statistics for a plurality of web content items from a data computing device, wherein the sharing statistics comprises time- dependence data and temporal comparisons;
generate a characteristic 2-vector for each of the plurality of web content items based on the sharing statistics, wherein the characteristic 2-vector comprises a plurality of novelty values and a plurality of popularity values;
normalize the characteristic 2-vector of each web content item of the plurality of web content items; select a state of a plurality of states from the corresponding normalized, characteristic 2-vector for each web content item of the plurality of web content items, wherein the state is associated with a utility reward of a plurality of utility rewards;
apply a Markov process to each web content item of the plurality of web content items to determine a corresponding transition probability of a plurality of transition probabilities based on the utility reward associated with the web content item; and
continually order the plurality of web content items based on the plurality of transition probabilities, wherein a subset of the plurality of web content items are displayed according to the order of the plurality of web content items.
13. The storage medium of claim 1, wherein the ordering of the plurality of web content items is performed using a Bertsimas-Nino-Mora adaptive greedy algorithm.
14. The storage medium of claim 1, wherein a refresh rate for updating each transitional probability of the plurality of transitional probabilities is determined by the order of a corresponding web content item of the plurality of web content items.
15. The storage medium of claim 1, wherein the plurality of popularity values are determined using a plurality of reshares that satisfy a power law distribution.
PCT/US2015/031263 2015-05-15 2015-05-15 Maximizing information value of web content WO2016186634A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/031263 WO2016186634A1 (en) 2015-05-15 2015-05-15 Maximizing information value of web content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/031263 WO2016186634A1 (en) 2015-05-15 2015-05-15 Maximizing information value of web content

Publications (1)

Publication Number Publication Date
WO2016186634A1 true WO2016186634A1 (en) 2016-11-24

Family

ID=57318954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/031263 WO2016186634A1 (en) 2015-05-15 2015-05-15 Maximizing information value of web content

Country Status (1)

Country Link
WO (1) WO2016186634A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10997250B2 (en) 2018-09-24 2021-05-04 Salesforce.Com, Inc. Routing of cases using unstructured input and natural language processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282698A1 (en) * 2006-05-19 2007-12-06 Huberman Bernardo A Determining most valuable ordering of items for presentation
US20110252022A1 (en) * 2010-04-07 2011-10-13 Microsoft Corporation Dynamic generation of relevant items
US20120185329A1 (en) * 2008-07-25 2012-07-19 Anke Audenaert Method and System for Determining Overall Content Values for Content Elements in a Web Network and for Optimizing Internet Traffic Flow Through the Web Network
US8874559B1 (en) * 2012-10-01 2014-10-28 Google Inc. Ranking and ordering items in user-streams

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282698A1 (en) * 2006-05-19 2007-12-06 Huberman Bernardo A Determining most valuable ordering of items for presentation
US20120185329A1 (en) * 2008-07-25 2012-07-19 Anke Audenaert Method and System for Determining Overall Content Values for Content Elements in a Web Network and for Optimizing Internet Traffic Flow Through the Web Network
US20110252022A1 (en) * 2010-04-07 2011-10-13 Microsoft Corporation Dynamic generation of relevant items
US8874559B1 (en) * 2012-10-01 2014-10-28 Google Inc. Ranking and ordering items in user-streams

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SANDRA SERVIA-RODRIGUEZ ET AL.: "Deciding what to display: maximizing the information value of social media", ARXIV:1411.3124, 12 November 2014 (2014-11-12), XP055331335, Retrieved from the Internet <URL:http://arxiv.org/abs/1411.3214> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10997250B2 (en) 2018-09-24 2021-05-04 Salesforce.Com, Inc. Routing of cases using unstructured input and natural language processing
US11755655B2 (en) 2018-09-24 2023-09-12 Salesforce, Inc. Routing of cases using unstructured input and natural language processing

Similar Documents

Publication Publication Date Title
US9704185B2 (en) Product recommendation using sentiment and semantic analysis
US10546006B2 (en) Method and system for hybrid information query
Tatar et al. From popularity prediction to ranking online news
WO2020135535A1 (en) Recommendation model training method and related apparatus
US10685065B2 (en) Method and system for recommending content to a user
US10025785B2 (en) Method and system of automatically downloading media content in a preferred network
US9495645B2 (en) Method and system of iteratively autotuning prediction parameters in a media content recommender
US9721019B2 (en) Systems and methods for providing personalized recommendations for electronic content
Epure et al. Recommending personalized news in short user sessions
US9348924B2 (en) Almost online large scale collaborative filtering based recommendation system
US9535938B2 (en) Efficient and fault-tolerant distributed algorithm for learning latent factor models through matrix factorization
WO2018040069A1 (en) Information recommendation system and method
WO2014149199A1 (en) Method and system for multi-phase ranking for content personalization
CN110223186B (en) User similarity determining method and information recommending method
CN110717093B (en) Movie recommendation system and method based on Spark
US20140244614A1 (en) Cross-Domain Topic Space
WO2015025248A2 (en) A system apparatus circuit method and associated computer executable code for hybrid content recommendation
CN110838043A (en) Commodity recommendation method and device
CN113239182A (en) Article recommendation method and device, computer equipment and storage medium
WO2022095661A1 (en) Update method and apparatus for recommendation model, computer device, and storage medium
Wu et al. Enhancing personalized recommendations on weighted social tagging networks
WO2016186634A1 (en) Maximizing information value of web content
CN110766488A (en) Method and device for automatically determining theme scene
CN112905885A (en) Method, apparatus, device, medium, and program product for recommending resources to a user
Stefancova et al. POI recommendation based on locality-specific seasonality and long-term trends

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15892740

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15892740

Country of ref document: EP

Kind code of ref document: A1