WO2013064505A1 - Method and system for determining a popularity of online content - Google Patents

Method and system for determining a popularity of online content Download PDF

Info

Publication number
WO2013064505A1
WO2013064505A1 PCT/EP2012/071503 EP2012071503W WO2013064505A1 WO 2013064505 A1 WO2013064505 A1 WO 2013064505A1 EP 2012071503 W EP2012071503 W EP 2012071503W WO 2013064505 A1 WO2013064505 A1 WO 2013064505A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
popularity
content
content object
values
Prior art date
Application number
PCT/EP2012/071503
Other languages
French (fr)
Inventor
Mohamed Ahmed
Stella Spagna
Saverio Niccolini
Original Assignee
Nec Europe Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Europe Ltd. filed Critical Nec Europe Ltd.
Publication of WO2013064505A1 publication Critical patent/WO2013064505A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the invention relates to a method for determining a popularity of a content object of online content comprising a plurality of content objects, stored in a content delivery network.
  • the invention also relates to a system for determining a popularity of a content object of online content comprising a plurality of content objects, stored in a content delivery network, comprising at least one server for hosting the online content.
  • Determining the popularity of online content is in particular applicable in content delivery networks, for example for traffic and/or cache management due to a given limited capacity, for example of storage for caching online content objects locally.
  • a system for determining a popularity of a content object of online content, a plurality of content objects stored in a content delivery network, preferably performing a method according to one of the claims 1-14, comprising at least one server for hosting the online content is characterized by First determining means for determining relative interest values for at least one content object of the online content with regard to at least one other content object in at least one predetermined time-interval of a time period, Second determining means for determining rate-of-change values (2) of the relative interest values in the at least one time-interval,
  • Reduction means for determining a trend popularity set with elements for content object behaviour based on a time-series of calculated popularity values for the at least one content object
  • the second function is dependent of c2) cumulated relative interest values according to step a) in all time-intervals, and/or c3) cumulated relative interest values according to step a) in all time-intervals of the time-period and over all content objects of the online content.
  • the second function represents a simple embodiment of "shared attention" a certain content object receives with respect to other content objects of the online content over the time-interval of interest. This "shared attention" may be based on the number of the observed requests for a given content object and the overall set of observed requests for the content object again subject to the vantage point and time-interval of interest.
  • the availability function is a function of the responsibility function and/or the responsibility function is a function of a similarity function. Calculation effort may therefore be reduced and the functions can be easily implemented, in particular in the following way: Initially all data points are considered as potential centroids, and then a message exchange process is started between points to converge to the most likely centroids. The steps performed successively are:
  • step e at least one user constraint is included. This further increases the flexibility, since user demand can be implemented.
  • a replication rate of the new content object in a future time-interval is determined on the basis of determined popularity of the new content object according to step e).
  • This enables an accurate and easy forecast for the replication-rate of a new content object in a future time-interval.
  • the expected replication rate may be used to determine how long the new content object is stored in cache of an interest provider, so that users of the internet provider get a fast access to content object.
  • a generic formula for the replication rate and with user constraints may be implemented in the following way:
  • Tu r The replication rate of content i at time t :l
  • the system comprises a database used for storing at least the trend popularity set.
  • a database provides a flexible and cost- effective means for storing at least the trend popularity set.
  • the database may also be used for storing user constrains, etc..
  • the system further comprises a cache for caching the online content of the server wherein the cache is configured to be operable to transmit and/or to remove the new content object of the server from the cache according to a determined popularity.
  • a cache for caching the online content of the server wherein the cache is configured to be operable to transmit and/or to remove the new content object of the server from the cache according to a determined popularity.
  • Fig. 3 is illustrating a prediction error when rating new content objects in different time intervals
  • Fig. 5 is illustrating a mean and standard deviation of a content object popularity over time with a conventional method
  • Fig. 6 is illustrating a prediction error distribution using a conventional linear regression model.
  • Fig. 1 is illustrating a number of content requests and a rate of change of a content object.
  • Fig. 1 is shown on the horizontal axis a time and on the vertical axis a number indicating content requests.
  • the first graph 1 shows a cumulative number of content object requests over time.
  • the second curve 2 shows the corresponding rate of change 2.
  • the rate of change graph 2 is increasing when the graph 1 has its maximum ascent.
  • these graphs may be used, i.e. analyzed and modeled by trying to perform fitting against known distributions, for example a Poisson or a Pareto distribution for instance.
  • Fig. 2 is illustrating self-similarity of content objects grouped together as measured by a cluster spread.
  • a performance of an embodiment of a method according to the invention is shown.
  • a self-similarity of content objects within the same cluster over two different time intervals in the first and last four hours of a real content object data set is shown.
  • a cluster means content objects assigned to the same element of the trend popularity set.
  • the shown graph represents an absolute cluster spread in each of the intervals. Each interval contains a set of so called template behaviors, defining how content elements behave within the interval.
  • the absolute cluster spread is calculated from the average distance of each content object with the template it is associated with.
  • the results are given in absolute numbers to show the accuracy.
  • the results in Fig. 2 show that the clusters of content identified are tightly grouped with on average a small standard derivation.
  • a number of actual hits on content objects varies from tens to thousands.
  • the total number of outliers presented in Fig. 2 represent less than 500 with a data set of over 25000 content objects.
  • Fig. 3 is illustrating a prediction error when rating new content objects in different time intervals.
  • Fig. 3 shows a prediction error in absolute numbers associated when classifying a new content object in each time interval as a result of randomized trail with a test and training set. Each time interval comprises a set of template behaviors. Fig. 3 illustrates therefore the prediction error per time window when using a feature space as derived by an embodiment of the method according to the present invention and shows the difference in the number of hits a given content object attains relative to the number of hits at the centroid it is associated with.
  • Fig. 4 is illustrating a cache management system according to a first embodiment of the present invention.
  • Fig. 4 are shown different clients C1 , C2, C3 connected to a central cache C located in an internet service provider network ISPN.
  • the cache C is further connected to a server S hosting content objects C0i, ...,CO m and a news server NS hosting new online content C ne w.
  • the server S and the news server NS are located in the internet INET.
  • the cache C is further connected to rating means R which are connected to a database D for storing a trend popularity set.
  • the cache C analyzes requests from the clients C1 , C2, C3 and decides whether or not to forward the content object request to the server S or redirect it to itself to provide a cached online content object from in the cache C itself. This avoids internetwork traffic between the internet service provider network ISPN and the internet INET.
  • the rating means R connected to the cache C to analyze the popularity of the online content objects stored respectively cached in the cache C according to a trend popularity set.
  • the rating means R categorize this new online content Cnew through an element of the trend popularity set and further decides if for example the new online content object C ne w has a certain popularity above a threshold that the new online content Cnew is cached in the cache C to provide to the clients C1 , C2, C3 a faster availability of the new online content Unew.
  • the present invention provides a concept of "shared attention" as a fundamental feature in the process of estimating the popularity evolution category of a given content, i.e. not using only the time series of the content request of the content but also relating it to the overall number of content object requests that are observed.
  • the present invention uses minimal and easy to gather variables to derive the "shared attention” feature, a derivation of accurate, archetypal content popularity evolution templates based on the "shared attention” feature, a derivation of transissions between the content popularity evolution classes over time to predict their future popularity of content with uncertain demands and an optimization of the process by stopping the classification of the content once it has reached "saturation", e.g., the 95% of its estimate total number of content objects requests.
  • the present invention even further provides a more precise and fine-grained categorization of online content object popularity which may help to improve online content replication and online content replacement decisions in particular important for limited storage space for replicating online content, for example in a distributed caching scenario where small caches near users are deployed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Library & Information Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method for determining a popularity of a content object of online content, comprising a plurality of content objects, stored in a content delivery network, comprising the steps of a) Determining relative interest values for at least one content object of the online content with regard to at least one other content object in at least one predetermined time-interval of a time period, b) Determining rate-of-change values of the relative interest values in the at least one time-interval, c) Calculating popularity values for the at least one content object in the at least one time-interval according to a energy function, the energy function being dependent of a first function and a second function, wherein the first function being a function of the determined rate-of-change values according to step b) and wherein the second function being a function of the determined relative interest values according to step a), d) Determining a trend popularity set with elements for content object behaviour based on a time-series of calculated popularity values for the at least one content object, and e) Determining the popularity of a new content object according to one of the elements of the determined trend popularity set. The invention also relates to a corresponding system and a use of the method and/or the system.

Description

METHOD AND SYSTEM FOR DETERMINING
A POPULARITY OF ONLINE CONTENT
The invention relates to a method for determining a popularity of a content object of online content comprising a plurality of content objects, stored in a content delivery network.
The invention also relates to a system for determining a popularity of a content object of online content comprising a plurality of content objects, stored in a content delivery network, comprising at least one server for hosting the online content.
Determining the popularity of online content is in particular applicable in content delivery networks, for example for traffic and/or cache management due to a given limited capacity, for example of storage for caching online content objects locally.
For example in the non-patent literature of Kristina Lerman and Tad Hogg "Using a Model of Social Dynamics to Predict Popularity of News", Proceedings of 19th International World Wide Web Conference, 2010, available under http://www.isi.edu/~lerman/papers/wfp0788-learman.pdf, for example a so-called feature space is defined, defining a popularity of the online content, that takes into account a social network of users to determine the popularity of content.
In the non-patent literature of Lee, Jong Gun, Moon, Sue and Salamatian, Kave "An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors", Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 . 2010, which is available under http://www-rp.lip6.fr~lee/pdf/wi2010_lee.pdf, it is tried to make use of features, which are easily observable and to combine the lifetime of the content objects, comments and the number of links in the first hours and the number of views regarding the availability of the online content for determining the popularity of the online content.
In the non-patent literature "Characterizing and Modelling Popularity Evolution of User-generated Videos", IFIP PERFORMANCE 201 1 , of Youmna Borghol, Siddhartha Mitra, Sebastien Ardon, Niklas Carlsson, Derek Eager and Anirban Mahanti, which is available under www.nicta.com.au/pub?doc=4406, a behaviour of groups of content objects is analyzed to derive distributions. These distributions model different aspects of the content's popularity evolution. Their feature space is based on the aggregate properties of content objects at different time intervals and the results can only be used to describe the aggregate statistics of collections of content objects. However, most YouTube videos experience little popularity growth in the first week after their submission.
In the non-patent literature of Reed, Colorado, Elvers, Todd and Srinivasan, Padmini "What's Trending? Mining Topical Trends in UGC Systems with YouTube as a Case Study", The Eleventh International Workshop on Multimedia Data Mining (MDMKDD 201 1 ), which is available under http://user.cis.fiu.edu/~lzhen001/activit- ies/KDD201 1 Program/workshops/WKS1 1/doc/a4_ree, a mechanism is shown to identify trending topics. Terms associated with content objects are analyzed, and a weighted directed graph is created that captures the relationship between these terms. Trending topics and their associated content objects are then identified by analyzing the directed graph.
The non-patent literature "Estimation Methods for Ranking Recent Information" of Miles Efron and Gene Golovchinsky, ACM SIGIR 201 1 , which is available under http://people.lis.illinois.edu/~mefron/papers/legSIGIR201 1.pdf, focuses on attaching temporal aspects to a texture query taking into account an age of term or query when assessing its temporal properties.
In the non-patent literature of Szabo, Gabor and Huberman, Bernardo A. "Predicting the Popularity of online content", Communication ACM, August 2010, which is available under http://www.hpl.hp.com/research/idl/papers/predictions/pre- dictions.pdf, a linear regression method to predict the popularity of online content is shown. A number of hits associated with the given content and the given time interval is used to predict the number of hits the online content expects to achieve a defined number of steps ahead.
The non-patent literature of Flavio Figueiredo, Fabricio Benevenuto and Jussara Almeida "The Tube Over Time: Characterizing Popularity Growth of YouTube Videos", Proceedings of the 4th ACM International Conference of Web Search and Data Mining (WSDM'1 1 ), February 201 1 , which is available under http://vod.dcc.ufmg.br/traces/youtime/wsdm339d-figueiredo.pdf, the non-patent literature of Symeon Papadopoulos, Thena Vakali and loannis Kompatsiaris "The Dynamics of Content Popularity in Social Media", IJDWM, 2010, which is available under http://www.irma-international.org/viewtitle/38952/, as well as the non-patent literature of Xu Cheng, Cameron Dale and Jiangchuan Liu "Statistics and Social Network of YouTube Videos", In Proc. of IEEE IWQoS, 2008, which is available under http://www.cs.sfu.ca/~jcliu/Papers/YouTube-IWQoS2008.pdf, show further methods analyzing popularity growth over content popularity in social media, in particular of YouTube videos.
In the non-patent literature of J. Yang and, J. Leskovec, "Patterns of Temporal Variation in Online Media", ACM International Conference on Web Search and Data Mining (WSDM), 201 1 , which is available under http://cs.stan- ford.edu/people/jure/pubs/memeshapes-wsdm1 1 .pdf, wavelet analysis on the complete per interval life-time hits is associated with the content. A shape matching is performed to identify a similarity between different content objects. Content object identification is therefore in generally based on the content hit graph.
For example one of the disadvantages of the method shown in the non-patent literature of Figueiredo, Benevenuto and Ameida is, that such a fitting method is time consuming to perform, sensitive to changes and must be input by experts. A further disadvantage is, that for reducing estimation errors of a fitting of a given model to an acceptable value, a large set of observations, i.e. content requests must be available.
It is therefore an objective of the present invention to provide a method and a system for determining a popularity of a content object of online content, which is more accurate.
It is a further objective of the present invention to provide a method and a system for determining a popularity of a content object of online content, which require less calculation time and/or memory usage. It is an even further objective of the present invention to provide a method and a system for popularity of a content object of online content, which require fewer observations without reducing the accuracy of the determining of the popularity of the content objects.
In accordance with the invention the aforementioned objectives are accomplished by the method of claim 1 and the system of claim 15 and the use of a method and/or a system according to claim 19.
According to claim 1 the method for determining a popularity of a content object of online content, comprising a plurality of content objects, stored in a content delivery network, is characterized by the steps of a) Determining relative interest values for at least one content object of the online content with regard to at least one other content object in at least one predetermined time-interval of a time period,
b) Determining rate-of-change values of the relative interest values in the at least one time-interval,
c) Calculating popularity values for the at least one content object in the at least one time-interval according to a energy function, the energy function being dependent of a first function and a second function, wherein the first function being a function of the determined rate-of-change values according to step b) and wherein the second function being a function of the determined relative interest values according to step a),
d) Determining a trend popularity set with elements for content object behaviour based on a time-series of calculated popularity values for the at least one content object, and
e) Determining the popularity of a new content object according to one of the elements of the determined trend popularity set.
According to claim 15, a system for determining a popularity of a content object of online content, a plurality of content objects stored in a content delivery network, preferably performing a method according to one of the claims 1-14, comprising at least one server for hosting the online content, is characterized by First determining means for determining relative interest values for at least one content object of the online content with regard to at least one other content object in at least one predetermined time-interval of a time period, Second determining means for determining rate-of-change values (2) of the relative interest values in the at least one time-interval,
Calculating means for calculating popularity values for the at least one content object in the at least one time-interval according to a energy function, the energy function being dependent of a first function and a second function, wherein the first function being a function of the determined rate-of-change values and wherein the second function being a function of the determined relative interest values,
Reduction means for determining a trend popularity set with elements for content object behaviour based on a time-series of calculated popularity values for the at least one content object, and
Rating means for determining the popularity of a new content object according to one of the elements of the determined trend popularity set.
According to claim 19 the method according to claims 1-15 and/or the system to one of the claims 15-18 is used for cache management in network caches.
According to the invention it has first been recognized that the method and the system for determining a popularity of a content object provide accurate determinations for the popularity of content objects without overspecializing the trend popularity set.
According to the invention it has first been further recognized, that users do not have to choose between the accuracy of a determination of a popularity of a content object and a specialization of a trend popularity set with respect to certain very limited content objects.
According to the invention it has further been first recognized, that the determined trend popularity set is simple and informative capturing time-evolution of content objects. According to the invention it has further been recognized that the determined trend popularity set on the basis of energy function values provide a relative measured so- called "shared attention" metric of content objects with respect to others. From the determined trend popularity set a precise categorisation or fitting of a popularity of a new content object is provided.
Further features, advantages and preferred embodiments of the present invention are to be found in the following subclaims of claim 1 respectively of claim 15.
According to a preferred embodiment of the method according to claim 1 , the first function is further a function of c1 ) the maximum of the rate-of-change values for the at least one content object over all time-intervals according to step b) and the value one. Such a first function captures a movement of the interest paid to a content object. The first function is based on the value for requesting of the content object over the predetermined time-interval of interest.
According to a further preferred embodiment the second function is dependent of c2) cumulated relative interest values according to step a) in all time-intervals, and/or c3) cumulated relative interest values according to step a) in all time-intervals of the time-period and over all content objects of the online content. The second function represents a simple embodiment of "shared attention" a certain content object receives with respect to other content objects of the online content over the time-interval of interest. This "shared attention" may be based on the number of the observed requests for a given content object and the overall set of observed requests for the content object again subject to the vantage point and time-interval of interest.
An energy function based on c1 ), c2) and c3) may be expressed as follows:
Ej = f
Figure imgf000007_0001
wherein g is the first function and h is the second function. According to a further preferred embodiment the first function and/or the second function is a fraction of the values according to c1 ) and the valves of step b) and/or of the values according to c2) and c3). This provides and easy and simple way to ensure a fast calculation of the first and second function. The popularity values for the at least one content object may then be calculated with a minimized memory usage. The energy function may then be expressed as follows: dAi) ∑: v i)
J maz-{dj ii ) , 1 }*=o.. r, ∑ ( Vj [i )
• £ . Energy of content j
• f£,{i): Rate of change of content j in the tiintviuterval i j
{d Max bctwccai end tin: HOC of content in the
interval i
• JTj ϊ¾ (ι.) : Cmnnlative mniitjcr of bits for content at
c7 [i): C ninilaf ivc nnniber of bits for all eoiifcut at Ί) where Ti are time periods and ROC means rate-of-change and further
According to a further preferred embodiment the energy function is calculated dependent on the first function and/or the second function being weighted. This enhances the flexibility of the method in general by using a weighted combination of the "shared attention" represented by the second function and the movement of interest of the online content object which is represented by the first function. For example, more emphasis may be put on the first function, so an evolution of the online content object popularity based on when its starts to grow or peak may be accentuated.
According to a further preferred embodiment, the elements of the trend popularity set are determined by a clustering algorithm, preferably a probabilistic clustering algorithm with, centroids forming the elements of the trend popularity set. The use of preferably probabilistic clustering algorithms provides a reliable reduction of the calculated popularity values to a limited number of elements for the trend popularity set without a high calculation effort. According to a further preferred embodiment the clustering algorithm uses a similarity function defining a similarity between at least two content objects, preferably in form of a negative square error function or a log-likelihood-function of a probability function defining a probability of a content object being a potential element for the other content object. This similarity function indicates how well a content object is suited to be the centroid for the other content object. An easy-to- calculate similarity function is for example the negative squared error function. If a probability model is available, which defines a probability that a content object is a centroid for another content object a log-likelihood-function is an easy-to-calculate and suitable function for a similarity function limiting the calculation amount. Results are nonetheless achievable with good convergence and accuracy.
According to a further preferred embodiment the clustering algorithm uses a responsibility function defining a suitability between at least two content objects reflecting suitability of the potential element being a centroid for the at least one other content object with respect to other potential elements for the at least one other content object. The responsibility function indicates or reflects how suitable a first content object is to serve as centroid for another content object with account to other potential centroids for the other content object.
According to a further preferred embodiment the clustering algorithm uses an availability function defining a suitability for at least one of the content objects to choose a potential element as element for the trend popularity set. The availability function indicates or reflects how appropriate it would be for a content object to choose another content object as centroid, i.e. an availability of the other content object to be a centroid. This enables the clustering algorithm to define elements of the trend popularity set with good accuracy and in a reasonable time. Calculation time and memory usage may thus be reduced.
According to a further preferred embodiment the availability function is a function of the responsibility function and/or the responsibility function is a function of a similarity function. Calculation effort may therefore be reduced and the functions can be easily implemented, in particular in the following way: Initially all data points are considered as potential centroids, and then a message exchange process is started between points to converge to the most likely centroids. The steps performed successively are:
1 . Initialise the similarity value (negative squared error or log-likelihood).
2. Update the responsibilities given the availabilities.
3. Update the availabilities given the responsibilities.
4. Combine availabilities and responsibilities to monitor the exemplar/centroid decision and terminate the algorithm when these decisions have not changed for N iterations (converged).
5. Extract centroids i.e. points with the highest availability.
According to a further preferred embodiment in step e) at least one user constraint is included. This further increases the flexibility, since user demand can be implemented.
According to a further preferred embodiment a replication rate of the new content object in a future time-interval is determined on the basis of determined popularity of the new content object according to step e). This enables an accurate and easy forecast for the replication-rate of a new content object in a future time-interval. For example the expected replication rate may be used to determine how long the new content object is stored in cache of an interest provider, so that users of the internet provider get a fast access to content object. A generic formula for the replication rate and with user constraints may be implemented in the following way:
¾ = /(C, ¾ ] )
Where
Tur = The replication rate of content i at time t:l
(j)U r )— The expected popularity of content i. at time £,as derived from our model
( — The user constraints According to a further preferred embodiment at least one of the elements of the trend popularity set comprises element lifetime information, preferably in form of a maximum number of requests. A classification of a content object in the next time intervals may then to be stopped when it has reached the end of its lifetime. This reduces the number of content objects checked for defining the trend popularity set, because they are "end of line".
According to further preferred embodiment of the system according to claim 16 the system comprises a client and that first and/or second determining means inspect data traffic from the client requesting a new content object to the server hosting the new content object. The determining means are then able to redirect data traffic from the client instead to a server hosting the content object to a corresponding cache. Traffic into other networks resulting in additional costs is avoided.
According to a further preferred embodiment the system comprises a database used for storing at least the trend popularity set. A database provides a flexible and cost- effective means for storing at least the trend popularity set. Of course the database may also be used for storing user constrains, etc..
According to a further preferred embodiment the system further comprises a cache for caching the online content of the server wherein the cache is configured to be operable to transmit and/or to remove the new content object of the server from the cache according to a determined popularity. This enables an effective cache management for online content. For example content objects with reduced popularity may be removed and when removed, memory is set free for storing respectively caching content objects with a higher popularity. This further provides a faster delivery of content objects to a user via the cache. Further inter-network traffic i.e. traffic into other networks is reduced saving inter-network costs.
There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end, it is to be referred to the patent claims subordinate to patent claim 1 and patent claim 15 on the one hand and to the following explanation of preferred examples of embodiments of the invention, illustrated by the drawing on the other hand. In connection with the explanation of the preferred embodiments of the invention by the aid of the drawing, generally preferred embodiments and further developments of the teaching will be explained.
In the drawing
Fig. 1 is illustrating a number of content requests and a rate of change of a content object;
Fig. 2 is illustrating self-similarity of content objects grouped together as measured by a cluster spread;
Fig. 3 is illustrating a prediction error when rating new content objects in different time intervals;
Fig. 4 is illustrating a cache management system according to a first embodiment of the present invention;
Fig. 5 is illustrating a mean and standard deviation of a content object popularity over time with a conventional method; and
Fig. 6 is illustrating a prediction error distribution using a conventional linear regression model.
Fig. 1 is illustrating a number of content requests and a rate of change of a content object.
In Fig. 1 is shown on the horizontal axis a time and on the vertical axis a number indicating content requests. The first graph 1 shows a cumulative number of content object requests over time. The second curve 2 shows the corresponding rate of change 2. The rate of change graph 2 is increasing when the graph 1 has its maximum ascent. In conventional approaches to estimate the popularity of online content these graphs may be used, i.e. analyzed and modeled by trying to perform fitting against known distributions, for example a Poisson or a Pareto distribution for instance. Fig. 2 is illustrating self-similarity of content objects grouped together as measured by a cluster spread.
In Fig. 2 a performance of an embodiment of a method according to the invention is shown. A self-similarity of content objects within the same cluster over two different time intervals in the first and last four hours of a real content object data set is shown. A cluster means content objects assigned to the same element of the trend popularity set. The shown graph represents an absolute cluster spread in each of the intervals. Each interval contains a set of so called template behaviors, defining how content elements behave within the interval. The absolute cluster spread is calculated from the average distance of each content object with the template it is associated with. The results are given in absolute numbers to show the accuracy. The results in Fig. 2 show that the clusters of content identified are tightly grouped with on average a small standard derivation. A number of actual hits on content objects varies from tens to thousands. The total number of outliers presented in Fig. 2 represent less than 500 with a data set of over 25000 content objects.
Fig. 3 is illustrating a prediction error when rating new content objects in different time intervals.
Fig. 3 shows a prediction error in absolute numbers associated when classifying a new content object in each time interval as a result of randomized trail with a test and training set. Each time interval comprises a set of template behaviors. Fig. 3 illustrates therefore the prediction error per time window when using a feature space as derived by an embodiment of the method according to the present invention and shows the difference in the number of hits a given content object attains relative to the number of hits at the centroid it is associated with.
Fig. 4 is illustrating a cache management system according to a first embodiment of the present invention.
In Fig. 4 are shown different clients C1 , C2, C3 connected to a central cache C located in an internet service provider network ISPN. The cache C is further connected to a server S hosting content objects C0i, ...,COm and a news server NS hosting new online content Cnew. The server S and the news server NS are located in the internet INET. The cache C is further connected to rating means R which are connected to a database D for storing a trend popularity set.
The cache C analyzes requests from the clients C1 , C2, C3 and decides whether or not to forward the content object request to the server S or redirect it to itself to provide a cached online content object from in the cache C itself. This avoids internetwork traffic between the internet service provider network ISPN and the internet INET. The rating means R connected to the cache C to analyze the popularity of the online content objects stored respectively cached in the cache C according to a trend popularity set. If a client requests a new content Cnew the rating means R categorize this new online content Cnew through an element of the trend popularity set and further decides if for example the new online content object Cnew has a certain popularity above a threshold that the new online content Cnew is cached in the cache C to provide to the clients C1 , C2, C3 a faster availability of the new online content Unew.
Fig. 5 is illustrating a mean and standard deviation of a content object popularity over time with a conventional method.
Fig. 5 shows a large variance respectively a variance being heteroscedastic growing with time in online content popularity according to conventional popularity prediction models without ranking content objects individually but groups of objects instead. Fig. 5 shows that due to the large variance conventional modeling techniques are unsuitable for modeling individual content objects.
Fig. 6 is illustrating a prediction error distribution using a conventional linear regression model.
In Fig. 6 is shown a conventional linear regression based method, as for example shown in Szabo, G. and Hubermann, Bernardo A. for predicting the popularity of online content. A number of hits associated with a given online content object at a given time interval is used to project the number of hits of the content object to achieve in future times. This conventional method shows although high accuracy on a few data samples but is prone to very high errors. Fig. 6 shows that the prediction error decreases as time increases because the standard deviation of the content objects falls as content object attention of users saturates which does not mean that the linear regression is performing good.
In summary the present invention provides a concept of "shared attention" as a fundamental feature in the process of estimating the popularity evolution category of a given content, i.e. not using only the time series of the content request of the content but also relating it to the overall number of content object requests that are observed. The present invention uses minimal and easy to gather variables to derive the "shared attention" feature, a derivation of accurate, archetypal content popularity evolution templates based on the "shared attention" feature, a derivation of transissions between the content popularity evolution classes over time to predict their future popularity of content with uncertain demands and an optimization of the process by stopping the classification of the content once it has reached "saturation", e.g., the 95% of its estimate total number of content objects requests.
The present invention further shows an application of the method and the system to derive the relative expected demand on content objects and an application to caching and cache management systems. However, it is not limited to caching but offers also potential in content advert placement: Knowing which content object will become popular and when means that this creates an opportunity for users to proactively select to advertise next to specific content objects and discover new markets. For example this raises the possibility of using the expected popularity of content to buy advertising - by enabling users to explicitly select the online content object their advertisements are placed against. If user are able to sell as well as buy advertising speculatively, for example by the right to place their advert next to a given content object in e.g. three days for 24 hours and resell options on its purchase a possibility of a new kind of market for online advertising is created.
Specifically conventional online advertising markets are based on keyword terms, advertisers bid for the terms that would like the content to be associated with. The present invention overcomes this inefficiency by taking advantage of millions of poorly categorized content objects that receive sharp and momentary spikes of popularity, for example viral memes. The present invention enables advertising to be placed with them in a rather adhoc manner bringing all the real value of such undervalued content objects.
The present invention even further provides a more precise and fine-grained categorization of online content object popularity which may help to improve online content replication and online content replacement decisions in particular important for limited storage space for replicating online content, for example in a distributed caching scenario where small caches near users are deployed.
Many modifications and other embodiments of the invention set forth herein will come to mind the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

Claims
1 . A method for determining a popularity of a content object of online content, comprising a plurality of content objects, stored in a content delivery network, characterized by the steps of
a) Determining relative interest values for at least one content object of the online content with regard to at least one other content object in at least one predetermined time-interval of a time period,
b) Determining rate-of-change values (2) of the relative interest values in the at least one time-interval,
c) Calculating popularity values for the at least one content object in the at least one time-interval according to a energy function, the energy function being dependent of a first function and a second function, wherein the first function being a function of the determined rate-of-change values according to step b) and wherein the second function being a function of the determined relative interest values according to step a),
d) Determining a trend popularity set with elements for content object behaviour based on a time-series of calculated popularity values for the at least one content object, and
e) Determining the popularity of a new content object according to one of the elements of the determined trend popularity set.
2. The method according to claim 1 , characterized in that the first function is further a function of c1 ) the maximum of the rate-of-change values for the at least one content object over all time-intervals according to step b) and the value one.
3. The method according to one of the claims 1 -2, characterized in that the second function is dependent of c2) cumulated relative interest values according to step a) in all time-intervals, and/or c3) cumulated relative interest values according to step a) in all time-intervals of the time-period and over all content objects of the online content.
4. The method according to claim 2 or claim 3, characterized in that the first function and/or the second function is a fraction of the values according to c1 ) and the values according to step b) and/or of the values according to c2) and c3).
5. The method according to one of the claims 1 -4, characterized in that the energy function is calculated dependent on the first function and/or the second function being weighted.
6. The method according to one of the claims 1 -5, characterized in that the elements of the trend popularity set are determined by a clustering algorithm, preferably a probabilistic clustering algorithm, with centroids forming the elements of the trend popularity set.
7. The method according to one of the claims 1 -6, characterized in that the clustering algorithm uses a similarity function defining a similarity between at least two content objects, preferably in form of a negative square error function or a log-likelihood-function of a probability function defining a probability of a content object being a potential element for the other content object.
8. The method according to one of the claims 1 -7, characterized in that the clustering algorithm uses a responsibility function defining a suitability between at least two content objects reflecting suitability of the potential element being a centroid for the at least one other content object with respect to other potential elements for the at least one other content object.
9. The method according to one of the claims 1 -8, characterized in that the clustering algorithm uses an availability function defining a suitability for at least one of the content objects to choose the potential element as element for a trend popularity set.
10. The method according to one of the claims 7-9, characterized in that the availability function is a function of the responsibility function and/or the responsibility function is a function of the similarity function.
1 1. The method according to one of the claims 7-10, characterized in that the elements of the trend popularity set are defined according to the highest values of the availability function.
12. The method according to one of the claims 1 -1 1 , characterized in that in step e) at least one user constraint is included.
13. The method according to one of the claims 1 -12, characterized in that a replication rate of the new content object in a future time-interval is determined on the basis of determined popularity of the new content object according to step e).
14. The method according to one of the claims 1 -13, characterized in that at least one of the elements of the trend popularity set comprises element lifetime information, preferably in form of a maximum number of requests.
15. A system for determining a popularity of a content object of online content, comprising a plurality of content objects stored in a content delivery network, preferably for performing a method according to one of the claims 1 -14, comprising at least one server for hosting the online content, characterized by
First determining means for determining relative interest values for at least one content object of the online content with regard to at least one other content object in at least one predetermined time-interval of a time period, Second determining means for determining rate-of-change values (2) of the relative interest values in the at least one time-interval,
Calculating means for calculating popularity values for the at least one content object in the at least one time-interval according to a energy function, the energy function being dependent of a first function and a second function, wherein the first function being a function of the determined rate-of-change values and wherein the second function being a function of the determined relative interest values, Reduction means for determining a trend popularity set with elements for content object behaviour based on a time-series of calculated popularity values for the at least one content object, and
Rating means for determining the popularity of a new content object according to one of the elements of the determined trend popularity set.
16. The system according to claim 15, characterized in that the system comprises a client and that the first and/or second determining means inspect data traffic from the client requesting the new content object to the server hosting the new content object.
17. The system according to one of the claims 15-16, characterized in that the system comprises a database used for storing at least the trend popularity set.
18. The system according to one of the claims 15-17, characterized in that the system further comprises a cache for caching the online content of the server wherein the cache is configured to be operable to transmit to and/or to remove the new online content of the server from the cache according to a determined popularity.
19. Use of a method according to one of the claims 1 -14 and/or a system according to one of the claims 15-18 for cache management in network caches.
PCT/EP2012/071503 2011-10-31 2012-10-30 Method and system for determining a popularity of online content WO2013064505A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP11008693 2011-10-31
EP11008693.1 2011-10-31

Publications (1)

Publication Number Publication Date
WO2013064505A1 true WO2013064505A1 (en) 2013-05-10

Family

ID=47263251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/071503 WO2013064505A1 (en) 2011-10-31 2012-10-30 Method and system for determining a popularity of online content

Country Status (1)

Country Link
WO (1) WO2013064505A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015001494A1 (en) * 2012-02-23 2015-01-08 Ericsson Television Inc. System and method for delivering content in a content delivery network
US9253051B2 (en) 2012-02-23 2016-02-02 Ericsson Ab System and method for delivering content in a content delivery network
WO2016075135A1 (en) * 2014-11-10 2016-05-19 Nec Europe Ltd. Method for storing objects in a storage and corresponding system
US9756370B2 (en) 2015-06-01 2017-09-05 At&T Intellectual Property I, L.P. Predicting content popularity
US10572550B2 (en) 2014-07-24 2020-02-25 Yandex Europe Ag Method of and system for crawling a web resource
CN117134997A (en) * 2023-10-26 2023-11-28 中电科大数据研究院有限公司 Edge sensor energy consumption attack detection method, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002025496A2 (en) * 2000-09-22 2002-03-28 Clearway Acquisition, Inc. Serving dynamic web-pages
US20040260769A1 (en) * 2003-06-18 2004-12-23 Junji Yamamoto Method and apparatus for distributed cache control and network system
US20070118498A1 (en) * 2005-11-22 2007-05-24 Nec Laboratories America, Inc. Methods and systems for utilizing content, dynamic patterns, and/or relational information for data analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002025496A2 (en) * 2000-09-22 2002-03-28 Clearway Acquisition, Inc. Serving dynamic web-pages
US20040260769A1 (en) * 2003-06-18 2004-12-23 Junji Yamamoto Method and apparatus for distributed cache control and network system
US20070118498A1 (en) * 2005-11-22 2007-05-24 Nec Laboratories America, Inc. Methods and systems for utilizing content, dynamic patterns, and/or relational information for data analysis

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
J. YANG; J. LESKOVEC: "ACM International Conference on Web Search and Data Mining (WSDM", 2011, article "Patterns of Temporal Variation in Online Media"
KRISTINA LERMAN; TAD HOGG: "Using a Model of Social Dynamics to Predict Popularity of News", PROCEEDINGS OF 19TH INTERNATIONAL WORLD WIDE WEB CONFERENCE, 2010, Retrieved from the Internet <URL:http://www.isi.edu/-Ierman/papers/wfpO788-learman.pdf>
LEE; JONG GUN; MOON, SUE; SALAMATIAN, KAVE: "An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors", PROCEEDINGS OF THE 2010 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, vol. 01, 2010, Retrieved from the Internet <URL:http:llwww-rp.lip6.fr~lee/pdf/w12010_lee.pdf>
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE OF WEB SEARCH AND DATA MINING (WSDM'11, February 2011 (2011-02-01)
SYMEON PAPADOPOULOS; THENA VAKALI; LOANNIS KOMPATSIARIS: "The Dynamics of Content Popularity in Social Media", IJDWM, 2010, Retrieved from the Internet <URL:http://www.irma-international.org/viewtitle/38952>
SZABO, GABOR; HUBERMAN, BERNARDO A.: "Predicting the Popularity of online content", COMMUNICATION ACM, August 2010 (2010-08-01), Retrieved from the Internet <URL:http://www.hpl.hp.com/research/idl/papers/predictions/pre- dictions.pdf>
XU CHENG; CAMERON DALE; JIANGCHUAN LIU: "Statistics and Social Network of YouTube Videos", PROC. OF IEEE IWQOS, 2008, Retrieved from the Internet <URL:http://www.cs.sfu.ca/-jcliu/PapersNouTube-IWQoS2008.pdf>
YOUMNA BORGHOL; SIDDHARTHA MITRA; SEBASTIEN ARDON; NIKLAS CARLSSON; DEREK EAGER; ANIRBAN MAHANTI: "Characterizing and Modelling Popularity Evolution of User-generated Videos", IFIP PERFORMANCE, 2011

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015001494A1 (en) * 2012-02-23 2015-01-08 Ericsson Television Inc. System and method for delivering content in a content delivery network
US9253051B2 (en) 2012-02-23 2016-02-02 Ericsson Ab System and method for delivering content in a content delivery network
US9438487B2 (en) 2012-02-23 2016-09-06 Ericsson Ab Bandwith policy management in a self-corrected content delivery network
US9800683B2 (en) 2012-02-23 2017-10-24 Ericsson Ab Bandwidth policy management in a self-corrected content delivery network
US10572550B2 (en) 2014-07-24 2020-02-25 Yandex Europe Ag Method of and system for crawling a web resource
WO2016075135A1 (en) * 2014-11-10 2016-05-19 Nec Europe Ltd. Method for storing objects in a storage and corresponding system
US9756370B2 (en) 2015-06-01 2017-09-05 At&T Intellectual Property I, L.P. Predicting content popularity
US10412432B2 (en) 2015-06-01 2019-09-10 At&T Intellectual Property I, L.P. Predicting content popularity
US10757457B2 (en) 2015-06-01 2020-08-25 At&T Intellectual Property I, L.P. Predicting content popularity
CN117134997A (en) * 2023-10-26 2023-11-28 中电科大数据研究院有限公司 Edge sensor energy consumption attack detection method, device and storage medium
CN117134997B (en) * 2023-10-26 2024-03-01 中电科大数据研究院有限公司 Edge sensor energy consumption attack detection method, device and storage medium

Similar Documents

Publication Publication Date Title
Tanzil et al. Adaptive scheme for caching YouTube content in a cellular network: Machine learning approach
WO2013064505A1 (en) Method and system for determining a popularity of online content
WO2018126953A1 (en) Seed population expanding method, device, information releasing system and storing medium
Figueiredo On the prediction of popularity of trends and hits for user generated videos
US10860858B2 (en) Utilizing a trained multi-modal combination model for content and text-based evaluation and distribution of digital video content to client devices
Berger Towards lightweight and robust machine learning for cdn caching
Zhang et al. Exploring latent features for memory-based QoS prediction in cloud computing
Tatar et al. From popularity prediction to ranking online news
US9087332B2 (en) Adaptive targeting for finding look-alike users
CN104572734B (en) Method for recommending problem, apparatus and system
Ribeiro et al. Estimating and sampling graphs with multidimensional random walks
Tatar et al. Ranking news articles based on popularity prediction
US8561184B1 (en) System, method and computer program product for comprehensive collusion detection and network traffic quality prediction
CN101897184B (en) Device and method for optimizing access to contents by users
CN105991397B (en) Information dissemination method and device
US20110082824A1 (en) Method for selecting an optimal classification protocol for classifying one or more targets
US20110218955A1 (en) Evaluation of Client Status for Likelihood of Churn
US20090222321A1 (en) Prediction of future popularity of query terms
CN108734499B (en) Promotion information effect analysis method and device and computer readable medium
Lee et al. Link prediction with social vector clocks
CN111491175A (en) Edge network caching method and device based on video content characteristics
US20190034961A1 (en) Method for targeting electronic advertising by data encoding and prediction for sequential data machine learning models
Yang et al. Cost-effective user monitoring for popularity prediction of online user-generated content
Li et al. Arbitrary distribution modeling with censorship in real-time bidding advertising
Iqbal et al. An effective community-based link prediction model for improving accuracy in social networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12794208

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12794208

Country of ref document: EP

Kind code of ref document: A1