WO2016005664A1 - Method and system for producing a content journal - Google Patents

Method and system for producing a content journal Download PDF

Info

Publication number
WO2016005664A1
WO2016005664A1 PCT/FI2015/050491 FI2015050491W WO2016005664A1 WO 2016005664 A1 WO2016005664 A1 WO 2016005664A1 FI 2015050491 W FI2015050491 W FI 2015050491W WO 2016005664 A1 WO2016005664 A1 WO 2016005664A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
posts
original
publishing
database
Prior art date
Application number
PCT/FI2015/050491
Other languages
French (fr)
Inventor
Heikki PELKKIKANGAS
Original Assignee
Next News Media Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Next News Media Oy filed Critical Next News Media Oy
Publication of WO2016005664A1 publication Critical patent/WO2016005664A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • Search engines such as Google, crawl and index this content and make it available through keyword based search, which makes search engines one of the primary gateways to Internet content.
  • the user is limited to news media and websites the user is already aware of, but typically Internet users are not willing to browse through more than 3-5 websites and therefore the user usually only sees a subset of all the content.
  • News aggregators offer a partial solution in that they show the topics from all major news media sites, but they do not distinguish interesting and less interesting topics and lack most of the non-media websites such as blog, social networks and video sites like YouTube.
  • Internet content is anything created in the Internet by individual users or corporations such as news media companies.
  • a piece of content is identified by a Uniform Recourse Locator (URL), which consists of the domain address of the website continued with an unique identifier which provides access to the individual piece of content provided by the web site in the form of web pages.
  • the content piece is typically created from text, images, sound and video, which are packaged into a webpage using HyperText Markup Language (HTML). HTML is the standard markup language used to create web pages.
  • HTML HyperText Markup Language
  • HTML is the standard markup language used to create web pages.
  • the type of content can be for example news, blog posts, Youtube videos, Soundcloud (an online audio distribution platform) sounds or social networking system content.
  • Google News is a free news aggregator service on the Internet provided and operated by Google Inc, selecting most up-to-date information from thousands of publications by an automatic aggregation algorithm.
  • Google uses its own software to determine which stories to show from the online news sources it watches. Human editorial input does come into the system, however, in choosing exactly which sources Google News will pick from.
  • Google News provides searching, and the choice of sorting the results by date and time of publishing or grouping them. Users can request e-mail "alerts" on various keyword topics by subscribing to Google News Alerts.
  • Recently Google News has implemented the anchor news method defined in US patent application 2013/0298000 which is used with the top news topic.
  • a social networking service is a platform service to build social networks or social relations among people who share interests and other connections.
  • Most social network services are web-based and provide means for users to interact over the Internet.
  • Social network sites are varied and they incorporate new information and communication tools such as, mobile connectivity, photo/video/sharing and blogging.
  • Social networking sites allow users to share ideas, pictures, social content items in the form of posts, messages, comments, and updates), activities, events, and interests with people in their network.
  • Social interaction among people in which they create, share or exchange information and ideas in virtual communities and networks are also called social media.
  • Different type examples of social media and/or social networks and/or social networking services are e.g. collaborative projects (for example, Wikipedia), microblogs (for example, Twitter), social news networking sites (for example, Digg and Leakernet), content communities (for example, YouTube and DailyMotion), and social networking sites (for example, Facebook and Twitter).
  • Facebook is an online social networking service allowing quickly sharing of web pages with other friends on Facebook. Users must register before using the site, after which they may communicate with other friends on Facebook. The communication can take place through private or public messages, as well as a chat feature, and share content that includes website URLs, images, and video content.
  • Twitter another social networking service, allows online social networking and microblogging service that enables users to send and read short 140-character text messages, called “tweets". Registered users can read and post tweets, but unregistered users can only read them. Users access Twitter through the website interface, SMS, or mobile device app.
  • a retweet is someone else's Tweet that you chose to share with all of your followers and is a reply to a tweet that includes the original message or a tweet that includes a link to a news article or blog post that you find particularly interesting and links to other tweets.
  • Retweet is used on the Twitter Web site to show tweeting of a content that has been posted by another user. Hashtags are created for adding additional context and metadata to tweets.
  • Twitter, blogs, and other social media sites a follower is someone who subscribes to receive your updates.
  • a like button, like option, or recommend button is a feature in communication software such as social networking services, Internet forums, news websites and blogs where the user can express that they like, enjoy or support certain content.
  • buttons usually display the quantity of users who liked each content, and may show a full or partial list of them. This is a quantitative alternative to other methods of expressing reaction to content, like writing a reply text. Some websites also include a dislike button, so the user can either vote in favour, against or neutrally.
  • a hashtag is a form of metadata tag. Words in messages on microblogging and social networking services such as Twitter, Facebook, Google+ or Instagram may be tagged by putting "#" before them. Hashtags make it possible to group such messages, since one can search for the hashtag and get the set of messages that contain it.
  • a hashtag is only connected to a specific medium and can therefore not be linked and connected to pictures or messages from different platforms.
  • US patent application 2013/0198204 discloses a method for determining online significance of e.g. news links by using social media.
  • a set of content that is relevant to a topic is identified from various sources. The identification is based on a web crawler for the Internet systematically browsing the World Wide Web for online content publications.
  • a content interface can retrieve content items from designated sources, such as news feeds from a particular website.
  • the solution provides a mechanism to identify topics of interest amongst the general public, especially in order to determine what topics of interest and trends are currently trending in interest or awareness amongst an online population.
  • a plurality of social networking media are processed to find references to identified sets of content.
  • a score is then determined for each of the one or more content items.
  • the score can at least partly be based on the number of instances that the content item is referenced by the communications of the social networking media.
  • the scoring can be varied based on the type of social media.
  • a presentation can be provided that identifies a plurality of content items, as well as the score for each of the plurality of content items.
  • US patent application 2013/0298000 discloses a method for providing socially relevant content in a news domain with a news aggregator by means of comments from users in a social network and news from multiple sources.
  • a news page is created upon request from an individual user.
  • the method includes a step for providing an anchor story page for display of a given news story specified by the user in the request by an URL.
  • the anchor story page includes multiple links to news content items related to the news story and the retrieved social content items.
  • a social metric score is calculated for each of the plurality of content using social media trending information for ranking the contents.
  • a user receives a ranked or sorted list of content that is trending in the social media that maybe presented as content.
  • a trending component uses an algorithm to select, rank and sort content based on the metrics.
  • US patent 8,578,274 discloses systems and methods for aggregating web feeds relevant to a geographical locale from multiple sources. Web feeds are filtered for qualifying content and for publication.
  • the object of the invention is to provide a system and method for generating a content journal about a selected subject, which Internet users can follow to stay up to date on the selected subject
  • the invention is concerned with a method in a public telecommunications network for producing an automatically updated content journal about a selected subject from different sources by creating new content out of original content presented by a service provider in the network.
  • the method comprises pre-defining a timeframe for the content journal, search criteria used to search for posts with a reference to an original content about the selected subject from social networking systems, and threshold rules for selecting an original content to be used as a basis for creating and publishing new content in the journal about the selected subject.
  • Social networking systems are continuously searched for new posts that match the pre-defined search criteria and matching posts are retrieved into a database.
  • the network address of one or more original contents referenced in the matching posts are retrieved and stored in a database with a timestamp.
  • the database is continuously scanned for referenced original contents within the set timeframe on the basis of their timestamp and a value for referenced original contents is calculated on the basis of information in the matching posts.
  • the values are evaluated against the pre-defined threshold rules.
  • New content is created and published out of references to original content that match the predefined threshold rules, out of posts concerned and out of generated metadata.
  • a software program product of the invention executes an algorithm to perform the steps of the method.
  • the algorithm e.g. scans the database by using the value of the original content and the threshold value to make a decision of publishing and creating new content.
  • the system of the invention for producing an automatically updated content journal about a selected subject from different sources comprises a host server with an aggregator engine with means for continuously searching social networking systems for new posts that match pre-defined search criteria, retrieving matching posts into a database, and extracting references to one or more original contents from posts in the database and storing them in the same or other database with a timestamp.
  • a publishing engine in the host server executes an algorithm that continuously scans the database for referenced contents within a set timeframe on the basis of the timestamp, calculates a value for referenced contents on the basis of information in the matching posts, evaluates the value against pre-defined threshold rules, creates and publishes new content out of references to original content that match pre-defined threshold rules, out of posts concerned, and generated metadata, and publishes the new content in the content journal on a user interface.
  • the host server also comprises one or more databases for storing the retrieved matching posts, the references to one or more original contents with a time stamp, and generated metadata, and a user interface for publishing the content journal.
  • journal generally a periodical dealing with content, especially matters or news of current interest in the form of a record of interesting matters kept regularly for users.
  • post is here meant to cover all kind of social content items such as messages, comments, tweets and updates.
  • click-through is here meant the process of a visitor or user clicking on a link, URL or another reference or network address and going to the original content source, such as a Web site, blog etc.
  • a click-through can also be called an ad click or a request.
  • the click rate measures the amount of times a source is clicked versus the amount of times it is viewed.
  • “Original content” refers to the content that is referenced and discussed in the posts in the social networking systems.
  • New content refers to the content created and published in the invention.
  • the preferable embodiments of the invention have the characteristics of the subclaims.
  • Generated metadata includes information of the author of the post, information of the author of the original content, a screen capture of a part or whole of the original content and/or category information of the new content published.
  • the public telecommunications network is the Internet and the posts have references to original content in the form of links or network addresses, such as the Uniform Resource Locator, URL, and the social networking system is usually Twitter and/or Facebook but other social networks can also be used.
  • the original content usually resides in a web site but can reside in a blog or even in a social network.
  • Each such website, blog or social network is represented by a domain name such as youtube.com or a sub-division of a website such as youtube.com/channel which can be, for example, the Youtube channel of a content producer.
  • the invention provides a service that publishes only new content that is based on relevant news or other relevant content.
  • the relevancy is decided on the basis of threshold rules defined in advance.
  • the relevancy is determined in the invention based on publishing rules using threshold rules for calculating the importance of referenced contents.
  • the threshold rules are in the first hand based on a threshold value for original contents to reach so that they would be used for creating new content be published. A value is calculated for each original content referenced in the retrieved posts. In addition to the threshold value, there might be other threshold rules, like gender rules etc.
  • the value of the original contents is associated to the related posts from different aspects.
  • the value for each of said original content is defined on the basis of the number of such posts that are related to the original content in social networks (the social networking systems). Other factors might also influence on the value, such as the author of the post.
  • An equation is created for obtaining a numerical score value weighing all defined factors.
  • the threshold value is defined as the score to be reached or exceeded so that the content in question would be published.
  • the threshold value is defined individually for the content source of each original content referenced in the posts in the social networking systems.
  • the method and the system of the invention are especially useful as a tool for professionals for efficient news or content presentation, wherein the selection of content to be published is performed by the service based on said publishing rules. Thus, the user himself does not need to make the selection of what is relevant.
  • An essential feature of the invention is that it can give a useful tool for e.g. journalists to produce automatically updated content journal, wherein given topics are followed up.
  • the topics might include certain themes or types of content, main news and other news, sport, trends, and blogs etc.
  • the invention can also provide a platform service for user groups in order to produce the relevant content by themselves by using the functions of the system and method of the invention.
  • the invention does not present a ranked list of contents. Instead, the service only publishes information of contents that has passed a threshold as calculated by the algorithm used thus avoiding the need of user effort for news selection.
  • the invention is described more in detail by means of some advantageous embodiments by referring to figures. The invention is not restricted to the details of these embodiments.
  • FIGURES Figure 1 is an architecture view of a system, wherein the invention can be implemented.
  • Figure 2 is a block diagram illustrating a host server for aggregating relevant web feeds.
  • Figure 3 is a flow scheme of an embodiment of the method of the invention.
  • Figure 4 is an example of a user interface of the invention.
  • the invention can be implemented in a system architecture according to figure 1 , presented as a block diagram, in which a host server 1 provides a service that automatically produces a content journal from content retrieved from social networking systems 4a - 4b and from web sites 2a - 2d, over a telecommunications network 3.
  • the web sites 2a - 2d can be, but are not limited to, news media sites.
  • Four web sites 2a - 2d are indicated in figure 1 , but the number of sources to be used by the host server 1 is not limited in anyway and there can be more or less of them.
  • the web sites 2a - 2d can be news media sites, blogs, Youtube channels, or any content sites and have content e.g.
  • the content has any applicable known or convenient form, such as multimedia, text, executables, video, images, audio etc. Each piece of content is referenced by an URL.
  • a web site can be a subdivision of a larger web site such as an individual Youtube channel.
  • the service is accessible through a user interface 6 via client devices that can be any device able to establish a connection with an other device or server through said network 3.
  • the client devices typically include a display or other output functionality to present data exchanged between entities in the system.
  • a client device can for example be a Personal Computer (PC), a mobile computing device, a lap top computer, a handheld computer, a mobile phone, a smart phone, a Personal Digital Assistant (PDA) etc.
  • PC Personal Computer
  • PDA Personal Digital Assistant
  • one or more client devices can be connected to each other.
  • the client devices can be connected to the network 3 via a dial up connection, a digital subscriber loop, cable modem or other type of connection and can communicate with the host server 1 that provide access to the service provided via the user interface 6 for example via a web browser.
  • the telecommunications network 3, over which the client devices communicate maybe a public network, such as the Internet, a telephonic network, a private network, like an intranet and/or extranet.
  • the Internet can provide services through any known or convenient protocol, such as the Transmission Control Protocol / Internet Protocol (TCP/IP protocol) and/or the HyperText Transfer Protocol (HTTP for HyperText Markup Language (HTML) that makes up the World Wide Web (the web).
  • TCP/IP protocol Transmission Control Protocol / Internet Protocol
  • HTTP HyperText Transfer Protocol
  • HTML HyperText Markup Language
  • the database 5 can store content data retrieved by the host server 1 from the web sites 2a - 2d and data, such as posts, from one or more social networking systems 4a, 4b, such as Twitter and/or Facebook. In some cases, content data can also be retrieved from the social networking systems 4a, 4b.
  • the data retrieved from the social networking systems 4a, 4b include posts, such as comments, tweets and retweets related to the content and authors of the posts.
  • An algorithm of the host server 1 analyzes the posts and perform selective filtering to ensure that content to be published is temporally relevant. Network addresses of web sites found relevant to be retrieved are filtered out.
  • the system of the invention generates metadata of e.g. the number of times the content has been shared in the social networking system (the share count) and of the authors of the posts.
  • the content data retrieved from the web sites 2a - 2d include a screen capture and possible other content information.
  • the titles of content provided on the retrieved network addresses, the publishing date and optionally category information of the content and/or the metadata about the content on these network addresses found to be relevant can be extracted by means of the network address.
  • the data base 5 can be implemented via object-oriented technology and/or via text files, and can be managed by a distributed database management system, an object oriented database management system, a file system, and/or any other convenient or known database management package.
  • the host server 1 can communicate with the client devices 6a, the web sites 2a - 2d and the social networking systems 4a, 4b via the network 3.
  • the social networking systems 4a, 4b can include American services such as Facebook, Google+, YouTube, Linkedln, Instagram, Pinterest, Tumblr and Twitter widely used worldwide, but also services from other countries.
  • FIG. 2 is a block diagram illustrating a detailed example of a host server 1 for providing the service of producing a content journal.
  • the host server includes a network interface 1 a, an aggregator engine 1 b, a publishing engine 1 c, a tracking engine 1 d, a content database 1 g, a post database 1 e, an author data database 1 f and a repository 1 h for search terms, publishing rules implemented as threshold rules and display rules.
  • the content database 1 g, the post database 1 e, the author data database 1 f and the repository 1 h can be separate databases or be integrated in one single database. Therefore, when talking about the database 5 in this text its content can be shared among other databases, such as between these four mentioned databases or there is a single database 5 only for all these databases.
  • the network interface 1 a enables the host server 1 to mediate data in the network to the user device 6a (see figure 1 ) through any known or convenient protocol supported by the communicating entities.
  • the aggregator engine 1 b can be implemented as software embodied in a computer readable medium or computer-readable storage medium on a machine, in firmware and/or hardware.
  • the aggregator engine 1 b continuously searches and retrieves posts from social networking systems 4a and 4b (see figure 1 ) that match pre-defined search criteria by using search terms fetched from the search term repository 1 h.
  • the aggregator engine 1 b retrieves matching posts as text files, e.g. in the Extensible Markup Language Format (XML format) and stores them into the post database 1 e.
  • the posts references original contents provided in e.g. web sites and the aggregator engine 1 b extracts identifier information, such as the network address of the original content referred to in the posts.
  • the identifier information identifies the original content source and is also a reference thereto usually including the network address of the content referenced, such as a Uniform Resource Identifier (URI) or a Uniform Resource Locator (URL).
  • URL also known as web address or network address, particularly when used with HTTP
  • URL is a specific character string that constitutes a reference to a resource.
  • An example of a typical URL would be "http://en.example.org/wiki/Main_Page”.
  • a URL is technically a type of URI, but in many technical documents and verbal discussions, URL is often used as a synonym for URI.
  • the identifier information such as the URI, usually is a link to a special content or article provided by some of the web sites 2a - 2d.
  • the aggregator engine 1 b When the aggregator engine 1 b extracts references, such as links, network addresses and URL to one or more original contents from posts in the post database 1 e, it stores them in the content database 1 g with a timestamp. The aggregator engine 1 b further extracts from the posts retrieved as text files, the name of the author of the post and stores it in the author database 1 f, the date and/or time of the post and stores this information in the post database 1 e. If the author already exists in the author database, a value "+1 " is added in an author post count field, in which way the system of the invention keeps track on how many relevant posts a given author has in the post database 1 e.
  • references such as links, network addresses and URL to one or more original contents from posts in the post database 1 e
  • the aggregator engine 1 b extracts, from the network address of the original content, a title of the original content, a screen capture of a part of or full original content, the name of the author of the original content, category information, the date and/or time of the original content, and stores this information in the content database 1 g.
  • the aggregator engine 1 b is connected to the content database 1 g that has identifier information of contents referenced in the social networking systems provided by the web sites 2a - 2d.
  • the identifier information can be in the form of a network address, such as the URI or URL.
  • the content database 1 g can store both source lists (the web site addresses themselves of (original) content found relevant) and source metadata.
  • Source metadata can include said identifier information and optionally a short description of the type of source.
  • the aggregator engine 1 b can also include a normalizing function to normalize data retrieved (i.e. posts, metadata and contents) into particular consistent data structures.
  • a normalizing function to normalize data retrieved (i.e. posts, metadata and contents) into particular consistent data structures.
  • An example of a specified data structure for display is described more in detail in connection with figure 4.
  • the tracking engine 1 d implemented as software embodied in a computer-readable medium or storage medium on a machine, firmware or hardware, follows user behavior of the use of the content journal by counting the likes of the individual posts, and/or the click-throughs of referenced contents for increasing the value of a post author for every added like or click- through by one.
  • the publishing engine 1 c continuously scans the content database 1 g for referenced contents within a set timeframe and calculates a value for referenced contents on the basis of information in the matching posts by means of information from the post database 1 e. It evaluates the value against pre-defined publishing rules implemented as threshold rules.
  • the publishing engine 1 b performs filtering on retrieved posts filling search criteria to determine whether they reference qualified content for publication and match publishing rules, i.e. reach or exceeds a threshold value for publishing.
  • the publishing engine 1 b has a filtering function executing an algorithm that calculates a value for each referenced content found.
  • the value of the contents is calculated by taking different factors into consideration. Primarily, the value for each of original content referenced is defined on the basis of the number of such posts in social networks (the social networking systems). Other factors might also influence on the value, such as the author of the post. An equation is created for obtaining a numerical score value weighing all defined factors. Either of these factors can be weighed in a desired way in defining the value and either of them can even be ignored.
  • each reference in a post gives one point and the threshold value is simply the sum of the references (i.e. the number of posts, wherein this original content is referenced). If the value of an original content exceeds the threshold value, then a decision is made to use this original content to create new content and publish it in the content journal.
  • more factors are taken into consideration in the calculation of a value for each original content. These factors can be weighed in a desired way. For example the most active authors (having the highest post-count values) and those who have more likes and click throughs are weighed more.
  • An equation for calculation the value V for each original content could be:
  • V number of posts + [X * likes] + [Y * click-throughs] + [Z * author postcount] in the simpler embodiments, X, Y and/or Z can be set to zero.
  • the algorithm evaluates a referenced content based on its value against a threshold value for publication, the threshold value determining whether the content is qualified for publishing or not.
  • the threshold value is defined as the score to be reached or exceeded so that the content in question would be used for creating new content for publishing.
  • the threshold value is defined individually for the content sources providing the contents being referenced in the posts in the social networks. This means that each original content provided by a content source, such as an URL, is evaluated against a threshold value defined individually for each domain or URL (or subdomain).
  • the publishing engine 1 c creates and publishes new content out of references and URLs to original content that match pre-defined threshold rules, out of posts concerned, and out of generated metadata.
  • the publishing engine can perform this e.g simultaneously with publishing new content.
  • the publishing engine 1 c then publishes the new content in the content journal on a user interface 1 a.
  • the publishing engine 1 b creates the new content by combining information of original content decided to be published and the related posts and other metadata mentioned above.
  • Different publishing rules might be applied. There might e.g. be rules in which way and order the posts are published together with the information of the content. A screen capture of qualified content exceeding the defined threshold value might be incorporated in the data structure defining the lay out of the content journal.
  • the display rule repository 1 h defines an optional group of user IDs and hashtags. For each of these, an own weight has been given depending on importance. If these hashtags or user IDs exist in the related posts (posts related original content exceeding publishing rules), they are ranked higher and are presented first in an order determined by this weight. In this way, the system understands to present posts of e.g. the prime minister or other public person first. The same applies for interesting hashtags.
  • the new content created is stored in the content database 1 g and presented on the user interface 1 a.
  • the host server 1 comprises the databases that can be integrated with the host server or be one or more external components. Relevant content is stored in the content database 1 c. It is, however, not necessary, to store the whole content of the original content referenced in the posts, since the service can publish only the identifier information of it (such as a link to the content or an URL) and optionally a screen capture of the front page of such content.
  • the publishing engine 1 c publishes the new content based on qualified content to be included in the content journal and to be accessible to the user through the user interface 6.
  • the publishing engine 1 c publishes new content in the form of the data structure constructed by the aggregator engine 1 b and stored in the content databasel c. For this purpose, the publishing engine 1 c communicates with the content database 1 c.
  • the publishing engine 1 c periodically and continuously retrieves updated new content aggregated by the aggregator engine 1 b.
  • the collecting and creating of new updated content can be performed in a predetermined manner, such as every 2 minutes, every 5 minutes, every 10 minutes, etc, as desired and configured.
  • the publishing engine 1 c then stores and retrieves the new contents in the content database 1 g and publishes them.
  • the publishing engine 1 c publishes the new content in a network interface 1 a to be accessed by a user device 6a and be presented in the user interface 6 of the user device 6a.
  • All the components of the host server 1 makes together a functional unit and maybe divided over multiple computers and/or processing units.
  • Figure 3 is a flow scheme of an embodiment of the method of the invention for producing an automatically updated content journal about a selected object.
  • Certain topics can be selected as objects, such as new sites, sport sites, economical sites, political sites, professional sites, certain blogs, or any topic being of interest for certain groups of people to follow-up.
  • the method starts with defining some settings for the method to work in steps 1 - 3, which can be performed in any mutual order.
  • step 1 in figure 3 pre-defining a timeframe for the journal takes place, since the aim is to provide real-time service for being up-to-date on some topic or any interesting content to be followed up.
  • search criteria are defined, which are used to search for posts about the selected subject from social networking systems.
  • threshold rules for publishing content about the selected object in the journal are pre-defined.
  • the threshold rules includes a threshold value defined as the score to be reached or exceeded so that the content in question would be used for creating new content to publish.
  • the threshold value is defined individually for each content source providing the content referenced in relevant posts (post matching search criteria) in the social networking systems.
  • the threshold value might consist of a sum of the minimum number of posts found in relation to the content. The defined threshold value has to be exceeded so that a content referenced in the social networking systems would be published in the content journal of the invention.
  • step 4 of figure 3 the aggregator engine 1 b continuously searches for posts within the pre-defined time frame that match said search criteria and retrieves matching posts into the post database 1 e.
  • step 5 of figure 3 the software in the aggregator engine 1 b continuously with predetermined time intervals queries the network address (usually the URL) of one or more content referenced in the retrieved posts and stores them in the content database 1 g.
  • step 6 of figure 3 the post database 1 e, wherein the posts are stored, is continuously scanned by the publishing engine 1 c with pre-determined time intervals for content to be published. For that purpose, an algorithm calculates a value for each original content referenced.
  • step 7 of figure 3 It is then continuously determined in step 7 of figure 3 whether any referenced content has a value exceeding a threshold value defined for the content in question. If and when referenced content of an exceeding threshold value is detected, a decision is made in step 7 to publish at least a part of that content. No actions are taken for content of a value below threshold as indicated in step 8.
  • Information (including posts and system generated metadata) of the (original content to be used for publishing is then stored in step 9 of figure 3 in the content database 1 g, such as at least identifier information (Usually the URL), and optionally the title of the (original) content to be used for publishing, information of the author of the post and/or content to be used for publishing and/or category information. Also a screen capture of the original content referenced in the posts and provided by the web sites can be taken and stored in the database 1 g to be published as part of the new content of the content journal.
  • step 10 of figure 3 new content in the form of structured data is created out of posts and referenced original content by combining said metadata, part of the original content and the related posts.
  • the creating is performed by normalizing the data to be published by a normalizing module of the aggregator engine 1 b into a particular consistent data structure.
  • the data structure is described more in detail in connection with figures 2 and 4.
  • step 1 1 of figure 3 The new content created is published in step 1 1 of figure 3 by the publishing module 1 e.
  • step 1 1 of figure 3 it is indicated with arrow 12 that the social networking systems are continuously searched for posts that match search criteria and the content database 1 g is continuously scanned for finding referenced content to be published on the basis of an exceeded threshold value, meaning that steps 4 - 1 1 are continuously repeated as long as the service is provided.
  • Figure 4 is an example of a user interface 6 of the invention.
  • the user interface is constructed in accordance with a specified data structure with multiple fields for information.
  • Posts 7a - 7c (such as comments, tweets and the like) are ranked and presented in order on the left side of the interface 6.
  • the posts include fields for the text (reference 12a in post 7a) of the post containing a link (reference 8a in post 7a) to the original content retrieved, such as the URL, for the name (reference 9a in post 7a) of the author of the post, a picture 10a (reference 10a in post 7a) of the author of the post, follower information 1 1 a (reference 1 1 a in post 7a), and the date and/or time of the post 13a (reference 13a in post 7a).
  • Posts 7b - 7c have corresponding information in the same way even if not shown.
  • a screen capture 14 of the referenced original content is shown to the right of the interface. This screen capture also has a link to the original content. Further, there can be a field 15 for the title of the original content and a field 16 for the time of the original content.
  • the user can access published original content and posts found relevant by the service of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention is concerned with a method and system in a public telecommunications network for producing an automatically updated content journal about a selected subject from different sources by creating new content out of original content presented by a service provider in the network. The method comprises pre-defining a timeframe for the content journal, search criteria used to search for posts with a reference to an original content about the selected subject from social networking systems, and threshold rules for selecting an original content to be used as a basis for creating and publishing new content in the journal about the selected subject. Social networking systems are continuously searched for new posts that match the pre-defined search criteria and matching posts are retrieved into a database. The network address of one or more original contents referenced in the matching posts are retrieved and stored in a database with a timestamp. The database is continuously scanned for referenced original contents within the set timeframe on the basis of their timestamp and a value for referenced original contents is calculated on the basis of information in the matching posts. The values are evaluated against the pre-defined threshold rules. New content is created and published out of references to original content that match the pre-defined threshold rules, out of posts concerned and out of generated metadata. The system of the invention comprises an aggregator engine (1b), a publishing engine (1c) and one or more databases (5) for performing the method. A software program product of the invention executes an algorithm for performing the steps of the method.

Description

METHOD AND SYSTEM FOR PRODUCING A CONTENT JOURNAL
FIELD OF THE INVENTION
Method in a public telecommunications network for producing an automatically updated content journal about a pre-defined subject.
BACKGROUND
Each day millions of content items are published in the Internet of which a major part is not relevant or interesting to an individual Internet user. Search engines, such as Google, crawl and index this content and make it available through keyword based search, which makes search engines one of the primary gateways to Internet content.
However, if a user wants to access certain content which can not successfully be defined by a keyword, the search engines are useless for finding such content. There are for example no methods for a user to find out what new and interesting content has been published today about football in his home country.
The user is limited to news media and websites the user is already aware of, but typically Internet users are not willing to browse through more than 3-5 websites and therefore the user usually only sees a subset of all the content. News aggregators offer a partial solution in that they show the topics from all major news media sites, but they do not distinguish interesting and less interesting topics and lack most of the non-media websites such as blog, social networks and video sites like YouTube.
The advent of social media has opened a new way to access content by social recommendations. For example by following the posts of football enthusiasts in Twitter in Internet, users can see most of such content that is considered important by people, as these people like to post references to all the interesting content pieces they find in the internet. However, receiving such information requires extended knowledge on who to follow, which the majority of common Internet users do not have. Additionally following Twitter news feed consumes both time and energy.
Generally, Internet content is anything created in the Internet by individual users or corporations such as news media companies. A piece of content is identified by a Uniform Recourse Locator (URL), which consists of the domain address of the website continued with an unique identifier which provides access to the individual piece of content provided by the web site in the form of web pages. The content piece is typically created from text, images, sound and video, which are packaged into a webpage using HyperText Markup Language (HTML). HTML is the standard markup language used to create web pages. Generally the type of content can be for example news, blog posts, Youtube videos, Soundcloud (an online audio distribution platform) sounds or social networking system content.
People, therefore, benefit of methods and guidance to find content of interest from the enormous flow of information available. News is one subtype of content and several solutions for selective news publishing already exist.
Google News is a free news aggregator service on the Internet provided and operated by Google Inc, selecting most up-to-date information from thousands of publications by an automatic aggregation algorithm. As a news aggregator site, Google uses its own software to determine which stories to show from the online news sources it watches. Human editorial input does come into the system, however, in choosing exactly which sources Google News will pick from. Google News provides searching, and the choice of sorting the results by date and time of publishing or grouping them. Users can request e-mail "alerts" on various keyword topics by subscribing to Google News Alerts. Recently Google News has implemented the anchor news method defined in US patent application 2013/0298000 which is used with the top news topic.
A social networking service is a platform service to build social networks or social relations among people who share interests and other connections. Most social network services are web-based and provide means for users to interact over the Internet. Social network sites are varied and they incorporate new information and communication tools such as, mobile connectivity, photo/video/sharing and blogging. Social networking sites allow users to share ideas, pictures, social content items in the form of posts, messages, comments, and updates), activities, events, and interests with people in their network. Social interaction among people in which they create, share or exchange information and ideas in virtual communities and networks are also called social media. Different type examples of social media and/or social networks and/or social networking services are e.g. collaborative projects (for example, Wikipedia), microblogs (for example, Twitter), social news networking sites (for example, Digg and Leakernet), content communities (for example, YouTube and DailyMotion), and social networking sites (for example, Facebook and Twitter).
For example Facebook is an online social networking service allowing quickly sharing of web pages with other friends on Facebook. Users must register before using the site, after which they may communicate with other friends on Facebook. The communication can take place through private or public messages, as well as a chat feature, and share content that includes website URLs, images, and video content.
Twitter, another social networking service, allows online social networking and microblogging service that enables users to send and read short 140-character text messages, called "tweets". Registered users can read and post tweets, but unregistered users can only read them. Users access Twitter through the website interface, SMS, or mobile device app.
A retweet is someone else's Tweet that you chose to share with all of your followers and is a reply to a tweet that includes the original message or a tweet that includes a link to a news article or blog post that you find particularly interesting and links to other tweets. Retweet is used on the Twitter Web site to show tweeting of a content that has been posted by another user. Hashtags are created for adding additional context and metadata to tweets. On Twitter, blogs, and other social media sites, a follower is someone who subscribes to receive your updates. A like button, like option, or recommend button is a feature in communication software such as social networking services, Internet forums, news websites and blogs where the user can express that they like, enjoy or support certain content. Internet services that feature like buttons usually display the quantity of users who liked each content, and may show a full or partial list of them. This is a quantitative alternative to other methods of expressing reaction to content, like writing a reply text. Some websites also include a dislike button, so the user can either vote in favour, against or neutrally.
A hashtag is a form of metadata tag. Words in messages on microblogging and social networking services such as Twitter, Facebook, Google+ or Instagram may be tagged by putting "#" before them. Hashtags make it possible to group such messages, since one can search for the hashtag and get the set of messages that contain it. A hashtag is only connected to a specific medium and can therefore not be linked and connected to pictures or messages from different platforms.
Attempts have been made to make use of social networking activity to help people select interesting news in ranking the news according to interest. US patent application 2013/0198204 discloses a method for determining online significance of e.g. news links by using social media. In this solution, a set of content that is relevant to a topic is identified from various sources. The identification is based on a web crawler for the Internet systematically browsing the World Wide Web for online content publications. A content interface can retrieve content items from designated sources, such as news feeds from a particular website. The solution provides a mechanism to identify topics of interest amongst the general public, especially in order to determine what topics of interest and trends are currently trending in interest or awareness amongst an online population. A plurality of social networking media are processed to find references to identified sets of content. A score is then determined for each of the one or more content items. The score can at least partly be based on the number of instances that the content item is referenced by the communications of the social networking media. The scoring can be varied based on the type of social media. A presentation can be provided that identifies a plurality of content items, as well as the score for each of the plurality of content items.
US patent application 2013/0298000 discloses a method for providing socially relevant content in a news domain with a news aggregator by means of comments from users in a social network and news from multiple sources. A news page is created upon request from an individual user. The method includes a step for providing an anchor story page for display of a given news story specified by the user in the request by an URL. The anchor story page includes multiple links to news content items related to the news story and the retrieved social content items.
Techniques to rank content selected from one or more web-based social media metrics are described in US patent application 2012/0131013. A social metric score is calculated for each of the plurality of content using social media trending information for ranking the contents. Upon request, a user receives a ranked or sorted list of content that is trending in the social media that maybe presented as content. A trending component uses an algorithm to select, rank and sort content based on the metrics. US patent 8,578,274 discloses systems and methods for aggregating web feeds relevant to a geographical locale from multiple sources. Web feeds are filtered for qualifying content and for publication.
Thus, tools exist to allow users to view content that is trending on a particular website. There are, however, no prior art solutions for selective production of contents by presenting the most relevant content of given topics.
OBJECT OF THE INVENTION
The object of the invention is to provide a system and method for generating a content journal about a selected subject, which Internet users can follow to stay up to date on the selected subject
SUMMARY OF THE INVENTION
The invention is concerned with a method in a public telecommunications network for producing an automatically updated content journal about a selected subject from different sources by creating new content out of original content presented by a service provider in the network. The method comprises pre-defining a timeframe for the content journal, search criteria used to search for posts with a reference to an original content about the selected subject from social networking systems, and threshold rules for selecting an original content to be used as a basis for creating and publishing new content in the journal about the selected subject. Social networking systems are continuously searched for new posts that match the pre-defined search criteria and matching posts are retrieved into a database. The network address of one or more original contents referenced in the matching posts are retrieved and stored in a database with a timestamp. The database is continuously scanned for referenced original contents within the set timeframe on the basis of their timestamp and a value for referenced original contents is calculated on the basis of information in the matching posts. The values are evaluated against the pre-defined threshold rules. New content is created and published out of references to original content that match the predefined threshold rules, out of posts concerned and out of generated metadata. A software program product of the invention executes an algorithm to perform the steps of the method. The algorithm e.g. scans the database by using the value of the original content and the threshold value to make a decision of publishing and creating new content.
The system of the invention for producing an automatically updated content journal about a selected subject from different sources comprises a host server with an aggregator engine with means for continuously searching social networking systems for new posts that match pre-defined search criteria, retrieving matching posts into a database, and extracting references to one or more original contents from posts in the database and storing them in the same or other database with a timestamp. A publishing engine in the host server executes an algorithm that continuously scans the database for referenced contents within a set timeframe on the basis of the timestamp, calculates a value for referenced contents on the basis of information in the matching posts, evaluates the value against pre-defined threshold rules, creates and publishes new content out of references to original content that match pre-defined threshold rules, out of posts concerned, and generated metadata, and publishes the new content in the content journal on a user interface. The host server also comprises one or more databases for storing the retrieved matching posts, the references to one or more original contents with a time stamp, and generated metadata, and a user interface for publishing the content journal.
With the term "journal" is here meant generally a periodical dealing with content, especially matters or news of current interest in the form of a record of interesting matters kept regularly for users.
With the term "post" is here meant to cover all kind of social content items such as messages, comments, tweets and updates.
With the term "click-through" is here meant the process of a visitor or user clicking on a link, URL or another reference or network address and going to the original content source, such as a Web site, blog etc. A click-through can also be called an ad click or a request. The click rate measures the amount of times a source is clicked versus the amount of times it is viewed.
"Original content" refers to the content that is referenced and discussed in the posts in the social networking systems.
"New content" refers to the content created and published in the invention. The preferable embodiments of the invention have the characteristics of the subclaims.
Generated metadata includes information of the author of the post, information of the author of the original content, a screen capture of a part or whole of the original content and/or category information of the new content published. The public telecommunications network is the Internet and the posts have references to original content in the form of links or network addresses, such as the Uniform Resource Locator, URL, and the social networking system is usually Twitter and/or Facebook but other social networks can also be used.
The original content usually resides in a web site but can reside in a blog or even in a social network. Each such website, blog or social network is represented by a domain name such as youtube.com or a sub-division of a website such as youtube.com/channel which can be, for example, the Youtube channel of a content producer.
One of the initial implementations of the system is a journal of most important news today in Finland. The invention provides a service that publishes only new content that is based on relevant news or other relevant content. The relevancy is decided on the basis of threshold rules defined in advance.
The relevancy is determined in the invention based on publishing rules using threshold rules for calculating the importance of referenced contents. The threshold rules are in the first hand based on a threshold value for original contents to reach so that they would be used for creating new content be published. A value is calculated for each original content referenced in the retrieved posts. In addition to the threshold value, there might be other threshold rules, like ethic rules etc.
The value of the original contents is associated to the related posts from different aspects. Primarily, the value for each of said original content is defined on the basis of the number of such posts that are related to the original content in social networks (the social networking systems). Other factors might also influence on the value, such as the author of the post. An equation is created for obtaining a numerical score value weighing all defined factors.
The threshold value is defined as the score to be reached or exceeded so that the content in question would be published. The threshold value is defined individually for the content source of each original content referenced in the posts in the social networking systems. The method and the system of the invention are especially useful as a tool for professionals for efficient news or content presentation, wherein the selection of content to be published is performed by the service based on said publishing rules. Thus, the user himself does not need to make the selection of what is relevant. An essential feature of the invention is that it can give a useful tool for e.g. journalists to produce automatically updated content journal, wherein given topics are followed up. The topics might include certain themes or types of content, main news and other news, sport, trends, and blogs etc.
The invention can also provide a platform service for user groups in order to produce the relevant content by themselves by using the functions of the system and method of the invention.
Contrary to prior art solutions, the invention does not present a ranked list of contents. Instead, the service only publishes information of contents that has passed a threshold as calculated by the algorithm used thus avoiding the need of user effort for news selection. In the following, the invention is described more in detail by means of some advantageous embodiments by referring to figures. The invention is not restricted to the details of these embodiments.
FIGURES Figure 1 is an architecture view of a system, wherein the invention can be implemented. Figure 2 is a block diagram illustrating a host server for aggregating relevant web feeds. Figure 3 is a flow scheme of an embodiment of the method of the invention. Figure 4 is an example of a user interface of the invention.
DETAILED DESCRIPTION
The invention can be implemented in a system architecture according to figure 1 , presented as a block diagram, in which a host server 1 provides a service that automatically produces a content journal from content retrieved from social networking systems 4a - 4b and from web sites 2a - 2d, over a telecommunications network 3. The web sites 2a - 2d can be, but are not limited to, news media sites. Four web sites 2a - 2d are indicated in figure 1 , but the number of sources to be used by the host server 1 is not limited in anyway and there can be more or less of them. The web sites 2a - 2d can be news media sites, blogs, Youtube channels, or any content sites and have content e.g. in the form of articles, images, movies, music, feeds, data, news, etc. to be provided to connected users through the network 3. The content has any applicable known or convenient form, such as multimedia, text, executables, video, images, audio etc. Each piece of content is referenced by an URL. A web site can be a subdivision of a larger web site such as an individual Youtube channel. The service is accessible through a user interface 6 via client devices that can be any device able to establish a connection with an other device or server through said network 3. The client devices, of which only one client device 6a is shown in figure 1 , typically include a display or other output functionality to present data exchanged between entities in the system. A client device can for example be a Personal Computer (PC), a mobile computing device, a lap top computer, a handheld computer, a mobile phone, a smart phone, a Personal Digital Assistant (PDA) etc. In some embodiments, one or more client devices can be connected to each other.
The client devices can be connected to the network 3 via a dial up connection, a digital subscriber loop, cable modem or other type of connection and can communicate with the host server 1 that provide access to the service provided via the user interface 6 for example via a web browser.
The telecommunications network 3, over which the client devices communicate, maybe a public network, such as the Internet, a telephonic network, a private network, like an intranet and/or extranet. For example the Internet can provide services through any known or convenient protocol, such as the Transmission Control Protocol / Internet Protocol (TCP/IP protocol) and/or the HyperText Transfer Protocol (HTTP for HyperText Markup Language (HTML) that makes up the World Wide Web (the web). The physical connections of the Internet and the protocols and communication procedures of the Internet and the web are well known to those of skill in the relevant art.
The database 5 can store content data retrieved by the host server 1 from the web sites 2a - 2d and data, such as posts, from one or more social networking systems 4a, 4b, such as Twitter and/or Facebook. In some cases, content data can also be retrieved from the social networking systems 4a, 4b.
The data retrieved from the social networking systems 4a, 4b include posts, such as comments, tweets and retweets related to the content and authors of the posts. An algorithm of the host server 1 analyzes the posts and perform selective filtering to ensure that content to be published is temporally relevant. Network addresses of web sites found relevant to be retrieved are filtered out. The system of the invention generates metadata of e.g. the number of times the content has been shared in the social networking system (the share count) and of the authors of the posts. The content data retrieved from the web sites 2a - 2d include a screen capture and possible other content information. The titles of content provided on the retrieved network addresses, the publishing date and optionally category information of the content and/or the metadata about the content on these network addresses found to be relevant can be extracted by means of the network address. The data base 5 can be implemented via object-oriented technology and/or via text files, and can be managed by a distributed database management system, an object oriented database management system, a file system, and/or any other convenient or known database management package.
The host server 1 can communicate with the client devices 6a, the web sites 2a - 2d and the social networking systems 4a, 4b via the network 3.
The social networking systems 4a, 4b can include American services such as Facebook, Google+, YouTube, Linkedln, Instagram, Pinterest, Tumblr and Twitter widely used worldwide, but also services from other countries.
Figure 2 is a block diagram illustrating a detailed example of a host server 1 for providing the service of producing a content journal. The host server includes a network interface 1 a, an aggregator engine 1 b, a publishing engine 1 c, a tracking engine 1 d, a content database 1 g, a post database 1 e, an author data database 1 f and a repository 1 h for search terms, publishing rules implemented as threshold rules and display rules. The content database 1 g, the post database 1 e, the author data database 1 f and the repository 1 h can be separate databases or be integrated in one single database. Therefore, when talking about the database 5 in this text its content can be shared among other databases, such as between these four mentioned databases or there is a single database 5 only for all these databases. Thus the term database 5 covers all the databases 1 e, 1 f, 1 g and 1 h. Naturally, there can be even more databases among which the content of the databases is shared. The network interface 1 a enables the host server 1 to mediate data in the network to the user device 6a (see figure 1 ) through any known or convenient protocol supported by the communicating entities.
The aggregator engine 1 b can be implemented as software embodied in a computer readable medium or computer-readable storage medium on a machine, in firmware and/or hardware. The aggregator engine 1 b continuously searches and retrieves posts from social networking systems 4a and 4b (see figure 1 ) that match pre-defined search criteria by using search terms fetched from the search term repository 1 h. The aggregator engine 1 b retrieves matching posts as text files, e.g. in the Extensible Markup Language Format (XML format) and stores them into the post database 1 e. The posts references original contents provided in e.g. web sites and the aggregator engine 1 b extracts identifier information, such as the network address of the original content referred to in the posts. The identifier information identifies the original content source and is also a reference thereto usually including the network address of the content referenced, such as a Uniform Resource Identifier (URI) or a Uniform Resource Locator (URL). URL (also known as web address or network address, particularly when used with HTTP), is a specific character string that constitutes a reference to a resource. In most web browsers, the URL of a web page is displayed on top inside an address bar. An example of a typical URL would be "http://en.example.org/wiki/Main_Page". A URL is technically a type of URI, but in many technical documents and verbal discussions, URL is often used as a synonym for URI. In the invention, the identifier information, such as the URI, usually is a link to a special content or article provided by some of the web sites 2a - 2d.
When the aggregator engine 1 b extracts references, such as links, network addresses and URL to one or more original contents from posts in the post database 1 e, it stores them in the content database 1 g with a timestamp. The aggregator engine 1 b further extracts from the posts retrieved as text files, the name of the author of the post and stores it in the author database 1 f, the date and/or time of the post and stores this information in the post database 1 e. If the author already exists in the author database, a value "+1 " is added in an author post count field, in which way the system of the invention keeps track on how many relevant posts a given author has in the post database 1 e.
Still further, the aggregator engine 1 b extracts, from the network address of the original content, a title of the original content, a screen capture of a part of or full original content, the name of the author of the original content, category information, the date and/or time of the original content, and stores this information in the content database 1 g.
The aggregator engine 1 b is connected to the content database 1 g that has identifier information of contents referenced in the social networking systems provided by the web sites 2a - 2d. The identifier information can be in the form of a network address, such as the URI or URL. The content database 1 g can store both source lists (the web site addresses themselves of (original) content found relevant) and source metadata. Source metadata can include said identifier information and optionally a short description of the type of source.
The aggregator engine 1 b can also include a normalizing function to normalize data retrieved (i.e. posts, metadata and contents) into particular consistent data structures. An example of a specified data structure for display is described more in detail in connection with figure 4.
The tracking engine 1 d, implemented as software embodied in a computer-readable medium or storage medium on a machine, firmware or hardware, follows user behavior of the use of the content journal by counting the likes of the individual posts, and/or the click-throughs of referenced contents for increasing the value of a post author for every added like or click- through by one.
The publishing engine 1 c continuously scans the content database 1 g for referenced contents within a set timeframe and calculates a value for referenced contents on the basis of information in the matching posts by means of information from the post database 1 e. It evaluates the value against pre-defined publishing rules implemented as threshold rules. The publishing engine 1 b performs filtering on retrieved posts filling search criteria to determine whether they reference qualified content for publication and match publishing rules, i.e. reach or exceeds a threshold value for publishing. For this purpose, the publishing engine 1 b has a filtering function executing an algorithm that calculates a value for each referenced content found.
The value of the contents is calculated by taking different factors into consideration. Primarily, the value for each of original content referenced is defined on the basis of the number of such posts in social networks (the social networking systems). Other factors might also influence on the value, such as the author of the post. An equation is created for obtaining a numerical score value weighing all defined factors. Either of these factors can be weighed in a desired way in defining the value and either of them can even be ignored.
In one embodiment each reference in a post gives one point and the threshold value is simply the sum of the references (i.e. the number of posts, wherein this original content is referenced). If the value of an original content exceeds the threshold value, then a decision is made to use this original content to create new content and publish it in the content journal.
In another embodiment, more factors are taken into consideration in the calculation of a value for each original content. These factors can be weighed in a desired way. For example the most active authors (having the highest post-count values) and those who have more likes and click throughs are weighed more. An equation for calculation the value V for each original content could be:
V = number of posts + [X * likes] + [Y * click-throughs] + [Z * author postcount] in the simpler embodiments, X, Y and/or Z can be set to zero.
The algorithm evaluates a referenced content based on its value against a threshold value for publication, the threshold value determining whether the content is qualified for publishing or not.
The threshold value is defined as the score to be reached or exceeded so that the content in question would be used for creating new content for publishing. The threshold value is defined individually for the content sources providing the contents being referenced in the posts in the social networks. This means that each original content provided by a content source, such as an URL, is evaluated against a threshold value defined individually for each domain or URL (or subdomain). For referenced content that match pre-defined threshold rules, the publishing engine 1 c creates and publishes new content out of references and URLs to original content that match pre-defined threshold rules, out of posts concerned, and out of generated metadata.
When new posts are found that references original content that already have been published, these posts are added into the published content journal. The publishing engine can perform this e.g simultaneously with publishing new content.
The publishing engine 1 c then publishes the new content in the content journal on a user interface 1 a. The publishing engine 1 b creates the new content by combining information of original content decided to be published and the related posts and other metadata mentioned above.
Different publishing rules might be applied. There might e.g. be rules in which way and order the posts are published together with the information of the content. A screen capture of qualified content exceeding the defined threshold value might be incorporated in the data structure defining the lay out of the content journal. The display rule repository 1 h defines an optional group of user IDs and hashtags. For each of these, an own weight has been given depending on importance. If these hashtags or user IDs exist in the related posts (posts related original content exceeding publishing rules), they are ranked higher and are presented first in an order determined by this weight. In this way, the system understands to present posts of e.g. the prime minister or other public person first. The same applies for interesting hashtags.
The new content created is stored in the content database 1 g and presented on the user interface 1 a. The users having the interface open in their computer, the browser shows a button telling that "a new content" has been published.
The host server 1 comprises the databases that can be integrated with the host server or be one or more external components. Relevant content is stored in the content database 1 c. It is, however, not necessary, to store the whole content of the original content referenced in the posts, since the service can publish only the identifier information of it (such as a link to the content or an URL) and optionally a screen capture of the front page of such content.
The publishing engine 1 c publishes the new content based on qualified content to be included in the content journal and to be accessible to the user through the user interface 6. The publishing engine 1 c publishes new content in the form of the data structure constructed by the aggregator engine 1 b and stored in the content databasel c. For this purpose, the publishing engine 1 c communicates with the content database 1 c.
As the published new content generally consists of temporally relevant data for publishing in real-time or near real time, the publishing engine 1 c periodically and continuously retrieves updated new content aggregated by the aggregator engine 1 b. The collecting and creating of new updated content can be performed in a predetermined manner, such as every 2 minutes, every 5 minutes, every 10 minutes, etc, as desired and configured. The publishing engine 1 c then stores and retrieves the new contents in the content database 1 g and publishes them. The publishing engine 1 c publishes the new content in a network interface 1 a to be accessed by a user device 6a and be presented in the user interface 6 of the user device 6a.
All the components of the host server 1 makes together a functional unit and maybe divided over multiple computers and/or processing units.
Figure 3 is a flow scheme of an embodiment of the method of the invention for producing an automatically updated content journal about a selected object. Certain topics can be selected as objects, such as new sites, sport sites, economical sites, political sites, professional sites, certain blogs, or any topic being of interest for certain groups of people to follow-up.
The method starts with defining some settings for the method to work in steps 1 - 3, which can be performed in any mutual order.
In one of these steps (step 1 in figure 3) pre-defining a timeframe for the journal takes place, since the aim is to provide real-time service for being up-to-date on some topic or any interesting content to be followed up.
In another step (step 2 in figure 3), search criteria are defined, which are used to search for posts about the selected subject from social networking systems.
In a third step (step 3 in figure 3), threshold rules for publishing content about the selected object in the journal are pre-defined. The threshold rules includes a threshold value defined as the score to be reached or exceeded so that the content in question would be used for creating new content to publish. The threshold value is defined individually for each content source providing the content referenced in relevant posts (post matching search criteria) in the social networking systems. The threshold value might consist of a sum of the minimum number of posts found in relation to the content. The defined threshold value has to be exceeded so that a content referenced in the social networking systems would be published in the content journal of the invention.
In step 4 of figure 3, the aggregator engine 1 b continuously searches for posts within the pre-defined time frame that match said search criteria and retrieves matching posts into the post database 1 e.
In step 5 of figure 3, the software in the aggregator engine 1 b continuously with predetermined time intervals queries the network address (usually the URL) of one or more content referenced in the retrieved posts and stores them in the content database 1 g. In step 6 of figure 3, the post database 1 e, wherein the posts are stored, is continuously scanned by the publishing engine 1 c with pre-determined time intervals for content to be published. For that purpose, an algorithm calculates a value for each original content referenced.
It is then continuously determined in step 7 of figure 3 whether any referenced content has a value exceeding a threshold value defined for the content in question. If and when referenced content of an exceeding threshold value is detected, a decision is made in step 7 to publish at least a part of that content. No actions are taken for content of a value below threshold as indicated in step 8.
Information (including posts and system generated metadata) of the (original content to be used for publishing is then stored in step 9 of figure 3 in the content database 1 g, such as at least identifier information (Usually the URL), and optionally the title of the (original) content to be used for publishing, information of the author of the post and/or content to be used for publishing and/or category information. Also a screen capture of the original content referenced in the posts and provided by the web sites can be taken and stored in the database 1 g to be published as part of the new content of the content journal.
In step 10 of figure 3, new content in the form of structured data is created out of posts and referenced original content by combining said metadata, part of the original content and the related posts. The creating is performed by normalizing the data to be published by a normalizing module of the aggregator engine 1 b into a particular consistent data structure. The data structure is described more in detail in connection with figures 2 and 4.
The new content created is published in step 1 1 of figure 3 by the publishing module 1 e. After step 1 1 of figure 3 it is indicated with arrow 12 that the social networking systems are continuously searched for posts that match search criteria and the content database 1 g is continuously scanned for finding referenced content to be published on the basis of an exceeded threshold value, meaning that steps 4 - 1 1 are continuously repeated as long as the service is provided.
Figure 4 is an example of a user interface 6 of the invention.
The user interface is constructed in accordance with a specified data structure with multiple fields for information. Posts 7a - 7c ((such as comments, tweets and the like) are ranked and presented in order on the left side of the interface 6.
The posts include fields for the text (reference 12a in post 7a) of the post containing a link (reference 8a in post 7a) to the original content retrieved, such as the URL, for the name (reference 9a in post 7a) of the author of the post, a picture 10a (reference 10a in post 7a) of the author of the post, follower information 1 1 a (reference 1 1 a in post 7a), and the date and/or time of the post 13a (reference 13a in post 7a). Posts 7b - 7c have corresponding information in the same way even if not shown.
A screen capture 14 of the referenced original content is shown to the right of the interface. This screen capture also has a link to the original content. Further, there can be a field 15 for the title of the original content and a field 16 for the time of the original content.
By means of the links 8a - 8d in the posts or in the screen capture of the original content, the user can access published original content and posts found relevant by the service of the invention.

Claims

1 . Method in a public telecommunications network for producing an automatically updated content journal about a selected subject from different sources by creating new content out of original content presented by a service provider in the network, comprising the steps of
a) pre-defining
- a timeframe for the content journal,
- search criteria used to search for posts with a reference to an original content in the form of links or network addresses about the selected subject from social networking systems, and
- threshold rules for selecting an original content to be used as a basis for creating and publishing new content in the journal about the selected subject, the threshold rules including a threshold value for each original content to reach or exceed in order to be processed for publishing,
b) continuously searching social networking systems for new posts that match the predefined search criteria and retrieving matching posts into a database,
c) extracting the network address of one or more original contents referenced in the matching posts, generating metadata about the content on these network addresses, and storing them in a database with a timestamp,
d) continuously scanning the database with search criteria for referenced original contents within the set timeframe on the basis of their timestamp and calculating a value for referenced original contents on the basis of information in the matching posts, and evaluating the values against the pre-defined threshold rules,
e) creating and publishing new content out of references to original content that match the pre-defined threshold rules by combining information of original content decided to be published, posts concerned and of the generated metadata in accordance with publishing rules defining the lay-out of the content journal to be published,
f) publishing the new content in the content journal.
2. Method of claim 1 further c h a r a c t e r i z e d by extracting the author(s) of the post(s) and storing them in the database.
3. Method of claim 1 or 2, c h a r a c t e r i z e d in that the generated metadata includes information of the author of the post, information of the author of the original content, a screen capture of a part or whole of the original content and/or category information of the new content published.
4. Method of claim 1 , 2, or 3, c h a r a c t e r i z e d in that the value for each of said original content referenced is calculated on the basis of the number of existing posts related to the original content, the number of posts given by a certain person, the likes of the individual posts, and/or the click-throughs of referenced contents.
5. Method of claim 1 , 2, 3, or 4, c h a r a c t e r i z e d in that the threshold rules includes a threshold value defined as a numerical value for each original content to reach or exceed in order to be processed for publishing.
6. Method of claim 5, c h a r a c t e r i z e d in that the threshold value is defined individually for each original content in the form of specific parameters for each original content source.
7. Method of claim 5 or 6, c h a r a c t e r i z e d in that the scanning of the database is performed by an algorithm that uses the value of the original content and the threshold value to make a decision of publishing and creating new content.
8. Method of claim 1 , 2, 3, 4, 5, 6, or 7, c h a r a c t e r i z e d in that for creating the new content, retrieved and stored original content information and post information is normalized into a particular consistent data structure.
9. Method of claim 1 , 2, 3, 4, 5, 6, 7, 8, or 8, c h a r a c t e r i z e d in that the public telecommunications network is the Internet and the posts have references to original content in the form of links or network addresses, such as the Uniform Resource Locator, URL.
10. Method of claim 1 , 2, 3, 4, 5, 6, 7, 8, or 9, c h a r a c t e r i z e d in that the social networking system is Twitter and/or Facebook. System in a public telecommunications network for producing an automatically updated content journal about a selected subject from different sources, the system comprising a host server with an aggregator engine with means for
- continuously searching social networking systems for new posts that match predefined search criteria used to search for posts with a reference to an original content in the form of links or network addresses about the selected subject from social networking systems, and
retrieving matching posts into a database,
extracting references to one or more original contents from posts in the database, generating metadata about the content on these network addresses, and storing them in the same or other database with a timestamp,
a publishing engine executing an algorithm
continuously scanning the database with search criteria for referenced contents within a set timeframe on the basis of the timestamp,
calculating a value for referenced contents on the basis of information in the matching posts,
evaluating the value against pre-defined threshold rules, the threshold rules including a threshold value for each original content to reach or exceed in order to be processed for publishing,
creating and publishing new content by combining information of original content decided to be published, posts concerned and of the generated metadata in accordance with publishing rules defining the lay-out of the content journal to be published, and publishing the new content in the content journal on a user interface, one or more databases for storing
the retrieved matching posts,
the references to one or more original contents with a time stamp, and the
generated metadata, and
a user interface for publishing the content journal. System of claim 1 1 , further c h a r a c t e r i z e d by a tracking engine implemented as software following user behavior on matching posts by counting the likes of the individual posts, and/or the click-throughs of referenced contents for increasing the value of a post author for every added like or click-through.
System of claim 1 1 or 12, c h a r a c t e r i z e d in that the aggregator engine further extracts, from the network address of the original content, a title of the original content, a screen capture of a part of or full original content, the name of the author of the original content, category information, the date and/or time of the original content, and stores this information in one or more databases.
System of claim 1 1 , 12, or 13, c h a r a c t e r i z e d in that the aggregator engine further extracts, from the posts retrieved as text files, the name of the author of the post, and the date and/or time of the post and stores this information in one or more databases.
System of claim 1 1 , 12, 13, or 14, c h a r a c t e r i z e d in that the information stored in the database is shared among one or more of a content database storing the references to original contents, such as links and network addresses, the screen capture of the original content, category information, the date and/or time of the original content, a post database storing screen captures of posts retrieved, the date and/or time of the post and an author database storing the name of the author of the original content and the authors of the posts.
System of claim 1 1 , 12, 13, 14, or 15, c h a r a c t e r i z e d in that the aggregator engine (1 b) is implemented as software in a computer-readable medium on a machine, hardware or firmware.
System of claim 1 1 , 12, 13, 14, 15 or 16, c h a r a c t e r i z e d in that the aggregator engine (1 b) additionally includes a normalizing module to normalize feeds in a specified data structure. System of claim 1 1 , 12, 13, 14, 15, 16, or 17, c h a r a c t e r i z e d in that the publishing engine (1 b) implemented as a software includes a filtering module executing the algorithm.
System of claim 1 1 , 12, 13, 14, 15, 16, 17, or 18, c h a r a c t e r i z e d in that the publishing entity (1 c) publishes the new content accessible to the user through the user interface (6).
A software program product to be used in a system for producing an automatically updated content journal about a selected subject from different sources, the system having a database with posts retrieved from social networking systems that are related to the selected object and references contents of the selected object, the software program product being run in computer readable media and executing an algorithm extracting references in the form of links or network addresses to one or more original contents from posts in the database, generating metadata about the content on these network addresses, and storing them in the same or other database with a timestamp,
continuously scanning the database with search criteria for referenced contents within a set timeframe on the basis of the timestamp,
calculating a value for referenced contents on the basis of information in the matching posts,
evaluating the value against pre-defined threshold rules, the threshold rules including a threshold value for each original content to reach or exceed in order to be processed for publishing,
creating and publishing new content by combining information of original content decided to be published, posts concerned and of the generated metadata in accordance with publishing rules defining the lay-out of the content journal to be published, and publishing the new content in the content journal on a user interface.
PCT/FI2015/050491 2014-07-11 2015-07-07 Method and system for producing a content journal WO2016005664A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20145670 2014-07-11
FI20145670A FI20145670A (en) 2014-07-11 2014-07-11 METHOD AND SYSTEM FOR PRODUCTION OF CONTENT PUBLICATION

Publications (1)

Publication Number Publication Date
WO2016005664A1 true WO2016005664A1 (en) 2016-01-14

Family

ID=53785668

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2015/050491 WO2016005664A1 (en) 2014-07-11 2015-07-07 Method and system for producing a content journal

Country Status (2)

Country Link
FI (1) FI20145670A (en)
WO (1) WO2016005664A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019123056A1 (en) * 2017-12-21 2019-06-27 Gucciardi Gaspare System and method for selective processing of web content

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131013A1 (en) 2010-11-19 2012-05-24 Cbs Interactive Inc. Techniques for ranking content based on social media metrics
US8578274B2 (en) 2008-09-26 2013-11-05 Radius Intelligence. Inc. System and method for aggregating web feeds relevant to a geographical locale from multiple sources
US20130298000A1 (en) 2012-05-02 2013-11-07 Scott ZUCCARINO Socially relevant content in a news domain

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8578274B2 (en) 2008-09-26 2013-11-05 Radius Intelligence. Inc. System and method for aggregating web feeds relevant to a geographical locale from multiple sources
US20120131013A1 (en) 2010-11-19 2012-05-24 Cbs Interactive Inc. Techniques for ranking content based on social media metrics
US20130198204A1 (en) 2010-11-19 2013-08-01 Timothy Peter WILLIAMS System and method determining online significance of content items and topics using social media
US20130298000A1 (en) 2012-05-02 2013-11-07 Scott ZUCCARINO Socially relevant content in a news domain

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019123056A1 (en) * 2017-12-21 2019-06-27 Gucciardi Gaspare System and method for selective processing of web content
US11165738B2 (en) 2017-12-21 2021-11-02 Gaspare GUCCIARDI System and method for selective processing of web content

Also Published As

Publication number Publication date
FI20145670A (en) 2016-01-12

Similar Documents

Publication Publication Date Title
CA2824627C (en) System and method for analyzing messages in a network or across networks
US9953063B2 (en) System and method of providing a content discovery platform for optimizing social network engagements
Tinati et al. Identifying communicator roles in twitter
US10776424B2 (en) System and method for identifying and ranking trending named entities in digital content objects
US8250096B2 (en) Access to trusted user-generated content using social networks
US8478735B1 (en) Method and system for ranking results and providing lists of experts from social networks
KR101785596B1 (en) Blending search results on online social networks
US8412796B2 (en) Real time information feed processing
KR101686594B1 (en) Ranking objects by social relevance
US9087106B2 (en) Behavior targeting social recommendations
US20160094646A1 (en) Trending of aggregated personalized information streams and multi-dimensional graphical depiction thereof
US9477720B1 (en) Social search endorsements
US10489473B2 (en) Generating information describing interactions with a content item presented in multiple collections of content
WO2012095768A1 (en) Method for ranking search results in network based upon user's computer-related activities, system, program product, and program thereof
US10621680B2 (en) System and method for alerting users to digital content objects of potential interest
US10990620B2 (en) Aiding composition of themed articles about popular and novel topics and offering users a navigable experience of associated content
Dongo et al. A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
Majer et al. Leveraging microblogs for resource ranking
Ji et al. A study on recommendation features for an RSS reader
US20150026266A1 (en) Share to stream
WO2016005664A1 (en) Method and system for producing a content journal
Santhalia et al. Design and Development of a User Specific Dynamic E-Magazine
Kinsella Augmenting Social Media Items with Metadata using Related Web Content
Plumbaum et al. Personalized information access using semantic knowledge
Gopidi Automatic User Profile Construction for a Personalized News Recommender System Using Twitter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15747834

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15747834

Country of ref document: EP

Kind code of ref document: A1