WO2022263802A1

WO2022263802A1 - Systems and methods associated with content curation

Info

Publication number: WO2022263802A1
Application number: PCT/GB2022/051486
Authority: WO
Inventors: Charles Stacy MUIRHEAD; Thomas SCRACE
Original assignee: Cogx Ltd
Priority date: 2021-06-13
Filing date: 2022-06-13
Publication date: 2022-12-22
Also published as: GB202108420D0

Abstract

Systems (1) and methods suitable for delivering relevant curated content to users is described. Content can be curated in dependence on learning goals specified for each user. A knowledge graph (20) is used to store a semantic representation of content and user data. The knowledge graph comprises tags that categorise content and user data. Content from a content repository (12) is processed to generate entities for storage within the knowledge graph (20). A content delivery module delivers relevant content to users based on the relationship between the entities in the knowledge graph (20).

Description

Systems and methods associated with content curation

Field of the invention

The present invention relates to systems and methods associated with automatic content curation, and in particular the discovery and delivery of relevant digital content, particular associated with meeting user learning goals, or providing thought-leadership on topics of interest. The present invention also relates to establishing connections between users of a content discovery and delivery system. The present invention further relates to system and methods for assisting users with their learning goals, including the tracking the consumption of content associated with those learning goals.

Background to the invention

Whilst there are many existing applications for discovery and delivery of thought- leadership content, few are focused on this type of content. Those that specifically address thought leadership are primarily text-based (rather than rich multimedia content), or contain only 'self-published' audio content (see Tortoise as an example). Many podcast player apps, including market-leaders Apple Podcasts and Spotify, contain millions of podcasts, including much thought-leadership, but discovery is challenged by the sheer volume of other content. In podcast apps, titles and descriptions are the primary mechanism of discovery, with users often choosing to subscribe by podcast series rather than search through large libraries of available content. The majority of podcasts cover a variety of topics, are recorded at length. 1 hour podcasts - or longer - are very common, and often begin with extended introductions or advertorial content - making this an inefficient model for gathering insights at speed. There are an increasing number of 'live audio' products available, such as Clubhouse, but these offer a very different experience where discovery is temporal - i.e. tied to when the interesting discussions are actually happening, which is a challenge for many busy people. If 'dropping into' Clubhouse or similar 'live' audio room-type apps when you have a spare moment, it is very possible that you will not discover discussions which are interesting to you, hence an 'appointment to view' model, where the user must make themselves available to listen at the right time if they want a good choice of live content.

In summary, the problem with existing digital content formats include:

- Podcasts:

-'Tyranny of choice' - too much to select from

- Longform content

- Intros, promotions, advertorials, idents

- Limited metadata for discovery - No previews of the audio

- Some video, most audio - some podcast players do not support the video content.

- Takes too long to navigate through many podcasts to discover the insights within the content

- Live audio rooms (Clubhouse style)

- Limited choices based on what is live right now

- Lack of focus or quality control on audio - Anything goes - may be thought leadership, may be something very different

- 'Appointment to listen' - have to fit listening into personal schedule

- Often a lot of superfluous discussion around the useful insights - Video platforms, such as YouTube/Vimeo/DailyMotion

- Vast library of content - no focus on Thought Leadership

- No previews

- A lot of advertising on YouTube

- Longform prevalent, some chapterisation is available - Cannot play audio-only in background without premium account

A more efficient way to discover and consume both audio and video content is thus desirable.

It is against this background that the present invention has been devised.

Summary of the invention A first aspect of the present invention may provide a digital content delivery system. The system may be configured to deliver digital content that is automatically curated to be relevant to each user of the system. The system may comprise at least one of:

- a database comprising a knowledge graph for storing a semantic representation of content and user data; - a set of user accounts, each associated with an individual user;

- a data processing module configured to process at least one of:

- content from a content repository;

- user data from the set of user accounts; to generate entities, and determine their interrelationships, for storage within the knowledge graph, the entities including at least one user entity specific to each user account; and - a content delivery module for delivering curated content to each user in dependence on the relationship between that content, and the user entity specific to the respective user account.

Accordingly, the system may further comprise a user device, configured to display a curated content feed to which the content delivery module transmits content curated specifically to the user. The user device is preferably a mobile device.

It will be understood that a content repository need not necessarily be a single resource but rather may be in the form of an existing set of content, such as that which can be accessed via a content delivery network. This may include podcast distribution channels and other platforms.

The knowledge graph may comprise tags, for example, for categorising or otherwise semantically representing content, including that of user data.

To facilitate processing, heterogenous content from the content repository is preferably converted into a homogeneous form for processing by the data processing module. For example, content that contains audio such as speech is preferably converted into a text- based transcript. To this end, the data processing module may comprise a speech-to-text converter for converting speech content into text. Accordingly, it is possible to establish semantic relationships between entities within the knowledge graph that originate from disparate content.

Entities that are stored within the knowledge graph may represent users, whole content items, and/or content item portions.

Preferably, the curated content comprises an audio component, such as a podcast recording or a video, or part thereof.

The data processing module may be configured to perform an auto-tagging process for semantically positioning entities within the knowledge graph. To perform auto-tagging, the data processing module may comprise applying at least one of: machine-learning algorithms, rule-based algorithms, and/or natural language processing algorithms.

For example, the data processing module may be configured to apply a rule-based pipeline to content from a content repository, the rule-based pipeline comprising question pre-processing, and/or question segmentation, to generate from them a set of rules-based tags. The set of rules-based tags may include longest n-gram tags, for example.

Similarly, the data processing module may be configured to apply a machine-learning- based pipeline including applying at least one of: pre-trained language models, ranking functions, and fine-tuning utilising vertical data sets or models. Moreover, the machine learning-based pipeline is configured to determine the most semantically relevant tags to apply to each content item of a content repository.

Naturally, tags from the auto-tagging process derived from one or more algorithms may be combined.

As well as tagging whole items of content from a content repository, the system may also be arranged to tag different parts of content with different tags.

Leading on from this the data processing module may comprise a snippet generator. The snippet generator may be configured to select a portion of the curated content to be delivered by the content delivery module. Specifically, the snippet generator may be arranged to retrieve content from the content repository and process the content to obtain a set of candidate content portions, each portion having its own entity or semantic representation within the knowledge graph. Accordingly, different parts of the same item of content may be relevant to different users of the system, as established by the relationship between each content portion entity, and user entity within the knowledge graph.

Moreover, the speech-to-text convertor may generate a time-registered transcript from an item of content in the form of an audio file. In particular, the temporal location (within the original content item) of each word or sound may be specified alongside that word or sound. Advantageously, this can facilitate extraction of the correct portion of content to be delivered as a snippet.

To this end, the system (and the data processing module in particular) may be configured to perform at least one of: o speech recognition on the audio component of an item of content from the content repository, to generate a time-registered text version of that content; o text processing on the text version of that content, the text processing determining word-groups, such as sentences or phrases; o ranking on each word-group, the ranking including semantic scores for each word- group - ideally based on the overall context of the content; o selection of the highest-ranked word-groups as snippet candidates; o storing a representation of each snippet candidate as an entity within the knowledge graph; o extraction of a portion of the content item that corresponds to a respective snippet candidate; o audio-processing of each extracted portion to generate a short-form audio file having optimal codec and bitrate for a given content delivery device (typically mobile device); o hosting each of the short-form audio files via a content delivery network; o determining a match between entities representing a snippet candidate and entities representing a user; and o delivering the short-form audio file to a content feed of a matching user as an item of curated content.

The curated content may comprise at least one recommendation, such as a connection suggestion to another user of the system, or content to which that user is subscribed. Preferably, the system comprises a recommendation engine for providing recommendations. Preferably, the recommendation engine is configured to perform matchmaking between users on the basis of their respective user entities within the knowledge graph.

Preferably, the system comprises a user account management module. This allows users to provide their personal details, preferences, and other attributes. These inform the generation and evolution of each user entity within the knowledge graph.

The system may further comprise an application that is executable on a user device via which users can interact with their user account, and/or receive content from the content delivery module. The application may be a mobile application ("app") that is downloadable from an application hosting platform (e.g. app store, or play store), and executable on a user mobile telecommunication device. When so executed, the app configures the mobile device to display a user interface (Ul) to a user, and receive interactions via the Ul. Interactions with the Ul may allow explicit update of a corresponding user account (e.g. via the user manually entering preferences such as their interests). Interactions with the Ul may allow implicit update of a corresponding user account, for example, by logging user engagement with tagged content, messaging of other users, or connecting with other users. These updates can be used to control the content delivered from the content delivery module, and displayed via the user interface, for example within a content feed element of the Ul. Accordingly, a user's content feed can be highly-personalised. The functionality of the user account management module may be provided, at least in part, by the app. Additionally, the user account management module allows users or administrators (e.g. enterprise-level learning and development administrators) to set learning goals, for example about a particular topic. The goals that can be set ideally correspond to tags which may be applied by the system to categorise content.

These learning goals can be used to modify the corresponding user entity within the knowledge graph and thus control the curated content received by the user from the content delivery module.

Moreover, the system is configured to log user engagement with content, the engagement being logged in terms of time and category. Advantageously, this allows the system to automatically and naturally track the progress of users in achieving their learning goals. For example, as users watch or listen to content about a particular topic this is stored as part of their user account.

Accordingly, system can display to the user (or a third party, such as an administrator) a categorised summary of the content they have consumed. The summary may include absolute or relative time-spent on a particular category - e.g. hours vs percentage of a predetermined learning goal.

A second aspect of the present invention may reside in a computer-implemented content curation method comprising at least one of:

- storing a semantic representation of content and user data within a knowledge graph;

- maintaining a set of user accounts, each associated with an individual user;

- processing at least one of content from a content repository, and user data from the set of user account so as to generate entities, and determine their interrelationships, for storage within the knowledge graph, the entities including at least one user entity specific to each user account; and

- delivering curated content to each user in dependence on the relationship between that content, and the user entity specific to the respective user account.

A third general aspect of the present invention may reside in a mobile telecommunications device for use in content curation, the mobile telecommunications device comprising at least one of: an electronic touch-sensitive screen; and a wireless telecommunications module operable to download an application; the mobile telecommunications device being arranged to execute the downloaded application to control the mobile telecommunications device to: - prompt the user, via the screen, to take a set of actions to generate user data;

- transmit, via the wireless telecommunications module, the set of user data to a respective user account for processing;

- receiving curated content from a content delivery module, generated in response to the user data within the respective user account; and

- displaying that curated content on the screen of the mobile device.

Further aspects of the present invention may reside in one or more of:

- Aggregation and curation of content from a variety of sources, be that pre existing podcast content, or bespoke content that has been manually uploaded.

- Using advanced Al to pre-process both audio and video content to extract a time-registered text transcript with exceptionally high accuracy.

- Natural language processing of the resulting transcription to determine, through "relevance detection" algorithms to extract the most impactful and insightful elements of the content, along with metadata such as topic tags, and discovery of the personalities/speakers within the content.

- Relevance detection is based around a ML graph model which is constantly improving based on an ever-expanding set of end-user data signals regarding engagement with the content.

- Timestamps from the results of these algorithmic processes are then used to create short-form video and audio snippets of those most relevant moments in the full-length content.

- The snippets are then served to end users in a hyper-personalised feed, using a combination of explicitly declared interests and more implicit data signals, such as "saving for later", recommending, or recommendations from other, connected users.

- The personalisation model is continually building a more accurate understanding of the type of content that the user will find most engaging, based on these implicit and explicit parameters and signals.

- Connection recommendations (connect with other users) are also continually informed by a number of parameters including consumption of, and engagement with, similar content, as well as explicitly stated information such as similar industries and similar job titles.

- Followed topics can be personalised further in the form of learning goals, leading to an alternative content feed focused specifically on those goals. This can be defined by individual users, or in the case of enterprise accounts, can be defined by an account administrator for other users within the company.

It will be understood that features and advantages of different aspects of the present invention may be combined or substituted with one another where context allows. For example, the features of the system described in relation to the first aspect of the present invention may be provided as part of the method of the second aspect, and/or the mobile device described in relation to the third aspect of the present invention.

Furthermore, such features may themselves constitute further aspects of the present invention, either alone or in combination with others.

For example, the features of the database, the knowledge graph, the user accounts, the data processing module, the snippet generator, the user account management module, and the content delivery module may themselves constitute further aspects of the present invention.

Specific description of the preferred embodiments

In order for the invention to be more readily understood, embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which Figure 1 shows a schematic view of a system according to a first exemplary embodiment of the present invention, and Figures 2a and 2b together show a flow diagram of some of the steps performed by a data processing module of the system of Figure 1.

Referring to Figure 1, there is shown a schematic view of a system 1 according to a first exemplary embodiment of the present invention. The system 1 is configured to implement a method of delivering curated content, a generalised example of which is described above.

The system 1 comprises a database 2 having a knowledge graph 20 within which a semantic representation of data is stored. The data stored in the knowledge graph includes semantic representations of content and user data, with such data being stored in the form of tags that categorise user data and content in the form of an ontology that supports a hierarchical taxonomy.

The system 1 further comprises a user account management module 3, for storing and managing a set of user account 30. Each user account 30 is associated with an individual user of the system 1 that interacts with the system to receive curated digital content that is relevant to them. User accounts comprise associated user data.

An application hosting platform 10 (e.g. app store, or play store) and a content repository 12 are also provided, but in certain embodiments, the application hosting platform 10 and the content repository 12 are not necessarily part of the system 1 itself, and so are shown in dashed outline in Figure 1. g

The various components of the system 1 are communicatively linked to one another, and also components external to the system 1 via a communications network 7. The components of the system 1 , and the network 7 may reside, at least in part, on a server. However, this may not necessarily be in the form of a single physical machine, but rather may encompass, for example, a distributed or "cloud" computing service, engine, service or platform. Accordingly, at least parts of the network 7 involve communication across the Internet.

The system 1 further comprises a data processing module 4, a content delivery module 5, and a user device 6. Whilst only a single user device 6 is shown, it will be understood that the system 1 is likely to include at least hundreds of different user devices, each associated with a respective user, and via which relevant curated content is delivered to a respective user. The user device 6 in most embodiments is envisaged to be in the form of a mobile telecommunication device, such as a smartphone.

The user device 6 is configured to connect to the application hosting platform 10, via network 7, and download from it a mobile application ("app") 11 that is executable on the user device 6. When so executed, the app 11 configures the user device 6 to display a user interface (Ul) to a user via a screen 60 of the mobile device, and further receive interactions via the Ul. In alternative arrangements, the user device 6 may be another computing device, such as a laptop or desktop machine, with the app 11 being tailored to provide similar functions on such a computing device.

The app 11 interfaces with other components of the system 1 to provide various functionality. Notably, the other components of the system 1 serve a typically "back-end" function of sending data to the app 11. The app 11 then performs a typically "front-end" function of rendering that data for display to a user via the screen/UI 60 of the device 6, and receives inputs from that user to send as queries or other information back to the other components of the system 1.

For example, the app 11 interfaces with the user account management module to allow a user to register an account with the system 1 , this being stored within the set of user accounts 30. By default, a user remains logged into their associated user account, and so the user device 6 becomes associated with the user, and the corresponding user account.

The app 11 allows the user device 6 to interface with the user account management module 3 to provide user data, such as login credentials, specify an organisation that the user is associated with, and topics of interest. In certain organisations, the user may search for, and specify their relationship with other members of the organisation. For example, a user may specify who their manager is, and/or who they may manage.

Additionally, a user can use the app 11 to specify a set of learning goals. The app 11 configures the mobile device 6 to provide a user with a learning goal specification interface. This presents a user with learning goal options, receives a user selection of those learning goal options, and transmit those selections to the user account management module 3 be stored against the user account associated with the user.

This, along with other user data stored against the user account, is used to determine content to be delivered to the user, and moreover allows content to be delivered that is relevant to the learning goals of the user.

The app 11 comprises an engagement logger that logs user engagement with content, such as reading an article, or listening to a podcast. As content is consumed via the app 11 , the engagement logger tracks or otherwise estimates a quantity of content that has been consumed, and one or more categories that the content is classified under.

For example, the logger may track the number of minutes a particular podcast has been playing, and also topics to which the podcast relates to. This logged information can therefore be used to automatically track progress towards meeting learning goals. The app 11, interfacing with the user account management module 4, saves this logged engagement against the user account. This can then be displayed so that a user can visualise their progress towards meeting their learning objectives. Progress towards learning goals can be displayed as, for example, time spent consuming content relevant to a particular learning goal. A learning goal may be towards a particular topic, with the goal being quantified as "consume 120 minutes of content about this topic". A progress bar can therefore be displayed showing the number of minutes so far consumed, or otherwise a percentage of overall content consumption focused on a particular goal.

A user's learning goals can be managed and viewed at an organisational or enterprise level. For example, using a management dashboard, managers at multiple levels in an organisation can set topic goals for the workforce. Additionally, employees can make selections using a similar process, which indicate to higher levels of the organisation what their learning preferences are. Managers can then approve a set of learning topics, with, for example, the learning goals being aligned with corporate values or other company initiatives. Managers can be aided in reporting the education of the workforce by comparing suggestions from higher levels of the organisation with those preferences indicated by employees. An employee user can thus receive content relevant to the learning goals that are approved at an organisational level, and this can be tracked, and the progress towards meeting those learning goals display to that employee user, and their managers. Specifically, each user has a learning goal dashboard via which progress towards their own learning goal, and those of users managed by them can be displayed.

Consumption of thought-leadership content can thus be used to enable learning and personal and professional growth.

Content is delivered by the content delivery module 5, which communicates with the app 11 to present consumable digital content via the screen 60 of the user device 6. This is presented in the form of a feed - such as a list of summary items that can be browsed through by a user, and individually selected to obtain a content item to which that summary relates to. The feed may be presented by the screen/UI 60 as a scrollable list, for example. The summary items that are presented to a user typically include a title and a description which provide the user with information about what is likely to be within the accompanying content. Additionally, the app 11 is configured to provide an indication to a user about which learning goals a particular item of content is relevant to. Content that counts towards meeting learning goals may be included as summary items in the feed at a predetermined frequency, such as every 5-10 summary items. Additionally, content items may be relevant to multiple learning goals or topics. Thus, the consumption of certain content items can count towards multiple learning goals. Accordingly, the system may display to a user the time required to be spent consuming relevant content in order to meet specified learning goals, and furthermore calculate an optimal way for users to consume content that meets those learning goals more quickly.

It should be noted that the app may provide one feed that specialised towards learning goals, and other feeds directed to other categories, such as entertainment, social, event- based and location-based feeds.

The content originates from the content repository 12. This may include a podcast directory, for example. However, the content within the content repository 12 is first processed to determine its semantic relevance to each user.

The data processing module 4 accesses content from the content repository 12 via the network 7 and processes each available item of content to generate content entities, the content entities being dependent on the semantic information within the content. To achieve this, content that is not in the form of text is converted into text.

The knowledge graph does not necessarily need to store content itself. Rather, a semantic representation of that content is stored instead, together with a reference to the digital location of that content (e.g. via a URL, or similar). Thus, content from a wide range of different sources can be referenced without the knowledge graph 20 or database 2 of the system 1 as a whole suffering from the excessive data usage that would come from raw content data storage. This content can be made accessible to users simply by providing their user devices 6 with an appropriate reference.

The data processing module 4 also processes user data from the user accounts 30 and, in response, generates user entities. These entities are stored in the knowledge graph 20. The data processing module 4 calculates relationships between different entities within the knowledge graph 20, and this is used as the basis for determining personalised recommendations to be served to each user, and these are typically displayed a feed rendered on the user device 6.

The knowledge graph can be progressively updated in response to changes in content - in particular, new content being added to the content repository 12. The data processing module 4 is configured to register a change in content within the content repository. For example, the data processing module 4 may apply web syndication (e.g. RSS feeds) to detect the addition of new content to a particular content channel. In response, the data processing module 4 processes the new content to add a new content entity within the knowledge graph representative of that new content.

It should be noted that the content repository 12 may include a diverse set of content items, or content item portions, including:

- news articles, including text, images, audio (podcast) and video content;

- organisational data, including organisation knowledge bases, partners, sellers, buyers, products and services;

- event data, including event session data (e.g. videoed or live stream sessions);

- user-specific content, including user designations (e.g. event attendee, founder, speaker, investor).

Such content may be termed as "signal data" that the data processing module 4 processes to build and update the knowledge graph 20 within the database 2 of the system 1. Part of the processing that is performed by the data processing module 4 is termed as auto-tagging:

Auto-tagging semantically positions entities within the knowledge graph with the aim of establishing accurate relationships between entities, which can then be utilised to facilitate recommendations. In certain embodiments, the invention applies an approach that uses a machine learning algorithm, leveraging language models, combined with rule-based algorithms.

The system 1 utilises an extensive ontology - for example, of 32,000 tags - that provide semantic context to any entity via relationships to those entities. Those relationships can be used to form the basis of providing recommendations. The ontology facilitates continuous organic expansion of semantically defined and connected tags. This ontology of tags is architected in a way that allows for dynamic expansion, in order to include user- requested tags, or new technology terms as they enter common vernacular. In this way the ontology will expand organically into a so-called Universe of Tags. Initially, this may be constructed from English article nodes derived from an online encyclopaedia knowledge graph.

This architecture is unique in terms of combining two rules-based and machine learning approaches. The main objective of this architecture is to;

- apply a rule-based algorithm to ensure that all chunks of text that can be mapped to the ontology have been processed;

- apply a machine learning based approach that searches for the most semantically relevant ontology tags to input text.

The approach leverages transformers and extensions of pre-trained language models, and uses the unique underlying relationship between the ontology and the initial online encyclopaedia knowledge graph. Pre-trained language models are already trained based on public data sources. The model used in the Auto Tagger, RoBERTa, has been trained on a pre-existing data set. This combination allows the Auto Tagger to dynamically adopt new tags in the system ontology into the model for automatic tagging of entities being inserted into the knowledge graph 20.

Figures 2a and 2b together show a flow diagram of the steps performed by the data processing module 4, when performing an auto-tagging process 200. In general, the auto-tagging process 200 comprises passing an input text via a rules-based pipeline 220 and machine-learning based pipeline 240, each of which output a set of tags: rules-based tags, and ML based tags respectively. The rules-based pipeline 220 is shown primarily in Figure 2a, and the ML-based pipeline 240 is shown primarily in Figure 2b.

The input text can be obtained from content. Where the content is not already in text form, it may be processed by the data processing module 4 to derive text representations of the non-text content. In particular, where the content comprises an audio component (e.g. podcasts), the data processing module 4 is configured to convert speech within the audio component into text. The data processing module 4 may comprise and execute a speech to text conversion module for this purpose.

This can effectively generate a transcript of speech recorded in an audio format. Thus, the converted text form of the audio component can be used as the input text of the auto tagging process. In any case, the text can be further processed by the data processing module 4 to derive tags from it to allow categorisation of the audio content items within the knowledge graph.

In should be noted that content could be in a mixed format: for example, content associated with podcasts may include an audio component, descriptive text components (e.g. a title and a description of the podcast), and other information, such as metadata, capable of being parsed as text. The combination of the text derived from an item of content is typically used to derive tags from it, as this provides a richer set of contextual information from which semantic information can be derived.

For example, it is possible to identify or distinguish between individual speakers within a conversation using one or a combination of techniques. Conversations tend to include verbal introductions to speakers, and furthermore those speakers may be identified in the metadata or descriptive text components of a content item. Thus, the data processing module 4 may be configured to process the transcript, metadata and/or other parts of a content item that includes an audio component to identify one or more speakers, or at least distinguish between different speakers. To this end, the data processing module 4 may perform a voice recognition process on the audio component. Furthermore, by combining the voice recognition process, and the speech to text conversion, it is possible to determine what each speaker has respectively said, and when they said it.

If an identity of a speaker is determined by the data processing module 4, this can be added as a tag within the knowledge graph. Furthermore, if the speaker is a user of the system 1, the tag may be associated with a user entity within the knowledge graph. This has a variety of different advantages, including the ability to identify thought-leaders on particular topics, and moreover determine what content they have contributed to or featured in.

Other information can also be obtained from an audio component of content. For example, the relative pace, tone and/or sentiment of speech can be measured and tagged accordingly. This can be useful in categorising content based on these qualities of speech. Thus, as the knowledge graph is populated, it is possible to provide recommendations to users. In general, this involves signal data sources, primary from the content repository 12 to be passed to a data signal pipeline which performs auto-tagging to populate and update the knowledge graph 20. Thereafter, a recommendation engine operates to deploy intent-based recommendation models that are typically fulfilled by queries over the knowledge graph.

The results of the recommendation engine are described above as being presented to a user via operation of the app 11 on their user device 6. However, such recommendation may also be communicated to the user via communication adaptors, such as Slack, Microsoft Teams, Facebook Workplace, Yammer and Skype.

As mentioned, the knowledge graph primary stores a semantic representation of content, together with a reference to the location of that content. To minimise the need to store content that would otherwise be duplicated by the database 2 of the system, the reference to the location of content is, in most cases, to the original source of that content.

However, in some variants of the current embodiment of the invention, the system 1 is configured to store modifications to the original content on the database 2.

Moreover, the data processing module 4 comprises a snippet generator which processes content items to extract from them content items portions. Following on from this, different parts of content may be assigned with different tags. The snippet generator is configured to retrieve content from the content repository 12 and process the content to obtain a set of content portions. Each portion has its own entity or semantic representation within the knowledge graph. Accordingly, different parts of the same item of content may be relevant to different users of the system, as established by the relationship between each content portion entity, and user entity within the knowledge graph.

When an item of content has an audio component, the data processing module 4 is configured to generate a time-registered transcript from that content. A temporal location within that content of a spoken word can thus be specified in the transcript alongside a corresponding transcribed word. In this case, the snippet generator can retrieve audio content portions by selecting a segment of the time-registered transcript, determining a temporal range of that segment, then retrieving the portion of the audio file corresponding to that determined temporal range.

In general terms, the data processing module 4, when implementing the snippet generator, performs at least one of:

- Automatic speech recognition - Text processing

- Sentence ranking that produces semantic scores for each sentence based on the overall context of the content (using its transcription)

- Identifying all candidate snippets within the content

- Calculating the semantic scores for candidates snippets using their respective constituent sentence(s)

- Generating the resulting shortform audio file in optimal codec and bitrate for mobile consumption

- Hosting resulting snippets and serving via a content delivery network (CDN)

- Matching candidate snippets to the SUM (semantic user model) for each user using content tags

- Combining snippet scores and matching scores to generate individual users' feeds

The system 1 can provide recommendations other than content such as podcasts and news articles. In particular, the system 1 may be configured to recommend people, information or actions that are in some way represented as entities within the knowledge graph.

In particular, the system 1 is configured with recommendation models for the purpose of making recommendations based on predetermine intents. For example:

- Matching two members who have similar interests

- Matching investors with founders

- Matching members to products they might be interested in

- Matching members based on engagement with particular content (consuming, following, recommending, sharing, connecting)

For each such intent, the recommendation engine has a model that encapsulates an understanding of how to query the knowledge graph for suitable entities to recommend. Each model may produce results based on a mixture of:

- Rule-based queries - Data science algorithms (including similarity algorithms, pathfinding algorithms, link- prediction algorithms, community detection algorithms, centrality (importance) algorithms and others)

- Statistical models derived from decision theory - Manually-enforced overrides

The highly-connected, knowledge graph-based nature of the system 1 , and the recommendation engine that it implements, allows accurate predictions to be delivered without needing to rely on large datasets.

If users choose to connect based on the recommended matches, this allows them to communicate directly with each other on the platform, as well as informing further the recommendation algorithms for recommended content items.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the scope of the appended claims.

Claims

1. A digital content delivery system suitable for delivering relevant curated content to users in dependence on learning goals specified for each user, the system comprising: a database comprising a knowledge graph for storing a semantic representation of content and user data, the knowledge graph comprising tags that categorise content and user data within a hierarchical taxonomy; a set of user accounts, each account being associated with an individual user, and comprising associated user data; a user account management module configured to receive inputs via which learning goals can be specified for each user account thereby updating the respective user data for that user account, learning goals being associated with at least one tag within the knowledge graph; a data processing module configured to:

- process content from a content repository and user data from the set of user accounts, and in response generate entities for storage within the knowledge graph, the entities including content entities, and a user entity specific to each user account; and

- determine a relationship between the entities; a content delivery module for delivering content to each user in dependence on the determined relationship between the entities, so that content is delivered that is relevant to the learning goals specified for the respective user account; and a user device to which the content delivery module transmits curated content specific to the user, and via which content is delivered to the user.

2. The system of claim 1, wherein the user account management module comprises a learning goal specification interface configured to:

- display learning goal options that are derived from tags within the knowledge graph;

- receive inputs to select at least one learning goal option for association with a user account; and

- specify learning goals for that user account in dependence on the received inputs.

3. The system of claim 2, wherein the inputs to select at least one learning goal option are received from the user associated with that user account, and at least one additional user, such as a manager.

4. The system of any preceding claim, further comprising an engagement logger configured to log, within a respective user account, user engagement with content delivered by the content delivery module, the user engagement being logged in terms of progress towards meeting a learning goal specified for that user account.

5. The system of any preceding claim, wherein the user account management module further comprises a display interface configured to display the learning goals specified for a user account to at least one of: the user associated with that user account, and another user such as an administrator or manager.

6. The system of any preceding claim, wherein the user account management module is configured to receive inputs from users to provide their personal details, preferences, and other attributes, thereby controlling the placement of each user entity within the knowledge graph.

7. The system of any preceding claim, further comprise an application that is executable on the user device, and via which users can interact with their user account, and receive content from the content delivery module.

8. The system of claim 7, wherein the user device is a mobile telecommunication device, such as a smartphone, and the application is a mobile application that is downloadable from an application hosting platform, and executable on a user mobile telecommunication device.

9. The system of claim 8, wherein the mobile application configures the mobile device to display a user interface (Ul) to a user, and receive interactions via the Ul, interactions with the Ul allowing:

- explicit update of a corresponding user account via the manual entry by the user of preferences; and

- implicit update of a corresponding user account by logging user engagement with tagged content, messaging of other users, or connecting with other users.

10. The system of any preceding claim, wherein the curated content that is delivered to a user comprises at least one recommendation in the form of a connection suggestion to another user of the system, or content to which that user is subscribed.

11. The system of claim 10, further comprising a recommendation engine for providing recommendations, the recommendation engine being configured to:

- perform matchmaking between users on the basis of the relationship between their respective user entities within the knowledge graph; and - deliver a connection suggestion to at least one of those users.

12. The system of claim 10 or 11, configured to receive a query from a user, and in response generate at least one recommendation.

13. The system of any preceding claim, wherein the data processing module is configured to register a change in content within the content repository thereby to determine new content not yet processed, and then process the new content to generate entities for storage within the knowledge graph associated with that new content.

14. The system of any preceding claim, wherein entities that are stored within the knowledge graph represent users, whole content items, and content item portions.

15. The system of any preceding claim, wherein the data processing module is configured to perform an auto-tagging process for semantically positioning entities, such as those derived from content, within the knowledge graph.

16. The system of claim 15, wherein the auto-tagging process comprises applying a rule-based pipeline to content from a content repository, the rule-based pipeline comprising question pre-processing or segmentation, to generate from the content a set of rules-based tags such as longest n-gram tags.

17. The system of claim 15 or claim 16, wherein the auto-tagging process comprises applying a machine-learning-based pipeline including applying at least one of: pre-trained language models, ranking functions, and fine-tuning.

18. The system of any preceding claim, wherein the content repository comprises content, such as podcasts or video, that have an audio component.

19. The system of claim 18, wherein the data processing module is configured to convert speech within the audio component into text, the text being further processed by the data processing module to derive tags from it to allow categorisation of the audio content items within the knowledge graph.

20. The system of claim 19, wherein the text includes a transcript of the speech within the audio component.

21. The system of claim 20, wherein the transcript is time-registered.

22. The system of any one of claims 19 to 21, wherein the text includes an identification of at least one speaker delivering speech within the audio component.

23. The system of claim 22 when dependent on claim 20 or 21 , wherein the identification of at least one speaker is derived from the transcript.

24. The system of claim 22 or 23, wherein the data processing module is configured to perform a voice recognition process on the audio component to determine the identity of the at least one speaker.

25. The system of claim 24, wherein the identity of a speaker is stored as a tag and/or entity within the knowledge graph.

26. The system of claim 24 or 25, wherein the voice recognition process distinguishes at least one speaker from another.

27. The system of claim 26, wherein the voice recognition process determines what each speaker respectively said.

28. The system of claim 26 or 27, wherein the voice recognition process comprises a sentiment analysis step for classifying the sentiment of at a least portion of the text.

29. The system of claim 19 to 28, wherein the data processing module is configured to generate metadata that identifies properties of a transcript generated by the conversion of speech to text, the properties including at least one of: an identification of at least one speaker, a determination of the words of each speaker, and the sentiment of at least a portion of the transcript.

30. The system of any preceding claim, wherein the data processing module further comprises a snippet generator configured to retrieve content from the content repository and process the content to obtain a set of candidate content portions, each portion having its own entity or semantic representation within the knowledge graph.

31. The system of claim 30, when dependent on any one of claims 19 to 29, wherein the snippet generator retrieves audio content portions based on associated portions of text converted from speech within content having an audio component.

32. The system of claim 31, wherein the data processing module generates a time- registered transcript from content having an audio component, a temporal location within that content of a spoken word being specified in the transcript alongside a corresponding transcribed word; wherein the snippet generator retrieves audio content portions by selecting a segment of the time-registered transcript, determining a temporal range of that segment, then retrieving the portion of the audio file corresponding to that determined temporal range.

33. The system of any preceding claim, wherein the data processing module is configured to perform:

- speech recognition on an audio component of an item of content from the content repository, to generate a time-registered text version of that content;

- text processing on the text version of that content, the text processing determining word-groups, such as sentences or phrases;

- ranking on each word-group, the ranking including semantic scores for each word-group;

- selection of the highest-ranked word-groups as snippet candidates; and

- storing a representation of each snippet candidate as an entity within the knowledge graph.

34. The system of claim 33, wherein ranking comprises identifying a subsection within the audio component, such as an advertisement, that is contextually unrelated to a main section, ranking ignoring or negatively scoring word groups belonging to that subsection.

35. The system of claim 33 or 34, wherein the data processing module is configured to:

- receive a request for audio content from a content delivery device, the request specifying technical capabilities including at least one of: processing capabilities of the content delivery device, content playback capabilities of the content delivery device, and the bandwidth between the content delivery device and the data processing module;

- extract of a portion of the content item that corresponds to a respective snippet candidate;

- audio-process each extracted portion to generate a short-form audio file having optimal codec and bitrate for the technical capabilities of the requesting content delivery device.

36. The system of claim 35 further comprising transmitting the short-form audio file to the content delivery device.

37. The system of any one of claims 35 and 36, further comprising:

- hosting each of the short-form audio files via a content delivery network;

- determining a match between entities representing a snippet candidate and entities representing a user; and

- delivering the short-form audio file to a content feed of a matching user as an item of curated content.

38. The system of any preceding claim, wherein the content repository comprises at least part of a podcast directory.

39. A computer-implemented content curation method comprising:

- maintaining a set of user accounts, each associated with an individual user;

40. A mobile telecommunications device for use in content curation, the mobile telecommunications device comprising:

- an electronic touch-sensitive screen; and

- a wireless telecommunications module operable to download an application; the mobile telecommunications device being arranged to execute the downloaded application to control the mobile telecommunications device to:

- prompt the user, via the screen, to take a set of actions to generate user data;

- displaying that curated content on the screen of the mobile device.