GB2592884A - System and method for enabling a search platform to users - Google Patents

System and method for enabling a search platform to users Download PDF

Info

Publication number
GB2592884A
GB2592884A GB1917703.9A GB201917703A GB2592884A GB 2592884 A GB2592884 A GB 2592884A GB 201917703 A GB201917703 A GB 201917703A GB 2592884 A GB2592884 A GB 2592884A
Authority
GB
United Kingdom
Prior art keywords
user
search
lecture
concept
videos
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1917703.9A
Other versions
GB201917703D0 (en
Inventor
Mezaris Vasileios
Pournaras Alexandros
Wiese Michael
Galanopoulos Damianos
Saleh Ahmed
Vagliano Iacopo
Blume Till
Fessl Angela
Simic Ilija
Pammer-Schindler Viktoria
Sabol Vedran
Wertner Alfred
Vigo Markel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ernst and Young GmbH
Original Assignee
Ernst and Young GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ernst and Young GmbH filed Critical Ernst and Young GmbH
Priority to GB1917703.9A priority Critical patent/GB2592884A/en
Publication of GB201917703D0 publication Critical patent/GB201917703D0/en
Priority to PCT/IB2020/061514 priority patent/WO2021111400A1/en
Publication of GB2592884A publication Critical patent/GB2592884A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Enabling a search platform to users, comprising: obtaining a search query; integrating data from a social-stream manager and web crawlers; processing data to identify authors in documents; analyse lecture videos and non-lecture videos to retrieve a temporal structure and generate keyword-based annotations for lecture videos, and generate concept-based annotations for all videos; and retrieve bibliographic metadata that is modelled as Linked Open Data and then transformed into a common data model; filtering search results by document type, author, date or venue; ranking the search results based on relevance; visualizing search results that comprise; graph visualization for discovery and exploration of relationships between documents and properties, an interface for interest-based result exploration, a bar chart displaying aggregated information, and a tag cloud for analysis of keyword frequency; providing training support comprising a learning-how-to-search widget 504 and a curriculum-reflection widget 506, wherein the learning-how-to-search widget displays user interaction data and provides questions to the user to reflect on search behaviour and experiment with functionalities, wherein the curriculum reflection widget provides training material for the user that comprises guidance through tutorials, video lectures, or reflective learning and enables the user to complete the curriculum.

Description

Intellectual Property Office Application No. GB1917703.9 RTM Date:5 May 2020 The following terms are registered trade marks and should be read as such wherever they occur in this document: Elasticsearch Hypios JavaScript Google Intellectual Property Office is an operating name of the Patent Office www.gov.uk/ipo -1 -
SYSTEM AND METHOD FOR ENABLING A SEARCH PLATFORM TO USERS
TECHNICAL FIELD
The present disclosure relates generally to a system or a method that 5 enables a search platform to users.
BACKGROUND
Open innovation is a distributed innovation process based on knowledge flows across organizational boundaries which involves various actors, from researchers, to entrepreneurs, to users, to governments, and civil society. Existing Open Innovation Systems (OIS), for example, Innocentive (Innocentive.com) and Hypios (hypios-ci.com) mainly support collaborative idea generation and problem solving. However, the generation of ideas is not the biggest challenge of open innovation.
Research and innovation staff in academia and industry needs to effectively obtain an overview of publications, patents, products, funding opportunities, etc., to derive appropriate innovation strategies. For instance, researchers and students need to find, understand, and build on a top of a large and steadily increasing number of previous publications and other online educational resources (video lectures, tutorials, etc.). Similarly, financial auditors need to monitor a constantly evolving set of regulations pertinent to their daily work. In the Big Data era, such information is usually available and freely accessible in digital resources (e.g. text and media). However, the students and the professionals typically lack the time, strategies and tools to efficiently extract useful knowledge from all these resources. -2 -
In existing approaches, recommender systems (RS) are used to suggest interesting items, e.g. movies, news, scientific papers. Typically, the recommender systems are classified into content-based, collaborative-filtering, knowledge-based, or hybrid. The content-based RS provide suggestions that consider the items that a user likes in the past. The collaborative-filtering RS generate recommendations to the user based on the items that similar users liked. The knowledge-based RS infers similarities between user requirements and item features described in a knowledge base. The hybrid RS combines one or more of these techniques. With the evolution of the Web towards the global data space known as a Linked Open Data (LOD) cloud, Linked-Data-based RS have emerged, and they suggest items by exploiting knowledge on the LOD cloud. Each of the above recommender systems analyses their resource in a unique way and provides recommendations based on the analysis.
Further, each of the recommendation suggested by the above recommender systems plays an important role in suggesting relevant training materials to the user. However, there is no such existing systems available in the market for extracting information from all these resources to recommend relevant training materials as it is cumbersome to extract information from different resources and comingling it for providing suggestions on the relevant raining materials.
Therefore, there arises a need to address the aforementioned technical drawbacks in existing known approaches for automatically providing strategies and tools to efficiently extract useful knowledge from all these 25 resources for recommending relevant training materials.
SUMMARY -3 -
According to a first aspect, there is provided a method for enabling a search platform to users, the method comprising: obtaining a search query from a user device of a user; integrating data from (i) a social stream manager, (ii) a search-5 engine based web crawler and (iii) a focused web domain crawler; processing the integrated data to (i) identify author names in documents that are stored in an Elasticsearch index; (ii) analyze lecture videos and non-lecture videos, to (a) retrieve a temporal structure (fragments) of each video and (b) generate keyword-based annotations for each fragment specifically for lecture videos, and (c) generate concept-based annotations for each temporal fragment for the lecture videos and the non-lecture videos; and (iii) retrieve bibliographic nnetadata that is modelled as Linked Open Data (LOD) and transform the bibliographic metadata into a common data model using a bibliographic nnetadata injection; filtering search results by at least one of (i) a document type, (ii) an author, (iii) a date or (iv) venue; ranking the search results based on relevance of the search results to the search query of the user; visualizing the search results that comprise (i) a graph visualization for discovery and exploration of relationships between documents and their properties, (ii) a visual interface called uRank for interest-based result set exploration, (iii) a bar chart displaying aggregated information about the properties of retrieved documents, and (iv) a tag cloud for an analysis of keyword frequency in the retrieved documents; and -4 -providing an adaptive training support that comprises a learninghow-to-search widget and a curriculum reflection widget, wherein the learning-how-to-search widget automatically displays user interaction data regarding functionalities used by the user based on activity log data retrieved from WevQuery; and provides questions to the user to (i) reflect on the search behaviour and (ii) experiment with other search functionalities; wherein the curriculum reflection widget provides (a) a training 10 material adapted to user's competence level that comprises at least one of (i) guidance through tutorials, (ii) video lectures, and (iii) the reflective learning on the content of a training environment, and (b) enables the user to complete the available curriculum.
Optionally, the method further comprises storing the user interaction data and the user's context using WevQuery, wherein the user interaction data are user interface events that comprise at least one of (i) mouse click events, (ii) mouse movement events, (iii) mouse wheel events, (iv) keyboard events, (v) window events or (vi) screen touch events, wherein the user's context is at least one of (i) search topics or (ii) curriculum of the user.
Optionally, in the method, the social stream manager crawls social media and monitors a plurality of social streams to collect incoming content relevant to a keyword, a social media user or a location of the user, using 25 a corresponding Application Programming Interface (API) of each service. -5 -
Optionally, in the method, the search-engine based web crawler crawls web pages that are relevant to topics based on the search query by exploiting web search Application Programming Interfaces (APIs).
Optionally, in the method, the focused web domain crawler crawls user-5 defined web domains.
Optionally, in the method, the fragments of the lecture videos and the keyword-based annotations for each fragment are generated by (i) obtaining an audio transcript of each lecture video and automatically transforming the audio transcript of each lecture video into a set of meaningful textual cues; (ii) representing every textual cue in a vector space using word ennbeddings; (iii) detecting time boundaries on a lecture video using generated vector space representations of textual cues; these boundaries define the set of temporal video fragments; and (iv) selecting a set of keywords for annotating every fragment; these keywords are the N most frequent textual cues for the given fragment.
Optionally, in the method, the concept-based annotations for each fragment of the lecture videos are generated by (i) selecting a closed set of pre-specified visual concepts that is suitable for a task; (ii) generating, for each concept, a Concept Language Model (CLM) that includes a set of M keywords that are relevant to this specific concept; each Concept Language Model (CLM) is generated by automatically issuing a web query, transforming top-K retrieved articles in the Bag-of-Words (BoW) representation, and selecting the top-M most frequent keywords of the BoW representation; (iii) defining as Transcript Language Model (TLM) the set of N keywords used for annotating the fragment; (iv) identifying, for every CLM, the semantic relatedness value for each possible pair of keywords, where one keyword belongs in the TLM and the other keyword -6 -belongs in the CLM; (v) transforming the set of semantic relatedness values (for one CLM and one TLM) into a single score that denotes the semantic relation of the concept represented by the CLM with the lecture video fragment represented by the TLM; and (vi) annotating the lecture's fragment with visual concepts by selecting, from the closed set of pre-specified visual concepts, a set of X concepts with the highest semantic relation score for this particular fragment.
Optionally, in the method, the concept-based annotations for the non-lecture videos are generated by (i) de-composing the non-lecture videos into elementary temporal shots by: (a) representing visual content of each video frame by extracting a color histogram and a set of local descriptors, (b) assessing visual similarity between successive frames using the features in (a) and comparing it against a pre-specified threshold to detect candidate shot transitions, and (c) re-evaluating candidate shot transitions by applying a flash detector and a pair of dissolve and wipe detectors to filter out false detections; (ii) annotating each shot with visual concepts that are obtained from a pre-specified concept pool, by (a) using a number of deep learning based concept detectors, or -7 - (b) using a number of discriminant analysis based concept detectors, or (c) using a number of concept detectors combing deep learning and discriminant analysis.
Optionally, in the method, the non-lecture videos that are semantically or thematically closer to the lecture-videos are identified using the generated concept-based annotations and semantic word ennbeddings that match these annotations for the non-lecture and lecture videos.
Embodiments of the present disclosure substantially eliminate or at least 10 partially address the aforementioned drawbacks in existing approaches for adaptive training support and automatic learning guidance to users.
Additional aspects, advantages, features and objects of the present disclosure are made apparent from the drawings and the detailed description of the illustrative embodiments construed in conjunction with 15 the appended claims that follow.
It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific -8 -methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
Embodiments of the present disclosure will now be described, by way of 5 example only, with reference to the following diagrams wherein: FIG. 1 is a schematic illustration of a system in accordance with an embodiment of the present disclosure; FIG. 2 is a functional block diagram of crawlers in accordance with an embodiment of the present disclosure; FIG. 3 is a functional block diagram of a focused web domain crawler in accordance with an embodiment of the present disclosure; FIG. 4 is a functional block diagram of WevQuery in accordance with an embodiment of the present disclosure; FIGS. 5A-5B are exemplary views of graphical user interfaces of enabling 15 adaptive training support to a user in accordance with an embodiment of the present disclosure; FIG. 6 is an exemplary view of a graphical user interface of generating queries using a designer device in accordance with an embodiment of the present disclosure; FIG. 7A is an exemplary view of a graphical user interface for displaying a new event dialogue to a designer in accordance with an embodiment of the present disclosure; -9 -FIG. 7B is an exemplary view of a graphical user interface for displaying a new temporal constraint dialogue to a designer in accordance with an embodiment of the present disclosure; FIG. 8 is a flowchart illustrating steps of a method for (of) retrieving data 5 sources from a Linked Open Data (LOD) cloud using a bibliographic metadata injection module in accordance with an embodiment of the present disclosure; FIG. 9 is a flowchart illustrating steps of a method for (of) identifying author names that are mentioned on documents that are stored in an 10 Elasticsearch index in accordance with an embodiment of the present disclosure; FIG. 10A illustrates an exemplary view of a graph visualization of search results in accordance with an embodiment of the present disclosure; FIG. 10B illustrates an exemplary view of a ring menu of a node of FIG. 15 10A in accordance with an embodiment of the present disclosure; FIG. 10C illustrates an exemplary view of a visualization of an aggregated subgraph of FIG. 10A in accordance with an embodiment of the present disclosure; FIG. 11A illustrates an exemplary view of a graphical user interface for 20 displaying documents that are ranked according to the selected keywords in accordance with an embodiment of the present disclosure; FIG. 11B illustrates an exemplary view of a graphical user interface for displaying initial ranking provided by a search engine with a listing of -10 -extracted keywords in accordance with an embodiment of the present disclosure; FIG. 11C illustrates exemplary view of a graphical user interface for displaying changes in ranking based on user's interest in certain topics in 5 accordance with an embodiment of the present disclosure; FIG. 11D illustrates an exemplary view of a graphical user interface for displaying an effect of reducing weight of a keyword in accordance with an embodiment of the present disclosure; FIG. 11E illustrates an exemplary view of a graphical user interface for a 10 preview of a document's contents, displaying selected keywords in accordance with an embodiment of the present disclosure; FIG. 12 illustrates an exemplary view of a graphical user interface for displaying document metadata in the retrieved results as bar charts in accordance with an embodiment of the present disclosure; FIG. 13 illustrates an exemplary view of a graphical user interface for displaying prominent keywords in retrieved search results in a tag cloud in accordance with an embodiment of the present disclosure; FIG. 14 illustrates an exemplary view of a graphical user interface for a preview of a recommended-document's contents in accordance with an 20 embodiment of the present disclosure; and FIG. 15 is a functional block diagram of a recommender system which suggest personalized training material from a training environment in a working environment in accordance with an embodiment of the present disclosure.
In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the non-underlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
DETAILED DESCRIPTION OF EMBODIMENTS
The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize other embodiments for carrying out or practicing the present disclosure are also possible.
According to a first aspect, provided is a method for enabling a search is platform to users, the method comprising: obtaining a search query from a user device of a user; integrating data from (i) a social stream manager, (ii) a search-engine based web crawler and (iii) a focused web domain crawler; processing the integrated data to (i) identify author names in documents that are stored in an Elasticsearch index; (ii) analyze lecture videos and non-lecture videos, to (a) retrieve a temporal structure (fragments) of each video and (b) generate keyword-based annotations for each fragment specifically for lecture videos, and (c) generate concept-based annotations for each temporal fragment for the lecture videos and the non-lecture videos; and -12 - (iii) retrieve bibliographic nnetadata that is modelled as Linked Open Data (LOD) and transform the bibliographic metadata into a common data model using a bibliographic nnetadata injection; filtering search results by at least one of (i) a document type, (n) an author, (iii) a date or (iv) venue; ranking the search results based on relevance of the search results to the search query of the user; visualizing the search results that comprise (i) a graph visualization for discovery and exploration of relationships between documents and their properties, (ii) a visual interface called uRank for interest-based result set exploration, (iii) a bar chart displaying aggregated information about the properties of retrieved documents, and (iv) a tag cloud for an analysis of keyword frequency in the retrieved documents; and providing an adaptive training support that comprises a learning-15 how-to-search widget and a curriculum reflection widget, wherein the learning-how-to-search widget automatically displays user interaction data regarding functionalities used by the user based on activity log data retrieved from WevQuery; and provides questions to the user to (i) reflect on the search behaviour and (ii) experiment with other search functionalities, wherein the curriculum reflection widget provides (a) a training material adapted to user's competence level that comprises at least one of (i) guidance through tutorials, (ii) video lectures, and (iii) the reflective 25 learning on the content of a training environment and (b) enables the user to complete the available curriculum.
-13 -The present method, when in operation, thus automatically recommends training material from the training environment of the user using the curriculum reflection widget.
In embodiments of the present disclosure, a search query is obtained from 5 a user device of the user. The user device may comprise a personal computer, a smartphone, a tablet, a laptop or an electronic notebook. In an embodiment, the present method enables the user to enable or disable a user interaction tracking at any time using profile settings. In an embodiment, a plurality of interaction events is obtained from a web 10 application when interaction data is obtained from the user. The present method may track the interaction data such as website ID, User ID, IP address, URL, timestannp, load timestamp, time offset, platform or browser.
The present method includes WevQuery, a tool to log user interactions that includes a visual query language and a programming interface for querying interaction log data. This allows for an evaluation of interactive web systems through an iterative hypothesis generation and testing. WevQuery enables designers to graphically define their queries or hypotheses without knowledge of databases or programming skills to create queries to retrieve information about behaviours exhibited on the web application. The queries are represented as sequences of events that are defined using interactive drag-and-drop functionalities on the web application. The queries may include a number of user interface events including scroll change events or mouse interactions.
In an embodiment, WevQuery includes an interface that includes an interactive web. In another embodiment, WevQuery includes a back-end that translates the graphically designed hypotheses into the queries that -14 -are run in a database and generates a comprehensive report about the formulated hypothesis. WevQuery stores user interaction data and user's context in a corresponding user profile. In an embodiment, the user interaction data are user interface events that include at least one of (i) mouse click events, (ii) mouse movement events, (iii) mouse wheel events, (iv) keyboard events, (v) window events or (vi) screen touch events. In an embodiment, the user's context is at least one of (i) search topics or (ii) a curriculum of the user.
The present method integrates data from (i) a social stream manager, (ii) a search-engine based web crawler and (iii) a focused web domain crawler. The social stream manager (SSM) is a software that operates on top of the stream manager. The stream manager crawls social media, e.g. Twitter and YouTube, and monitors a plurality of social streams to collect incoming content relevant to a keyword, a social media user or a location of the user, using a corresponding Application Programming Interface (API) of each service.
The content consists of the Items, MediaItems and Webpages, which are collected by the stream manager. In an embodiment, the Items are the social media posts like tweets etc. In an embodiment, MediaItenns are the video links extracted from items with their respective nnetadata. In an embodiment, Webpages are the webpage links extracted from items with their respective metadata. The social stream manager (SSM) transforms the data format of the collections according to the method's common data model. Additionally, it processes video links by using the parts of the method that generate fragments and concept-based annotations for videos. In an embodiment, a mechanism that monitors Horizon 2020 funding topics is also part of the Social Stream Manager (SSM) and it is -15 -scheduled to run once every day. The mechanism collects the data about the funding topics from a JavaScript Object Notation (JSON) file. In an embodiment, the JSON file is updated every time a new funding topic is inserted. The mechanism first checks status of a call. If the call is closed, 5 it skips a topic as there is no need to index an expired funding topic. The next step is to retrieve the full Hypertext Markup Language (HTML) page of the topic and transform its metadata into the common data model format. The crawler checks if there are possible duplicates of the produced document already indexed before indexing it. In an embodiment, the 10 crawler checks the duplicates which only takes into account the URL by looking for similar URLs in indexed documents.
The search-engine based crawler (SEC) crawls the webpages that are relevant to topics based on the user's interest by exploiting web search Application Programming Interfaces (APIs), e.g. Google custom search API, to retrieve the search results. Each webpage is fetched and converted to a common data model. In an embodiment, topics are scheduled to be re-searched periodically.
In an embodiment, the search-engine based crawler extracts and manages embedded videos included in the crawled webpages. The search-engine based crawler may send a request, in the form of a video URL, to the parts of the method that generate fragments and concept-based annotations for videos, and retrieves the generated concept annotation results.
The focused web domain crawler (FDC) crawls user-defined web domains 25 to extract URLs from the webpage. The focused web domain crawler (FDC) includes an application scheduler, a general spider, an engine scheduler and a downloader. The application scheduler periodically verifies the database if new domains are stored in the database. The application scheduler launches the general spider to crawl a new domain when the new domain is stored in the database. The general spider crawls the URLs and sends the URLs to the engine scheduler. The engine scheduler controls data flow between the application scheduler, the general spider, and the downloader and triggers events when actions occurred in the focused web domain crawler (FDC). The engine scheduler sends the URLs to the downloader. The downloader fetches the webpage and sends the fetched webpage to the general spider through the engine scheduler. The general spider extracts common data model attributes from HTML and sends the common data model attributes to a pipeline. In an embodiment, the pipeline feeds the common data model attributes as a document to the method's data repository. The general spider extracts the URL links from the webpage and sends the URL links to the engine scheduler that may schedule download. In an embodiment, the general spider continuously extracts the URLs until the URLs of the webpage are crawled. In an embodiment, the focused web domain crawler (FDC) further includes an internal duplicate URL filter that avoids re-crawling the same webpages. The focused web domain crawler (FDC) may include an allow pattern and a block pattern that is defined by the user. The allow pattern may force the focused web domain crawler (FDC) to fetch the URLs if the URLs matches with the allow pattern. The block pattern may prevent the focused web domain crawler (FDC) from fetching the URLs that are matching with the block pattern when allowing any other URLs in the domain. The FDC extracts and manages embedded videos included in the crawled webpages. The FDC may send a request, in a form of a video URL, to the parts of the method that generate fragments and concept-based annotations for videos, and retrieves the generated concept annotation results.
The present method processing the integrated data to identify author names in documents that are stored in an Elasticsearch index, (ii) analyze 5 videos, which can be both lecture and non-lecture videos, to (a) retrieve a temporal structure (fragments) of each video and (b) specifically for lecture videos, generate keyword-based annotations for each fragment and (c) for both lecture and non-lecture videos generate concept-based annotations for each temporal fragment and (iii) retrieve bibliographic 10 metadata that is modelled as Linked Open Data (LOD) and transform the bibliographic metadata into a common data model using a bibliographic metadata injection.
The present method identifies the author names by (i) extracting features or fields from the Elasticsearch index, (ii) applying normalized rules on the extracted features, (iii) creating or updating a look-up table in a form of a first SQLite file which maps directly each mention ID to corresponding features retrieved from the Elasticsearch index, (iv) disambiguating the author names from the first SQLite file, (v) inserting an identifier (mention ID) to corresponding metadata of each author in the Elasticsearch index, (vi) adding author IDs to a feature database and creating a second SQLite file which maps mention IDs of new identified names to the corresponding author IDs, (vii) inserting the author IDs into Elasticsearch metadata records of a plurality of author mentions that include a new identified name, (viii) assigning an author and a set of documents using each mention ID in the Elasticsearch index, (ix) precomputing a mapping mention ID to document IDs and (x) removing obvious mistakes that are occurred when identifies the author names.
The present method analyzes the videos to perform temporal segmentation and visual concept detection of the videos that are hosted on the social media.
The fragments of the lecture videos and the keyword-based annotations 5 for each fragment are generated by (i) obtaining an audio transcript of each lecture video and automatically transforming the audio transcript of each lecture video into a set of meaningful textual cues; (ii) representing every textual cue in a vector space using word ennbeddings; (iii) detecting time boundaries on the lecture video using the generated vector space 10 representations of the textual cues; these boundaries define the set of temporal video fragments and (iv) selecting a set of keywords for annotating every fragment; these keywords are the N most frequent textual cues for the given fragment.
The concept-based annotations for each fragment of the lecture videos are generated by (i) selecting a closed set of pre-specified visual concepts that is suitable for the task; (ii) generating for each concept a Concept Language Model (CLM) that includes a set of M keywords that are relevant to this specific concept; each Concept Language Model (CLM) is generated by automatically issuing a web query, transforming the top-K retrieved articles in the Bag-of-Words (BoW) representation, and selecting the top-M most frequent keywords of the BoW representation; (Hi) defining as Transcript Language Model (TLM) the set of N keywords used for annotating the fragment; (iv) identifying, for every CLM, the semantic relatedness value for each possible pair of keywords, where one keyword belongs in the TLM and the other keyword belongs in the CLM; (v) transforming the set of semantic relatedness values (for one CLM and one TLM) into a single score that denotes the semantic relation of the concept represented by the CLM with the lecture video fragment represented by the TLM; and (vi) annotating the lecture's fragment with visual concepts by selecting, from the closed set of pre-specified visual concepts, a set of X concepts with the highest semantic relation score for this particular.
The concept-based annotations for the non-lecture videos are generated by (i) de-composing the non-lecture videos into elementary temporal shots by: (a) Representing the visual content of each video frame by extracting a color histogram and a set of local descriptors (b) Assessing the visual similarity between successive frames using the features in (a) and comparing it against a pre-specified threshold to detect candidate shot transitions and (c) re-evaluating candidate shot transitions by applying a flash detector and a pair of dissolve and wipe detectors to filter out false detections; (ii) annotating each shot with visual concepts that are obtained from a pre-specified concept pool by (a) using a number of deep learning based concept detectors, or (b) using a number of discriminant analysis based concept detectors, or (c) using a number of concept detectors combing deep learning and discriminant analysis.
In an embodiment, the non-lecture videos that are semantically or thematically closer to the lecture-videos are identified using the generated 20 concept-based annotations and semantic word ennbeddings that match these annotations for the non-lecture and lecture videos.
The present method filters search results by at least one of (i) a document type, (ii) an author, (iii) a date or (iv) venue. In an embodiment, an Elasticsearch is implemented in the search engine to provide scalable, real-time, nnultinnodal and faceted search and manage the plurality of document types to enable the user to filter the search results based on at least one of the document types, the author, the date or the venue. -20 -
In an embodiment, the present method combines data sets with existing data from (i) a Linked Open Data cloud and (ii) a global space of structured and interlinked data. In an embodiment, the present method includes a video dataset that includes a plurality of metadata records of educational videos with transcripts. The educational videos are obtained from at least one of (i) scholars or (ii) scientists at events such as conferences, summer schools, workshops, and science promotional events. In an embodiment, the present method includes an economics dataset that includes a plurality of metadata records of economic scientific -io publications. The present method may include a text dataset that includes a plurality of metadata records and open access full texts. The Elasticsearch includes an Elasticsearch cluster that allows scalability and increases the availability of index. The Elasticsearch cluster includes an Elasticsearch node that stores the data and the Elasticsearch index that stores the datasets. The Elasticsearch index includes one or more index shards, index replicas, filters and analysers. Each index shard may store a plurality of documents. The index replicas perform as a primary index if any fail-over occurs. The filters (i) adds synonyms of concepts using external files, (ii) filters English stop words, e.g. the and (iii) coverts a plural form of a word to a singular form of the word, e.g. Tax offices to Tax office using arithmetic stemnners. The analysers manage an index text. In an embodiment, the analysers include the filters. The analysers may be a concept analyser and a spreading activation analyser to preprocess thesauruses and indexed documents.
The Elasticsearch enables the user to retrieve a number of relevant documents based on the user query. The Elasticsearch computes ranks the documents based on their relevance to the user's query relying on titles of the search results to the user using a novel ranking method and -21 -Hierarchical Concept Frequency -Inverse Document Frequency (HCFIDF). The HCF-IDF method may obtain results comparable to state-of-the-art techniques based on full text. A dedicated user interface enables users to view a history of related documents, e.g. laws and regulations of a specific topic to enable the user to track an evolution of these documents over time and refer to a specific version.
The present method ranks the search results based on relevance of the search results to the search query of the user. The present method visualizes the search results that comprises a plurality of entities and a 10 plurality of relations corresponding to the search query of the user.
The present method includes a Graph Visualization Framework (GVF) for visualizing the data. The graph visualization framework provides interactive analysis of large, complex networks including various entities and relationships which arise from co-occurrences. The graph visualization framework focuses on visual representations of the metadata and novel graph aggregation metaphors conveying relevant properties of nodes and relations in sub-graphs.
The graph visualization framework includes powerful interaction models for explorative navigation, filtering and visual querying of graph data. In an embodiment, graphs include multiple types of nodes connected by different types of links. The graph visualization framework includes a ring-menu that allows faster exploration of the graph by displaying nodes connected over multiple hops from an original node. The graphs may grow large and complex when many nodes are shown, leading to information overload. The graph visualization framework enables the user to focus on desired information by summarizing the rest of the graph in a way that -22 -allows the user to identify and explore other potentially relevant graph areas.
The present method may visualize the data using uRank visual interface that enables the user to explore a document collection and refine information needs in terms of topical keywords. The uRank visual interface receives a set of textual documents, i.e. titles and abstracts, from the search engine. The uRank visual interface includes a keyword extraction module that analyses titles and abstracts and returns (i) a list of weighted representative terms for each document, and (ii) a set of keywords that describe the whole collection. In an embodiment, a user interface (UI) that displays a list of documents with the extracted collection keywords. The user may explore the documents and keywords. The user may identify possible key topics or relations between the documents and the keywords. The user may select them individually via clicking on the respective keywords when the user identifies interesting terms. In an embodiment, a document list is re-ranked based on relevance to the selected keywords and augmented with stacked-bars visualising document scores related to each keyword. The user may select a single document to access more detailed information on it.
The present method may visualize the data as a bar chart and represent statistical distributions of metadata and topics of various document properties. The user may select a property of the retrieved results, for which the user wants to see the distribution of occurrences. The bar chart may display most frequently occurring properties in the retrieved results.
The labels on the y-axis of the bar chart summarise the content of the retrieved results. Additionally, the user may click on one of the bars which -23 -opens a dialog with a ranked list of documents including this property. The user may click on a document in the list may open the source.
In an embodiment, a tag cloud that visualizes the keywords and displays the most relevant keywords of the retrieved search results. The tag cloud 5 relies on the uRank's extraction of keywords from the retrieved content, to generate a richer variety of keywords for a current result set. In an embodiment, the size of tags is based on their relevance (number of occurrences). The tag cloud orders the tags descending based on their relevance and sorts the tags alphabetically or by occurrence. Additionally, 10 the user may click on one of the tags that opens a dialog with a ranked list of documents containing this keyword. In an embodiment, clicking on a document in the list may open the source.
The present method implements an adaptive training support (ATS) comprising a learning-how-to-search widget and a curriculum reflection widget. The learning-how-to-search widget is generated by analyzing the users' behaviour that is stored in the database to track the user's search behaviour.
In an embodiment, the present method includes an adaptive training support (ATS) engine that communicates with a client-server architecture that utilizes well established representational state transfer (REST) API calls. The ATS engine may use data captured in user models from user interaction tracking.
The ATS engine may use an ATS database to store and retrieve relevant ATS information. In an embodiment, the relevant information for the ATS 25 may be computed and may be sent in 3SON format to a client. The client may use the available data to render the ATS.
-24 -The adaptive training support engine includes a data gathering tool, an analysis tool and a database. The utilized data gathering tool is called WevQuery. It captures user interaction data, which the ATS Engine then converts into user models that can be further analysed. The analysis tool analyses the collected data to detect triggers based on interaction patterns worth being reflected on. The analysis tool converts the data into a.]SON-encoded data string for the communication with the user. The database stores all analysis performed by the analysis tool. The database further stores the information created by the user e.g. a reply to a prompt.
In an embodiment, communication between the user and the server may be established with REST API calls. In an embodiment, the data may be sent in a JSON-encoded format.
The ATS engine continuously triggers an update of feature use for all users. Each update involves (i) a feature detection from an interaction log (ii) an extraction and storage of features in the database (iii) an update of a timestamp used as the start timestamp for feature detection and iv) schedule of a next update. The ATS engine maintains the start timestamp for feature detection. At the end of an update this timestamp is shifted forward to the timestamp of the last detected feature. This ensures that only the events in the interaction log starting from the one after the last detected feature are processed during an update. The ATS engine also maintains a parameter which defines a time interval between consecutive updates of the feature use on the present system. After finishing an update, the ATS engine schedules the next update with the time interval of this parameter.
The learning-how-to-search widget automatically analyzes user interaction data and user's context that is stored in WevQuery to -25 -compute a user's search behaviour regarding functionalities used and provide questions to the user to (i) reflect on the search behaviour and (ii) experiment with other search functionalities.
The learning-how-to-search widget provides a bar-chart representing a feature use, e.g. how often the graph visualisation is exploited, and a related reflective question that motivates users to think about their search behaviour or to try out a new/other feature available on the platform. The learning-how-to-search widget selects prompts to the user based on the previously stored user's interactions.
The learning-how-to-search widget obtains information about a feature used as a measure of interaction. The learning-how-to-search exploits this information to decide which prompt is presented next to the user. When the user starts using the present system, the widget has no information about the user's interactions. In an embodiment, the reflection guidance in the learning-how-to-search widget at this stage (start -up stage) does not present any prompt. The reflection guidance starts when the user accomplishes to use the search features a certain amount of time. The learning-how-to-search widget shows the prompt and a text field where the user can enter an answer. When the user submits the answer, the prompt and answer field disappear. The reflection guidance does not show the next prompt immediately. Rather, the reflection guidance waits for a certain amount of time until the next prompt is selected and shown to the user. In an embodiment, an episode count is used to decide when to show the next prompt. In an embodiment, a default is one episode of not showing any prompt. This way it is avoided to annoy the user with too many prompts. The reflection guidance tracks feature use and answers given to the presented prompts. Based on this information, the reflection -26 -guidance decides if it keeps presenting prompts from the current category or if it moves on to the next one. After the start up stage, three more stages in the reflection guidance follow. In each stage, the reflection guidance uses one or more categories to select prompts. The categories between each stage differ and are adjusted to interactions (i.e. experience) that the user have at that stage. The prompt categories include a first stage, a second stage and a third stage. The first stage prompts that ask about which features are less or not used, and also about the benefit and/or satisfaction of a specific feature. The second stage prompts that ask about a feature mostly used in the system, prompts that ask about the reason why these features which are mostly used or less used, and reflection. The third stage prompts that ask about most beneficial/satisfactory features, and skill/performance increase, behaviour changes.
In an embodiment, the prompts in the first stage are easy to answer and keep user's motivation high to further interact with the learning-how-tosearch widget. The switch from the first stage into the second stage happens when the prompts for each of the categories in the first stage has been answered and feature use exceeds a certain threshold. In the second stage, the prompts are aligned to the interaction (i.e. experience) that the user has collected so far. The effort of answering the prompts is higher than in the first stage. The switch to the third stage depends as before on answering the prompts and feature use. In the third stage, the most challenging questions in terms of reflection are presented to the user. These questions are not only about feature use, but also address the question whether the user has observed a change in his search behaviour influenced by the learning-how-to-search widget. -27 -
The present method includes the curriculum reflection widget that automatically presents the training material adapted to the user's competence level that comprises at least one of (a) guidance through tutorials, (b) video lectures, (c) and reflective learning with a reflective question about the content of the training environment. The curriculum reflection widget enables the user to complete the available curriculum.
In an embodiment, the curriculum reflection widget provides a learning prompt associated with the next not completed learning unit. In an embodiment, a next learning prompt may be displayed to the user if the user completes a learning unit and gives an answer to a reflective question each time. If a user completes multiple learning units in the learning environment, the reflective question displayed in the curriculum widget may relate to the last learned unit.
As a micro learning unit is very short, user interaction tracking cannot be 15 applied there to infer the engagement level of the user with the curriculum reflection widget. Therefore, a learning unit counts as completed, if the user clicks on the "Next" button of the learning unit.
If the user has completed the learning unit, a reflective prompt is displayed to the user. In an embodiment, reflective questions include a set of general questions, that bring the user to reflect about a last learning unit. In an embodiment, each reflective question includes a placeholder, which is set dynamically based on a topic of the last learned unit.
If the user answers a question, the answer may be stored in a database in a table as a record which includes a user's id, a question id, a matching 25 learning unit and a time when it was answered by the user. In an -28 -embodiment, a record of an active learning unit may be updated that it was answered in the database.
If the user goes through multiple learning units in the curriculum reflection widget, the reflective question may be directed to the last 5 learned unit.
In an embodiment, the curriculum reflection widget includes a progress bar at bottom of the curriculum reflection widget that shows the user's progress for a current submodule of the the curriculum reflection widget. In an embodiment, the progress is calculated by obtaining a number of lessons for this submodule which the user has completed and dividing it by a total number of learning units in this submodule.
According to another embodiment, the social stream manager crawls social media and monitors a plurality of social streams to collect incoming content relevant to a keyword, a social media user or a location of the user, using a corresponding Application Programming Interface (API) of each service.
According to yet another embodiment, the search-engine based web crawler crawls web pages that are relevant to topics based on the search query by exploiting web search Application Programming Interfaces 20 (APIs).
According to yet another embodiment, the focused web domain crawler crawls user-defined web domains.
According to yet another embodiment, the fragments of the lecture videos and the keyword-based annotations for each fragment are generated by 25 (i) obtaining an audio transcript of each lecture video and automatically -29 -transforming the audio transcript of each lecture video into a set of meaningful textual cues; (ii) representing every textual cue in a vector space using word embeddings;(iii) detecting time boundaries on a lecture video using generated vector space representations of textual cues; these boundaries define the set of temporal video fragments and (iv) selecting a set of keywords for annotating every fragment; these keywords are the N most frequent textual cues for the given fragment.
According to yet another embodiment, the concept-based annotations for 10 the non-lecture videos are generated by (i) de-composing the non-lecture videos into elementary temporal shots by: (a) representing visual content of each video frame by extracting a color histogram and a set of local descriptors (b) assessing visual similarity between successive frames using the features in (a) and comparing it against a pre-specified threshold to detect candidate shot transitions (c) re-evaluating candidate shot transitions by applying a flash detector and a pair of dissolve and wipe detectors to filter out false detections; (ii) annotating each shot with visual concepts that are obtained from a pre-specified concept pool, by (a) using a number of deep learning based concept detectors, or (b) using a number of discriminant analysis based concept detectors, or -30 - (c) using a number of concept detectors combing deep learning and discriminant analysis.
According to yet another embodiment, the non-lecture videos that are semantically or thematically closer to the lecture-videos are identified using the generated concept-based annotations and semantic word embeddings that match these annotations for the non-lecture and lecture videos.
The advantages of the present method are thus identical to those disclosed above in connection with the present process and the 10 embodiments listed above in connection with the present process apply mutatis mutandis to the present method.
DETAILED DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic illustration of a system in accordance with an embodiment of the present disclosure. The system comprises a user interaction data obtaining module 102, a data acquisition module 104, a data processing module 106, an Elasticsearch index 108, a user logging module 110, a social community functionality obtaining module 112, a search engine 114, a data visualization module 116, WevQuery 118 and an adaptive training support providing module 120. The functions of these parts are as has been described above.
FIG. 2 is a functional block diagram of crawlers in accordance with an embodiment of the present disclosure. The functional block diagram includes a social stream manager 202, a search-engine based web crawler 204, a focused web domain crawler 206, a social media 208, a webpage 210, pre-specified web pages 212, fragmentation and concept-based annotation for videos 214, a database 216, crawlers input user -31 -interface (UI) 218 and a data integration service (DIS) 220. The functions of these parts are as has been described above.
FIG. 3 is a functional block diagram of a focused web domain crawler in accordance with an embodiment of the present disclosure. The functional block diagram includes a database 302, an application scheduler 304, a general spider 306, an engine scheduler 308, a downloader 310, an internet 312, pipelines 314 and an Elasticsearch index 316. The functions of these parts are as has been described above.
FIG. 4 is a functional block diagram of WevQuery 406 in accordance with an embodiment of the present disclosure. The functional block diagram includes a user device 402, a database 404, WevQuery 406, a web server 408, and a designer device 410. The functions of these parts are as has been described above.
FIGS. 5A-5B are exemplary views of graphical user interfaces of enabling adaptive training support to a user in accordance with an embodiment of the present disclosure. In an embodiment, a graphical user interface 502 depicts a learning-how-to-search widget and a graphical user interface 504 depicts learning prompts of a curriculum reflection widget and a graphical user interface 506 depicts a sunburst visualization of a user's progress with regard to a curriculum. In an embodiment, the visualization of the user's progress in the curriculum is divided into three modules. In another embodiment, the visualization of the user's progress in the curriculum comprises a fourth module that is available for auditors. Each module is represented as section in the inner circle of visualization and each module is additionally divided into three sub-modules (i.e. outer circle). Every time a user completes a new learning unit, the percentage in the respective section in the visualization is updated accordingly.
-32 - Furthermore, a progress in each sub-module is encoded by color. If a user has not completed any learning units in a sub-module (0%), the respective section will be marked as red. Making progress in a sub-module may turn the section to yellow (50%) and finally, by completing a sub-module, the section may turn green (100%). Moreover, the sections in the sunburst diagram are ordered to mirror the structure of the curriculum. Starting from the top, the sub-modules get completed clockwise, slowly turning the visualization green.
Changes in the curriculum may be handled by the visualisation itself, without a need to do any further development. It is responsive in the way, that the curriculum reflection widget uses Scalable Vector Graphics (SVG) with a view box for the visualization. The SVG always fills the available width, while the view box guarantees that the sunbursts proportions stay intact.
FIG. 6 is an exemplary view of a graphical user interface 600 of generating queries using a designer device in accordance with an embodiment of the present disclosure. The graphical user interface 600 is dynamically generated from an XML schema that defines the grammar of the queries. In an embodiment, possible values of other fields are retrieved from the XML schema. The possible values may be attributes of node elements that are used to specify an event. In an embodiment, designers reuse and share the queries at any time, if the queries are stored as XML files. In an embodiment, a query in an XML file is transformed into a MapReduce query. The MapReduce query splits data into subsets and reduces function processes of each subset independently. The MapReduce query runs against an interaction data server that stores user interface events in a database. The graphical user -33 -interface 600 enables designers to generate queries using the designer device. The graphical user interface 600 enables the designers to drag and drop event elements and automatically displays available values for attributes of the events. The queries are composed of a sequence of ordered events and each event in the sequence may match one or more types of interaction events. For example, a particular event in the sequence may be set to match at least one of a mousedown event, e.g. clicking the mouse or a mousewheel event, e.g. scroll wheel of the mouse. A first event from an Event Palette, that matches a mousedown event and a nnouseup event. The graphical user interface 600 includes a plurality of modules including the Event Palette widget to define sequences and components to establish temporal relationships between events. The Event Palette widget displays a user event that may be selected when defining a query. An event template creation dialogue is displayed to the designer when the designer presses on a plus sign in "Event Example" box. The graphical user interface 600 depicts that the events generated in the Event Palette are dragged and dropped into an Event Sequence pattern Design Area, using a move icon at a top-right of the element that represents an event. The position of events in a list determines an order of the sequence which is conveyed by a number located next to the move icon. In an embodiment, resulting query includes a sequence of events WevQuery uses to look for patterns that match the sequence. The designer may discard the events by clicking a bin icon. The graphical user interface 600 depicts a file menu that enables the designers to operate with the designed query. The designers may click Run Query and visualize results on a screen of the designer device or receive the results through email when the query is defined by the designers.
-34 -FIG. 7A is an exemplary view of a graphical user interface 702 for displaying a new event dialogue to a designer in accordance with an embodiment of the present disclosure. The graphical user interface 702 depicts an event template creation dialogue is displayed to the designer 5 when the designer presses on a plus sign in "Event Example" box. For each event, an event type and context of the event are stored. The designers may select context name such as node ID, node type, node class, node text content, node text value and URL using a name icon. The designers may add events such as scroll, mouse down, mouse over using 10 an events icon.
FIG. 7B is an exemplary view of a graphical user interface 704 for displaying a new temporal constraint dialogue to a designer in accordance with an embodiment of the present disclosure. The graphical user interface 704 depicts the addition of temporal constrains that enables the designer to set time intervals between matched events. A query may ignore time elapsed between events if the time intervals are not specified. The graphical user interface 704 enables the designers to establish the temporal constrains when the designer clicks on a "add a new temporal constrain" icon. The temporal constrains are relation, events and duration, and unit. The relation determines if temporal distance between selected events may within or above an indicated threshold. The events enable the designer to select the events that are affected by temporal constrains. In an embodiment, dialogue temporarily disappears to enable the designers to select the events in an Event Sequence Pattern Design Area when at least one button is pressed. The duration and unit determine the temporal distance and the unit of time based on the selection of designers. In an embodiment, a length of a bar that conveys scope of the -35 -temporal constraints are dragged and modified if the temporal constraints are defined.
FIG. 8 is a flowchart illustrating steps of a method for (of) retrieving data sources from a Linked Open Data (LOD) cloud 800 using a bibliographic metadata injection module 808 in accordance with an embodiment of the present disclosure. At a step 802 of the method of retrieving, sending a query to the Linked Open Data (LOD) 800 from a search engine. In an embodiment, the search engine requires a query as an input. In an embodiment, an exemplifying query is representing using a bibliographic ontology and DCMI metadata terms. The Linked Open Data (LOD) 800 sends a list that includes identified data sources with the Linked Open Data (LOD) 800 to the bibliographic metadata injection module 808. At a step 804 of the method of retrieving, harvesting the data sources by means that contained informant is extracted. At a step 806 of the method of retrieving, the contained information is parsed and converted as specified in a mapping file. In an embodiment, the mapping file ensures that the bibliographic metadata injection module 808 parses desired information. The data sources may include non-bibliographic metadata. The bibliographic metadata injection module 808 retrieves bibliographic metadata modelled as Linked Open Data (LOD) and transforms the retrieved bibliographic metadata into common data model.
FIG. 9 is a flowchart illustrating steps of a method for (of) identifying author names that are mentioned on documents that are stored in an Elasticsearch index 900 in accordance with an embodiment of the present disclosure. At a step 902 of the method of identifying, inserting an identifier (mention ID) to corresponding metadata of each author in the Elasticsearch index 900. At a step 904of the method of identifying, -36 -extracting features or fields from the Elasticsearch index 900. At a step 906 of the method of identifying, applying normalized rules on the extracted features. At a step 908 of the method of identifying, creating or updating a look-up table in a form of a first SQLite file which maps directly each mention ID to corresponding features retrieved from the Elasticsearch index 900. At a step 910 of the method of identifying, disambiguating the author names from the first SQLite file. At step 912 of the method of identifying, adding author IDs to a feature database and creating a second SQLite file which maps mention IDs of new identified names to the corresponding author IDs. At a step 914 of the method of identifying, inserting the author IDs into Elasticsearch metadata records of a plurality of author mentions that include a new identified name. At a step 916 of the method of identifying, assigning an author and a set of documents using each mention ID in the Elasticsearch index 900. At a step 918 of the method of identifying, precomputing a mapping mention ID to document IDs. At a step 920 of the method of identifying, removing obvious mistakes that are occurred when identifies the author names.
FIG. 10A illustrates an exemplary view of a graphical visualization 1002 of search results in accordance with an embodiment of the present disclosure. The Graph Visualization Framework supports interactive analysis of large, complex networks consisting of various entities and relationships which arise from co-occurrences. The Graph Visualization Framework focuses on visual representations of metadata and novel graph aggregation metaphors conveying relevant properties of nodes and relations in sub-graphs. The graphical visualization 1002 includes multiple types of nodes connected by different types of links. The graph visualization framework includes a ring-menu that allows faster exploration of the graph by displaying nodes connected over multiple hops from an original node. In an embodiment, the graph visualization 1002 grows large and complex when many nodes are shown, leading to information overload. The Graph Visualization Framework enables users to focus on the desired information by summarizing rest of the graph in a way that allows them to identify and explore other potentially relevant graph areas. The present method may enable the user to identify valuable information in the search results by representing entities such as documents, authors, and locations through nodes. In an embodiment, in the edges are at least one of (i) a solid edge representing a direct connection, and (ii) a dashed edge representing relevance. In an embodiment, the relevance in the dashed edge is represented by opacity of the edge. In another embodiment, darker edge represents the more relevant are the nodes to each other.
FIG. 10B illustrates an exemplary view of a ring menu of a node of FIG. 10A in accordance with an embodiment of the present disclosure. The exemplary view depicts the ring menu of the node, that is a context menu for revealing additional nodes going out from the node for which the ring-menu was opened. A user may initiate an exploration of the graph beginning from a selected node. The user may left click on the node that may trigger an expansion of all directly connected nodes. The user may right click the node, which opens the ring menu. This allows the user to identify related nodes depending on their properties (as their type or other metadata) and their distance from an original node. The user may explore the rest of the graph by clicking on a sector, which triggers the expansion of a visible portion of the graph by showing the nodes and the relations which surround the current node.
FIG. 10C illustrates an exemplary view of a visualization of an aggregated subgraph of FIG. 10A in accordance with an embodiment of the present disclosure. The exemplary view displays aggregated subgraph that shows documents 1002, authors 1004, years 1006, concepts 1008 and keywords 1010. The arcs in the centre of the visualization show a number of connections from one node type to another.
FIG. 11A illustrates an exemplary view of a graphical user interface 1100 for displaying documents that are related to a keyword in accordance with an embodiment of the present disclosure. The graphical user interface 1100 depicts the documents that are related to the search term "geography", with ranking updated to match keywords "geographicity", "regionalization" and "country". The graphical user interface 1100 includes a tag box 1102A, a query box 1102B, a document list 1102C and a ranking view 1102D. The tag box 1102A represents a keyword-is based summary of search results, the query box 1102B includes the keywords selected by a user, the document list 1102C and the ranking view 1102D represents a list with document titles augmented with stacked bars indicating relevance scores. The document list 1102C displays titles with ranking information and the ranking view 1102D displays stacked bar charts depicting the relevance scores of a document. In an embodiment, a list and a ranking visualisation are updated as a user manipulates keyword tags in a Query Box.
FIG. 11B illustrates an exemplary view of a graphical user interface 1104 for displaying initial ranking provided by a search engine in accordance with an embodiment of the present disclosure. The graphical user interface 1104 depicts the initial ranking provided by the search engine when a term "geography" is entered as a query by a user. In an embodiment, the search engine provides the ranking based on relevance score given by the search engine.
FIG. 11C illustrates an exemplary view of a graphical user interface for displaying changes in ranking based on user's interest in certain topics in 5 accordance with an embodiment of the present disclosure. The graphical user interface includes a shift column that displays how many positions the rank of the document has changed based on a last selected keyword. In an embodiment, the ranking is fine-tuned by adjusting a slider below the selected keyword. The slider reduces or increases the importance of 10 a keyword for the ranking. In an embodiment, the keyword's importance decreases if the user moves the slider to the left. In an embodiment, the keyword's importance increases if the user moves the slider to the right.
FIG. 11D illustrates an exemplary view of a graphical user interface for displaying the effect of reducing weight of a keyword in accordance with an embodiment of the present disclosure. The graphical user interface depicts results are re-ranked after the weight of "geographicity" is reduced.
FIG. 11E illustrates an exemplary view of a graphical user interface for a preview of a document's contents, displaying selected keywords in 20 accordance with an embodiment of the present disclosure. The graphical user interface depicts a selected keyword, e.g. geographic in underline.
FIG. 12 illustrates an exemplary view of a graphical user interface for displaying document metadata in the retrieved results as bar charts in accordance with an embodiment of the present disclosure. The graphical user interface displays the most frequently occurring properties in the retrieved results when searching for the term "geography". The graphical -40 -user interface provides an option to click on one of the bars to open a listing showing the documents related to the property.
FIG. 13 illustrates an exemplary view of a graphical user interface for displaying prominent keywords in retrieved search results in a tag cloud 5 in accordance with an embodiment of the present disclosure. The graphical user interface depicts the most prominent keywords in the retrieved search results for the term "data". The graphical user interface provides an option to click on one of the keywords to open a listing showing the documents related to the keyword. The displayed keywords 10 can be sorted by frequency and alphabetically. Additionally, the displayed keywords can be filtered by frequency, text and by publishing year of the document in which they occur.
FIG. 14 illustrates an exemplary view of a graphical user interface 1400 for a preview of a recommended-document's contents in accordance with an embodiment of the present disclosure. The graphical user interface 1400 depicts a most relevant document with respect a user profile. In an embodiment, the graphical user interface 1400 displays three documents and enables a user to move among pages with the "previous page" and "next page" buttons. For each document, an icon representing a document type, for example, an article, books, video, etc. title authors and publication year is shown. When the user clicks on "more" button, additional information, such as an abstract or a description, are displayed to the user in the graphical user interface 1400.
FIG. 15 is a functional block diagram of a recommender system which 25 suggest personalized training material from a training environment in a working environment in accordance with an embodiment of the present disclosure. The recommender system comprises a recommender service 1502, an Elasticsearch index 1504 and WevQuery 1506. The functions of these components have been described above. The recommender service 1502 retrieves a user's search history through WevQuery 1506 and generates a corresponding user profile based on frequency and recentness of terms searched by the user. The recommender service 1502 then retrieves documents that are matching with the corresponding user profile in the Elasticsearch index 1504 using a (Hierarchical Concept Frequency -Inverse Document Frequency) HCF-IDF ranking method previously described.
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural.

Claims (9)

  1. -42 -CLAIMS1. A method for enabling a search platform to users, the method comprising: obtaining a search query from a user device of a user; integrating data from (i) a social stream manager, (ii) a search-engine based web crawler and (iii) a focused web domain crawler; processing the integrated data to (i) identify author names in documents that are stored in an Elasticsearch index; (ii) analyze lecture videos and non-lecture videos, to (a) retrieve a temporal structure (fragments) of each video and (b) generate keyword-based annotations for each fragment specifically for lecture videos, and (c) generate concept-based annotations for each temporal fragment for the lecture videos and the non-lecture videos; and (iii) retrieve bibliographic nnetadata that is modelled as Linked Open Data (LOD) and transform the bibliographic metadata into a common data model using a bibliographic nnetadata injection; filtering search results by at least one of (i) a document type, (ii) an author, (iii) a date or (iv) venue; ranking the search results based on relevance of the search results to the search query of the user; visualizing the search results that comprise (i) a graph visualization for discovery and exploration of relationships between documents and their properties, (ii) a visual interface called uRank for interest-based result set exploration, (iii) a bar chart displaying aggregated information about the properties of retrieved documents, and (iv) a tag cloud for an analysis of keyword frequency in the retrieved documents; and -43 -providing an adaptive training support that comprises a learninghow-to-search widget and a curriculum reflection widget, wherein the learning-how-to-search widget automatically displays user interaction data regarding functionalities used by the user based on activity log data retrieved from WevQuery; and provides questions to the user to (i) reflect on the search behaviour and (ii) experiment with other search functionalities, wherein the curriculum reflection widget provides (a) a training material 10 adapted to user's competence level that comprises at least one of (i) guidance through tutorials, (ii) video lectures, and (iii) the reflective learning on the content of a training environment, and (b) enables the user to complete the available curriculum.
  2. 2. The method as claimed in claim 1, further comprising storing the user interaction data and the user's context using WevQuery, wherein the user interaction data are user interface events that comprise at least one of (i) mouse click events, (ii) mouse movement events, (iii) mouse wheel events, (iv) keyboard events, (v) window events or (vi) screen touch events, wherein the user's context is at least one of (i) search topics or (ii) curriculum of the user.
  3. 3. The method as claimed in claim 1, wherein the social stream manager crawls social media and monitors a plurality of social streams to collect incoming content relevant to a keyword, a social media user or a location of the user, using a corresponding Application Programming Interface (API) of each service.
  4. -44 - 4. The method as claimed in claim 1, wherein the search-engine based web crawler crawls web pages that are relevant to topics based on the search query by exploiting web search Application Programming Interfaces (APIs).
  5. 5. The method as claimed in claim 1, wherein the focused web domain crawler crawls user-defined web domains.
  6. 6. The method as claimed in claim 1, wherein the fragments of the lecture videos and the keyword-based annotations for each fragment are generated by (i) obtaining an audio transcript of each lecture video and automatically transforming the audio transcript of each lecture video into a set of meaningful textual cues; (ii) representing every textual cue in a vector space using (iii) detecting time boundaries on a lecture video using generated vector space representations of textual cues; these boundaries define the set of temporal video fragments; and (iv) selecting a set of keywords for annotating every fragment; these keywords are the N most frequent textual cues for the given fragment.
  7. 7. The method as claimed in claim 1, wherein the concept-based 25 annotations for each fragment of the lecture videos are generated by (i) selecting a closed set of pre-specified visual concepts that is suitable for a task; -45 - (ii) generating for each concept a Concept Language Model (CLM) that includes a set of M keywords that are relevant to this specific concept, wherein each Concept Language Model (CLM) is generated by automatically issuing a web query, transforming top-K retrieved articles in the Bag-of-Words (BoW) representation, and selecting the top-M most frequent keywords of the BoW representation; (iii) defining as Transcript Language Model (TLM) the set of N keywords used for annotating the fragment; (iv) identifying, for every CLM, the semantic relatedness value for 10 each possible pair of keywords, where one keyword belongs in the TLM and the other keyword belongs in the CLM; (v) transforming the set of semantic relatedness values (for one CLM and one TLM) into a single score that denotes the semantic relation of the concept represented by the CLM with the lecture video fragment 15 represented by the TLM; and (vi) annotating the lecture's fragment with visual concepts by selecting, from the closed set of pre-specified visual concepts, a set of X concepts with the highest semantic relation score for this particular fragment.
  8. 8. The method as claimed in claim 1, wherein the concept-based annotations for the non-lecture videos are generated by (i) de-composing the non-lecture videos into elementary temporal shots by: (a) representing visual content of each video frame by extracting a color histogram and a set of local descriptors, -46 -b) assessing visual similarity between successive frames using the features in (a) and comparing it against a pre-specified threshold to detect candidate shot transitions, and (c) re-evaluating candidate shot transitions by applying a flash detector and a pair of dissolve and wipe detectors to filter out false detections; (ii) annotating each shot with visual concepts that are obtained from a pre-specified concept pool, by (a) using a number of deep learning based concept detectors, or (b) using a number of discriminant analysis based concept detectors, or (c) using a number of concept detectors combing deep learning and discriminant analysis.
  9. 9. The method as claimed in claim 1, wherein the non-lecture videos that are semantically or thematically closer to the lecture-videos are identified using the generated concept-based annotations and semantic word embeddings that match these annotations for the non-lecture and lecture videos.
GB1917703.9A 2019-12-04 2019-12-04 System and method for enabling a search platform to users Withdrawn GB2592884A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB1917703.9A GB2592884A (en) 2019-12-04 2019-12-04 System and method for enabling a search platform to users
PCT/IB2020/061514 WO2021111400A1 (en) 2019-12-04 2020-12-04 System and method for enabling a search platform to users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1917703.9A GB2592884A (en) 2019-12-04 2019-12-04 System and method for enabling a search platform to users

Publications (2)

Publication Number Publication Date
GB201917703D0 GB201917703D0 (en) 2020-01-15
GB2592884A true GB2592884A (en) 2021-09-15

Family

ID=69147234

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1917703.9A Withdrawn GB2592884A (en) 2019-12-04 2019-12-04 System and method for enabling a search platform to users

Country Status (2)

Country Link
GB (1) GB2592884A (en)
WO (1) WO2021111400A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112866295B (en) * 2021-03-23 2021-10-01 上海新诤信知识产权服务股份有限公司 Big data crawler-prevention processing method and cloud platform system
CN113536136B (en) * 2021-08-09 2023-04-18 北京字跳网络技术有限公司 Method, device and equipment for realizing search
CN114780820B (en) * 2022-04-28 2022-11-01 广州高专资讯科技有限公司 Open source platform-based target matching system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
GB201917703D0 (en) 2020-01-15
WO2021111400A1 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
US11513998B2 (en) Narrowing information search results for presentation to a user
US20230078155A1 (en) Narrowing information search results for presentation to a user
US11347963B2 (en) Systems and methods for identifying semantically and visually related content
US10698964B2 (en) System and method for automatically detecting and interactively displaying information about entities, activities, and events from multiple-modality natural language sources
US8135669B2 (en) Information access with usage-driven metadata feedback
Ortega Academic search engines: A quantitative outlook
US9460193B2 (en) Context and process based search ranking
US20140195884A1 (en) System and method for automatically detecting and interactively displaying information about entities, activities, and events from multiple-modality natural language sources
US20130046771A1 (en) Systems and methods for facilitating the gathering of open source intelligence
Scharl et al. Analyzing the public discourse on works of fiction–Detection and visualization of emotion in online coverage about HBO’s Game of Thrones
WO2021111400A1 (en) System and method for enabling a search platform to users
Waitelonis et al. Semantically enabled exploratory video search
Kolli et al. A Novel Nlp And Machine Learning Based Text Extraction Approach From Online News Feed
Tsukuda et al. SmartVideoRanking: video search by mining emotions from time-synchronized comments
Christensen et al. News Hunter: a semantic news aggregator
Gopi et al. TwIST: A mobile approach for searching and exploring within Twitter
Lucchese et al. Recommender Systems.
Joao Methods for improving entity linking and exploiting social media messages across crises.
Stoffalette Joao Methods for improving entity linking and exploiting social media messages across crises
Heimonen Design and evaluation of user interfaces for mobile web search
Chowdhury et al. Information access.
Alli Result Page Generation for Web Searching: Emerging Research and Opportunities: Emerging Research and Opportunities
Alli Result Page Generation for Web Searching: Emerging Research and
Said et al. Interlinking video with DBpedia using knowledge-based Word Sense Disambiguation algorithms.
Smith Exploratory and faceted browsing, over heterogeneous and cross-domain data sources

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)