WO2022005272A1 - Système et procédé d'agrégation de sujets sensibles à l'aide d'un graphe de relations - Google Patents

Système et procédé d'agrégation de sujets sensibles à l'aide d'un graphe de relations Download PDF

Info

Publication number
WO2022005272A1
WO2022005272A1 PCT/MY2020/050158 MY2020050158W WO2022005272A1 WO 2022005272 A1 WO2022005272 A1 WO 2022005272A1 MY 2020050158 W MY2020050158 W MY 2020050158W WO 2022005272 A1 WO2022005272 A1 WO 2022005272A1
Authority
WO
WIPO (PCT)
Prior art keywords
topic
named entity
named
entities
phrases
Prior art date
Application number
PCT/MY2020/050158
Other languages
English (en)
Inventor
Weiying KOK
Ma. Stella Tabora DOMINGO
Yasaman EFTEKHARYPOUR
Abdul Aziz LATIP
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2022005272A1 publication Critical patent/WO2022005272A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the present invention relates to a system and method for hot topics aggregation using relationship graph.
  • the present invention provides a system and method that breaks data into sentences, phrases or words to extract named entities from content of a document and further constructs a graph of semantic relationship between each named entities to form an entity relation topic for hot topic extraction.
  • Hot topics generally refer to a subject that is widely discussed in many sources in the same period of time and varies in popularity over time.
  • the hotness of the topic can be ranked based on the frequency of publication, the number and/or frequency of visits, reposts, or discussed or search on social media and search engines.
  • Hot topic detection is an important knowledge discovery task on social media streams to determine topics that are being discussed the most on a social media.
  • This service is usually used by news aggregators or social media to collect, aggregate and categorize multiple media sources to present them into a single page which helps to save user’s time from visiting multiple sites for new updates.
  • the aggregated hot news or social media topics are not just for being updated to the most recent trend, and instead to be a source of analysis and as a further understanding of the public or users’ interest and needs across all domains such as government agencies, ecommerce business and online health communities, etc.
  • CN 446 A Publication One example of a system and method for extracting topics is disclosed in China Patent Application Publication No. CN 104915446 A (hereinafter referred to as CN 446 A Publication) entitled “Automatic Extracting Method and System of Event Evolving Relationship Based on News 29 June 2015, Applicant: Univ South China Tech.
  • the CN 446 A Publication describes a system and method for automatically extracting news-based evolutionary relations, including news information pre-processing, news lead extraction, news event time extraction, event extraction, event keyword extraction, and event evolution relationship analysis.
  • CN 446 A Publication is limited to extraction of nouns or noun phrase as characteristics of word review of an event and does not extract the named entity for the entire reviewed data.
  • US 686 B2 Patent discloses that a real-time topic analysis for social listening is performed to discover and understand trending topics in varying degrees of granularity. Further, the US 686 B2 Patent utilizes a lightweight natural language processing, NLP method for topic extraction from the data and ranked the topics by an ATF-IDF algorithm for handling dynamically-changing content. US 686 B2 Patent also discloses classification of trending topics into clusters which provides insight for decision making and business intelligence.
  • CN 157 A Publication A further example of a hot topic extraction method is disclosed in China Patent Application Publication No. CN102982157 A (hereinafter referred to as CN 157 A Publication) entitled “Device and method used for mining microblog hot topics” having a filing date of 3 December 2012, Applicant: Beijing Qihoo Technology Co; Qizhi Software Beijing Co Ltd.
  • CN 157 A Pubication relates to a device and method used for mining microblog hot topic from an open interface.
  • CN 157 A Publication discloses that excavating the microblogging much-talked- about topic is done by carrying out hot word according to the microblogging contents that were gathered by the device.
  • CN 157 A Application teaches that the hot topics gathered were weighted and calculated according to the microblogging parameter of microblogging quantity and corresponding microblogging by obtaining the temperature value of popular keyword sets.
  • the hot topics were sorted and displayed according to the user's request or activity of the user.
  • the existing clustering method need to be improved for a more efficient real-time streaming news topic without the need to predetermine the number of clusters. Finding the relationship of topic by discovering the semantic relationship is important to understand the cause or causes of events, as well as relevant background and insights on the development of events, the climax, until the end of the entire topic. Further, the strength of relationship between the topics and the related topics should be considered in determining the hotness of a topic.
  • the present invention relates to a system and method for hot topics aggregation using relationship graph.
  • the present invention provides a system and method that breaks data into sentences, phrases or words to extract named entities from content of a document and further constructs a graph of semantic relationship between each named entities to form an entity relation topic for hot topic extraction.
  • One aspect of the invention provides a system (100) for hot topics aggregation using a relationship graph.
  • the system (100) having a plurality of databases (108, 110, 112, 114) for storing processed and unprocessed documents is characterized by at least one data processing unit (102) having means for receiving real-time streaming data from a plurality of remote data sources; cleansing, analysing and processing received data in text-based; breaking down data processed into sentences, phrases or words to extract named entities and discover dependency between the named entities and sentences to derive meanings from documents containing natural language by applying natural language processing; at least one data relation unit (104) configured to the at least one data processing unit (102) for extracting and constructing semantic relationship between documents based on processed sentences, phrases or words; and at least one topic ranking unit (106) configured to the at least one data relation unit (104) for extracting topic clusters from the relationship graph, determining topic ranking and constructing hot topics abstract from ranked topic clusters to form a specific topic and a general topic.
  • the at least one data relation unit (104) further comprises at least one key phrase extraction module for selecting a set of phrases describing extracted entities as named entity topic and its value; and named entities relation graph clusters for building a relationship graph between documents to identify semantic similarity based on extracted named entities and key phrases.
  • a further aspect of the invention provides that the plurality of databases (108, 110, 112, 114) are preferably a content database (108), a named entities database (110), a graph database (112) and a hot topic database (114).
  • Another aspect of the invention provides a method (200) for hot topics aggregation using a relationship graph.
  • the method (200) comprising steps of processing real-time streaming data received from a plurality of remote data sources (202); extracting and constructing semantic relationship between documents based on processed sentences, phrases or words (204); and extracting topic clusters from the relationship graph (206), determining topic ranking and constructing hot topics abstract from ranked topic clusters to form a specific topic and a general topic (208).
  • a further aspect of the invention provides that the step of processing real-time streaming data received from a plurality of remote data sources (202) further comprises steps of retrieving title and content of each document from a content database (302); removing all unwanted contents including symbols, emoticons, images and uniform resource locator (304); tokenizing contents of all documents into sentences (306); marking title and sentences from the contents of all documents by part of speech tagging (308); extracting named entities from the marked title and sentences from contents of all documents of step 308 (310); checking the extracted named entities extracted in step 310 and matching named entities with abbreviation in the named entities database (312); harmonizing abbreviation of the extracted named entities back to its original name (314); determining occurrence of named entities in the contents of all documents (316); ranking a list of named entities found based on the occurrence of named entities determined in step 316 and position in title and content of documents in descending order and saving ranked list of named entities in the content database (318); and performing cross sentence dependency parser for each named entity between all sentences related to the named entity
  • step of extracting and constructing semantic relationship between documents based on processed sentences, phrases or words (204) further comprises steps of (400) extracting key phrase by selecting a set of phrases that best describe extracted entities as named entity topic and its value (402); and building a relationship graph between documents from named entities relation graph clusters to identify the semantic similarity based on extracted named entities and key phrases (404).
  • the step of extracting key phrase by selecting a set of phrases that best describe extracted entities as named entity topic and its value further comprises steps of (500) retrieving ranked extracted named entities from step 318 (502); compiling sentences from cross sentence dependency parser from step 320 with related named entities (504); tokenizing sentences into words (506); removing at least one stop word to create phrases (508); stemming created phrases (510); converting phrases into vectors (512); constructing similarity matrix between the vectors (514); scoring of phrases by using graph based ranking algorithm (516); ranking of phrases by descending order of rank score (518); selecting a top ranked phrase as a key phrase (520); and using the key phrase as named entity topic value and storing the key phrase into the named entity database (522).
  • a further aspect of the invention provides that the step of building a relationship graph between documents from named entities relation graph clusters to identify the semantic similarity based on extracted named entities and key phrases (404) further comprises steps of (600) determining if the named entity, NE exist in the relationship graph (602); creating a new named entity, NE node (604) if the name entity does not exist in the relationship graph (604); and proceeding to step 606 if the named entity exist in the relationship graph (602); determining if the named entity, NE node has a named entity topic, NeT value (606); adding a new named entity topic, NeT to the named entity, NE node (608) if the named entity topic does not exist in the named entity node (606); and proceeding to step 610 node if the named entity topic exist in the graph; determining if the named entity topic, NeT value exist (610); adding the named entity topic, NeT value to the named entity topic (612) if the named entity topic value does not exist in the named entity topic (610); proceeding to step
  • the step of extracting topic clusters from the relationship graph, determining topic ranking and constructing hot topics abstract from ranked topic clusters to form a specific topic and a general topic (206) further comprises steps of (700) setting a timeframe in day(s) (702); obtaining the named entity, NE nodes that contain named entity topic, NeT value with DocPT within the timeframe (704); constructing a topic title using the named entity, NE and named entity topic value (706); determining the named entity topic, NeT weight (708); ranking of topics by descending order of the named entity topic, NeT weight (710); and listing top N topics as hot topics for the timeframe (712).
  • FIG. 1.0 illustrates a general architecture of the system of the present invention.
  • FIG. 1.0a illustrates the sub modules within the data relation unit of the system of the present invention.
  • FIG. 2.0 is a flowchart illustrating a general methodology of the present invention.
  • FIG. 3.0 is a flowchart illustrating steps of processing real-time streaming data received from a plurality of remote data sources of the present invention.
  • FIG. 4.0 is a flowchart illustrating steps of extracting and constructing semantic relationship between documents based on processed sentences, phrases or words of the present invention.
  • FIG. 5.0 is a flowchart illustrating steps of extracting key phrase by selecting a set of phrases that best describe extracted entities as named entity topic and its value of the present invention.
  • FIG. 6.0 is a flowchart illustrating steps of building a relationship graph between documents from named entities relation graph clusters to identify the semantic similarity based on extracted named entities and key phrases of the present invention.
  • FIG. 7.0 is a flowchart illustrating steps of extracting topic clusters from the relationship graph, determining topic ranking and constructing hot topics abstract from ranked topic clusters to form a specific topic and a general topic of the present invention.
  • the present invention relates to a system and method for hot topics aggregation using relationship graph.
  • the present invention provides a system and method that breaks data into sentences, phrases or words to extract named entities from the content of a document and further constructs a graph of semantic relationship between each named entities to form an entity relation topic for hot topic extraction.
  • FIG. 1.0 illustrates a general architecture of the system of the present invention
  • FIG. 1 .0a illustrates the sub modules within the data relation unit of the system of the present invention.
  • the system (100) of the present invention for hot topics aggregation using a relationship graph having a plurality of databases (108, 110, 112, 114) for storing processed and unprocessed documents is characterized by three modules.
  • the modules are namely at least one data processing unit (102), at least one data relation unit (104) and at least one topic ranking unit (106).
  • the at least one data processing unit (102) having means for receiving real-time streaming data from a plurality of remote data sources; cleansing, analysing and processing received data in text-based; breaking down data processed into sentences, phrases or words to extract named entities and discover dependency between the named entities and sentences to derive meanings from documents containing natural language by applying natural language processing.
  • the at least one data relation unit (104) configured to the at least one data processing unit (102) having means for extracting and constructing semantic relationship between documents based on processed sentences, phrases or words; and the at least one topic ranking unit (106) configured to the at least one data relation unit (104) for extracting topic clusters from the relationship graph, determining topic ranking and constructing hot topics abstract from ranked topic clusters to form a specific topic and a general topic.
  • the at least one data relation unit (104) further comprises at least one key phrase extraction module (104a) and named entities relation graph clusters (104b).
  • the key phrase extraction module (104a) selects a set of phrases describing extracted entities as named entity topic and its value; and the named entities relation graph clusters (1004b) for building a relationship graph between documents to identify semantic similarity based on extracted named entities and key phrases.
  • These modules are coupled to a plurality of databases wherein the plurality of databases (108, 110, 112, 114) are preferably a content database (108), a named entities database (110), a graph database (112) and a hot topic database (114).
  • the at least one data processing unit (102) is coupled to the content database (108) and named entities database (110) while the at least one data relation unit (104) is coupled to the named entities database (110) and a graph database (112) while the and at least one topic ranking unit (106) is coupled to the hot topic database (114).
  • FIG. 2.0 illustrates the general methodology of the present invention for hot topics aggregation using a relationship graph.
  • the general methodology comprising steps of processing real-time streaming data received from a plurality of remote data sources (202); extracting and constructing semantic relationship between documents based on processed sentences, phrases or words (204); and extracting topic clusters from the relationship graph, determining topic ranking and constructing hot topics abstract from ranked topic clusters to form a specific topic and a general topic (206).
  • FIG. 3.0 illustrates the steps for processing real-time streaming data received from a plurality of remote data sources in the at least one data processing unit (102).
  • the steps for processing real-time streaming data received from a plurality of remote data sources is initiated by first retrieving title and content of each document from the content database (302). Thereafter, all unwanted contents including symbols, emoticons, images and uniform resource locator are removed from the content of the document (304). Upon removing all unwanted contents, the contents of all documents are tokenized into sentences (306) and followed by marking of title and sentences from contents of all documents a POS tag or part-of-speech tag (308).
  • a POS tag is a special label assigned to each tokenized word in a text corpus to indicate the part of speech for purposes of corpus searches and for text analysis.
  • Named entities are subsequently extracted from marked title and sentences from contents of all documents of step 308 (310) and named entities extracted in step 310 are checked and the checked named entities are further matched with abbreviations in the named entities database (312).
  • abbreviations in the named entities database Upon matching the abbreviations in the named entities database, the abbreviation of extracted named entities are harmonized back to its original name (314). The occurrence of named entities in the contents of all documents is thereafter determined (316).
  • cross sentence dependency parser is performed for each named entity between all sentences related to the named entity (320).
  • Cross sentence dependency parser searches for across all sentences where the entity is mentioned in the context in a separate sentence which is usually replaced by a pronoun in the next sentence.
  • Cross sentence dependency parser is used to link or gather separate sentences which refers or mentions about the entity. An example of an application of cross dependency parser is provided herewith.
  • FIG. 4.0 illustrates the steps for extracting and constructing semantic relationship between documents based on processed sentences, phrases or words (204) in the at least one data relation unit (104) while FIG. 5.0 illustrates the steps of extracting key phrase by selecting a set of phrases that best describe extracted entities as named entity topic and its value of the present invention and FIG. 6.0 illustrates the steps of building a relationship graph between documents from named entities relation graph clusters to identify the semantic similarity based on extracted named entities and key phrases of the present invention. As illustrated in FIG.
  • key phrase in extracting and constructing semantic relationship between documents based on processed sentences, phrases or words, key phrase is first extracted by selecting a set of phrases that best describe extracted entities as named entity topic and its value (402); and thereafter build a relationship graph between documents from named entities relation graph clusters to identify the semantic similarity based on extracted named entities and key phrases (404).
  • FIG. 5.0 in extracting key phrase by selecting a set of phrases that best describe extracted entities as named entity topic and its value (402) further comprises steps of (500) first retrieving ranked extracted named entities from step 318 (502) followed by compiling sentences from cross sentence dependency parser from step 320 with related named entities (504).
  • sentences are tokenized into words (506) with at least one stop word being removed to create phrases (508).
  • Stop words are words that are filtered before of after processing of natural language. Generally, stop words are commonly used words that a search engine has bee instructed to ignore.
  • the created phrases are stemmed (510) and converted into vectors (512).
  • similarity matrix is constructed between the vectors (514).
  • the phrases are scored by using graph based ranking algorithm (516) to rank the phrases in descending order of the rank score (518). Further, the top ranked phrase is selected as the key phrase (520) which is used as the named entity topic, NeT value. Finally, the selected key phrase is stored into the named entity database (522) which is also known as the named entity, NE dictionary.
  • in building a relationship graph between documents from named entities relation graph clusters to identify the semantic similarity based on extracted named entities and key phrases further comprises steps of (600) first determining if the named entity, NE exist in the relationship graph (602) and subsequently creating a new named entity, NE node (604) if the name entity does not exist in the relationship graph (602); and proceeding to step 606 if the named entity exist in the relationship graph. If the named entity exist in the relationship graph, it is further determined if the named entity, NE node has a named entity topic, NeT value (606).
  • a new named entity topic, NeT is added to the named entity, NE node (608) if the named entity topic does not exist in the named entity node (606); and proceeding to step 610 node if the named entity topic exist in the graph. If the named entity topic exist in the graph, it is further determined if the named entity topic, NeT value exist (610). The named entity topic, NeT value is added to the named entity topic if the named entity topic value does not exist in the named entity topic (612); and proceeding to step 614. Thereafter, document ID, docID is added to the named entity topic (614) and document published time, DocPT to the named entity topic and updating in a graph database (616).
  • FIG. 7.0 illustrates the steps of extracting topic clusters from the relationship graph, determining topic ranking and constructing hot topics abstract from ranked topic clusters to form a specific topic and a general topic of the present invention.
  • a timeframe in day(s) is first set (702).
  • the named entity, NE nodes that contain named entity topic, NeT value with DocPT within the timeframe are obtained (704) and a topic title using the named entity, NE and named entity topic value is constructed (706).
  • the named entity topic, NeT weight is determined (708).
  • the topics are ranked by descending order of the named entity topic weight (710); and top N topics are listed as hot topics for the timeframe (712).
  • NeT weight The hotness of a topic is determined by its named entity topic, NeT weight which is determined using the formula as shown below:
  • Document count Time range Current date or time Document published date or time Set timeframe in day(s) or hours total up — to — date document for a NeT The present invention in summary provides for extraction of hot topics and semantic relationship between each entities topic by breaking data into sentences, phrases or words to extract all named entities from the content by applying cross sentence dependency related to each named entity. Sentences related to a named entity are compiled from the cross sentence dependency to extract the keyword or phrases and further constructing a graph of the semantic relationship between each named entities to form an entity relation topic for hot topics extraction. Topic clusters are extracted from the relationship graph and topic ranking is determined based on similarity matrix between named entities topics value with inclusion of up-to-date weightage calculation to construct hot topics abstract from the ranked topic clusters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention se rapporte à un système et à un procédé d'agrégation de sujets sensibles à l'aide d'un graphe de relations. La présente invention est caractérisée par au moins une unité de traitement de données (102) permettant de traiter des données de diffusion en continu en temps réel reçues en provenance d'une pluralité de sources de données à distance (202) ; au moins une unité de relation de données (104) permettant d'extraire et de construire une relation sémantique entre des documents sur la base de phrases, d'expressions ou de mots traités (204) ; et au moins une unité de classement de sujets (106) permettant d'extraire des groupes de sujets à partir du graphe de relations, de déterminer un classement de sujets et de construire des sujets sensibles abstraits à partir de groupes de sujets classés pour former un sujet spécifique et un sujet général. En particulier, la présente invention construit un graphe de relations sémantique entre chaque entité nommée pour former un sujet de relation d'entité pour une extraction de sujet sensible.
PCT/MY2020/050158 2020-06-30 2020-11-17 Système et procédé d'agrégation de sujets sensibles à l'aide d'un graphe de relations WO2022005272A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2020003413 2020-06-30
MYPI2020003413 2020-06-30

Publications (1)

Publication Number Publication Date
WO2022005272A1 true WO2022005272A1 (fr) 2022-01-06

Family

ID=79317622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2020/050158 WO2022005272A1 (fr) 2020-06-30 2020-11-17 Système et procédé d'agrégation de sujets sensibles à l'aide d'un graphe de relations

Country Status (1)

Country Link
WO (1) WO2022005272A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795175A (zh) * 2023-02-15 2023-03-14 铭台(北京)科技有限公司 基于数据分析的多维度热点提取方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030033333A1 (en) * 2001-05-11 2003-02-13 Fujitsu Limited Hot topic extraction apparatus and method, storage medium therefor
US20100082331A1 (en) * 2008-09-30 2010-04-01 Xerox Corporation Semantically-driven extraction of relations between named entities
US20130085745A1 (en) * 2011-10-04 2013-04-04 Salesforce.Com, Inc. Semantic-based approach for identifying topics in a corpus of text-based items
US20150120717A1 (en) * 2013-10-25 2015-04-30 Marketwire L.P. Systems and methods for determining influencers in a social data network and ranking data objects based on influencers
KR20180053325A (ko) * 2015-09-18 2018-05-21 페이스북, 인크. 온라인 소셜 네트워크에서의 핵심 토픽의 감지

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030033333A1 (en) * 2001-05-11 2003-02-13 Fujitsu Limited Hot topic extraction apparatus and method, storage medium therefor
US20100082331A1 (en) * 2008-09-30 2010-04-01 Xerox Corporation Semantically-driven extraction of relations between named entities
US20130085745A1 (en) * 2011-10-04 2013-04-04 Salesforce.Com, Inc. Semantic-based approach for identifying topics in a corpus of text-based items
US20150120717A1 (en) * 2013-10-25 2015-04-30 Marketwire L.P. Systems and methods for determining influencers in a social data network and ranking data objects based on influencers
KR20180053325A (ko) * 2015-09-18 2018-05-21 페이스북, 인크. 온라인 소셜 네트워크에서의 핵심 토픽의 감지

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795175A (zh) * 2023-02-15 2023-03-14 铭台(北京)科技有限公司 基于数据分析的多维度热点提取方法
CN115795175B (zh) * 2023-02-15 2023-04-25 铭台(北京)科技有限公司 基于数据分析的多维度热点提取方法

Similar Documents

Publication Publication Date Title
US11475319B2 (en) Extracting facts from unstructured information
CN103390051B (zh) 一种基于微博数据的话题发现与追踪方法
US10002189B2 (en) Method and apparatus for searching using an active ontology
US6965900B2 (en) Method and apparatus for electronically extracting application specific multidimensional information from documents selected from a set of documents electronically extracted from a library of electronically searchable documents
Segev et al. Context-based matching and ranking of web services for composition
JP4241934B2 (ja) テキスト処理及び検索システム及び方法
US20030115188A1 (en) Method and apparatus for electronically extracting application specific multidimensional information from a library of searchable documents and for providing the application specific information to a user application
WO2014047727A1 (fr) Méthode et système de surveillance de médias sociaux et d'analyse de texte afin d'automatiser la classification de messages d'utilisateur grâce à un modèle d'évaluation de pertinence à facettes
US20110179026A1 (en) Related Concept Selection Using Semantic and Contextual Relationships
WO2010014082A1 (fr) Procédé et appareil pour associer des ensembles de données à l’aide de vecteurs sémantiques et d'analyses de mots-clés
CN102609433A (zh) 基于用户日志进行查询推荐的方法及系统
CN102119383A (zh) 便利内容检索服务系统内本体和语言模型生成的信息获取和汇聚方法及子系统
CN112989208B (zh) 一种信息推荐方法、装置、电子设备及存储介质
Al-Taani et al. An extractive graph-based Arabic text summarization approach
CN105378730A (zh) 社交媒体分析与输出
Kunneman et al. Open-domain extraction of future events from Twitter
CN110555154B (zh) 一种面向主题的信息检索方法
CN102117285B (zh) 一种基于语义索引的检索方法
Musaev et al. Fast text classification using randomized explicit semantic analysis
WO2022005272A1 (fr) Système et procédé d'agrégation de sujets sensibles à l'aide d'un graphe de relations
WO2017058584A1 (fr) Extraction de faits d'informations non structurées
Fu et al. Mining newsworthy events in the traffic accident domain from Chinese microblog
JP2008102790A (ja) 検索システム
CN113934910A (zh) 一种自动优化、更新的主题库构建方法,及热点事件实时更新方法
Chen et al. A personalised query suggestion agent based on query-concept bipartite graphs and Concept Relation Trees

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20943658

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20943658

Country of ref document: EP

Kind code of ref document: A1