WO2023282854A2 - Systèmes et procédés pour classer l'influence d'actualités et pour identifier des actualités positives - Google Patents

Systèmes et procédés pour classer l'influence d'actualités et pour identifier des actualités positives Download PDF

Info

Publication number
WO2023282854A2
WO2023282854A2 PCT/SG2022/050480 SG2022050480W WO2023282854A2 WO 2023282854 A2 WO2023282854 A2 WO 2023282854A2 SG 2022050480 W SG2022050480 W SG 2022050480W WO 2023282854 A2 WO2023282854 A2 WO 2023282854A2
Authority
WO
WIPO (PCT)
Prior art keywords
news
influence
asset
assets
query
Prior art date
Application number
PCT/SG2022/050480
Other languages
English (en)
Other versions
WO2023282854A3 (fr
Inventor
Fuli Feng
Moxin LI
Zhenghao Ritchie NG
Cheng Luo
Tat Seng Chua
Original Assignee
National University Of Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University Of Singapore filed Critical National University Of Singapore
Publication of WO2023282854A2 publication Critical patent/WO2023282854A2/fr
Publication of WO2023282854A3 publication Critical patent/WO2023282854A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Definitions

  • Present systems and methods are developed based on news ranking, such as financial news ranking, where it is critical to combine the perspectives of both query and document.
  • a system for ranking influence of news on an asset from a groups of assets comprising: an influence quantification module (IQM) for evaluating an influence of an event based on the asset, by identifying and analysing one or more first news corresponding to the event, to determine a first influence score for each first news; an influence allocation module (IAM) for evaluating an influence of one or more second news across all assets, by identifying connections between assets based on each second news, to determine a second influence score for each second news; and an influence mixer for combining the first influence score and second influence score to determine, for each first news and each second news, a ranking of the respective news against all other news in influencing an asset.
  • IQM influence quantification module
  • IAM influence allocation module
  • an influence mixer for combining the first influence score and second influence score to determine, for each first news and each second news, a ranking of the respective news against all other news in influencing an asset.
  • a system for identifying positive news comprising: a database query engine for querying a database to identify influential analyst reports, influence being determined based on a number of accesses of the respective report; review each influential analyst report to extract mentions of events, based on a set of interpretation rules; a matching module to, for each mention, compare the mention with a plurality of news to match the mention with news describing the respective event; and one or more processors configured to: select, based on a reference frequency of each matched news, a first predetermined number of matched news that is most frequently referenced; and select, from the first predetermined number of matched news a second predetermined number of matched news based on a spectrum of events discussed in the respective news.
  • Also disclosed is a method for ranking influence of news on an asset from a groups of assets comprising: evaluating, using an influence quantification module (IQM), an influence of an event based on the asset, by identifying and analysing one or more first news corresponding to the event, to determine a first influence score for each first news; evaluating, using an influence allocation module (IAM), an influence of one or more second news across all assets, by identifying connections between assets based on each second news, to determine a second influence score for each second news; and combining, using an influence mixer, the first influence score and second influence score to determine, for each first news and each second news, a ranking of the respective news against all other news in influencing an asset.
  • IQM influence quantification module
  • IAM influence allocation module
  • the problem of financial event ranking as a learning to rank task is formulated and a Hybrid News Ranking (HNR) framework is proposed to tackle the problem.
  • HNR Hybrid News Ranking
  • the target asset is viewed as the query to retrieve the candidate news published within a lag until the query date.
  • methods described herein build up a cost friendly system to label positive news and construct three datasets for financial news ranking.
  • Figure 6 is a block diagram showing an exemplary computer device, for performing the methods described herein;
  • Figure 8 shows the performance of the compared methods with respect to Mean Average Precision (MAP).
  • Figure 10 shows the returned top-5 news of HNR_CNN and the search engine.
  • Hybrid News Ranking framework that, from the asset perspective, evaluates the influence of news articles by comparing their contents and, from the event perspective, accesses the influence over all query assets.
  • the framework resolves the dilemma between the essential requirement of sufficient labels for training the framework and the unaffordable cost of hiring domain experts for labelling the news.
  • Labelled data is indispensable for the training of HNR, which largely relies on deep neural networks to encode the query and news.
  • HNR largely relies on deep neural networks to encode the query and news.
  • it is extremely resource consuming to label the influence of finance events due to the large number of candidates and the reliance on experienced but expensive financial experts.
  • proposed herein is a cost-friendly system for news labelling that leverages the knowledge within published financial analyst reports. This enables construction of three financial event ranking datasets.
  • the proposed labelling system identifies positive news for each asset from the corresponding, periodically published analyst reports that are written by domain experts.
  • some embodiments of the system perform: mention extraction which extracts events mentioned in the analyst report and mention-news matching which matches the extracted event with news reporting the event.
  • mention extraction which extracts events mentioned in the analyst report
  • mention-news matching which matches the extracted event with news reporting the event.
  • Systems and methods disclosed herein can also scrutinize influence from the perspectives of both asset and news.
  • the system comprises 1) an influence quantification module to evaluate the influence of financial events from the asset perspective by comparing their contents; 2) an influence allocation module to access the influence of a news across assets from the query perspective, aligning the influence scores of a news on the homogeneous queries; and 3) an influence mixer to combine the two perspectives.
  • 104 evaluating an influence of one or more second news across all assets; 106: combining the first influence score and second influence score to determine, for each first news and each second news, a ranking of the respective news against all other news in influencing an asset.
  • Figure 2(a) shows the candlestick progression of Tesla share price 200 over time. The candlesticks and bars represent the daily price movement and trading volume, respectively.
  • Curve 202 is the Dow Jones Industrial Average (DJI) index, which reflects the trend of US stock market.
  • Figure 1(a) (referring to Figure 1, image (a)) reflects the events announced at each time point and the influence those events has on the share price movement.
  • DJI Dow Jones Industrial Average
  • the scoring function should identify the key patterns of news contents that can distinguish influential news from the common one in a query specific manner. To do this, the present training process learns from labelled historical queries.
  • a supervised learning paradigm is used to optimize the parameters of the scoring function, which is formulated as:
  • Figure 3 illustrates the HNR framework for running the method 100, where queries 300 (e.g., Ql and Q t 2 ) are homogeneous ones, and the query numbered 302 is heterogeneous with respect to queries 300 (e.g., Qt and Q t 3 are heterogeneous).
  • Step 304 of Figure 3 aligns with step 102 of Figure 1 and is performed by the IQM.
  • the IQM itself is illustrated in Figure 4.
  • the main consideration for devising the influence quantification module 400 is assessing the influence of a news (i.e. news item or first news item) on a given query according to its contents.
  • the IQM assesses the influence from a news perspective.
  • the query may comprise a query description and the news may comprise content. This analysis thus comprises mining connections between the query description and the news content. This mining process is therefore coherent with the target or concept of document retrieval.
  • the influence quantification module is effectively a document retrieval model where the input is a concatenation of the query 402 and the candidate news 404 [ Q t , Df] and the output is the influence prediction 406, i.e., the probability that the D ⁇ is an influential event of Q t .
  • the probability can be formulated as: where Q q denotes the model parameters to be learned.
  • f q ( ⁇ ) was devised based on a deep Transformer such as BERT, XLNet or RoBERTa, which is pre-trained over a large- scale corpus in a self-supervised manner to encode the co-occurrence of words.
  • the CNN consists of a stack layer, column convolution layer, row convolution layer, and FC layer, although other architectures may be applicable depending on the application.
  • the column convolution layer of the CNN consists of ID vertical filters to learn the rules for combining the two predictions.
  • the row convolution layer consists of ID horizontal filters to recognize the interquery patterns, which is formulated as:
  • the FC layer makes the final prediction to combine the recognized signals, which is formulated as: where H e S (M)x l Q l and b e Jl ⁇ are parameters to be learned, and flatten ( ⁇ ) flattens a matrix as a vector.
  • Figure 3 image (c) the process of the CNN influence mixer is depicted using a simple example where the output score 310 are inputted into a column convolution layer 312 that outputs to a row convolution layer 314 that passes through the FC layer 316 to produced final, mixed scores 318.
  • image (a) relates to extracting event mentions from the analyst reports and image (b) involves matching mention with candidate news.
  • image (c) The process 500 corresponding to Figure 5 is shown in image (c), and involves:
  • each influential analyst report - e.g. report with a number of accesses exceeding a threshold or that is ranked (e.g. using the IQM) within the top N news, where N is a predetermined number of proportion of all reports - is then reviewed to extract mentions of events, based on a set of interpretation rules.
  • Hongyuan Futures Co., Ltd. was selected as the source to collect historical analyst reports since the reports are published in plain text instead of PDF files.
  • three commodity markets were chosen: metal, agriculture, and chemical, according to the popularity of reports, i.e., the number of reads.
  • the analyst report is typically published on each trading day and is targeted at a group of commodities, e.g., base metal (copper, aluminium, etc.) and ferrous (screw thread steel and iron ore).
  • base metal copper, aluminium, etc.
  • ferrous screw thread steel and iron ore
  • candidate news is collected in Chinese from the largest portal websites in China for both the financial news and commodity news.
  • News posted within 48 hours before trading day t were selected as the candidate news of query Q t .
  • 48 hours is used since the markets can quickly react to the event, which means that "old" news cannot influence the markets anymore.
  • the size of the candidate set is around 2,400 for each query. In other words, given a query, the task is to select the top -K influential news from thousands of candidates.
  • Step 504 therefore involves extracting one or more of at least one assets or asset classes, determined by mentions, associated with the events.
  • the number of 4.89, 2.51, and 2.55 positive events were extracted for the queries in the metal, agriculture, and chemical datasets, respectively.
  • Mention news matching is thus performed.
  • the event mention is matched with its corresponding news, i.e., recognizing the news that reports the same event as the mention within the candidate set.
  • matching is performed in two steps: 1) automatic matching, which evaluates the similarity between the mention and each candidate news; and 2) manual checking where crowd workers check the top-3 most similar news to identify the positive one.
  • Manual checking may ultimately be avoided in some embodiments, in view of large data sets or after considerable training. Note that checking on whether two pieces of text describe the same event can be done without domain expertise, resolving the reliance on domain experts.
  • the public API can be leveraged provided by one of the largest search engines for Chinese news.
  • positive news can be identified for more than 99.9% of the extracted event mentions and discard the remaining cases.
  • the news can be confirmed to cover a wide spectrum of events affecting the supply and demand of the commodities, such as geopolitical events, government policies, company announcements, and strike, indicating the challenge of these datasets.
  • FIG. 6 is a block diagram showing an exemplary computer device 600, in which embodiments of the proposed method may be practiced.
  • the computer device 600 may be a mobile computer device such as a smart phone, a wearable device, a palm-top computer, and multimedia Internet enabled cellular telephones, an on-board computing system or any other computing system, a mobile device such as an iPhone TM manufactured by AppleTM, Inc or one manufactured by LGTM, HTCTM and SamsungTM, for example, or other device.
  • a mobile computer device such as a smart phone, a wearable device, a palm-top computer, and multimedia Internet enabled cellular telephones, an on-board computing system or any other computing system, a mobile device such as an iPhone TM manufactured by AppleTM, Inc or one manufactured by LGTM, HTCTM and SamsungTM, for example, or other device.
  • the mobile computer device 600 includes the following components in electronic communication via a bus 606: (a) a display 602;
  • non-volatile (non-transitory) memory 604 (b) non-volatile (non-transitory) memory 604;
  • the present system 600 further includes database query module 624, IAM 626, IQM 628, Mixer 630 and matching module 632. All modules may be stored in memory 604.
  • the database queried by the database query module 624 may also be stored in memory 604, but may instead be located remotely - e.g. in the server of a news provider or news aggregator. Components of the system may communicate between each other over bus 606, or over network 622 via antenna 630 or transceivers 612 in a similar way to communicating with remote servers and databases.
  • the display 602 generally operates to provide a presentation of content to a user, and may be realized by any of a variety of displays (e.g., CRT, LCD, HDMI, micro-projector and OLED displays).
  • non-volatile data storage 604 functions to store (e.g., persistently store) data and executable code.
  • the system architecture may be implemented in memory 604, or by instructions stored in memory 604.
  • the non-volatile memory 604 includes bootloader code, modem software, operating system code, file system code, and code to facilitate the implementation components, well known to those of ordinary skill in the art, which are not depicted nor described for simplicity.
  • the non-volatile memory 604 is realized by flash memory (e.g., NAND or ONENAND memory), but it is certainly contemplated that other memory types may be utilized as well. Although it may be possible to execute the code from the non-volatile memory 604, the executable code in the non-volatile memory 604 is typically loaded into RAM 608 and executed by one or more of the N processing components 610.
  • flash memory e.g., NAND or ONENAND memory
  • the N processing components 610 in connection with RAM 608 generally operate to execute the instructions stored in non-volatile memory 604.
  • the N processing components 610 may include a video processor, modem processor, DSP, graphics processing unit (GPU), and other processing components.
  • the system 600 of Figure 6 may be connected to any appliance 618, such as a system housing the server containing the database queried by database query module 624.
  • Non-transitory computer-readable medium 604 includes both computer storage medium and communication medium including any medium that facilitates transfer of a computer program from one place to another.
  • a storage medium may be any available medium that can be accessed by a computer.
  • each dataset was chronologically split into training, validation, and testing with a ratio of 7: 1:2. That is, the most recent 20 percent queries are treated as testing cases.
  • MAP Mobile Reciprocal Rank - MRR1 and MRR3
  • Recall Rec3, Rec5, and ReclO
  • ColBERT It is a Transformer-based ranking method that encodes the query and the document separately with the Transformer and scores query- document pair by the similarity (inner product) of their representations.
  • RetQE It is an extension of Ret equipped with query expansion. For a query Q t , it extracts candidate terms from the positive news Q t-lr and adds the top- ranked terms into the input query according to their cosine similarity with the query. The expanded query is then fed into Ret during both model training and testing.
  • ClaM It fine-tunes the pre-trained Transformer with an additional layer to classify the documents into different types of queries. Documents are ranked by the probability over the class corresponds to the type of the input query. It is same as the influence allocation module in the proposed HNR.
  • the compared methods are implemented with Pytorch 1.4.0 based on HuggingFace's Transformers.
  • For pre-training we use the checkpoint of Chinese RoBERTa with Whole Word Masking (named chinese-roberta-wwm-ext).
  • the maximum input length was set to be 256 and update model parameters with AdamW [28].
  • the gradient accumulation step was set as 2, gradient clipping by 2.0, the number of warmup steps as 100, the total training steps as 5,000, and the weight for regularization term (i.e., a) as 0.
  • the learning rate and batch size are selected according to the validation performance with respect to ReclO.
  • the coefficient of RUBi i.e., l
  • Table 2 ranking performance of document classification and document retrieval models on the three datasets The rationality of Learning to Rank (RQ1) was then considered. To validate the rationality of formulating the financial event ranking as a learning to rank task, the document classification and retrieval methods were first tested. Table 2 summarizes the ranking performance of the compared methods on the three datasets: metal, agriculture, and chemical. Note that Upper_Bound represents the performance of knowing the ground truth of the test queries, which can be seen as the performance of domain experts. From the table, we have the following observations:
  • the retrieval models achieve performance that is comparable to classification models. Across the six metrics, both types of model achieve the best performance in some cases. For instance, ClaM achieves the best Rec5 on the metal market, while ColBERT achieves the best Rec5 on the agriculture market. These results reflect that both types have their pros and cons, i.e., neither asset perspective nor the event perspective is sufficient to solve financial event ranking. As such, it is essential to build a hybrid solution to combine the two perspectives.
  • BM25 achieves the worst performance because it only considers the occurrence of query terms. This result highlights the importance of understanding the news content, which means that keyword- based filtering is not applicable for financial news, b) Ret performs better than RetQE in most cases, which means that the benefit of query expansion is limited in the financial event ranking problem. The reason may be the result of temporal fluctuation of the financial events, i.e., positive events across different time- steps are not closely connected.
  • ClaM performs better on the metal and agriculture markets, while ClaS achieves better performance on the chemical market. Recall that ClaM has separate classification parameters for each type of query while ClaS shares all model parameters across queries.
  • HNR_CNN 0.512 0.459 0.545 0.400 0.627 0.834
  • Table 3 performance of HNR on the three datasets. The best and worst performances on each dataset with respect to each metric are highlighted with bold font and underlined, respectively
  • HNR_CNN that applies the CNN mixer (Equation 7); 2) HNR_RUBi that applies the RUBi mixer (Equation 6); 3) HNR without the influence allocation module (i.e., Ret); and 4) HNR without the influence quantification module (i.e., ClaM).
  • Table 3 shows the performance of the four HNR versions on the three datasets. From the table, the following can be observed :
  • HNB_CNN performs better than HNB_RUBi in most cases, which shows the advantages of the CNN mixer module. That is to say, the query and news perspectives can be more accurately combined by considering the local-region patterns (i.e., the intermodule and inter-query connections).
  • the hybrid models i.e., HNB_RUBi and HNB_CNN
  • achieve performance gain over the single models i.e., Ret and Clam.
  • the hybrid models typically perform the best and seldom perform the worst among the four versions, which indicates that the hybrid models successfully leverage the pros of both the quantification and the allocation modules. Therefore, these results validate the effectiveness and the rationality of the hybrid learning-to-rank framework in solving financial event ranking.
  • HNB_CNN performs slightly better with respect to Rec5 and ReclO, while worse with respect to the remaining metrics, especially Rec3.
  • the result means that the influence mixer has to sacrifice the ranking at the head (e.g., top three) to recall more positive news on the chemical dataset.
  • the allocation module i.e., ClaM
  • HNR_RUBi and HNR_CNN show clear performance gain over Ret, which is attributed to the ability of HNR to consider query relations.
  • HNR On the heterogeneous queries, as compared to Ret, HNR performs better on one query, but worse on the other query. It means that the benefit of HNR mainly comes from accounting for the homogeneous query relations.
  • Figure 8 shows performance of HNR and the quantification module on homogeneous queries and heterogeneous queries.
  • the combination strategies learned by the CNN mixer were then investigated by observing the changes from the ranking of Ret to the ranking of HNB_CNN.
  • the inversion number was used to quantitatively analyze the position changes of a type of news.
  • a label was applied with 1 and the remaining news with 0 in the rankings from Ret and HNB_CNN, count the inversion number of each ranking, and the increase rate of the inversion number was calculated from Ret to HNB_CNN.
  • a positive increase rate means that the HNB_CNN moves the selected type of news to the top positions.
  • Figure 8 illustrate the average increase rate over all testing queries in the three datasets. From the figure, it is observable that the increase rate of the first type (i.e., argmax ⁇ yi) Q t ) is positive, while the increase rate of the second type (i.e., argmaxffi) 1 Q t ) is negative.
  • the result means that the CNN mixer favors news with the maximum score from the allocation module, which is a reasonable strategy to combine the two modules.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système permettant de classer l'influence d'actualités sur un actif d'un groupe d'actifs. Le système comprend un module de quantification d'influence (IQM) destiné à évaluer l'influence d'un ou de plusieurs événements sur la base de l'actif, par identification et analyse de premières actualités correspondant à l'événement, en vue de déterminer un premier score d'influence pour la ou les premières actualités. Un module d'attribution d'influence (IAM) évalue une influence de secondes actualités sur tous les actifs, par identification de connexions entre des actifs sur la base de chaque seconde actualité, en vue de déterminer un second score d'influence pour chaque seconde actualité. Un mélangeur d'influence combine ensuite le premier score d'influence et le second score d'influence pour déterminer, pour chaque première actualité et chaque seconde actualité, un classement des actualités respectives vis-à-vis de toutes les autres actualités dans l'influence d'un actif.
PCT/SG2022/050480 2021-07-09 2022-07-08 Systèmes et procédés pour classer l'influence d'actualités et pour identifier des actualités positives WO2023282854A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10202107560U 2021-07-09
SG10202107560U 2021-07-09

Publications (2)

Publication Number Publication Date
WO2023282854A2 true WO2023282854A2 (fr) 2023-01-12
WO2023282854A3 WO2023282854A3 (fr) 2023-04-06

Family

ID=84802142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2022/050480 WO2023282854A2 (fr) 2021-07-09 2022-07-08 Systèmes et procédés pour classer l'influence d'actualités et pour identifier des actualités positives

Country Status (1)

Country Link
WO (1) WO2023282854A2 (fr)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222052B2 (en) * 2011-02-22 2022-01-11 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and
WO2014186639A2 (fr) * 2013-05-15 2014-11-20 Kensho Llc Systèmes et procédés d'exploration et de modélisation de données
KR101708444B1 (ko) * 2015-11-16 2017-02-22 주식회사 위버플 키워드 및 자산 가격 관련성 평가 방법 및 그 장치
US20180053255A1 (en) * 2016-08-19 2018-02-22 Noonum System and Method for end to end investment and portfolio management using machine driven analysis of the market against qualifying factors
US20190303395A1 (en) * 2018-03-30 2019-10-03 State Street Corporation Techniques to determine portfolio relevant articles
KR20220001065A (ko) * 2020-06-29 2022-01-05 한국투자증권 주식회사 컨텐츠와 자산의 관련도 평가 방법 및 장치

Also Published As

Publication number Publication date
WO2023282854A3 (fr) 2023-04-06

Similar Documents

Publication Publication Date Title
US11663254B2 (en) System and engine for seeded clustering of news events
US20180082183A1 (en) Machine learning-based relationship association and related discovery and search engines
CN108763321B (zh) 一种基于大规模相关实体网络的相关实体推荐方法
Welch et al. Search result diversity for informational queries
US20070198459A1 (en) System and method for online information analysis
US20120166439A1 (en) Method and system for classifying web sites using query-based web site models
CN111506727B (zh) 文本内容类别获取方法、装置、计算机设备和存储介质
CN112307182B (zh) 一种基于问答系统的伪相关反馈的扩展查询方法
CA2956627A1 (fr) Systeme et moteur servant au regroupement cible d'evenements d'informations
CN102411626A (zh) 基于相关性分数分布对查询意图进行分类的方法
US8825641B2 (en) Measuring duplication in search results
CN108647281B (zh) 网页访问风险检测、提示方法、装置及计算机设备
Kaur Web content classification: a survey
CN102306178A (zh) 视频推荐方法及装置
Sharaf et al. Efficient Diversification for Recommending Aggregate Data Visualizations
Saikumar et al. A Lite-SVM Based Semantic Search Model for Bigdata Analytics in Smart Cities
Saha et al. A large scale study of SVM based methods for abstract screening in systematic reviews
Zhou et al. Enhancing potential re-finding in personalized search with hierarchical memory networks
WO2023282854A2 (fr) Systèmes et procédés pour classer l'influence d'actualités et pour identifier des actualités positives
CN115630144A (zh) 一种文档搜索方法、装置及相关设备
Dai et al. Contrastive Learning for User Sequence Representation in Personalized Product Search
KR102041915B1 (ko) 인공지능을 활용한 데이터베이스 모듈 및 이를 이용하는 경제데이터 제공 시스템 및 방법
Jazbec et al. On the impact of publicly available news and information transfer to financial markets
Badache 2SRM: learning social signals for predicting relevant search results
CN117556118B (zh) 基于科研大数据预测的可视化推荐系统及方法

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE