CA2917153A1 - Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle - Google Patents

Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle Download PDF

Info

Publication number
CA2917153A1
CA2917153A1 CA2917153A CA2917153A CA2917153A1 CA 2917153 A1 CA2917153 A1 CA 2917153A1 CA 2917153 A CA2917153 A CA 2917153A CA 2917153 A CA2917153 A CA 2917153A CA 2917153 A1 CA2917153 A1 CA 2917153A1
Authority
CA
Canada
Prior art keywords
computer
text
corpus
relation
discourse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA2917153A
Other languages
English (en)
Other versions
CA2917153C (fr
Inventor
Blake HOWALD
Andrew NYSTROM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Reuters Enterprise Centre GmbH
Original Assignee
Thomson Reuters Global Resources ULC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Reuters Global Resources ULC filed Critical Thomson Reuters Global Resources ULC
Publication of CA2917153A1 publication Critical patent/CA2917153A1/fr
Application granted granted Critical
Publication of CA2917153C publication Critical patent/CA2917153C/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

La présente invention se rapporte à un procédé et à un système permettant de prédire des relations rhétoriques implicites entre deux fragments d'un texte, par exemple dans un important corpus annoté, tel que le Penn Discourse Treebank (« PDTB »), le Rhetorical Structure Theory corpus et le Discourse Graph Bank, et permettent, en particulier, de déterminer une relation rhétorique en l'absence d'un marqueur de discours explicite. Des caractéristiques de niveau de surface peuvent être utilisées pour capturer des informations pragmatiques codées dans le marqueur absent. Selon une manière, une caractéristique simplifiée déterminée sur la base seulement d'un texte brut et de fonctions sémantiques est utilisée pour améliorer la performance de toutes les relations. En utilisant des caractéristiques de niveau de surface pour prédire des relations rhétoriques implicites pour l'important corpus annoté, l'invention se rapproche d'une performance maximale théorique, suggérant que davantage de données n'amélioreront pas nécessairement la performance sur la base de ces caractéristiques et de caractéristiques similaires.
CA2917153A 2013-07-03 2014-07-03 Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle Active CA2917153C (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201361842635P 2013-07-03 2013-07-03
US61/842,635 2013-07-03
PCT/US2014/045432 WO2015003143A2 (fr) 2013-07-03 2014-07-03 Procédé et système permettant de simplifier une prédiction de relations rhétoriques implicites dans un corpus annoté à grande échelle

Publications (2)

Publication Number Publication Date
CA2917153A1 true CA2917153A1 (fr) 2015-01-08
CA2917153C CA2917153C (fr) 2022-05-17

Family

ID=52144292

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2917153A Active CA2917153C (fr) 2013-07-03 2014-07-03 Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle

Country Status (3)

Country Link
AU (1) AU2014285073B9 (fr)
CA (1) CA2917153C (fr)
WO (1) WO2015003143A2 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209366A (zh) * 2019-10-10 2020-05-29 天津大学 基于TransS驱动的互激励神经网络的隐式篇章关系识别方法
CN112257460A (zh) * 2020-09-25 2021-01-22 昆明理工大学 基于枢轴的汉越联合训练神经机器翻译方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7187545B2 (ja) * 2017-09-28 2022-12-12 オラクル・インターナショナル・コーポレイション 名前付きエンティティの構文解析および識別に基づくクロスドキュメントの修辞的つながりの判断
US11809825B2 (en) 2017-09-28 2023-11-07 Oracle International Corporation Management of a focused information sharing dialogue based on discourse trees
EP3791292A1 (fr) 2018-05-09 2021-03-17 Oracle International Corporation Constructions d'arbres de discours imaginaires pour améliorer la réponse à des questions convergentes
US11580298B2 (en) 2019-11-14 2023-02-14 Oracle International Corporation Detecting hypocrisy in text
CN113407713B (zh) * 2020-10-22 2024-04-05 腾讯科技(深圳)有限公司 基于主动学习的语料挖掘方法、装置及电子设备
CN113535973B (zh) * 2021-06-07 2023-06-23 中国科学院软件研究所 基于知识映射的事件关系抽取、语篇关系分析方法及装置
CN113377915B (zh) * 2021-06-22 2022-07-19 厦门大学 对话篇章解析方法
CN113553830B (zh) * 2021-08-11 2023-01-03 桂林电子科技大学 一种基于图的英语文本句子语篇连贯分析方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659766A (en) * 1994-09-16 1997-08-19 Xerox Corporation Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision
AU2001261506A1 (en) * 2000-05-11 2001-11-20 University Of Southern California Discourse parsing and summarization
US7062561B1 (en) * 2000-05-23 2006-06-13 Richard Reisman Method and apparatus for utilizing the social usage learned from multi-user feedback to improve resource identity signifier mapping
US7127208B2 (en) * 2002-01-23 2006-10-24 Educational Testing Service Automated annotation
US7305336B2 (en) * 2002-08-30 2007-12-04 Fuji Xerox Co., Ltd. System and method for summarization combining natural language generation with structural analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209366A (zh) * 2019-10-10 2020-05-29 天津大学 基于TransS驱动的互激励神经网络的隐式篇章关系识别方法
CN111209366B (zh) * 2019-10-10 2023-04-21 天津大学 基于TransS驱动的互激励神经网络的隐式篇章关系识别方法
CN112257460A (zh) * 2020-09-25 2021-01-22 昆明理工大学 基于枢轴的汉越联合训练神经机器翻译方法
CN112257460B (zh) * 2020-09-25 2022-06-21 昆明理工大学 基于枢轴的汉越联合训练神经机器翻译方法

Also Published As

Publication number Publication date
CA2917153C (fr) 2022-05-17
WO2015003143A3 (fr) 2015-05-14
WO2015003143A2 (fr) 2015-01-08
AU2014285073B2 (en) 2016-11-03
AU2014285073B9 (en) 2017-04-06
AU2014285073A1 (en) 2016-02-04

Similar Documents

Publication Publication Date Title
US9355372B2 (en) Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus
CA2917153C (fr) Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle
US9317498B2 (en) Systems and methods for generating summaries of documents
Yi et al. Sentiment mining in WebFountain
Sarawagi Information extraction
Chen et al. Towards robust unsupervised personal name disambiguation
Chali et al. Query-focused multi-document summarization: Automatic data annotations and supervised learning approaches
Khan et al. EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification
Zhang et al. Enhancing keyphrase extraction from academic articles with their reference information
Laddha et al. Aspect opinion expression and rating prediction via LDA–CRF hybrid
Fagan et al. An introduction to textual econometrics
Zheng et al. A review on authorship attribution in text mining
Sharma et al. Diverse feature set based Keyphrase extraction and indexing techniques
Rajman et al. From text to knowledge: Document processing and visualization: A text mining approach
You et al. Joint learning-based heterogeneous graph attention network for timeline summarization
Zhou et al. Semantic Smoothing of Document Models for Agglomerative Clustering.
Mason An n-gram based approach to the automatic classification of web pages by genre
Tahmasebi Models and algorithms for automatic detection of language evolution: towards finding and interpreting of content in long-term archives
Sizov Extraction-based automatic summarization: Theoretical and empirical investigation of summarization techniques
Dalton Entity-based enrichment for information extraction and retrieval
Brand et al. N-gram representations for comment filtering
Ceylan Investigating the extractive summarization of literary novels
Machova et al. Selecting the Most Probable Author of Asocial Posting in Online Media
Uddin et al. Short text classification using semantically enriched topic model
Gupta et al. Machine learning-based authorship attribution using token n-grams and other time tested features

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20190627