CA2917153C - Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle - Google Patents

Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle Download PDF

Info

Publication number
CA2917153C
CA2917153C CA2917153A CA2917153A CA2917153C CA 2917153 C CA2917153 C CA 2917153C CA 2917153 A CA2917153 A CA 2917153A CA 2917153 A CA2917153 A CA 2917153A CA 2917153 C CA2917153 C CA 2917153C
Authority
CA
Canada
Prior art keywords
computer
text
corpus
relation
discourse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA2917153A
Other languages
English (en)
Other versions
CA2917153A1 (fr
Inventor
Blake HOWALD
Andrew NYSTROM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Reuters Enterprise Centre GmbH
Original Assignee
Thomson Reuters Enterprise Centre GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Reuters Enterprise Centre GmbH filed Critical Thomson Reuters Enterprise Centre GmbH
Publication of CA2917153A1 publication Critical patent/CA2917153A1/fr
Application granted granted Critical
Publication of CA2917153C publication Critical patent/CA2917153C/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

La présente invention se rapporte à un procédé et à un système permettant de prédire des relations rhétoriques implicites entre deux fragments d'un texte, par exemple dans un important corpus annoté, tel que le Penn Discourse Treebank (« PDTB »), le Rhetorical Structure Theory corpus et le Discourse Graph Bank, et permettent, en particulier, de déterminer une relation rhétorique en l'absence d'un marqueur de discours explicite. Des caractéristiques de niveau de surface peuvent être utilisées pour capturer des informations pragmatiques codées dans le marqueur absent. Selon une manière, une caractéristique simplifiée déterminée sur la base seulement d'un texte brut et de fonctions sémantiques est utilisée pour améliorer la performance de toutes les relations. En utilisant des caractéristiques de niveau de surface pour prédire des relations rhétoriques implicites pour l'important corpus annoté, l'invention se rapproche d'une performance maximale théorique, suggérant que davantage de données n'amélioreront pas nécessairement la performance sur la base de ces caractéristiques et de caractéristiques similaires.
CA2917153A 2013-07-03 2014-07-03 Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle Active CA2917153C (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201361842635P 2013-07-03 2013-07-03
US61/842,635 2013-07-03
PCT/US2014/045432 WO2015003143A2 (fr) 2013-07-03 2014-07-03 Procédé et système permettant de simplifier une prédiction de relations rhétoriques implicites dans un corpus annoté à grande échelle

Publications (2)

Publication Number Publication Date
CA2917153A1 CA2917153A1 (fr) 2015-01-08
CA2917153C true CA2917153C (fr) 2022-05-17

Family

ID=52144292

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2917153A Active CA2917153C (fr) 2013-07-03 2014-07-03 Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle

Country Status (3)

Country Link
AU (1) AU2014285073B9 (fr)
CA (1) CA2917153C (fr)
WO (1) WO2015003143A2 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11809825B2 (en) 2017-09-28 2023-11-07 Oracle International Corporation Management of a focused information sharing dialogue based on discourse trees
CN117114001A (zh) * 2017-09-28 2023-11-24 甲骨文国际公司 基于命名实体的解析和识别确定跨文档的修辞相互关系
US11328016B2 (en) 2018-05-09 2022-05-10 Oracle International Corporation Constructing imaginary discourse trees to improve answering convergent questions
CN111209366B (zh) * 2019-10-10 2023-04-21 天津大学 基于TransS驱动的互激励神经网络的隐式篇章关系识别方法
US11580298B2 (en) 2019-11-14 2023-02-14 Oracle International Corporation Detecting hypocrisy in text
CN112257460B (zh) * 2020-09-25 2022-06-21 昆明理工大学 基于枢轴的汉越联合训练神经机器翻译方法
CN113407713B (zh) * 2020-10-22 2024-04-05 腾讯科技(深圳)有限公司 基于主动学习的语料挖掘方法、装置及电子设备
CN113535973B (zh) * 2021-06-07 2023-06-23 中国科学院软件研究所 基于知识映射的事件关系抽取、语篇关系分析方法及装置
CN113377915B (zh) * 2021-06-22 2022-07-19 厦门大学 对话篇章解析方法
CN113553830B (zh) * 2021-08-11 2023-01-03 桂林电子科技大学 一种基于图的英语文本句子语篇连贯分析方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659766A (en) * 1994-09-16 1997-08-19 Xerox Corporation Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision
WO2001086489A2 (fr) * 2000-05-11 2001-11-15 University Of Southern California Parsage et recapitulation de discours
US7062561B1 (en) * 2000-05-23 2006-06-13 Richard Reisman Method and apparatus for utilizing the social usage learned from multi-user feedback to improve resource identity signifier mapping
US7127208B2 (en) * 2002-01-23 2006-10-24 Educational Testing Service Automated annotation
US7305336B2 (en) * 2002-08-30 2007-12-04 Fuji Xerox Co., Ltd. System and method for summarization combining natural language generation with structural analysis

Also Published As

Publication number Publication date
CA2917153A1 (fr) 2015-01-08
WO2015003143A3 (fr) 2015-05-14
AU2014285073B2 (en) 2016-11-03
AU2014285073B9 (en) 2017-04-06
WO2015003143A2 (fr) 2015-01-08
AU2014285073A1 (en) 2016-02-04

Similar Documents

Publication Publication Date Title
US9355372B2 (en) Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus
CA2917153C (fr) Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle
Yi et al. Sentiment mining in WebFountain
Sarawagi Information extraction
Zubrinic et al. The automatic creation of concept maps from documents written using morphologically rich languages
Chali et al. Query-focused multi-document summarization: Automatic data annotations and supervised learning approaches
Devi et al. A hybrid document features extraction with clustering based classification framework on large document sets
Zhang et al. Enhancing keyphrase extraction from academic articles with their reference information
Souza et al. A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset
Laddha et al. Aspect opinion expression and rating prediction via LDA–CRF hybrid
Anoop et al. A topic modeling guided approach for semantic knowledge discovery in e-commerce
Ehrler et al. Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot
Fagan et al. An introduction to textual econometrics
Meuschke Analyzing non-textual content elements to detect academic plagiarism
Klochikhin et al. Text analysis
Sharma et al. Diverse feature set based Keyphrase extraction and indexing techniques
Rajman et al. From text to knowledge: Document processing and visualization: A text mining approach
Tahmasebi Models and algorithms for automatic detection of language evolution: towards finding and interpreting of content in long-term archives
Sizov Extraction-based automatic summarization: Theoretical and empirical investigation of summarization techniques
Xu et al. Exploiting paper contents and citation links to identify and characterise specialisations
Ceylan Investigating the extractive summarization of literary novels
Brand et al. N-gram representations for comment filtering
Gupta et al. Machine learning-based authorship attribution using token n-grams and other time tested features
Hou Semantic Enhancement for Text Representation
Uddin et al. Short text classification using semantically enriched topic model

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20190627