CA2917153C - Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle - Google Patents

Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle Download PDF

Info

Publication number
CA2917153C
CA2917153C CA2917153A CA2917153A CA2917153C CA 2917153 C CA2917153 C CA 2917153C CA 2917153 A CA2917153 A CA 2917153A CA 2917153 A CA2917153 A CA 2917153A CA 2917153 C CA2917153 C CA 2917153C
Authority
CA
Canada
Prior art keywords
computer
text
corpus
relation
discourse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA2917153A
Other languages
English (en)
Other versions
CA2917153A1 (fr
Inventor
Blake HOWALD
Andrew NYSTROM
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Reuters Enterprise Centre GmbH
Original Assignee
Thomson Reuters Enterprise Centre GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Reuters Enterprise Centre GmbH filed Critical Thomson Reuters Enterprise Centre GmbH
Publication of CA2917153A1 publication Critical patent/CA2917153A1/fr
Application granted granted Critical
Publication of CA2917153C publication Critical patent/CA2917153C/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

La présente invention se rapporte à un procédé et à un système permettant de prédire des relations rhétoriques implicites entre deux fragments d'un texte, par exemple dans un important corpus annoté, tel que le Penn Discourse Treebank (« PDTB »), le Rhetorical Structure Theory corpus et le Discourse Graph Bank, et permettent, en particulier, de déterminer une relation rhétorique en l'absence d'un marqueur de discours explicite. Des caractéristiques de niveau de surface peuvent être utilisées pour capturer des informations pragmatiques codées dans le marqueur absent. Selon une manière, une caractéristique simplifiée déterminée sur la base seulement d'un texte brut et de fonctions sémantiques est utilisée pour améliorer la performance de toutes les relations. En utilisant des caractéristiques de niveau de surface pour prédire des relations rhétoriques implicites pour l'important corpus annoté, l'invention se rapproche d'une performance maximale théorique, suggérant que davantage de données n'amélioreront pas nécessairement la performance sur la base de ces caractéristiques et de caractéristiques similaires.
CA2917153A 2013-07-03 2014-07-03 Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle Active CA2917153C (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201361842635P 2013-07-03 2013-07-03
US61/842,635 2013-07-03
PCT/US2014/045432 WO2015003143A2 (fr) 2013-07-03 2014-07-03 Procédé et système permettant de simplifier une prédiction de relations rhétoriques implicites dans un corpus annoté à grande échelle

Publications (2)

Publication Number Publication Date
CA2917153A1 CA2917153A1 (fr) 2015-01-08
CA2917153C true CA2917153C (fr) 2022-05-17

Family

ID=52144292

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2917153A Active CA2917153C (fr) 2013-07-03 2014-07-03 Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle

Country Status (3)

Country Link
AU (1) AU2014285073B9 (fr)
CA (1) CA2917153C (fr)
WO (1) WO2015003143A2 (fr)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11100144B2 (en) 2017-06-15 2021-08-24 Oracle International Corporation Data loss prevention system for cloud security based on document discourse analysis
CN111149100B (zh) * 2017-09-28 2023-08-29 甲骨文国际公司 基于命名实体的解析和识别确定跨文档的修辞相互关系
US11809825B2 (en) 2017-09-28 2023-11-07 Oracle International Corporation Management of a focused information sharing dialogue based on discourse trees
US11328016B2 (en) 2018-05-09 2022-05-10 Oracle International Corporation Constructing imaginary discourse trees to improve answering convergent questions
CN111209366B (zh) * 2019-10-10 2023-04-21 天津大学 基于TransS驱动的互激励神经网络的隐式篇章关系识别方法
US11580298B2 (en) 2019-11-14 2023-02-14 Oracle International Corporation Detecting hypocrisy in text
CN112257460B (zh) * 2020-09-25 2022-06-21 昆明理工大学 基于枢轴的汉越联合训练神经机器翻译方法
CN113407713B (zh) * 2020-10-22 2024-04-05 腾讯科技(深圳)有限公司 基于主动学习的语料挖掘方法、装置及电子设备
CN113535973B (zh) * 2021-06-07 2023-06-23 中国科学院软件研究所 基于知识映射的事件关系抽取、语篇关系分析方法及装置
CN113377915B (zh) * 2021-06-22 2022-07-19 厦门大学 对话篇章解析方法
CN113553830B (zh) * 2021-08-11 2023-01-03 桂林电子科技大学 一种基于图的英语文本句子语篇连贯分析方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659766A (en) * 1994-09-16 1997-08-19 Xerox Corporation Method and apparatus for inferring the topical content of a document based upon its lexical content without supervision
WO2001086489A2 (fr) * 2000-05-11 2001-11-15 University Of Southern California Parsage et recapitulation de discours
US7062561B1 (en) * 2000-05-23 2006-06-13 Richard Reisman Method and apparatus for utilizing the social usage learned from multi-user feedback to improve resource identity signifier mapping
US7127208B2 (en) * 2002-01-23 2006-10-24 Educational Testing Service Automated annotation
US7305336B2 (en) * 2002-08-30 2007-12-04 Fuji Xerox Co., Ltd. System and method for summarization combining natural language generation with structural analysis

Also Published As

Publication number Publication date
AU2014285073B2 (en) 2016-11-03
WO2015003143A3 (fr) 2015-05-14
WO2015003143A2 (fr) 2015-01-08
AU2014285073A1 (en) 2016-02-04
CA2917153A1 (fr) 2015-01-08
AU2014285073B9 (en) 2017-04-06

Similar Documents

Publication Publication Date Title
US9355372B2 (en) Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus
CA2917153C (fr) Procede et systeme permettant de simplifier une prediction de relations rhetoriques implicites dans un corpus annote a grande echelle
Yi et al. Sentiment mining in WebFountain
Pons-Porrata et al. Topic discovery based on text mining techniques
Zubrinic et al. The automatic creation of concept maps from documents written using morphologically rich languages
Zhang et al. Enhancing keyphrase extraction from academic articles with their reference information
CN107967290A (zh) 一种基于海量科研资料的知识图谱网络构建方法及系统、介质
Souza et al. A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset
Lisena et al. TOMODAPI: A topic modeling API to train, use and compare topic models
Devi et al. A hybrid document features extraction with clustering based classification framework on large document sets
Fagan et al. An introduction to textual econometrics
You et al. Joint learning-based heterogeneous graph attention network for timeline summarization
Laddha et al. Aspect opinion expression and rating prediction via LDA–CRF hybrid
Chali et al. Query-focused multi-document summarization: Automatic data annotations and supervised learning approaches
Ehrler et al. Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot
Anoop et al. A topic modeling guided approach for semantic knowledge discovery in e-commerce
Rajman et al. From text to knowledge: Document processing and visualization: A text mining approach
Sharma et al. Diverse feature set based Keyphrase extraction and indexing techniques
Li et al. Computational linguistics literature and citations oriented citation linkage, classification and summarization
Klochikhin et al. Text analysis
Tahmasebi Models and algorithms for automatic detection of language evolution
Sizov Extraction-based automatic summarization: Theoretical and empirical investigation of summarization techniques
Jiang et al. Python‐Based Visual Classification Algorithm for Economic Text Big Data
Brand et al. N-gram representations for comment filtering
Baby et al. REDUNDANCY REDUCTION AND SENTENCE PRIORITISATION OF THE STUDENT LECTURE NOTES USING SOFT COSINE IMPLEMENTED MMR ALGORITHM

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20190627