CA2932310A1 - Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents - Google Patents

Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents Download PDF

Info

Publication number
CA2932310A1
CA2932310A1 CA2932310A CA2932310A CA2932310A1 CA 2932310 A1 CA2932310 A1 CA 2932310A1 CA 2932310 A CA2932310 A CA 2932310A CA 2932310 A CA2932310 A CA 2932310A CA 2932310 A1 CA2932310 A1 CA 2932310A1
Authority
CA
Canada
Prior art keywords
document
processor
sections
machine learning
extracted information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA2932310A
Other languages
English (en)
Other versions
CA2932310C (fr
Inventor
Shubhashis Sengupta
Annervaz Karukapadath Mohamedrasheed
Chakravarthy Lakshminarasimhan
Manisha Kapur
Jovin George
Mansi Srivastava
Vaidya Sumanth
Rajeh Ganesh Natrajan
Siddesha Swamy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Accenture Global Services Ltd
Original Assignee
Accenture Global Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/836,659 external-priority patent/US9946924B2/en
Application filed by Accenture Global Services Ltd filed Critical Accenture Global Services Ltd
Publication of CA2932310A1 publication Critical patent/CA2932310A1/fr
Application granted granted Critical
Publication of CA2932310C publication Critical patent/CA2932310C/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/248Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
    • G06V30/2504Coarse or fine approaches, e.g. resolution of ambiguities or multiscale approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/268Lexical context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Une méthode mise en application par ordinateur, un pipeline traitement et un système créent une carte sémantique hiérarchique dun document et dinformation extraite. La méthode comprend une répartition du document dans des sections majeures par accès au document, par reconnaissance dune structure hiérarchique du document, par division du document dans les sections majeures à laide dun profileur de données et dun module dapprentissage automatique, par classification des sections majeures, par mappage des sections majeures à des éléments principaux dans lun des multiples niveaux, par recherche de lune des sections majeures, par identification des sous-sections à partir de la section majeure pour atteindre un score de fiabilité maximal indiquant que les sous-sections sassocient à lélément principal, par extraction de linformation à partir des sous-sections identifiées à laide de modélisateurs de séquences et de caractéristiques linguistiques fournies par le profileur de données, par génération de la carte sémantique hiérarchique du document à laide de linformation extraite, et par affichage, dans une interface utilisateur, de sélections descendantes des éléments principaux.
CA2932310A 2015-06-10 2016-06-06 Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents Active CA2932310C (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IN2920/CHE/2015 2015-06-10
IN2920CH2015 2015-06-10
US14/836,659 US9946924B2 (en) 2015-06-10 2015-08-26 System and method for automating information abstraction process for documents
US14/836,659 2015-08-26

Publications (2)

Publication Number Publication Date
CA2932310A1 true CA2932310A1 (fr) 2016-12-10
CA2932310C CA2932310C (fr) 2023-07-11

Family

ID=57483052

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2932310A Active CA2932310C (fr) 2015-06-10 2016-06-06 Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents

Country Status (1)

Country Link
CA (1) CA2932310C (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291071A (zh) * 2020-01-21 2020-06-16 北京字节跳动网络技术有限公司 数据处理方法、装置及电子设备
US20210216763A1 (en) * 2019-03-19 2021-07-15 Capital One Services, Llc Platform for document classification
CN113780005A (zh) * 2021-09-14 2021-12-10 码客工场工业科技(北京)有限公司 一种基于语义模型的Handle存量标识解析方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210216763A1 (en) * 2019-03-19 2021-07-15 Capital One Services, Llc Platform for document classification
US11727705B2 (en) * 2019-03-19 2023-08-15 Capital One Services, Llc Platform for document classification
CN111291071A (zh) * 2020-01-21 2020-06-16 北京字节跳动网络技术有限公司 数据处理方法、装置及电子设备
CN111291071B (zh) * 2020-01-21 2023-10-17 北京字节跳动网络技术有限公司 数据处理方法、装置及电子设备
CN113780005A (zh) * 2021-09-14 2021-12-10 码客工场工业科技(北京)有限公司 一种基于语义模型的Handle存量标识解析方法
CN113780005B (zh) * 2021-09-14 2024-04-16 码客工场工业科技(北京)有限公司 一种基于语义模型的Handle存量标识解析方法

Also Published As

Publication number Publication date
CA2932310C (fr) 2023-07-11

Similar Documents

Publication Publication Date Title
AU2016203856B2 (en) System and method for automating information abstraction process for documents
US20200210647A1 (en) Automated Summarization of Extracted Insight Data
US11514235B2 (en) Information extraction from open-ended schema-less tables
US20210248136A1 (en) Differentiation Of Search Results For Accurate Query Output
US9811765B2 (en) Image captioning with weak supervision
US9923860B2 (en) Annotating content with contextually relevant comments
US10055479B2 (en) Joint approach to feature and document labeling
US20200034764A1 (en) Dynamic Playback Of Synchronized Narrated Analytics Playlists
KR101754473B1 (ko) 문서를 이미지 기반 컨텐츠로 요약하여 제공하는 방법 및 시스템
US20150213361A1 (en) Predicting interesting things and concepts in content
JP6361351B2 (ja) 発話ワードをランク付けする方法、プログラム及び計算処理システム
CN109271542A (zh) 封面确定方法、装置、设备及可读存储介质
US20230206670A1 (en) Semantic representation of text in document
CN112384909A (zh) 利用无监督学习来改进文本到内容建议的方法和系统
US10754904B2 (en) Accuracy determination for media
CN112400165A (zh) 利用无监督学习来改进文本到内容建议的方法和系统
US20220121668A1 (en) Method for recommending document, electronic device and storage medium
CN109271624A (zh) 一种目标词确定方法、装置及存储介质
CN110737824B (zh) 内容查询方法和装置
CA2932310C (fr) Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents
CN115982376A (zh) 基于文本、多模数据和知识训练模型的方法和装置
EP3104285A1 (fr) Système et procédé pour automatiser un processus d'abstraction d'informations de documents
AU2019290658B2 (en) Systems and methods for identifying and linking events in structured proceedings
Körner et al. Mastering Azure Machine Learning: Perform large-scale end-to-end advanced machine learning in the cloud with Microsoft Azure Machine Learning
KR102215259B1 (ko) 주제별 단어 또는 문서의 관계성 분석 방법 및 이를 구현하는 장치

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722