CA2932310C - Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents - Google Patents

Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents Download PDF

Info

Publication number
CA2932310C
CA2932310C CA2932310A CA2932310A CA2932310C CA 2932310 C CA2932310 C CA 2932310C CA 2932310 A CA2932310 A CA 2932310A CA 2932310 A CA2932310 A CA 2932310A CA 2932310 C CA2932310 C CA 2932310C
Authority
CA
Canada
Prior art keywords
document
processor
sections
sub
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CA2932310A
Other languages
English (en)
Other versions
CA2932310A1 (fr
Inventor
Shubhashis Sengupta
Annervaz Karukapadath Mohamedrasheed
Chakravarthy Lakshminarasimhan
Manisha Kapur
Jovin George
Mansi Srivastava
Vaidya Sumanth
Rajeh Ganesh Natrajan
Siddesha Swamy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Accenture Global Services Ltd
Original Assignee
Accenture Global Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/836,659 external-priority patent/US9946924B2/en
Application filed by Accenture Global Services Ltd filed Critical Accenture Global Services Ltd
Publication of CA2932310A1 publication Critical patent/CA2932310A1/fr
Application granted granted Critical
Publication of CA2932310C publication Critical patent/CA2932310C/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/248Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
    • G06V30/2504Coarse or fine approaches, e.g. resolution of ambiguities or multiscale approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/268Lexical context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Une méthode mise en application par ordinateur, un pipeline traitement et un système créent une carte sémantique hiérarchique dun document et dinformation extraite. La méthode comprend une répartition du document dans des sections majeures par accès au document, par reconnaissance dune structure hiérarchique du document, par division du document dans les sections majeures à laide dun profileur de données et dun module dapprentissage automatique, par classification des sections majeures, par mappage des sections majeures à des éléments principaux dans lun des multiples niveaux, par recherche de lune des sections majeures, par identification des sous-sections à partir de la section majeure pour atteindre un score de fiabilité maximal indiquant que les sous-sections sassocient à lélément principal, par extraction de linformation à partir des sous-sections identifiées à laide de modélisateurs de séquences et de caractéristiques linguistiques fournies par le profileur de données, par génération de la carte sémantique hiérarchique du document à laide de linformation extraite, et par affichage, dans une interface utilisateur, de sélections descendantes des éléments principaux.
CA2932310A 2015-06-10 2016-06-06 Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents Active CA2932310C (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IN2920/CHE/2015 2015-06-10
IN2920CH2015 2015-06-10
US14/836,659 US9946924B2 (en) 2015-06-10 2015-08-26 System and method for automating information abstraction process for documents
US14/836,659 2015-08-26

Publications (2)

Publication Number Publication Date
CA2932310A1 CA2932310A1 (fr) 2016-12-10
CA2932310C true CA2932310C (fr) 2023-07-11

Family

ID=57483052

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2932310A Active CA2932310C (fr) 2015-06-10 2016-06-06 Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents

Country Status (1)

Country Link
CA (1) CA2932310C (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10402641B1 (en) * 2019-03-19 2019-09-03 Capital One Services, Llc Platform for document classification
CN111291071B (zh) * 2020-01-21 2023-10-17 北京字节跳动网络技术有限公司 数据处理方法、装置及电子设备
CN113780005B (zh) * 2021-09-14 2024-04-16 码客工场工业科技(北京)有限公司 一种基于语义模型的Handle存量标识解析方法

Also Published As

Publication number Publication date
CA2932310A1 (fr) 2016-12-10

Similar Documents

Publication Publication Date Title
AU2016203856B2 (en) System and method for automating information abstraction process for documents
US11816436B2 (en) Automated summarization of extracted insight data
US20200401593A1 (en) Dynamic Phase Generation And Resource Load Reduction For A Query
US11841854B2 (en) Differentiation of search results for accurate query output
US11282020B2 (en) Dynamic playback of synchronized narrated analytics playlists
US10546005B2 (en) Perspective data analysis and management
US11645314B2 (en) Interactive information retrieval using knowledge graphs
US9923860B2 (en) Annotating content with contextually relevant comments
US10217058B2 (en) Predicting interesting things and concepts in content
US11144582B2 (en) Method and system for parsing and aggregating unstructured data objects
US10956469B2 (en) System and method for metadata correlation using natural language processing
US10754904B2 (en) Accuracy determination for media
US9418058B2 (en) Processing method for social media issue and server device supporting the same
US20220121668A1 (en) Method for recommending document, electronic device and storage medium
US10055478B2 (en) Perspective data analysis and management
CA2932310C (fr) Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents
CN110737824B (zh) 内容查询方法和装置
CN109271624A (zh) 一种目标词确定方法、装置及存储介质
EP3104285A1 (fr) Système et procédé pour automatiser un processus d'abstraction d'informations de documents
AU2019290658B2 (en) Systems and methods for identifying and linking events in structured proceedings
Krueger et al. Prolix-visual prediction analysis for box office success
Kannao et al. A system for semantic segmentation of TV news broadcast videos
CN114036393B (zh) 数据推荐方法、装置、电子设备及计算机存储介质
CN117851865A (zh) 客户分类方法、装置、计算机设备及存储介质
CN118551063A (zh) 响应于多模态查询而提供的信息的视觉引用

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722

EEER Examination request

Effective date: 20210722