CA2932310C - Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents - Google Patents
Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents Download PDFInfo
- Publication number
- CA2932310C CA2932310C CA2932310A CA2932310A CA2932310C CA 2932310 C CA2932310 C CA 2932310C CA 2932310 A CA2932310 A CA 2932310A CA 2932310 A CA2932310 A CA 2932310A CA 2932310 C CA2932310 C CA 2932310C
- Authority
- CA
- Canada
- Prior art keywords
- document
- processor
- sections
- sub
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/248—Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
- G06V30/2504—Coarse or fine approaches, e.g. resolution of ambiguities or multiscale approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
- G06V30/268—Lexical context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Une méthode mise en application par ordinateur, un pipeline traitement et un système créent une carte sémantique hiérarchique dun document et dinformation extraite. La méthode comprend une répartition du document dans des sections majeures par accès au document, par reconnaissance dune structure hiérarchique du document, par division du document dans les sections majeures à laide dun profileur de données et dun module dapprentissage automatique, par classification des sections majeures, par mappage des sections majeures à des éléments principaux dans lun des multiples niveaux, par recherche de lune des sections majeures, par identification des sous-sections à partir de la section majeure pour atteindre un score de fiabilité maximal indiquant que les sous-sections sassocient à lélément principal, par extraction de linformation à partir des sous-sections identifiées à laide de modélisateurs de séquences et de caractéristiques linguistiques fournies par le profileur de données, par génération de la carte sémantique hiérarchique du document à laide de linformation extraite, et par affichage, dans une interface utilisateur, de sélections descendantes des éléments principaux.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN2920/CHE/2015 | 2015-06-10 | ||
IN2920CH2015 | 2015-06-10 | ||
US14/836,659 US9946924B2 (en) | 2015-06-10 | 2015-08-26 | System and method for automating information abstraction process for documents |
US14/836,659 | 2015-08-26 |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2932310A1 CA2932310A1 (fr) | 2016-12-10 |
CA2932310C true CA2932310C (fr) | 2023-07-11 |
Family
ID=57483052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2932310A Active CA2932310C (fr) | 2015-06-10 | 2016-06-06 | Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents |
Country Status (1)
Country | Link |
---|---|
CA (1) | CA2932310C (fr) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10402641B1 (en) * | 2019-03-19 | 2019-09-03 | Capital One Services, Llc | Platform for document classification |
CN111291071B (zh) * | 2020-01-21 | 2023-10-17 | 北京字节跳动网络技术有限公司 | 数据处理方法、装置及电子设备 |
CN113780005B (zh) * | 2021-09-14 | 2024-04-16 | 码客工场工业科技(北京)有限公司 | 一种基于语义模型的Handle存量标识解析方法 |
-
2016
- 2016-06-06 CA CA2932310A patent/CA2932310C/fr active Active
Also Published As
Publication number | Publication date |
---|---|
CA2932310A1 (fr) | 2016-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2016203856B2 (en) | System and method for automating information abstraction process for documents | |
US11816436B2 (en) | Automated summarization of extracted insight data | |
US20200401593A1 (en) | Dynamic Phase Generation And Resource Load Reduction For A Query | |
US11841854B2 (en) | Differentiation of search results for accurate query output | |
US11282020B2 (en) | Dynamic playback of synchronized narrated analytics playlists | |
US10546005B2 (en) | Perspective data analysis and management | |
US11645314B2 (en) | Interactive information retrieval using knowledge graphs | |
US9923860B2 (en) | Annotating content with contextually relevant comments | |
US10217058B2 (en) | Predicting interesting things and concepts in content | |
US11144582B2 (en) | Method and system for parsing and aggregating unstructured data objects | |
US10956469B2 (en) | System and method for metadata correlation using natural language processing | |
US10754904B2 (en) | Accuracy determination for media | |
US9418058B2 (en) | Processing method for social media issue and server device supporting the same | |
US20220121668A1 (en) | Method for recommending document, electronic device and storage medium | |
US10055478B2 (en) | Perspective data analysis and management | |
CA2932310C (fr) | Systeme et methode servant a l'automatisation de procede d'abstraction d'information de documents | |
CN110737824B (zh) | 内容查询方法和装置 | |
CN109271624A (zh) | 一种目标词确定方法、装置及存储介质 | |
EP3104285A1 (fr) | Système et procédé pour automatiser un processus d'abstraction d'informations de documents | |
AU2019290658B2 (en) | Systems and methods for identifying and linking events in structured proceedings | |
Krueger et al. | Prolix-visual prediction analysis for box office success | |
Kannao et al. | A system for semantic segmentation of TV news broadcast videos | |
CN114036393B (zh) | 数据推荐方法、装置、电子设备及计算机存储介质 | |
CN117851865A (zh) | 客户分类方法、装置、计算机设备及存储介质 | |
CN118551063A (zh) | 响应于多模态查询而提供的信息的视觉引用 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |
|
EEER | Examination request |
Effective date: 20210722 |