MX2019014440A - Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos. - Google Patents
Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos.Info
- Publication number
- MX2019014440A MX2019014440A MX2019014440A MX2019014440A MX2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A
- Authority
- MX
- Mexico
- Prior art keywords
- sql
- images
- natural language
- database querying
- queries
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 4
- 238000000605 extraction Methods 0.000 title 1
- 238000013075 data extraction Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Character Discrimination (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
Diversos métodos están utilizando extracción de datos basada en SQL para extraer información relevante de las imágenes. Estos son métodos basados en reglas para generar SQL-Consulta de NL, si se van a manejar oraciones en inglés nuevas entonces se requiere intervención manual. Además se vuelve difícil para un usuario no técnico. Se proporciona sistema y método para extraer información relevante de las imágenes utilizando una interfaz conversacional y consulta de base de datos. El sistema elimina los efectos ruidosos, identifica el tipo de documentos y detecta varias entidades para diagramas. Además se diseña un esquema que permite una abstracción fácil de entender de las entidades detectadas por los modelos de visión profunda y las relaciones entre estos. Información y campos relevantes entonces pueden ser extraídos del documento escribiendo consultas SQL encima de las tablas de relación. Se añade una interfaz basada en lenguaje natural de manera que un usuario no técnico, que especifica las consultas en lenguaje natural, puede extraer la información sin esfuerzo.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN201821045427 | 2018-11-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
MX2019014440A true MX2019014440A (es) | 2022-04-12 |
Family
ID=65801945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
MX2019014440A MX2019014440A (es) | 2018-11-30 | 2019-11-29 | Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos. |
Country Status (7)
Country | Link |
---|---|
US (1) | US10936897B2 (es) |
EP (1) | EP3660733B1 (es) |
JP (1) | JP7474587B2 (es) |
CN (1) | CN111259724A (es) |
AU (1) | AU2019264603B2 (es) |
CA (1) | CA3059764A1 (es) |
MX (1) | MX2019014440A (es) |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10614345B1 (en) * | 2019-04-12 | 2020-04-07 | Ernst & Young U.S. Llp | Machine learning based extraction of partition objects from electronic documents |
US20200327351A1 (en) * | 2019-04-15 | 2020-10-15 | General Electric Company | Optical character recognition error correction based on visual and textual contents |
US11030446B2 (en) * | 2019-06-11 | 2021-06-08 | Open Text Sa Ulc | System and method for separation and classification of unstructured documents |
RU2737720C1 (ru) * | 2019-11-20 | 2020-12-02 | Общество с ограниченной ответственностью "Аби Продакшн" | Извлечение полей с помощью нейронных сетей без использования шаблонов |
US20230139831A1 (en) * | 2020-09-30 | 2023-05-04 | DataInfoCom USA, Inc. | Systems and methods for information retrieval and extraction |
CN111275139B (zh) * | 2020-01-21 | 2024-02-23 | 杭州大拿科技股份有限公司 | 手写内容去除方法、手写内容去除装置、存储介质 |
US11263753B2 (en) * | 2020-04-07 | 2022-03-01 | Naver Corporation | Method for training a convolutional neural network for image recognition using image-conditioned masked language modeling |
CN111709339B (zh) * | 2020-06-09 | 2023-09-19 | 北京百度网讯科技有限公司 | 一种票据图像识别方法、装置、设备及存储介质 |
CN111783710B (zh) * | 2020-07-09 | 2023-10-03 | 上海海事大学 | 医药影印件的信息提取方法和系统 |
US11495011B2 (en) * | 2020-08-07 | 2022-11-08 | Salesforce, Inc. | Template-based key-value extraction for inferring OCR key values within form images |
CN112016312B (zh) * | 2020-09-08 | 2023-08-29 | 平安科技(深圳)有限公司 | 数据关系抽取方法、装置、电子设备及存储介质 |
CN112256904A (zh) * | 2020-09-21 | 2021-01-22 | 天津大学 | 一种基于视觉描述语句的图像检索方法 |
CN112257386B (zh) * | 2020-10-26 | 2023-09-26 | 重庆邮电大学 | 一种文景转换中场景空间关系信息布局生成的方法 |
CN114490702B (zh) * | 2020-10-26 | 2024-08-20 | 陕西省烟草公司渭南市公司 | 一种离散制造业生产数据坐标图的动态生成方法 |
CN112270199A (zh) * | 2020-11-03 | 2021-01-26 | 辽宁工程技术大学 | 基于CGAN方法的个性化语义空间关键字Top-K查询方法 |
CN112182022B (zh) * | 2020-11-04 | 2024-04-16 | 北京安博通科技股份有限公司 | 基于自然语言的数据查询方法、装置及翻译模型 |
US11663842B2 (en) * | 2020-11-05 | 2023-05-30 | Jpmorgan Chase Bank, N.A. | Method and system for tabular information extraction |
KR102498403B1 (ko) * | 2021-01-29 | 2023-02-09 | 포항공과대학교 산학협력단 | 자연어를 sql로 변환하는 시스템을 위한 훈련 세트 수집 장치 및 그 방법 |
US11816913B2 (en) * | 2021-03-02 | 2023-11-14 | Tata Consultancy Services Limited | Methods and systems for extracting information from document images |
CN112966131B (zh) * | 2021-03-02 | 2022-09-16 | 中华人民共和国成都海关 | 一种海关数据风控类型识别方法、海关智能化风险布控方法、装置、计算机设备及存储介质 |
CN112633423B (zh) * | 2021-03-10 | 2021-06-22 | 北京易真学思教育科技有限公司 | 文本识别模型的训练方法、文本识别方法、装置及设备 |
US11763585B2 (en) | 2021-07-14 | 2023-09-19 | Bank Of America Corporation | Multi-layer neural network and convolutional neural network for context sensitive optical character recognition |
CN113571052B (zh) * | 2021-07-22 | 2024-09-20 | 亿咖通(湖北)技术有限公司 | 一种噪声提取及指令识别方法和电子设备 |
US11720531B2 (en) * | 2021-08-23 | 2023-08-08 | Sap Se | Automatic creation of database objects |
CN113792064A (zh) * | 2021-08-30 | 2021-12-14 | 阿里巴巴达摩院(杭州)科技有限公司 | 实现多轮对话的方法及装置和关系模型生成方法 |
CN113642327A (zh) * | 2021-10-14 | 2021-11-12 | 中国光大银行股份有限公司 | 一种标准知识库的构建方法及装置 |
CN114090624B (zh) * | 2021-11-19 | 2024-10-18 | 中国人民银行清算总中心 | 一种自然语言转化为结构化查询语言的处理方法及装置 |
US11934801B2 (en) * | 2021-12-07 | 2024-03-19 | Microsoft Technology Licensing, Llc | Multi-modal program inference |
CN114580279B (zh) * | 2022-03-02 | 2024-05-31 | 广西大学 | 一种基于lstm的低轨卫星通信自适应编码方法 |
US20230367961A1 (en) * | 2022-05-12 | 2023-11-16 | Dell Products L.P. | Automated address data determinations using artificial intelligence techniques |
US20240045863A1 (en) * | 2022-08-08 | 2024-02-08 | Palantir Technologies Inc. | Systems and methods for generating and displaying a data pipeline using a natural language query, and describing a data pipeline using natural language |
Family Cites Families (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0737034A (ja) * | 1993-07-15 | 1995-02-07 | Nec Corp | 光学式文字読み取り装置 |
US7461077B1 (en) | 2001-07-31 | 2008-12-02 | Nicholas Greenwood | Representation of data records |
US6778979B2 (en) | 2001-08-13 | 2004-08-17 | Xerox Corporation | System for automatically generating queries |
US7284191B2 (en) * | 2001-08-13 | 2007-10-16 | Xerox Corporation | Meta-document management system with document identifiers |
US7398201B2 (en) | 2001-08-14 | 2008-07-08 | Evri Inc. | Method and system for enhanced data searching |
US20050144189A1 (en) * | 2002-07-19 | 2005-06-30 | Keay Edwards | Electronic item management and archival system and method of operating the same |
US20050043940A1 (en) | 2003-08-20 | 2005-02-24 | Marvin Elder | Preparing a data source for a natural language query |
US7565139B2 (en) * | 2004-02-20 | 2009-07-21 | Google Inc. | Image-based search engine for mobile phones with camera |
JP4349183B2 (ja) * | 2004-04-01 | 2009-10-21 | 富士ゼロックス株式会社 | 画像処理装置および画像処理方法 |
US7587412B2 (en) * | 2005-08-23 | 2009-09-08 | Ricoh Company, Ltd. | Mixed media reality brokerage network and methods of use |
AU2006240074B2 (en) * | 2005-04-21 | 2012-07-05 | Theodore G. Paraskevakos | System and method for intelligent currency validation |
EP2087448A1 (en) * | 2006-11-21 | 2009-08-12 | Cameron Telfer Howie | A method of retrieving information from a digital image |
US20110145068A1 (en) * | 2007-09-17 | 2011-06-16 | King Martin T | Associating rendered advertisements with digital content |
US8520979B2 (en) * | 2008-08-19 | 2013-08-27 | Digimarc Corporation | Methods and systems for content processing |
WO2010096193A2 (en) * | 2009-02-18 | 2010-08-26 | Exbiblio B.V. | Identifying a document by performing spectral analysis on the contents of the document |
US9323784B2 (en) * | 2009-12-09 | 2016-04-26 | Google Inc. | Image search using text-based elements within the contents of images |
US9292493B2 (en) * | 2010-01-07 | 2016-03-22 | The Trustees Of The Stevens Institute Of Technology | Systems and methods for automatically detecting deception in human communications expressed in digital form |
WO2011149558A2 (en) * | 2010-05-28 | 2011-12-01 | Abelow Daniel H | Reality alternate |
US8542926B2 (en) * | 2010-11-19 | 2013-09-24 | Microsoft Corporation | Script-agnostic text reflow for document images |
US20130106894A1 (en) * | 2011-10-31 | 2013-05-02 | Elwha LLC, a limited liability company of the State of Delaware | Context-sensitive query enrichment |
US10146795B2 (en) * | 2012-01-12 | 2018-12-04 | Kofax, Inc. | Systems and methods for mobile image capture and processing |
JP6317772B2 (ja) * | 2013-03-15 | 2018-04-25 | トランスレート アブロード,インコーポレイテッド | 外国語の文字セットおよびそれらの翻訳を資源に制約のあるモバイル機器上にリアルタイムで表示するためのシステムおよび方法 |
WO2014144889A2 (en) | 2013-03-15 | 2014-09-18 | Amazon Technologies, Inc. | Scalable analysis platform for semi-structured data |
US9342533B2 (en) * | 2013-07-02 | 2016-05-17 | Open Text S.A. | System and method for feature recognition and document searching based on feature recognition |
US10318804B2 (en) * | 2014-06-30 | 2019-06-11 | First American Financial Corporation | System and method for data extraction and searching |
US10503761B2 (en) * | 2014-07-14 | 2019-12-10 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations |
US9710570B2 (en) * | 2014-07-14 | 2017-07-18 | International Business Machines Corporation | Computing the relevance of a document to concepts not specified in the document |
US10162882B2 (en) * | 2014-07-14 | 2018-12-25 | Nternational Business Machines Corporation | Automatically linking text to concepts in a knowledge base |
US10437869B2 (en) * | 2014-07-14 | 2019-10-08 | International Business Machines Corporation | Automatic new concept definition |
WO2018045358A1 (en) * | 2016-09-05 | 2018-03-08 | Google Llc | Generating theme-based videos |
US11080273B2 (en) * | 2017-03-20 | 2021-08-03 | International Business Machines Corporation | Image support for cognitive intelligence queries |
CA3056775A1 (en) * | 2017-03-22 | 2018-09-27 | Drilling Info, Inc. | Extracting data from electronic documents |
US10747761B2 (en) * | 2017-05-18 | 2020-08-18 | Salesforce.Com, Inc. | Neural network based translation of natural language queries to database queries |
US11494887B2 (en) * | 2018-03-09 | 2022-11-08 | Schlumberger Technology Corporation | System for characterizing oilfield tools |
JP7118697B2 (ja) * | 2018-03-30 | 2022-08-16 | 株式会社Preferred Networks | 注視点推定処理装置、注視点推定モデル生成装置、注視点推定処理システム、注視点推定処理方法、プログラム、および注視点推定モデル |
US10867404B2 (en) * | 2018-08-29 | 2020-12-15 | Toyota Jidosha Kabushiki Kaisha | Distance estimation using machine learning |
-
2019
- 2019-03-13 EP EP19162426.1A patent/EP3660733B1/en active Active
- 2019-03-14 US US16/353,570 patent/US10936897B2/en active Active
- 2019-10-23 CA CA3059764A patent/CA3059764A1/en active Pending
- 2019-11-14 AU AU2019264603A patent/AU2019264603B2/en active Active
- 2019-11-27 CN CN201911182075.5A patent/CN111259724A/zh active Pending
- 2019-11-29 MX MX2019014440A patent/MX2019014440A/es unknown
- 2019-11-29 JP JP2019217153A patent/JP7474587B2/ja active Active
Also Published As
Publication number | Publication date |
---|---|
US10936897B2 (en) | 2021-03-02 |
AU2019264603A1 (en) | 2020-06-18 |
JP2020095713A (ja) | 2020-06-18 |
JP7474587B2 (ja) | 2024-04-25 |
EP3660733B1 (en) | 2023-06-28 |
AU2019264603B2 (en) | 2024-08-22 |
CA3059764A1 (en) | 2020-05-30 |
US20200175304A1 (en) | 2020-06-04 |
CN111259724A (zh) | 2020-06-09 |
EP3660733A1 (en) | 2020-06-03 |
EP3660733C0 (en) | 2023-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
MX2019014440A (es) | Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos. | |
US10521464B2 (en) | Method and system for extracting, verifying and cataloging technical information from unstructured documents | |
CN107545791B (zh) | 一种利用课件自动生成课堂教学知识图谱的系统和方法 | |
MX2019001576A (es) | Sistemas y metodos para la recuperacion contextual de registros electronicos. | |
US10417267B2 (en) | Information processing terminal and method, and information management apparatus and method | |
CN104090955A (zh) | 一种音视频标签自动标注方法及系统 | |
CN107943786B (zh) | 一种中文命名实体识别方法及系统 | |
CN104408078A (zh) | 一种基于关键词的中英双语平行语料库构建方法 | |
CN103678684A (zh) | 一种基于导航信息检索的中文分词方法 | |
WO2016174682A1 (en) | Method for generating visual representations of data based on controlled natural language queries and system thereof | |
CN110750977B (zh) | 一种文本相似度计算方法及系统 | |
GB2576654A (en) | Method and apparatus for facilitating creation of simulation model | |
AU2018388932A1 (en) | Method and device using wikipedia link structure to generate chinese language concept vector | |
SG10201811578RA (en) | Predictive query processing for complex system lifecycle management | |
CN104281565A (zh) | 语义词典构建方法和装置 | |
CN105426379A (zh) | 基于词语位置的关键字权值计算方法 | |
Prakash et al. | Mining of bilingual Indian Web documents | |
US10896227B2 (en) | Data processing system, data processing method, and data structure | |
Costin-Gabriel et al. | Archaisms and neologisms identification in texts | |
JP2018116701A (ja) | 印鑑画像の処理装置、方法及び電子機器 | |
CN105608136B (zh) | 一种基于汉语复句的语义相关度计算方法 | |
KR20170087367A (ko) | 범언어적 시맨틱 웹 데이터 품질평가 방법 | |
Kirschnick et al. | Freepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction. | |
US20170154035A1 (en) | Text processing system, text processing method, and text processing program | |
CN105320717B (zh) | 本体学习中的个体半自动构建方法 |