MX2019014440A - Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos. - Google Patents

Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos.

Info

Publication number
MX2019014440A
MX2019014440A MX2019014440A MX2019014440A MX2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A MX 2019014440 A MX2019014440 A MX 2019014440A
Authority
MX
Mexico
Prior art keywords
sql
images
natural language
database querying
queries
Prior art date
Application number
MX2019014440A
Other languages
English (en)
Inventor
Gautam Shroff
Lovekesh VIG
Gunjan Sehgal
Monika Sharma
Arindam Chowdhury
Rohit Rahul
Vishwanath Doreswamy
Ashwin Srinivasan
Original Assignee
Tata Consultancy Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Ltd filed Critical Tata Consultancy Services Ltd
Publication of MX2019014440A publication Critical patent/MX2019014440A/es

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

Diversos métodos están utilizando extracción de datos basada en SQL para extraer información relevante de las imágenes. Estos son métodos basados en reglas para generar SQL-Consulta de NL, si se van a manejar oraciones en inglés nuevas entonces se requiere intervención manual. Además se vuelve difícil para un usuario no técnico. Se proporciona sistema y método para extraer información relevante de las imágenes utilizando una interfaz conversacional y consulta de base de datos. El sistema elimina los efectos ruidosos, identifica el tipo de documentos y detecta varias entidades para diagramas. Además se diseña un esquema que permite una abstracción fácil de entender de las entidades detectadas por los modelos de visión profunda y las relaciones entre estos. Información y campos relevantes entonces pueden ser extraídos del documento escribiendo consultas SQL encima de las tablas de relación. Se añade una interfaz basada en lenguaje natural de manera que un usuario no técnico, que especifica las consultas en lenguaje natural, puede extraer la información sin esfuerzo.
MX2019014440A 2018-11-30 2019-11-29 Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos. MX2019014440A (es)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IN201821045427 2018-11-30

Publications (1)

Publication Number Publication Date
MX2019014440A true MX2019014440A (es) 2022-04-12

Family

ID=65801945

Family Applications (1)

Application Number Title Priority Date Filing Date
MX2019014440A MX2019014440A (es) 2018-11-30 2019-11-29 Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos.

Country Status (7)

Country Link
US (1) US10936897B2 (es)
EP (1) EP3660733B1 (es)
JP (1) JP7474587B2 (es)
CN (1) CN111259724A (es)
AU (1) AU2019264603B2 (es)
CA (1) CA3059764A1 (es)
MX (1) MX2019014440A (es)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10614345B1 (en) * 2019-04-12 2020-04-07 Ernst & Young U.S. Llp Machine learning based extraction of partition objects from electronic documents
US20200327351A1 (en) * 2019-04-15 2020-10-15 General Electric Company Optical character recognition error correction based on visual and textual contents
US11030446B2 (en) * 2019-06-11 2021-06-08 Open Text Sa Ulc System and method for separation and classification of unstructured documents
RU2737720C1 (ru) * 2019-11-20 2020-12-02 Общество с ограниченной ответственностью "Аби Продакшн" Извлечение полей с помощью нейронных сетей без использования шаблонов
US20230139831A1 (en) * 2020-09-30 2023-05-04 DataInfoCom USA, Inc. Systems and methods for information retrieval and extraction
CN111275139B (zh) * 2020-01-21 2024-02-23 杭州大拿科技股份有限公司 手写内容去除方法、手写内容去除装置、存储介质
US11263753B2 (en) * 2020-04-07 2022-03-01 Naver Corporation Method for training a convolutional neural network for image recognition using image-conditioned masked language modeling
CN111709339B (zh) * 2020-06-09 2023-09-19 北京百度网讯科技有限公司 一种票据图像识别方法、装置、设备及存储介质
CN111783710B (zh) * 2020-07-09 2023-10-03 上海海事大学 医药影印件的信息提取方法和系统
US11495011B2 (en) * 2020-08-07 2022-11-08 Salesforce, Inc. Template-based key-value extraction for inferring OCR key values within form images
CN112016312B (zh) * 2020-09-08 2023-08-29 平安科技(深圳)有限公司 数据关系抽取方法、装置、电子设备及存储介质
CN112256904A (zh) * 2020-09-21 2021-01-22 天津大学 一种基于视觉描述语句的图像检索方法
CN112257386B (zh) * 2020-10-26 2023-09-26 重庆邮电大学 一种文景转换中场景空间关系信息布局生成的方法
CN114490702B (zh) * 2020-10-26 2024-08-20 陕西省烟草公司渭南市公司 一种离散制造业生产数据坐标图的动态生成方法
CN112270199A (zh) * 2020-11-03 2021-01-26 辽宁工程技术大学 基于CGAN方法的个性化语义空间关键字Top-K查询方法
CN112182022B (zh) * 2020-11-04 2024-04-16 北京安博通科技股份有限公司 基于自然语言的数据查询方法、装置及翻译模型
US11663842B2 (en) * 2020-11-05 2023-05-30 Jpmorgan Chase Bank, N.A. Method and system for tabular information extraction
KR102498403B1 (ko) * 2021-01-29 2023-02-09 포항공과대학교 산학협력단 자연어를 sql로 변환하는 시스템을 위한 훈련 세트 수집 장치 및 그 방법
US11816913B2 (en) * 2021-03-02 2023-11-14 Tata Consultancy Services Limited Methods and systems for extracting information from document images
CN112966131B (zh) * 2021-03-02 2022-09-16 中华人民共和国成都海关 一种海关数据风控类型识别方法、海关智能化风险布控方法、装置、计算机设备及存储介质
CN112633423B (zh) * 2021-03-10 2021-06-22 北京易真学思教育科技有限公司 文本识别模型的训练方法、文本识别方法、装置及设备
US11763585B2 (en) 2021-07-14 2023-09-19 Bank Of America Corporation Multi-layer neural network and convolutional neural network for context sensitive optical character recognition
CN113571052B (zh) * 2021-07-22 2024-09-20 亿咖通(湖北)技术有限公司 一种噪声提取及指令识别方法和电子设备
US11720531B2 (en) * 2021-08-23 2023-08-08 Sap Se Automatic creation of database objects
CN113792064A (zh) * 2021-08-30 2021-12-14 阿里巴巴达摩院(杭州)科技有限公司 实现多轮对话的方法及装置和关系模型生成方法
CN113642327A (zh) * 2021-10-14 2021-11-12 中国光大银行股份有限公司 一种标准知识库的构建方法及装置
CN114090624B (zh) * 2021-11-19 2024-10-18 中国人民银行清算总中心 一种自然语言转化为结构化查询语言的处理方法及装置
US11934801B2 (en) * 2021-12-07 2024-03-19 Microsoft Technology Licensing, Llc Multi-modal program inference
CN114580279B (zh) * 2022-03-02 2024-05-31 广西大学 一种基于lstm的低轨卫星通信自适应编码方法
US20230367961A1 (en) * 2022-05-12 2023-11-16 Dell Products L.P. Automated address data determinations using artificial intelligence techniques
US20240045863A1 (en) * 2022-08-08 2024-02-08 Palantir Technologies Inc. Systems and methods for generating and displaying a data pipeline using a natural language query, and describing a data pipeline using natural language

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0737034A (ja) * 1993-07-15 1995-02-07 Nec Corp 光学式文字読み取り装置
US7461077B1 (en) 2001-07-31 2008-12-02 Nicholas Greenwood Representation of data records
US6778979B2 (en) 2001-08-13 2004-08-17 Xerox Corporation System for automatically generating queries
US7284191B2 (en) * 2001-08-13 2007-10-16 Xerox Corporation Meta-document management system with document identifiers
US7398201B2 (en) 2001-08-14 2008-07-08 Evri Inc. Method and system for enhanced data searching
US20050144189A1 (en) * 2002-07-19 2005-06-30 Keay Edwards Electronic item management and archival system and method of operating the same
US20050043940A1 (en) 2003-08-20 2005-02-24 Marvin Elder Preparing a data source for a natural language query
US7565139B2 (en) * 2004-02-20 2009-07-21 Google Inc. Image-based search engine for mobile phones with camera
JP4349183B2 (ja) * 2004-04-01 2009-10-21 富士ゼロックス株式会社 画像処理装置および画像処理方法
US7587412B2 (en) * 2005-08-23 2009-09-08 Ricoh Company, Ltd. Mixed media reality brokerage network and methods of use
AU2006240074B2 (en) * 2005-04-21 2012-07-05 Theodore G. Paraskevakos System and method for intelligent currency validation
EP2087448A1 (en) * 2006-11-21 2009-08-12 Cameron Telfer Howie A method of retrieving information from a digital image
US20110145068A1 (en) * 2007-09-17 2011-06-16 King Martin T Associating rendered advertisements with digital content
US8520979B2 (en) * 2008-08-19 2013-08-27 Digimarc Corporation Methods and systems for content processing
WO2010096193A2 (en) * 2009-02-18 2010-08-26 Exbiblio B.V. Identifying a document by performing spectral analysis on the contents of the document
US9323784B2 (en) * 2009-12-09 2016-04-26 Google Inc. Image search using text-based elements within the contents of images
US9292493B2 (en) * 2010-01-07 2016-03-22 The Trustees Of The Stevens Institute Of Technology Systems and methods for automatically detecting deception in human communications expressed in digital form
WO2011149558A2 (en) * 2010-05-28 2011-12-01 Abelow Daniel H Reality alternate
US8542926B2 (en) * 2010-11-19 2013-09-24 Microsoft Corporation Script-agnostic text reflow for document images
US20130106894A1 (en) * 2011-10-31 2013-05-02 Elwha LLC, a limited liability company of the State of Delaware Context-sensitive query enrichment
US10146795B2 (en) * 2012-01-12 2018-12-04 Kofax, Inc. Systems and methods for mobile image capture and processing
JP6317772B2 (ja) * 2013-03-15 2018-04-25 トランスレート アブロード,インコーポレイテッド 外国語の文字セットおよびそれらの翻訳を資源に制約のあるモバイル機器上にリアルタイムで表示するためのシステムおよび方法
WO2014144889A2 (en) 2013-03-15 2014-09-18 Amazon Technologies, Inc. Scalable analysis platform for semi-structured data
US9342533B2 (en) * 2013-07-02 2016-05-17 Open Text S.A. System and method for feature recognition and document searching based on feature recognition
US10318804B2 (en) * 2014-06-30 2019-06-11 First American Financial Corporation System and method for data extraction and searching
US10503761B2 (en) * 2014-07-14 2019-12-10 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations
US9710570B2 (en) * 2014-07-14 2017-07-18 International Business Machines Corporation Computing the relevance of a document to concepts not specified in the document
US10162882B2 (en) * 2014-07-14 2018-12-25 Nternational Business Machines Corporation Automatically linking text to concepts in a knowledge base
US10437869B2 (en) * 2014-07-14 2019-10-08 International Business Machines Corporation Automatic new concept definition
WO2018045358A1 (en) * 2016-09-05 2018-03-08 Google Llc Generating theme-based videos
US11080273B2 (en) * 2017-03-20 2021-08-03 International Business Machines Corporation Image support for cognitive intelligence queries
CA3056775A1 (en) * 2017-03-22 2018-09-27 Drilling Info, Inc. Extracting data from electronic documents
US10747761B2 (en) * 2017-05-18 2020-08-18 Salesforce.Com, Inc. Neural network based translation of natural language queries to database queries
US11494887B2 (en) * 2018-03-09 2022-11-08 Schlumberger Technology Corporation System for characterizing oilfield tools
JP7118697B2 (ja) * 2018-03-30 2022-08-16 株式会社Preferred Networks 注視点推定処理装置、注視点推定モデル生成装置、注視点推定処理システム、注視点推定処理方法、プログラム、および注視点推定モデル
US10867404B2 (en) * 2018-08-29 2020-12-15 Toyota Jidosha Kabushiki Kaisha Distance estimation using machine learning

Also Published As

Publication number Publication date
US10936897B2 (en) 2021-03-02
AU2019264603A1 (en) 2020-06-18
JP2020095713A (ja) 2020-06-18
JP7474587B2 (ja) 2024-04-25
EP3660733B1 (en) 2023-06-28
AU2019264603B2 (en) 2024-08-22
CA3059764A1 (en) 2020-05-30
US20200175304A1 (en) 2020-06-04
CN111259724A (zh) 2020-06-09
EP3660733A1 (en) 2020-06-03
EP3660733C0 (en) 2023-06-28

Similar Documents

Publication Publication Date Title
MX2019014440A (es) Metodo y sistema para extraccion de informacion de imagenes de documentos utilizando interfaz conversacional y consulta de base de datos.
US10521464B2 (en) Method and system for extracting, verifying and cataloging technical information from unstructured documents
CN107545791B (zh) 一种利用课件自动生成课堂教学知识图谱的系统和方法
MX2019001576A (es) Sistemas y metodos para la recuperacion contextual de registros electronicos.
US10417267B2 (en) Information processing terminal and method, and information management apparatus and method
CN104090955A (zh) 一种音视频标签自动标注方法及系统
CN107943786B (zh) 一种中文命名实体识别方法及系统
CN104408078A (zh) 一种基于关键词的中英双语平行语料库构建方法
CN103678684A (zh) 一种基于导航信息检索的中文分词方法
WO2016174682A1 (en) Method for generating visual representations of data based on controlled natural language queries and system thereof
CN110750977B (zh) 一种文本相似度计算方法及系统
GB2576654A (en) Method and apparatus for facilitating creation of simulation model
AU2018388932A1 (en) Method and device using wikipedia link structure to generate chinese language concept vector
SG10201811578RA (en) Predictive query processing for complex system lifecycle management
CN104281565A (zh) 语义词典构建方法和装置
CN105426379A (zh) 基于词语位置的关键字权值计算方法
Prakash et al. Mining of bilingual Indian Web documents
US10896227B2 (en) Data processing system, data processing method, and data structure
Costin-Gabriel et al. Archaisms and neologisms identification in texts
JP2018116701A (ja) 印鑑画像の処理装置、方法及び電子機器
CN105608136B (zh) 一种基于汉语复句的语义相关度计算方法
KR20170087367A (ko) 범언어적 시맨틱 웹 데이터 품질평가 방법
Kirschnick et al. Freepal: A Large Collection of Deep Lexico-Syntactic Patterns for Relation Extraction.
US20170154035A1 (en) Text processing system, text processing method, and text processing program
CN105320717B (zh) 本体学习中的个体半自动构建方法