EP4085343A4 - Extraction de texte basée sur le domaine - Google Patents

Extraction de texte basée sur le domaine Download PDF

Info

Publication number
EP4085343A4
EP4085343A4 EP20910797.8A EP20910797A EP4085343A4 EP 4085343 A4 EP4085343 A4 EP 4085343A4 EP 20910797 A EP20910797 A EP 20910797A EP 4085343 A4 EP4085343 A4 EP 4085343A4
Authority
EP
European Patent Office
Prior art keywords
domain based
text extraction
based text
extraction
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP20910797.8A
Other languages
German (de)
English (en)
Other versions
EP4085343A1 (fr
Inventor
Madhusudan Singh
Kaushik Halder
Nirmal VANAPALLI VENKATA RAMESH RAYULU
Aritra Ghosh Dastidar
Ajay SHA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
L&T Technology Services Ltd
Original Assignee
L&T Technology Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by L&T Technology Services Ltd filed Critical L&T Technology Services Ltd
Publication of EP4085343A1 publication Critical patent/EP4085343A1/fr
Publication of EP4085343A4 publication Critical patent/EP4085343A4/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/10Recognition assisted with metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
EP20910797.8A 2019-12-30 2020-12-30 Extraction de texte basée sur le domaine Pending EP4085343A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201941054421 2019-12-30
PCT/IB2020/062535 WO2021137166A1 (fr) 2019-12-30 2020-12-30 Extraction de texte basée sur le domaine

Publications (2)

Publication Number Publication Date
EP4085343A1 EP4085343A1 (fr) 2022-11-09
EP4085343A4 true EP4085343A4 (fr) 2024-01-03

Family

ID=76685920

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20910797.8A Pending EP4085343A4 (fr) 2019-12-30 2020-12-30 Extraction de texte basée sur le domaine

Country Status (5)

Country Link
EP (1) EP4085343A4 (fr)
JP (1) JP2023507881A (fr)
AU (1) AU2020418619A1 (fr)
CA (1) CA3156204A1 (fr)
WO (1) WO2021137166A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912845B (zh) * 2023-06-16 2024-03-19 广东电网有限责任公司佛山供电局 一种基于nlp与ai的智能内容识别与分析方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10318804B2 (en) * 2014-06-30 2019-06-11 First American Financial Corporation System and method for data extraction and searching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
No further relevant documents disclosed *

Also Published As

Publication number Publication date
EP4085343A1 (fr) 2022-11-09
WO2021137166A1 (fr) 2021-07-08
AU2020418619A1 (en) 2022-05-26
JP2023507881A (ja) 2023-02-28
CA3156204A1 (fr) 2021-07-08

Similar Documents

Publication Publication Date Title
EP3577570A4 (fr) Extraction d'informations à partir de documents
EP3891691A4 (fr) Contenu configurable selon le déplacement
EP3574189A4 (fr) Structure de voie d'écoulement unitaire
EP3539310A4 (fr) Mécanisme de transfert de données de domaine ps amélioré
EP3721607A4 (fr) Séparation de domaines rstp multiples
EP3678511A4 (fr) Ensemble meuble de rangement
EP3638854A4 (fr) Pilier combiné
EP3674023A4 (fr) Insert
EP3646765A4 (fr) Papier hygiénique
EP3646764A4 (fr) Papier hygiénique
EP3710145A4 (fr) Compositions de polycarbonate
EP4085343A4 (fr) Extraction de texte basée sur le domaine
EP3881681A4 (fr) Boisson contenant du chitosane
EP3823499A4 (fr) Ensemble d'extraction de contenant
EP3805353A4 (fr) Boisson alcoolisée
EP3807076A4 (fr) Unité d'extraction de liquide
EP3733034A4 (fr) Papier hygiénique
EP3697870A4 (fr) Desémulsifiants respectueux de l'environnement
EP3696847A4 (fr) Dispositif d'ouverture de conteneur
EP3625652A4 (fr) Interface de sélection de caractères entrelacés
EP3584452A4 (fr) Composant résineux
EP3892371A4 (fr) Structure fonctionnelle
EP3901575A4 (fr) Niveau
EP3666086A4 (fr) Boisson
EP3682319A4 (fr) Éditeur de document intégré

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20211125

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Free format text: PREVIOUS MAIN CLASS: G06F0016000000

Ipc: G06F0018220000

A4 Supplementary search report drawn up and despatched

Effective date: 20231205

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 40/216 20200101ALI20231129BHEP

Ipc: G06V 30/416 20220101ALI20231129BHEP

Ipc: G06V 30/262 20220101ALI20231129BHEP

Ipc: G06F 40/279 20200101ALI20231129BHEP

Ipc: G06F 18/2413 20230101ALI20231129BHEP

Ipc: G06F 18/22 20230101AFI20231129BHEP