JP7801892B2 - 自然言語文書を検索するシステム - Google Patents

自然言語文書を検索するシステム

Info

Publication number
JP7801892B2
JP7801892B2 JP2021545331A JP2021545331A JP7801892B2 JP 7801892 B2 JP7801892 B2 JP 7801892B2 JP 2021545331 A JP2021545331 A JP 2021545331A JP 2021545331 A JP2021545331 A JP 2021545331A JP 7801892 B2 JP7801892 B2 JP 7801892B2
Authority
JP
Japan
Prior art keywords
graph
natural language
blocks
graphs
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2021545331A
Other languages
English (en)
Japanese (ja)
Other versions
JP2022508737A5 (https=
JP2022508737A (ja
Inventor
アルヴェラ、サカリ
カリオ、ジュホ
ビョルククビスト、セバスチャン
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iprally Technologies Oy
Original Assignee
Iprally Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iprally Technologies Oy filed Critical Iprally Technologies Oy
Publication of JP2022508737A publication Critical patent/JP2022508737A/ja
Publication of JP2022508737A5 publication Critical patent/JP2022508737A5/ja
Application granted granted Critical
Publication of JP7801892B2 publication Critical patent/JP7801892B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Devices For Executing Special Programs (AREA)
JP2021545331A 2018-10-13 2019-10-13 自然言語文書を検索するシステム Active JP7801892B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FI20185863 2018-10-13
FI20185863A FI20185863A1 (fi) 2018-10-13 2018-10-13 Järjestelmä luonnollisen kielen dokumenttien hakemiseksi
PCT/FI2019/050731 WO2020074786A1 (en) 2018-10-13 2019-10-13 System for searching natural language documents

Publications (3)

Publication Number Publication Date
JP2022508737A JP2022508737A (ja) 2022-01-19
JP2022508737A5 JP2022508737A5 (https=) 2025-10-28
JP7801892B2 true JP7801892B2 (ja) 2026-01-19

Family

ID=68583451

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2021545331A Active JP7801892B2 (ja) 2018-10-13 2019-10-13 自然言語文書を検索するシステム

Country Status (6)

Country Link
US (1) US20210350125A1 (https=)
EP (1) EP3864564A1 (https=)
JP (1) JP7801892B2 (https=)
CN (1) CN113196277A (https=)
FI (1) FI20185863A1 (https=)
WO (1) WO2020074786A1 (https=)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7172612B2 (ja) * 2019-01-11 2022-11-16 富士通株式会社 データ拡張プログラム、データ拡張方法およびデータ拡張装置
US20200372019A1 (en) * 2019-05-21 2020-11-26 Sisense Ltd. System and method for automatic completion of queries using natural language processing and an organizational memory
US12430335B2 (en) 2019-05-21 2025-09-30 Sisense Ltd. System and method for improved cache utilization using an organizational memory to generate a dashboard
KR20210046178A (ko) * 2019-10-18 2021-04-28 삼성전자주식회사 전자 장치 및 그 제어 방법
US11403488B2 (en) * 2020-03-19 2022-08-02 Hong Kong Applied Science and Technology Research Institute Company Limited Apparatus and method for recognizing image-based content presented in a structured layout
US11990214B2 (en) * 2020-07-21 2024-05-21 International Business Machines Corporation Handling form data errors arising from natural language processing
US11605187B1 (en) * 2020-08-18 2023-03-14 Corel Corporation Drawing function identification in graphics applications
US12541976B2 (en) * 2021-12-07 2026-02-03 Insight Direct Usa, Inc. Relationship modeling and anomaly detection based on video data
EP4542463A4 (en) 2022-06-15 2025-07-23 Fujitsu Ltd LEARNING PROGRAM, LEARNING METHOD AND INFORMATION PROCESSING DEVICE
US20230419045A1 (en) * 2022-06-24 2023-12-28 International Business Machines Corporation Generating goal-oriented dialogues from documents
US12086557B1 (en) 2023-10-06 2024-09-10 Armada Systems, Inc. Natural language statistical model with alerts
US12067041B1 (en) * 2023-10-06 2024-08-20 Armada Systems, Inc. Time series data to statistical natural language interaction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223466A (ja) 2002-01-31 2003-08-08 Seiko Epson Corp 特許検索装置、特許検索装置の制御方法、制御プログラムおよび記録媒体
JP2016110256A (ja) 2014-12-03 2016-06-20 富士ゼロックス株式会社 情報処理装置及び情報処理プログラム

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
EP0998714A1 (en) * 1997-07-22 2000-05-10 Microsoft Corporation System for processing textual inputs using natural language processing techniques
CN101685455B (zh) * 2008-09-28 2012-02-01 华为技术有限公司 数据检索的方法和系统
EP2959405A4 (en) * 2013-02-19 2016-10-12 Google Inc RESEARCH BASED ON TREATMENT OF NATURAL LANGUAGE
US10810193B1 (en) * 2013-03-13 2020-10-20 Google Llc Querying a data graph using natural language queries
US9189742B2 (en) * 2013-11-20 2015-11-17 Justin London Adaptive virtual intelligent agent
US10095689B2 (en) * 2014-12-29 2018-10-09 International Business Machines Corporation Automated ontology building
US20170075877A1 (en) * 2015-09-16 2017-03-16 Marie-Therese LEPELTIER Methods and systems of handling patent claims
US10078634B2 (en) * 2015-12-30 2018-09-18 International Business Machines Corporation Visualizing and exploring natural-language text
US9830315B1 (en) * 2016-07-13 2017-11-28 Xerox Corporation Sequence-based structured prediction for semantic parsing
US10255269B2 (en) * 2016-12-30 2019-04-09 Microsoft Technology Licensing, Llc Graph long short term memory for syntactic relationship discovery
CN107844608B (zh) * 2017-12-06 2021-11-30 湖南大学 一种基于词向量的句子相似度比较方法
US10891321B2 (en) * 2018-08-28 2021-01-12 American Chemical Society Systems and methods for performing a computer-implemented prior art search

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223466A (ja) 2002-01-31 2003-08-08 Seiko Epson Corp 特許検索装置、特許検索装置の制御方法、制御プログラムおよび記録媒体
JP2016110256A (ja) 2014-12-03 2016-06-20 富士ゼロックス株式会社 情報処理装置及び情報処理プログラム

Also Published As

Publication number Publication date
WO2020074786A1 (en) 2020-04-16
EP3864564A1 (en) 2021-08-18
CN113196277A (zh) 2021-07-30
JP2022508737A (ja) 2022-01-19
FI20185863A1 (fi) 2020-04-14
US20210350125A1 (en) 2021-11-11

Similar Documents

Publication Publication Date Title
JP7801892B2 (ja) 自然言語文書を検索するシステム
JP7826007B2 (ja) 自然言語検索システムの訓練方法、探索システムおよび対応の使用
Ristoski et al. Rdf2vec: Rdf graph embeddings for data mining
CN111159223B (zh) 一种基于结构化嵌入的交互式代码搜索方法及装置
JP2022508738A (ja) 特許文書を検索するための方法
Ranjan et al. LFNN: Lion fuzzy neural network-based evolutionary model for text classification using context and sense based features
US20230138014A1 (en) System and method for performing a search in a vector space based search engine
US12124802B2 (en) System and method for analyzing similarity of natural language data
CN118245564B (zh) 一种支持语义查重查新的特征比对库构建方法及装置
Gelman et al. A language-agnostic model for semantic source code labeling
Dawar et al. Comparing topic modeling and named entity recognition techniques for the semantic indexing of a landscape architecture textbook
Shen et al. Practical text phylogeny for real-world settings
Vahidnia et al. Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping.
Putri et al. Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field
Liu et al. Feature extraction of dialogue text based on big data and machine learning
Peng et al. Z-TCA: Fast algorithm for triadic concept analysis using zero-suppressed decision diagrams
Mohemad et al. Ontological-based information extraction of construction tender documents
Ďuračík et al. Using concepts of text based plagiarism detection in source code plagiarism analysis
Wang et al. A Method for Automatic Code Comment Generation Based on Different Keyword Sequences
Deng et al. Retrieval-Augmented Generation in Finance: A Comparative Evaluation of Vector, Hierarchical, and Graph-Based RAG Models
Goyal et al. Improved PAM Algorithm for Text Clustering in Data Mining
Alahmad Detection of Similar Text Documents Based on Self-Organizing Maps
Silva Sparse distributed representations as word embeddings for language understanding
LEVY et al. code2vec: Learning Distributed Representations of Code
Hévízi et al. Improving recognition accuracy on structured documents by learning structural patterns

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20221013

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20221013

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20230901

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20231130

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20240201

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240229

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20240312

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20250710

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20250903

A524 Written submission of copy of amendment under article 19 pct

Free format text: JAPANESE INTERMEDIATE CODE: A524

Effective date: 20251010

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20260105

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20260106

R150 Certificate of patent or registration of utility model

Ref document number: 7801892

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150