JP7801892B2 - 自然言語文書を検索するシステム - Google Patents
自然言語文書を検索するシステムInfo
- Publication number
- JP7801892B2 JP7801892B2 JP2021545331A JP2021545331A JP7801892B2 JP 7801892 B2 JP7801892 B2 JP 7801892B2 JP 2021545331 A JP2021545331 A JP 2021545331A JP 2021545331 A JP2021545331 A JP 2021545331A JP 7801892 B2 JP7801892 B2 JP 7801892B2
- Authority
- JP
- Japan
- Prior art keywords
- graph
- natural language
- blocks
- graphs
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/418—Document matching, e.g. of document images
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Medical Informatics (AREA)
- Computer Graphics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
- Devices For Executing Special Programs (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FI20185863 | 2018-10-13 | ||
| FI20185863A FI20185863A1 (fi) | 2018-10-13 | 2018-10-13 | Järjestelmä luonnollisen kielen dokumenttien hakemiseksi |
| PCT/FI2019/050731 WO2020074786A1 (en) | 2018-10-13 | 2019-10-13 | System for searching natural language documents |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2022508737A JP2022508737A (ja) | 2022-01-19 |
| JP2022508737A5 JP2022508737A5 (https=) | 2025-10-28 |
| JP7801892B2 true JP7801892B2 (ja) | 2026-01-19 |
Family
ID=68583451
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2021545331A Active JP7801892B2 (ja) | 2018-10-13 | 2019-10-13 | 自然言語文書を検索するシステム |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US20210350125A1 (https=) |
| EP (1) | EP3864564A1 (https=) |
| JP (1) | JP7801892B2 (https=) |
| CN (1) | CN113196277A (https=) |
| FI (1) | FI20185863A1 (https=) |
| WO (1) | WO2020074786A1 (https=) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7172612B2 (ja) * | 2019-01-11 | 2022-11-16 | 富士通株式会社 | データ拡張プログラム、データ拡張方法およびデータ拡張装置 |
| US20200372019A1 (en) * | 2019-05-21 | 2020-11-26 | Sisense Ltd. | System and method for automatic completion of queries using natural language processing and an organizational memory |
| US12430335B2 (en) | 2019-05-21 | 2025-09-30 | Sisense Ltd. | System and method for improved cache utilization using an organizational memory to generate a dashboard |
| KR20210046178A (ko) * | 2019-10-18 | 2021-04-28 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
| US11403488B2 (en) * | 2020-03-19 | 2022-08-02 | Hong Kong Applied Science and Technology Research Institute Company Limited | Apparatus and method for recognizing image-based content presented in a structured layout |
| US11990214B2 (en) * | 2020-07-21 | 2024-05-21 | International Business Machines Corporation | Handling form data errors arising from natural language processing |
| US11605187B1 (en) * | 2020-08-18 | 2023-03-14 | Corel Corporation | Drawing function identification in graphics applications |
| US12541976B2 (en) * | 2021-12-07 | 2026-02-03 | Insight Direct Usa, Inc. | Relationship modeling and anomaly detection based on video data |
| EP4542463A4 (en) | 2022-06-15 | 2025-07-23 | Fujitsu Ltd | LEARNING PROGRAM, LEARNING METHOD AND INFORMATION PROCESSING DEVICE |
| US20230419045A1 (en) * | 2022-06-24 | 2023-12-28 | International Business Machines Corporation | Generating goal-oriented dialogues from documents |
| US12086557B1 (en) | 2023-10-06 | 2024-09-10 | Armada Systems, Inc. | Natural language statistical model with alerts |
| US12067041B1 (en) * | 2023-10-06 | 2024-08-20 | Armada Systems, Inc. | Time series data to statistical natural language interaction |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003223466A (ja) | 2002-01-31 | 2003-08-08 | Seiko Epson Corp | 特許検索装置、特許検索装置の制御方法、制御プログラムおよび記録媒体 |
| JP2016110256A (ja) | 2014-12-03 | 2016-06-20 | 富士ゼロックス株式会社 | 情報処理装置及び情報処理プログラム |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5963940A (en) * | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
| EP0998714A1 (en) * | 1997-07-22 | 2000-05-10 | Microsoft Corporation | System for processing textual inputs using natural language processing techniques |
| CN101685455B (zh) * | 2008-09-28 | 2012-02-01 | 华为技术有限公司 | 数据检索的方法和系统 |
| EP2959405A4 (en) * | 2013-02-19 | 2016-10-12 | Google Inc | RESEARCH BASED ON TREATMENT OF NATURAL LANGUAGE |
| US10810193B1 (en) * | 2013-03-13 | 2020-10-20 | Google Llc | Querying a data graph using natural language queries |
| US9189742B2 (en) * | 2013-11-20 | 2015-11-17 | Justin London | Adaptive virtual intelligent agent |
| US10095689B2 (en) * | 2014-12-29 | 2018-10-09 | International Business Machines Corporation | Automated ontology building |
| US20170075877A1 (en) * | 2015-09-16 | 2017-03-16 | Marie-Therese LEPELTIER | Methods and systems of handling patent claims |
| US10078634B2 (en) * | 2015-12-30 | 2018-09-18 | International Business Machines Corporation | Visualizing and exploring natural-language text |
| US9830315B1 (en) * | 2016-07-13 | 2017-11-28 | Xerox Corporation | Sequence-based structured prediction for semantic parsing |
| US10255269B2 (en) * | 2016-12-30 | 2019-04-09 | Microsoft Technology Licensing, Llc | Graph long short term memory for syntactic relationship discovery |
| CN107844608B (zh) * | 2017-12-06 | 2021-11-30 | 湖南大学 | 一种基于词向量的句子相似度比较方法 |
| US10891321B2 (en) * | 2018-08-28 | 2021-01-12 | American Chemical Society | Systems and methods for performing a computer-implemented prior art search |
-
2018
- 2018-10-13 FI FI20185863A patent/FI20185863A1/fi unknown
-
2019
- 2019-10-13 WO PCT/FI2019/050731 patent/WO2020074786A1/en not_active Ceased
- 2019-10-13 EP EP19805356.3A patent/EP3864564A1/en not_active Ceased
- 2019-10-13 US US17/284,796 patent/US20210350125A1/en active Pending
- 2019-10-13 CN CN201980082810.7A patent/CN113196277A/zh active Pending
- 2019-10-13 JP JP2021545331A patent/JP7801892B2/ja active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003223466A (ja) | 2002-01-31 | 2003-08-08 | Seiko Epson Corp | 特許検索装置、特許検索装置の制御方法、制御プログラムおよび記録媒体 |
| JP2016110256A (ja) | 2014-12-03 | 2016-06-20 | 富士ゼロックス株式会社 | 情報処理装置及び情報処理プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2020074786A1 (en) | 2020-04-16 |
| EP3864564A1 (en) | 2021-08-18 |
| CN113196277A (zh) | 2021-07-30 |
| JP2022508737A (ja) | 2022-01-19 |
| FI20185863A1 (fi) | 2020-04-14 |
| US20210350125A1 (en) | 2021-11-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7801892B2 (ja) | 自然言語文書を検索するシステム | |
| JP7826007B2 (ja) | 自然言語検索システムの訓練方法、探索システムおよび対応の使用 | |
| Ristoski et al. | Rdf2vec: Rdf graph embeddings for data mining | |
| CN111159223B (zh) | 一种基于结构化嵌入的交互式代码搜索方法及装置 | |
| JP2022508738A (ja) | 特許文書を検索するための方法 | |
| Ranjan et al. | LFNN: Lion fuzzy neural network-based evolutionary model for text classification using context and sense based features | |
| US20230138014A1 (en) | System and method for performing a search in a vector space based search engine | |
| US12124802B2 (en) | System and method for analyzing similarity of natural language data | |
| CN118245564B (zh) | 一种支持语义查重查新的特征比对库构建方法及装置 | |
| Gelman et al. | A language-agnostic model for semantic source code labeling | |
| Dawar et al. | Comparing topic modeling and named entity recognition techniques for the semantic indexing of a landscape architecture textbook | |
| Shen et al. | Practical text phylogeny for real-world settings | |
| Vahidnia et al. | Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping. | |
| Putri et al. | Bahasa Indonesia pre-trained word vector generation using word2vec for computer and information technology field | |
| Liu et al. | Feature extraction of dialogue text based on big data and machine learning | |
| Peng et al. | Z-TCA: Fast algorithm for triadic concept analysis using zero-suppressed decision diagrams | |
| Mohemad et al. | Ontological-based information extraction of construction tender documents | |
| Ďuračík et al. | Using concepts of text based plagiarism detection in source code plagiarism analysis | |
| Wang et al. | A Method for Automatic Code Comment Generation Based on Different Keyword Sequences | |
| Deng et al. | Retrieval-Augmented Generation in Finance: A Comparative Evaluation of Vector, Hierarchical, and Graph-Based RAG Models | |
| Goyal et al. | Improved PAM Algorithm for Text Clustering in Data Mining | |
| Alahmad | Detection of Similar Text Documents Based on Self-Organizing Maps | |
| Silva | Sparse distributed representations as word embeddings for language understanding | |
| LEVY et al. | code2vec: Learning Distributed Representations of Code | |
| Hévízi et al. | Improving recognition accuracy on structured documents by learning structural patterns |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20221013 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20221013 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20230901 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20231130 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20240201 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240229 |
|
| A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 20240312 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20250710 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20250903 |
|
| A524 | Written submission of copy of amendment under article 19 pct |
Free format text: JAPANESE INTERMEDIATE CODE: A524 Effective date: 20251010 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20260105 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20260106 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7801892 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |