JP7826007B2 - 自然言語検索システムの訓練方法、探索システムおよび対応の使用 - Google Patents
自然言語検索システムの訓練方法、探索システムおよび対応の使用Info
- Publication number
- JP7826007B2 JP7826007B2 JP2021545333A JP2021545333A JP7826007B2 JP 7826007 B2 JP7826007 B2 JP 7826007B2 JP 2021545333 A JP2021545333 A JP 2021545333A JP 2021545333 A JP2021545333 A JP 2021545333A JP 7826007 B2 JP7826007 B2 JP 7826007B2
- Authority
- JP
- Japan
- Prior art keywords
- graph
- training
- block
- natural language
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/154—Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
- G06Q50/184—Intellectual property management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/11—Patent retrieval
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Technology Law (AREA)
- Tourism & Hospitality (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Medical Informatics (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
- Electrically Operated Instructional Devices (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FI20185865A FI20185865A1 (fi) | 2018-10-13 | 2018-10-13 | Menetelmä luonnollisen kielen hakujärjestelmän opettamiseksi, hakujärjestelmä ja vastaava käyttö |
| FI20185865 | 2018-10-13 | ||
| PCT/FI2019/050733 WO2020074788A1 (en) | 2018-10-13 | 2019-10-13 | Method of training a natural language search system, search system and corresponding use |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2022513353A JP2022513353A (ja) | 2022-02-07 |
| JP2022513353A5 JP2022513353A5 (https=) | 2022-10-21 |
| JP7826007B2 true JP7826007B2 (ja) | 2026-03-09 |
Family
ID=68583453
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2021545333A Active JP7826007B2 (ja) | 2018-10-13 | 2019-10-13 | 自然言語検索システムの訓練方法、探索システムおよび対応の使用 |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US12039272B2 (https=) |
| EP (1) | EP3864566A1 (https=) |
| JP (1) | JP7826007B2 (https=) |
| CN (1) | CN113196278B (https=) |
| FI (1) | FI20185865A1 (https=) |
| WO (1) | WO2020074788A1 (https=) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| FI20185865A1 (fi) * | 2018-10-13 | 2020-04-14 | Iprally Tech Oy | Menetelmä luonnollisen kielen hakujärjestelmän opettamiseksi, hakujärjestelmä ja vastaava käyttö |
| US12124802B2 (en) * | 2019-05-18 | 2024-10-22 | IPRally Technologies Oy | System and method for analyzing similarity of natural language data |
| US12572592B2 (en) * | 2020-03-05 | 2026-03-10 | International Business Machines Corporation | Automated graph embedding recommendations based on extracted graph features |
| CN111539228B (zh) * | 2020-04-29 | 2023-08-08 | 支付宝(杭州)信息技术有限公司 | 向量模型训练方法及装置、相似度确定方法及装置 |
| US11972225B2 (en) * | 2020-10-01 | 2024-04-30 | Shrey Pathak | Automated patent language generation |
| US12306906B2 (en) * | 2021-11-14 | 2025-05-20 | Microsoft Technology Licensing, Llc | Adaptive token sampling for efficient transformer |
| CN114443863B (zh) * | 2022-04-07 | 2022-07-26 | 北京网藤科技有限公司 | 工控网络中基于机器学习的攻击向量生成方法及系统 |
| CN116795789B (zh) * | 2023-08-24 | 2024-04-19 | 卓望信息技术(北京)有限公司 | 自动生成专利检索报告的方法及装置 |
| US12086557B1 (en) | 2023-10-06 | 2024-09-10 | Armada Systems, Inc. | Natural language statistical model with alerts |
| US12067041B1 (en) * | 2023-10-06 | 2024-08-20 | Armada Systems, Inc. | Time series data to statistical natural language interaction |
| ZA202408097B (en) * | 2024-09-21 | 2025-05-28 | Artem Aleksandrovich Kravchenko | Computer device for pre-training, or training, or fine tuning of a clustering model |
| ZA202407998B (en) * | 2024-09-21 | 2025-05-28 | Artem Aleksandrovich Kravchenko | Method for pre-training, or training, or fine tuning of a clustering model |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2007102771A (ja) | 2005-09-30 | 2007-04-19 | Mitsubishi Electric Research Laboratories Inc | オブジェクトのクラスの低次元モデルの集合から該クラスの特定のモデルを選択する方法 |
| JP2011257809A (ja) | 2010-06-04 | 2011-12-22 | Toshiba Corp | 文書分析装置 |
Family Cites Families (29)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6339767B1 (en) | 1997-06-02 | 2002-01-15 | Aurigin Systems, Inc. | Using hyperbolic trees to visualize data generated by patent-centric and group-oriented data processing |
| US8095581B2 (en) | 1999-02-05 | 2012-01-10 | Gregory A Stobbs | Computer-implemented patent portfolio analysis method and apparatus |
| CN107180264A (zh) * | 2006-07-12 | 2017-09-19 | 柯法克斯公司 | 用于对文档和数据的转导分类方法 |
| US20080086432A1 (en) * | 2006-07-12 | 2008-04-10 | Schmidtler Mauritius A R | Data classification methods using machine learning techniques |
| CN101196905A (zh) * | 2007-12-05 | 2008-06-11 | 覃征 | 一种智能图形检索方法 |
| US20100131513A1 (en) * | 2008-10-23 | 2010-05-27 | Lundberg Steven W | Patent mapping |
| US9110971B2 (en) | 2010-02-03 | 2015-08-18 | Thomson Reuters Global Resources | Method and system for ranking intellectual property documents using claim analysis |
| US9176949B2 (en) | 2011-07-06 | 2015-11-03 | Altamira Technologies Corporation | Systems and methods for sentence comparison and sentence-based search |
| US9202176B1 (en) | 2011-08-08 | 2015-12-01 | Gravity.Com, Inc. | Entity analysis system |
| US20130086042A1 (en) * | 2011-10-03 | 2013-04-04 | Steven W. Lundberg | System and method for information disclosure statement management and prior art cross-citation control |
| EP2959405A4 (en) | 2013-02-19 | 2016-10-12 | Google Inc | RESEARCH BASED ON TREATMENT OF NATURAL LANGUAGE |
| US10810193B1 (en) * | 2013-03-13 | 2020-10-20 | Google Llc | Querying a data graph using natural language queries |
| US10162882B2 (en) | 2014-07-14 | 2018-12-25 | Nternational Business Machines Corporation | Automatically linking text to concepts in a knowledge base |
| US10095689B2 (en) | 2014-12-29 | 2018-10-09 | International Business Machines Corporation | Automated ontology building |
| US10073890B1 (en) * | 2015-08-03 | 2018-09-11 | Marca Research & Development International, Llc | Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm |
| US20170075877A1 (en) | 2015-09-16 | 2017-03-16 | Marie-Therese LEPELTIER | Methods and systems of handling patent claims |
| US10831762B2 (en) * | 2015-11-06 | 2020-11-10 | International Business Machines Corporation | Extracting and denoising concept mentions using distributed representations of concepts |
| CN105260727B (zh) * | 2015-11-12 | 2018-09-21 | 武汉大学 | 基于图像处理与序列标注的学术文献语义再结构化方法 |
| US20180018564A1 (en) | 2016-07-13 | 2018-01-18 | Palantir Technologies Inc. | Artificial intelligence-based prior art document identification system |
| US10255269B2 (en) | 2016-12-30 | 2019-04-09 | Microsoft Technology Licensing, Llc | Graph long short term memory for syntactic relationship discovery |
| CN108334805B (zh) * | 2017-03-08 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 检测文档阅读顺序的方法和装置 |
| US20180300323A1 (en) * | 2017-04-17 | 2018-10-18 | Lee & Hayes, PLLC | Multi-Factor Document Analysis |
| US10817781B2 (en) * | 2017-04-28 | 2020-10-27 | SparkCognition, Inc. | Generation of document classifiers |
| US10776566B2 (en) | 2017-05-24 | 2020-09-15 | Nathan J. DeVries | System and method of document generation |
| CN107247780A (zh) * | 2017-06-12 | 2017-10-13 | 北京理工大学 | 一种基于知识本体的专利文献相似性度量方法 |
| CN110019806B (zh) * | 2017-12-25 | 2021-08-06 | 中移动信息技术有限公司 | 一种文档聚类方法及设备 |
| CN108717601B (zh) * | 2018-05-08 | 2022-05-06 | 西安交通大学 | 一种面向企业难题的多创新方法集成与融合方法 |
| US10891321B2 (en) * | 2018-08-28 | 2021-01-12 | American Chemical Society | Systems and methods for performing a computer-implemented prior art search |
| FI20185865A1 (fi) * | 2018-10-13 | 2020-04-14 | Iprally Tech Oy | Menetelmä luonnollisen kielen hakujärjestelmän opettamiseksi, hakujärjestelmä ja vastaava käyttö |
-
2018
- 2018-10-13 FI FI20185865A patent/FI20185865A1/fi unknown
-
2019
- 2019-10-13 US US17/284,799 patent/US12039272B2/en active Active
- 2019-10-13 EP EP19805358.9A patent/EP3864566A1/en not_active Ceased
- 2019-10-13 WO PCT/FI2019/050733 patent/WO2020074788A1/en not_active Ceased
- 2019-10-13 JP JP2021545333A patent/JP7826007B2/ja active Active
- 2019-10-13 CN CN201980082811.1A patent/CN113196278B/zh active Active
-
2024
- 2024-06-13 US US18/741,847 patent/US20240370649A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2007102771A (ja) | 2005-09-30 | 2007-04-19 | Mitsubishi Electric Research Laboratories Inc | オブジェクトのクラスの低次元モデルの集合から該クラスの特定のモデルを選択する方法 |
| JP2011257809A (ja) | 2010-06-04 | 2011-12-22 | Toshiba Corp | 文書分析装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2022513353A (ja) | 2022-02-07 |
| CN113196278A (zh) | 2021-07-30 |
| FI20185865A1 (fi) | 2020-04-14 |
| US20240370649A1 (en) | 2024-11-07 |
| US20210397790A1 (en) | 2021-12-23 |
| CN113196278B (zh) | 2025-09-12 |
| WO2020074788A1 (en) | 2020-04-16 |
| EP3864566A1 (en) | 2021-08-18 |
| US12039272B2 (en) | 2024-07-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7826007B2 (ja) | 自然言語検索システムの訓練方法、探索システムおよび対応の使用 | |
| JP7801892B2 (ja) | 自然言語文書を検索するシステム | |
| Ristoski et al. | Rdf2vec: Rdf graph embeddings for data mining | |
| CN111159223B (zh) | 一种基于结构化嵌入的交互式代码搜索方法及装置 | |
| JP2022508738A (ja) | 特許文書を検索するための方法 | |
| US20230138014A1 (en) | System and method for performing a search in a vector space based search engine | |
| US12124802B2 (en) | System and method for analyzing similarity of natural language data | |
| Béchet et al. | Discovering linguistic patterns using sequence mining | |
| CN118245564B (zh) | 一种支持语义查重查新的特征比对库构建方法及装置 | |
| CN116756266A (zh) | 基于外部知识和主题信息的服装文本摘要生成方法 | |
| Gelman et al. | A language-agnostic model for semantic source code labeling | |
| Sun | A natural language interface for querying graph databases | |
| Dawar et al. | Comparing topic modeling and named entity recognition techniques for the semantic indexing of a landscape architecture textbook | |
| Shen et al. | Practical text phylogeny for real-world settings | |
| Vahidnia et al. | Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping. | |
| Liu et al. | Feature extraction of dialogue text based on big data and machine learning | |
| Mohemad et al. | Ontological-based information extraction of construction tender documents | |
| Alahmad | Detection of Similar Text Documents Based on Self-Organizing Maps | |
| Ďuračík et al. | Using concepts of text based plagiarism detection in source code plagiarism analysis | |
| Deng et al. | Retrieval-Augmented Generation in Finance: A Comparative Evaluation of Vector, Hierarchical, and Graph-Based RAG Models | |
| Goyal et al. | Improved PAM Algorithm for Text Clustering in Data Mining | |
| Wang et al. | A Method for Automatic Code Comment Generation Based on Different Keyword Sequences | |
| Mansoor | Improving document representation using retrofitting | |
| CN120822506A (zh) | 用于生成关于文档的响应的方法、设备和计算机程序产品 | |
| Wai | Classification based automatic information extraction system from free text |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20221013 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20221013 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20230901 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20231130 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20240201 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240229 |
|
| A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 20240312 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240712 |
|
| A911 | Transfer to examiner for re-examination before appeal (zenchi) |
Free format text: JAPANESE INTERMEDIATE CODE: A911 Effective date: 20240925 |
|
| A912 | Re-examination (zenchi) completed and case transferred to appeal board |
Free format text: JAPANESE INTERMEDIATE CODE: A912 Effective date: 20241122 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20250710 |
|
| A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20250903 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20251010 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20251208 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20260225 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7826007 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |