JP7826007B2 - 自然言語検索システムの訓練方法、探索システムおよび対応の使用 - Google Patents

自然言語検索システムの訓練方法、探索システムおよび対応の使用

Info

Publication number
JP7826007B2
JP7826007B2 JP2021545333A JP2021545333A JP7826007B2 JP 7826007 B2 JP7826007 B2 JP 7826007B2 JP 2021545333 A JP2021545333 A JP 2021545333A JP 2021545333 A JP2021545333 A JP 2021545333A JP 7826007 B2 JP7826007 B2 JP 7826007B2
Authority
JP
Japan
Prior art keywords
graph
training
block
natural language
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2021545333A
Other languages
English (en)
Japanese (ja)
Other versions
JP2022513353A (ja
JP2022513353A5 (https=
Inventor
アルヴェラ、サカリ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iprally Technologies Oy
Original Assignee
Iprally Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iprally Technologies Oy filed Critical Iprally Technologies Oy
Publication of JP2022513353A publication Critical patent/JP2022513353A/ja
Publication of JP2022513353A5 publication Critical patent/JP2022513353A5/ja
Application granted granted Critical
Publication of JP7826007B2 publication Critical patent/JP7826007B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Creation or modification of classes or clusters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • G06Q50/184Intellectual property management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/11Patent retrieval
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)
JP2021545333A 2018-10-13 2019-10-13 自然言語検索システムの訓練方法、探索システムおよび対応の使用 Active JP7826007B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FI20185865A FI20185865A1 (fi) 2018-10-13 2018-10-13 Menetelmä luonnollisen kielen hakujärjestelmän opettamiseksi, hakujärjestelmä ja vastaava käyttö
FI20185865 2018-10-13
PCT/FI2019/050733 WO2020074788A1 (en) 2018-10-13 2019-10-13 Method of training a natural language search system, search system and corresponding use

Publications (3)

Publication Number Publication Date
JP2022513353A JP2022513353A (ja) 2022-02-07
JP2022513353A5 JP2022513353A5 (https=) 2022-10-21
JP7826007B2 true JP7826007B2 (ja) 2026-03-09

Family

ID=68583453

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2021545333A Active JP7826007B2 (ja) 2018-10-13 2019-10-13 自然言語検索システムの訓練方法、探索システムおよび対応の使用

Country Status (6)

Country Link
US (2) US12039272B2 (https=)
EP (1) EP3864566A1 (https=)
JP (1) JP7826007B2 (https=)
CN (1) CN113196278B (https=)
FI (1) FI20185865A1 (https=)
WO (1) WO2020074788A1 (https=)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI20185865A1 (fi) * 2018-10-13 2020-04-14 Iprally Tech Oy Menetelmä luonnollisen kielen hakujärjestelmän opettamiseksi, hakujärjestelmä ja vastaava käyttö
US12124802B2 (en) * 2019-05-18 2024-10-22 IPRally Technologies Oy System and method for analyzing similarity of natural language data
US12572592B2 (en) * 2020-03-05 2026-03-10 International Business Machines Corporation Automated graph embedding recommendations based on extracted graph features
CN111539228B (zh) * 2020-04-29 2023-08-08 支付宝(杭州)信息技术有限公司 向量模型训练方法及装置、相似度确定方法及装置
US11972225B2 (en) * 2020-10-01 2024-04-30 Shrey Pathak Automated patent language generation
US12306906B2 (en) * 2021-11-14 2025-05-20 Microsoft Technology Licensing, Llc Adaptive token sampling for efficient transformer
CN114443863B (zh) * 2022-04-07 2022-07-26 北京网藤科技有限公司 工控网络中基于机器学习的攻击向量生成方法及系统
CN116795789B (zh) * 2023-08-24 2024-04-19 卓望信息技术(北京)有限公司 自动生成专利检索报告的方法及装置
US12086557B1 (en) 2023-10-06 2024-09-10 Armada Systems, Inc. Natural language statistical model with alerts
US12067041B1 (en) * 2023-10-06 2024-08-20 Armada Systems, Inc. Time series data to statistical natural language interaction
ZA202408097B (en) * 2024-09-21 2025-05-28 Artem Aleksandrovich Kravchenko Computer device for pre-training, or training, or fine tuning of a clustering model
ZA202407998B (en) * 2024-09-21 2025-05-28 Artem Aleksandrovich Kravchenko Method for pre-training, or training, or fine tuning of a clustering model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007102771A (ja) 2005-09-30 2007-04-19 Mitsubishi Electric Research Laboratories Inc オブジェクトのクラスの低次元モデルの集合から該クラスの特定のモデルを選択する方法
JP2011257809A (ja) 2010-06-04 2011-12-22 Toshiba Corp 文書分析装置

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6339767B1 (en) 1997-06-02 2002-01-15 Aurigin Systems, Inc. Using hyperbolic trees to visualize data generated by patent-centric and group-oriented data processing
US8095581B2 (en) 1999-02-05 2012-01-10 Gregory A Stobbs Computer-implemented patent portfolio analysis method and apparatus
CN107180264A (zh) * 2006-07-12 2017-09-19 柯法克斯公司 用于对文档和数据的转导分类方法
US20080086432A1 (en) * 2006-07-12 2008-04-10 Schmidtler Mauritius A R Data classification methods using machine learning techniques
CN101196905A (zh) * 2007-12-05 2008-06-11 覃征 一种智能图形检索方法
US20100131513A1 (en) * 2008-10-23 2010-05-27 Lundberg Steven W Patent mapping
US9110971B2 (en) 2010-02-03 2015-08-18 Thomson Reuters Global Resources Method and system for ranking intellectual property documents using claim analysis
US9176949B2 (en) 2011-07-06 2015-11-03 Altamira Technologies Corporation Systems and methods for sentence comparison and sentence-based search
US9202176B1 (en) 2011-08-08 2015-12-01 Gravity.Com, Inc. Entity analysis system
US20130086042A1 (en) * 2011-10-03 2013-04-04 Steven W. Lundberg System and method for information disclosure statement management and prior art cross-citation control
EP2959405A4 (en) 2013-02-19 2016-10-12 Google Inc RESEARCH BASED ON TREATMENT OF NATURAL LANGUAGE
US10810193B1 (en) * 2013-03-13 2020-10-20 Google Llc Querying a data graph using natural language queries
US10162882B2 (en) 2014-07-14 2018-12-25 Nternational Business Machines Corporation Automatically linking text to concepts in a knowledge base
US10095689B2 (en) 2014-12-29 2018-10-09 International Business Machines Corporation Automated ontology building
US10073890B1 (en) * 2015-08-03 2018-09-11 Marca Research & Development International, Llc Systems and methods for patent reference comparison in a combined semantical-probabilistic algorithm
US20170075877A1 (en) 2015-09-16 2017-03-16 Marie-Therese LEPELTIER Methods and systems of handling patent claims
US10831762B2 (en) * 2015-11-06 2020-11-10 International Business Machines Corporation Extracting and denoising concept mentions using distributed representations of concepts
CN105260727B (zh) * 2015-11-12 2018-09-21 武汉大学 基于图像处理与序列标注的学术文献语义再结构化方法
US20180018564A1 (en) 2016-07-13 2018-01-18 Palantir Technologies Inc. Artificial intelligence-based prior art document identification system
US10255269B2 (en) 2016-12-30 2019-04-09 Microsoft Technology Licensing, Llc Graph long short term memory for syntactic relationship discovery
CN108334805B (zh) * 2017-03-08 2020-04-03 腾讯科技(深圳)有限公司 检测文档阅读顺序的方法和装置
US20180300323A1 (en) * 2017-04-17 2018-10-18 Lee & Hayes, PLLC Multi-Factor Document Analysis
US10817781B2 (en) * 2017-04-28 2020-10-27 SparkCognition, Inc. Generation of document classifiers
US10776566B2 (en) 2017-05-24 2020-09-15 Nathan J. DeVries System and method of document generation
CN107247780A (zh) * 2017-06-12 2017-10-13 北京理工大学 一种基于知识本体的专利文献相似性度量方法
CN110019806B (zh) * 2017-12-25 2021-08-06 中移动信息技术有限公司 一种文档聚类方法及设备
CN108717601B (zh) * 2018-05-08 2022-05-06 西安交通大学 一种面向企业难题的多创新方法集成与融合方法
US10891321B2 (en) * 2018-08-28 2021-01-12 American Chemical Society Systems and methods for performing a computer-implemented prior art search
FI20185865A1 (fi) * 2018-10-13 2020-04-14 Iprally Tech Oy Menetelmä luonnollisen kielen hakujärjestelmän opettamiseksi, hakujärjestelmä ja vastaava käyttö

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007102771A (ja) 2005-09-30 2007-04-19 Mitsubishi Electric Research Laboratories Inc オブジェクトのクラスの低次元モデルの集合から該クラスの特定のモデルを選択する方法
JP2011257809A (ja) 2010-06-04 2011-12-22 Toshiba Corp 文書分析装置

Also Published As

Publication number Publication date
JP2022513353A (ja) 2022-02-07
CN113196278A (zh) 2021-07-30
FI20185865A1 (fi) 2020-04-14
US20240370649A1 (en) 2024-11-07
US20210397790A1 (en) 2021-12-23
CN113196278B (zh) 2025-09-12
WO2020074788A1 (en) 2020-04-16
EP3864566A1 (en) 2021-08-18
US12039272B2 (en) 2024-07-16

Similar Documents

Publication Publication Date Title
JP7826007B2 (ja) 自然言語検索システムの訓練方法、探索システムおよび対応の使用
JP7801892B2 (ja) 自然言語文書を検索するシステム
Ristoski et al. Rdf2vec: Rdf graph embeddings for data mining
CN111159223B (zh) 一种基于结构化嵌入的交互式代码搜索方法及装置
JP2022508738A (ja) 特許文書を検索するための方法
US20230138014A1 (en) System and method for performing a search in a vector space based search engine
US12124802B2 (en) System and method for analyzing similarity of natural language data
Béchet et al. Discovering linguistic patterns using sequence mining
CN118245564B (zh) 一种支持语义查重查新的特征比对库构建方法及装置
CN116756266A (zh) 基于外部知识和主题信息的服装文本摘要生成方法
Gelman et al. A language-agnostic model for semantic source code labeling
Sun A natural language interface for querying graph databases
Dawar et al. Comparing topic modeling and named entity recognition techniques for the semantic indexing of a landscape architecture textbook
Shen et al. Practical text phylogeny for real-world settings
Vahidnia et al. Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping.
Liu et al. Feature extraction of dialogue text based on big data and machine learning
Mohemad et al. Ontological-based information extraction of construction tender documents
Alahmad Detection of Similar Text Documents Based on Self-Organizing Maps
Ďuračík et al. Using concepts of text based plagiarism detection in source code plagiarism analysis
Deng et al. Retrieval-Augmented Generation in Finance: A Comparative Evaluation of Vector, Hierarchical, and Graph-Based RAG Models
Goyal et al. Improved PAM Algorithm for Text Clustering in Data Mining
Wang et al. A Method for Automatic Code Comment Generation Based on Different Keyword Sequences
Mansoor Improving document representation using retrofitting
CN120822506A (zh) 用于生成关于文档的响应的方法、设备和计算机程序产品
Wai Classification based automatic information extraction system from free text

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20221013

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20221013

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20230901

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20231130

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20240201

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240229

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20240312

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240712

A911 Transfer to examiner for re-examination before appeal (zenchi)

Free format text: JAPANESE INTERMEDIATE CODE: A911

Effective date: 20240925

A912 Re-examination (zenchi) completed and case transferred to appeal board

Free format text: JAPANESE INTERMEDIATE CODE: A912

Effective date: 20241122

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20250710

A601 Written request for extension of time

Free format text: JAPANESE INTERMEDIATE CODE: A601

Effective date: 20250903

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20251010

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20251208

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20260225

R150 Certificate of patent or registration of utility model

Ref document number: 7826007

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150