CN113196277A - 用于检索自然语言文档的系统 - Google Patents

用于检索自然语言文档的系统 Download PDF

Info

Publication number
CN113196277A
CN113196277A CN201980082810.7A CN201980082810A CN113196277A CN 113196277 A CN113196277 A CN 113196277A CN 201980082810 A CN201980082810 A CN 201980082810A CN 113196277 A CN113196277 A CN 113196277A
Authority
CN
China
Prior art keywords
graph
natural language
graphics
node
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980082810.7A
Other languages
English (en)
Chinese (zh)
Inventor
S·阿维拉
J·卡利奥
S·比约克维斯特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iprelli Technologies Ltd
Original Assignee
Iprelli Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iprelli Technologies Ltd filed Critical Iprelli Technologies Ltd
Publication of CN113196277A publication Critical patent/CN113196277A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Devices For Executing Special Programs (AREA)
CN201980082810.7A 2018-10-13 2019-10-13 用于检索自然语言文档的系统 Pending CN113196277A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FI20185863 2018-10-13
FI20185863A FI20185863A1 (fi) 2018-10-13 2018-10-13 Järjestelmä luonnollisen kielen dokumenttien hakemiseksi
PCT/FI2019/050731 WO2020074786A1 (en) 2018-10-13 2019-10-13 System for searching natural language documents

Publications (1)

Publication Number Publication Date
CN113196277A true CN113196277A (zh) 2021-07-30

Family

ID=68583451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980082810.7A Pending CN113196277A (zh) 2018-10-13 2019-10-13 用于检索自然语言文档的系统

Country Status (6)

Country Link
US (1) US20210350125A1 (https=)
EP (1) EP3864564A1 (https=)
JP (1) JP7801892B2 (https=)
CN (1) CN113196277A (https=)
FI (1) FI20185863A1 (https=)
WO (1) WO2020074786A1 (https=)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7172612B2 (ja) * 2019-01-11 2022-11-16 富士通株式会社 データ拡張プログラム、データ拡張方法およびデータ拡張装置
US20200372019A1 (en) * 2019-05-21 2020-11-26 Sisense Ltd. System and method for automatic completion of queries using natural language processing and an organizational memory
US12430335B2 (en) 2019-05-21 2025-09-30 Sisense Ltd. System and method for improved cache utilization using an organizational memory to generate a dashboard
KR20210046178A (ko) * 2019-10-18 2021-04-28 삼성전자주식회사 전자 장치 및 그 제어 방법
US11403488B2 (en) * 2020-03-19 2022-08-02 Hong Kong Applied Science and Technology Research Institute Company Limited Apparatus and method for recognizing image-based content presented in a structured layout
US11990214B2 (en) * 2020-07-21 2024-05-21 International Business Machines Corporation Handling form data errors arising from natural language processing
US11605187B1 (en) * 2020-08-18 2023-03-14 Corel Corporation Drawing function identification in graphics applications
US12541976B2 (en) * 2021-12-07 2026-02-03 Insight Direct Usa, Inc. Relationship modeling and anomaly detection based on video data
EP4542463A4 (en) 2022-06-15 2025-07-23 Fujitsu Ltd LEARNING PROGRAM, LEARNING METHOD AND INFORMATION PROCESSING DEVICE
US20230419045A1 (en) * 2022-06-24 2023-12-28 International Business Machines Corporation Generating goal-oriented dialogues from documents
US12086557B1 (en) 2023-10-06 2024-09-10 Armada Systems, Inc. Natural language statistical model with alerts
US12067041B1 (en) * 2023-10-06 2024-08-20 Armada Systems, Inc. Time series data to statistical natural language interaction

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
CN1265209A (zh) * 1997-07-22 2000-08-30 微软公司 使用自然语言处理技术用于处理文本输入的系统
CN101685455A (zh) * 2008-09-28 2010-03-31 华为技术有限公司 数据检索的方法和系统
US20150142704A1 (en) * 2013-11-20 2015-05-21 Justin London Adaptive Virtual Intelligent Agent
CN105900081A (zh) * 2013-02-19 2016-08-24 谷歌公司 基于自然语言处理的搜索
US9830315B1 (en) * 2016-07-13 2017-11-28 Xerox Corporation Sequence-based structured prediction for semantic parsing
CN107844608A (zh) * 2017-12-06 2018-03-27 湖南大学 一种基于词向量的句子相似度比较方法
US20180189269A1 (en) * 2016-12-30 2018-07-05 Microsoft Technology Licensing, Llc Graph long short term memory for syntactic relationship discovery

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223466A (ja) 2002-01-31 2003-08-08 Seiko Epson Corp 特許検索装置、特許検索装置の制御方法、制御プログラムおよび記録媒体
US10810193B1 (en) * 2013-03-13 2020-10-20 Google Llc Querying a data graph using natural language queries
JP2016110256A (ja) 2014-12-03 2016-06-20 富士ゼロックス株式会社 情報処理装置及び情報処理プログラム
US10095689B2 (en) * 2014-12-29 2018-10-09 International Business Machines Corporation Automated ontology building
US20170075877A1 (en) * 2015-09-16 2017-03-16 Marie-Therese LEPELTIER Methods and systems of handling patent claims
US10078634B2 (en) * 2015-12-30 2018-09-18 International Business Machines Corporation Visualizing and exploring natural-language text
US10891321B2 (en) * 2018-08-28 2021-01-12 American Chemical Society Systems and methods for performing a computer-implemented prior art search

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
CN1265209A (zh) * 1997-07-22 2000-08-30 微软公司 使用自然语言处理技术用于处理文本输入的系统
CN101685455A (zh) * 2008-09-28 2010-03-31 华为技术有限公司 数据检索的方法和系统
CN105900081A (zh) * 2013-02-19 2016-08-24 谷歌公司 基于自然语言处理的搜索
US20150142704A1 (en) * 2013-11-20 2015-05-21 Justin London Adaptive Virtual Intelligent Agent
US9830315B1 (en) * 2016-07-13 2017-11-28 Xerox Corporation Sequence-based structured prediction for semantic parsing
US20180189269A1 (en) * 2016-12-30 2018-07-05 Microsoft Technology Licensing, Llc Graph long short term memory for syntactic relationship discovery
CN107844608A (zh) * 2017-12-06 2018-03-27 湖南大学 一种基于词向量的句子相似度比较方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KAI SHENG TAI,ET AL: "Improved semantic representations from tree-structured long short-term memory networks", 《ARXIV》, vol. 1, 30 May 2015 (2015-05-30), pages 1556 - 1566, XP055442054, DOI: 10.3115/v1/P15-1150 *

Also Published As

Publication number Publication date
WO2020074786A1 (en) 2020-04-16
EP3864564A1 (en) 2021-08-18
JP7801892B2 (ja) 2026-01-19
JP2022508737A (ja) 2022-01-19
FI20185863A1 (fi) 2020-04-14
US20210350125A1 (en) 2021-11-11

Similar Documents

Publication Publication Date Title
US20240370649A1 (en) Method of training a natural language search system, search system and corresponding use
CN113196277A (zh) 用于检索自然语言文档的系统
CN113168499A (zh) 检索专利文档的方法
Al-Hroob et al. The use of artificial neural networks for extracting actions and actors from requirements document
Zubrinic et al. The automatic creation of concept maps from documents written using morphologically rich languages
US20230138014A1 (en) System and method for performing a search in a vector space based search engine
US12124802B2 (en) System and method for analyzing similarity of natural language data
CN114840685A (zh) 一种应急预案知识图谱构建方法
CN118551046A (zh) 一种基于大语言模型增强文档处理流程的方法
CN119396997B (zh) 大数据环境下的实时数据分析与可视化方法及系统
CN116108191B (zh) 一种基于知识图谱的深度学习模型推荐方法
CN111831624A (zh) 数据表创建方法、装置、计算机设备及存储介质
Sun A natural language interface for querying graph databases
CN117251567B (zh) 多领域知识抽取方法
CN117829140A (zh) 用于规章与法规的自动比对方法及其系统
Frasconi et al. Text categorization for multi-page documents: A hybrid naive Bayes HMM approach
CN113392183A (zh) 一种儿童范畴图谱知识的表征与计算方法
Smrz et al. Information extraction in semantic wikis
Li et al. Single Document Viewpoint Summarization based on Triangle Identification in Dependency Graph
Jiang et al. Effective use of phrases in language modeling to improve information retrieval
CN119886139A (zh) 一种co2催化领域的多层次、广类别命名实体识别方法
Jakubowski et al. Extending FrameNet to Machine Learning Domain.
CN118838997A (zh) 智能招聘平台下的智能问答方法及系统
CN120821821A (zh) 输入文本处理方法、装置、设备、存储介质及程序产品
CN119272869A (zh) 一种卷烟制丝加工领域的知识抽取方法及装置

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination