CN112597277A - Document query method and device, storage medium and electronic equipment - Google Patents

Document query method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112597277A
CN112597277A CN202011570077.4A CN202011570077A CN112597277A CN 112597277 A CN112597277 A CN 112597277A CN 202011570077 A CN202011570077 A CN 202011570077A CN 112597277 A CN112597277 A CN 112597277A
Authority
CN
China
Prior art keywords
phrase
document
node
phrases
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011570077.4A
Other languages
Chinese (zh)
Inventor
俞宣伊
黄荣
刘俊峰
谭文静
孙丽黎
初娜
熊浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202011570077.4A priority Critical patent/CN112597277A/en
Publication of CN112597277A publication Critical patent/CN112597277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a document query method, a document query device, a storage medium and electronic equipment, which can obtain a target phrase input by a user; obtaining similar phrases of the target phrases, and determining the similar phrases and the target phrases as phrases to be inquired; inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node; and determining the obtained document corresponding to at least one file node as a query result. The invention does not need to carry out full text query and has higher query speed.

Description

Document query method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of document query technologies, and in particular, to a document query method, an apparatus, a storage medium, and an electronic device.
Background
With the popularization of electronic offices, various documents are increasing. Users often need to query for certain documents.
The current query to the document is generally to directly perform full text query in the document according to the search term input by the user, and when a certain document includes the search term, the document is output as a query result.
However, full-text queries are slow to query.
Disclosure of Invention
The embodiment of the invention aims to provide a document query method, a document query device, a storage medium and electronic equipment, so as to improve the query speed. The specific technical scheme is as follows:
a document query method, comprising:
obtaining a target phrase input by a user;
obtaining similar phrases of the target phrases, and determining the similar phrases and the target phrases as phrases to be inquired;
inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node;
and determining the obtained document corresponding to at least one file node as a query result.
Optionally, the obtaining of the similar phrases of the target phrase includes:
obtaining a word vector of the target phrase;
and in a preset word vector dictionary in the field corresponding to the target phrase, obtaining a phrase of which the similarity with the word vector of the target phrase meets the preset similarity requirement, and determining the phrase of which the preset similarity requirement is required as the similar phrase of the target phrase.
Optionally, the pre-constructed knowledge graph is a knowledge graph of a field corresponding to the target phrase, and/or the keyword node is located in a document corresponding to a document node having a direct connection relationship with the keyword node.
Optionally, the querying a keyword node corresponding to the phrase to be queried in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be queried is queried, obtaining a document node having a direct connection relationship with the queried keyword node, includes:
using the phrases to be queried to construct a knowledge graph query statement, and executing the knowledge graph query statement, wherein the knowledge graph query statement is used for:
and inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node.
Optionally, the process of constructing the pre-constructed knowledge graph includes:
obtaining a plurality of documents;
performing word segmentation processing on the document to obtain a plurality of phrases;
removing stop words in the plurality of phrases;
extracting keywords from the plurality of phrases from which the stop words are removed through a preset keyword extraction algorithm;
establishing a triple according to the inclusion relation between the plurality of documents and the keyword;
and establishing the keyword nodes, the document nodes and the direct connection relation in a knowledge graph according to the triples.
A document querying device, comprising: a target phrase obtaining unit, a similar phrase obtaining unit, a node inquiring unit and a result determining unit,
the target phrase obtaining unit is used for obtaining a target phrase input by a user;
the similar phrase obtaining unit is used for obtaining similar phrases of the target phrases, and determining the similar phrases and the target phrases as phrases to be inquired;
the node query unit is used for querying keyword nodes corresponding to the phrases to be queried in a pre-constructed knowledge graph, and when the keyword nodes corresponding to the phrases to be queried are queried, obtaining document nodes which have direct connection relation with the queried keyword nodes;
and the result determining unit is used for determining the obtained document corresponding to the at least one file node as a query result.
Optionally, the similar phrase obtaining unit obtains a similar phrase of the target phrase, and is specifically configured to:
obtaining a word vector of the target phrase; and in a preset word vector dictionary in the field corresponding to the target phrase, obtaining a phrase of which the similarity with the word vector of the target phrase meets the preset similarity requirement, and determining the phrase of which the preset similarity requirement is required as the similar phrase of the target phrase.
Optionally, the pre-constructed knowledge graph is a knowledge graph of a field corresponding to the target phrase, and/or the keyword node is located in a document corresponding to a document node having a direct connection relationship with the keyword node.
A storage medium having stored thereon a program which, when executed by a processor, implements any of the above-described document querying methods.
An electronic device comprising at least one processor, and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute any one of the document query methods.
The document query method, the document query device, the storage medium and the electronic equipment provided by the embodiment of the invention can obtain the target phrase input by a user; obtaining similar phrases of the target phrases, and determining the similar phrases and the target phrases as phrases to be inquired; inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node; and determining the obtained document corresponding to at least one file node as a query result. The invention does not need to carry out full text query and has higher query speed. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a document query method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a knowledge-graph provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a knowledge-graph building process according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a method for implementing a document query according to the present invention based on python according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a document querying device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a document query method provided in an embodiment of the present invention may include:
and S100, obtaining a target phrase input by a user.
The target phrase input by the user may be one or more phrases, and when the target phrase is a plurality of phrases, the user may segment each phrase by a separating symbol (e.g., a space, a pause, a semicolon, a comma, etc.).
Of course, in other embodiments, the present invention may also automatically recognize the content input by the user and perform segmentation to obtain at least one target phrase. Optionally, the invention can cut the content input by the user through word segmentation technology.
And S200, obtaining similar phrases of the target phrases, and determining the similar phrases and the target phrases as phrases to be inquired.
It can be understood that, if the query is performed only through the target phrase input by the user, the query range is narrow, and the documents required to be queried cannot be effectively covered. Through the target phrase and the similar phrases, the method and the device can not only inquire the document comprising the target phrase, but also inquire the document comprising the similar phrases of the target phrase, and effectively improve the coverage and the accuracy of the inquiry result.
Optionally, obtaining a similar phrase of the target phrase may specifically include:
obtaining a word vector of a target phrase;
and in a preset word vector dictionary in the field corresponding to the target phrase, obtaining a phrase of which the similarity with the word vector of the target phrase meets the preset similarity requirement, and determining the phrase of which the similarity requirement is met as the similar phrase of the target phrase.
Optionally, the invention may obtain the Word vector of the target phrase through a Word2vec model. Word2vec is a model used to generate Word vectors, which are mapped to Word vectors by training a corpus using neural networks.
Optionally, the invention can train the Word2vec model to obtain the Word2vec model corresponding to a certain field. Optionally, the present invention may obtain a plurality of corpora in the field (e.g., obtaining corpora from encyclopedia, network data, textbooks, and papers in the field), obtain keywords in the field according to the obtained corpora, and train a Word2vec model according to the keywords.
Optionally, after obtaining the word vector of the target phrase, the present invention may convert the word vector into a csv format, so as to be loaded as a dictionary when in use, thereby improving the access efficiency.
The preset word vector dictionary can correspond to the field, can effectively improve the pertinence to the field, and further improves the query accuracy. The similarity of the word vectors may be cosine similarity. The preset similarity requirement may be: the similarity is higher than a preset threshold, or the similarity ranking is positioned at the top N.
S300, searching keyword nodes corresponding to the phrases to be searched in the pre-constructed knowledge graph, and when the keyword nodes corresponding to the phrases to be searched are searched, obtaining document nodes which have direct connection relation with the searched keyword nodes.
The pre-constructed knowledge graph of the invention can comprise: document nodes and keyword nodes. Fig. 2 is a schematic diagram of a knowledge graph according to an embodiment of the present invention. As shown in fig. 2, the document nodes and the keyword nodes may have a direct connection relationship as shown by the arrows in fig. 2, such as an arrow between the node corresponding to the first document and the node corresponding to the first keyword. The meaning of the arrow is: the first keyword is located in a first document. Optionally, one document node may be connected to a plurality of keyword nodes, and one keyword node may also be connected to a plurality of document nodes.
The invention can embody the relation between the document and the key words through the knowledge graph. Meanwhile, the expansibility of the knowledge graph is strong, so that the knowledge graph can be expanded by adding key word nodes, document nodes and connection relations.
Optionally, document nodes of the knowledge graph of the present invention may have a connection relationship before, and keyword nodes may have a connection relationship before. The connection relationship between document nodes can be various, such as: document similarity relationships, document inclusion relationships, and the like. The connection relationship between the keyword nodes can be various, such as: meaning similar relation, meaning opposite relation, meaning including and included relation, and the like.
The querying of the keyword node corresponding to the phrase to be queried in the pre-constructed knowledge graph may include:
and searching keywords similar to the phrase to be inquired in keywords corresponding to the keyword nodes contained in the pre-constructed knowledge graph, and determining the keyword nodes corresponding to the searched keywords as the keyword nodes corresponding to the phrase to be inquired.
Optionally, the pre-constructed knowledge graph is a knowledge graph of a field corresponding to the target phrase, and/or the keyword node is located in a document corresponding to a document node having a direct connection relationship with the keyword node.
Optionally, step S300 may include:
using the phrases to be queried to construct a knowledge graph query statement and executing the knowledge graph query statement, wherein the knowledge graph query statement is used for:
and inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node.
Optionally, the knowledge graph query statement in the present invention may be a SPARQL query statement.
SPARQL is a query language and data acquisition protocol that can be used for knowledge graphs, provides functionality similar to SQL statements, and implements a variety of operations based on graph algorithms.
Through the SPARQL query statement, the method and the system can quickly find the corresponding keyword node, so that the corresponding document node is found.
S400, determining the obtained document corresponding to at least one file node as a query result.
The invention can inquire through the target phrase and the similar phrase without full text retrieval, thereby effectively improving the inquiry speed.
The words we search for may only have semantically similar relationships to the keywords present in the document, while the characters do not match. The method for calculating the similarity of the terms can effectively support fuzzy query, and most possible query input terms can be covered by selecting a large corpus for training.
As shown in fig. 3, an embodiment of the present invention further provides a process for constructing a pre-constructed knowledge-graph, which may include:
s001, obtaining a plurality of documents;
s002, performing word segmentation processing on the document to obtain a plurality of phrases;
s003, removing stop words in the phrases;
s004, extracting keywords from the plurality of phrases from which the stop words are removed through a preset keyword extraction algorithm;
the preset keyword extraction algorithm may be: TF-IDF (term frequency-inverse document frequency), document topic generation model LDA (Laten Dirichlet allocation), TextRank, etc.
S005, establishing a triple according to the inclusion relation between the plurality of documents and the keyword;
s006, establishing a keyword node, a document node and a direct connection relation in the knowledge graph according to the triples.
FIG. 4 is a diagram illustrating an alternative embodiment of the present invention that provides a method for implementing a document query according to the present invention based on python.
In fig. 4, both dashed boxes need to be preprocessed, and we use a jieba tool to perform chinese word segmentation on a corpus text, and some words may not be separated during word segmentation by setting jieba. Such as "routing switch/protocol" instead of "routing/switch/protocol". The Word2vec text preprocessing stage does not need to process stop words, while the LDA text preprocessing stage needs to remove stop words (high-frequency nonsense words such as conjunctions or mood assist words) to be compared with the common chinese stop Word list. Py, we will input words that find out the corresponding words of the similar n vectors through the word vector dictionary, and then find out the keywords in the word vector dictionary by referring to the keyword set table. These keywords are then used to create a SPARQL query statement to query the Fuseki database and return the results.
Corresponding to the document query method, the embodiment of the invention also provides a document query device.
As shown in fig. 5, a document querying device provided in an embodiment of the present invention may include: a target phrase obtaining unit 100, a similar phrase obtaining unit 200, a node query unit 300 and a result determination unit 400,
a target phrase obtaining unit 100, configured to obtain a target phrase input by a user;
a similar phrase obtaining unit 200, configured to obtain a similar phrase of the target phrase, and determine the similar phrase and the target phrase as a phrase to be queried;
optionally, the similar phrase obtaining unit 200 obtains the similar phrase of the target phrase, and may be specifically configured to:
obtaining a word vector of a target phrase; and in a preset word vector dictionary in the field corresponding to the target phrase, obtaining a phrase of which the similarity with the word vector of the target phrase meets the preset similarity requirement, and determining the phrase of which the similarity requirement is met as the similar phrase of the target phrase.
The node query unit 300 is configured to query a keyword node corresponding to a phrase to be queried in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be queried is queried, obtain a document node having a direct connection relationship with the queried keyword node;
and a result determining unit 400, configured to determine a document corresponding to the obtained at least one file node as a query result.
Optionally, the pre-constructed knowledge graph is a knowledge graph of a field corresponding to the target phrase, and/or the keyword node is located in a document corresponding to a document node having a direct connection relationship with the keyword node.
Optionally, the node querying unit 300 may specifically be configured to:
using the phrases to be queried to construct a knowledge graph query statement and executing the knowledge graph query statement, wherein the knowledge graph query statement is used for:
and inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node.
Optionally, the document querying device shown in fig. 5 may further include: the map building unit is used for building the knowledge map, and comprises: a first obtaining unit, a word segmentation unit, a word removing unit, an extraction unit, a triple establishing unit and a node establishing unit,
a first obtaining unit configured to obtain a plurality of documents;
the word segmentation unit is used for carrying out word segmentation processing on the document to obtain a plurality of word groups;
the word removing unit is used for removing stop words in the plurality of word groups;
the extraction unit is used for extracting keywords from the plurality of phrases from which the stop words are removed through a preset keyword extraction algorithm;
the triple establishing unit is used for establishing a triple according to the inclusion relation between the plurality of documents and the keyword;
and the node establishing unit is used for establishing the keyword nodes, the document nodes and the direct connection relation in the knowledge graph according to the triples.
The document inquiry apparatus includes a processor and a memory, the target phrase obtaining unit 100, the similar phrase obtaining unit 200, the node inquiry unit 300, the result determining unit 400, and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can set one or more, and the document is inquired by adjusting the kernel parameters.
An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the document query method when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the document query method is executed when the program runs.
As shown in fig. 6, an embodiment of the present invention provides an electronic device 70, where the electronic device 70 includes at least one processor 701, at least one memory 702 connected to the processor 701, and a bus 703; the processor 701 and the memory 702 complete mutual communication through a bus 703; the processor 701 is configured to call program instructions in the memory 702 to perform the document query method described above. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application also provides a computer program product adapted to execute a program initialized with the steps comprised in the document querying method described above, when executed on a data processing device.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A document query method, comprising:
obtaining a target phrase input by a user;
obtaining similar phrases of the target phrases, and determining the similar phrases and the target phrases as phrases to be inquired;
inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node;
and determining the obtained document corresponding to at least one file node as a query result.
2. The method of claim 1, wherein the obtaining similar phrases of the target phrase comprises:
obtaining a word vector of the target phrase;
and in a preset word vector dictionary in the field corresponding to the target phrase, obtaining a phrase of which the similarity with the word vector of the target phrase meets the preset similarity requirement, and determining the phrase of which the preset similarity requirement is required as the similar phrase of the target phrase.
3. The method according to claim 1, wherein the pre-constructed knowledge graph is a knowledge graph of a domain corresponding to the target phrase, and/or the keyword node is located in a document corresponding to a document node having a direct connection relationship with the keyword node.
4. The method according to claim 1, wherein the querying a keyword node corresponding to the phrase to be queried in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be queried is queried, obtaining a document node having a direct connection relationship with the queried keyword node, comprises:
using the phrases to be queried to construct a knowledge graph query statement, and executing the knowledge graph query statement, wherein the knowledge graph query statement is used for:
and inquiring a keyword node corresponding to the phrase to be inquired in a pre-constructed knowledge graph, and when the keyword node corresponding to the phrase to be inquired is inquired, obtaining a document node which has a direct connection relation with the inquired keyword node.
5. The method of claim 1, wherein the pre-constructed knowledge-graph is constructed by a process comprising:
obtaining a plurality of documents;
performing word segmentation processing on the document to obtain a plurality of phrases;
removing stop words in the plurality of phrases;
extracting keywords from the plurality of phrases from which the stop words are removed through a preset keyword extraction algorithm;
establishing a triple according to the inclusion relation between the plurality of documents and the keyword;
and establishing the keyword nodes, the document nodes and the direct connection relation in a knowledge graph according to the triples.
6. A document querying device, comprising: a target phrase obtaining unit, a similar phrase obtaining unit, a node inquiring unit and a result determining unit,
the target phrase obtaining unit is used for obtaining a target phrase input by a user;
the similar phrase obtaining unit is used for obtaining similar phrases of the target phrases, and determining the similar phrases and the target phrases as phrases to be inquired;
the node query unit is used for querying keyword nodes corresponding to the phrases to be queried in a pre-constructed knowledge graph, and when the keyword nodes corresponding to the phrases to be queried are queried, obtaining document nodes which have direct connection relation with the queried keyword nodes;
and the result determining unit is used for determining the obtained document corresponding to the at least one file node as a query result.
7. The document querying device according to claim 6, wherein the similar phrase obtaining unit obtains the similar phrases of the target phrase, and is specifically configured to:
obtaining a word vector of the target phrase; and in a preset word vector dictionary in the field corresponding to the target phrase, obtaining a phrase of which the similarity with the word vector of the target phrase meets the preset similarity requirement, and determining the phrase of which the preset similarity requirement is required as the similar phrase of the target phrase.
8. The apparatus according to claim 6, wherein the pre-constructed knowledge graph is a knowledge graph of a domain corresponding to the target phrase, and/or the keyword node is located in a document corresponding to a document node having a direct connection relationship with the keyword node.
9. A storage medium on which a program is stored, the program implementing the document query method of any one of claims 1 to 5 when executed by a processor.
10. An electronic device comprising at least one processor, and at least one memory, bus connected with the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the document query method of any of claims 1 to 5.
CN202011570077.4A 2020-12-26 2020-12-26 Document query method and device, storage medium and electronic equipment Pending CN112597277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011570077.4A CN112597277A (en) 2020-12-26 2020-12-26 Document query method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011570077.4A CN112597277A (en) 2020-12-26 2020-12-26 Document query method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112597277A true CN112597277A (en) 2021-04-02

Family

ID=75202555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011570077.4A Pending CN112597277A (en) 2020-12-26 2020-12-26 Document query method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112597277A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448983A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Knowledge point processing method, knowledge point processing device, knowledge point processing server, knowledge point processing medium and knowledge point processing product

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012027788A (en) * 2010-07-26 2012-02-09 Fyuutorekku:Kk Document retrieval system, document retrieval method, and program
CN107346325A (en) * 2016-05-04 2017-11-14 中国石油集团长城钻探工程有限公司 Information query method and device
CN108427686A (en) * 2017-02-15 2018-08-21 北京国双科技有限公司 Text data querying method and device
CN108537599A (en) * 2018-04-17 2018-09-14 北京三快在线科技有限公司 Query feedback method, apparatus and storage medium based on keyword polymerization
US20190012405A1 (en) * 2017-07-10 2019-01-10 International Business Machines Corporation Unsupervised generation of knowledge learning graphs
CN109359178A (en) * 2018-09-14 2019-02-19 华南师范大学 A kind of search method, device, storage medium and equipment
CN110119473A (en) * 2019-05-23 2019-08-13 北京金山数字娱乐科技有限公司 A kind of construction method and device of file destination knowledge mapping
CN110147437A (en) * 2019-05-23 2019-08-20 北京金山数字娱乐科技有限公司 A kind of searching method and device of knowledge based map
CN110209827A (en) * 2018-02-07 2019-09-06 腾讯科技(深圳)有限公司 Searching method, device, computer readable storage medium and computer equipment
CN110717042A (en) * 2019-09-24 2020-01-21 北京工商大学 Method for constructing document-keyword heterogeneous network model
CN111209411A (en) * 2020-01-03 2020-05-29 北京明略软件系统有限公司 Document analysis method and device
CN111611356A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Information searching method and device, electronic equipment and readable storage medium
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012027788A (en) * 2010-07-26 2012-02-09 Fyuutorekku:Kk Document retrieval system, document retrieval method, and program
CN107346325A (en) * 2016-05-04 2017-11-14 中国石油集团长城钻探工程有限公司 Information query method and device
CN108427686A (en) * 2017-02-15 2018-08-21 北京国双科技有限公司 Text data querying method and device
US20190012405A1 (en) * 2017-07-10 2019-01-10 International Business Machines Corporation Unsupervised generation of knowledge learning graphs
CN110209827A (en) * 2018-02-07 2019-09-06 腾讯科技(深圳)有限公司 Searching method, device, computer readable storage medium and computer equipment
CN108537599A (en) * 2018-04-17 2018-09-14 北京三快在线科技有限公司 Query feedback method, apparatus and storage medium based on keyword polymerization
CN109359178A (en) * 2018-09-14 2019-02-19 华南师范大学 A kind of search method, device, storage medium and equipment
CN111611356A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Information searching method and device, electronic equipment and readable storage medium
CN110119473A (en) * 2019-05-23 2019-08-13 北京金山数字娱乐科技有限公司 A kind of construction method and device of file destination knowledge mapping
CN110147437A (en) * 2019-05-23 2019-08-20 北京金山数字娱乐科技有限公司 A kind of searching method and device of knowledge based map
CN110717042A (en) * 2019-09-24 2020-01-21 北京工商大学 Method for constructing document-keyword heterogeneous network model
CN111209411A (en) * 2020-01-03 2020-05-29 北京明略软件系统有限公司 Document analysis method and device
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448983A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Knowledge point processing method, knowledge point processing device, knowledge point processing server, knowledge point processing medium and knowledge point processing product
CN113448983B (en) * 2021-07-15 2024-01-30 中国银行股份有限公司 Knowledge point processing method, device, server, medium and product

Similar Documents

Publication Publication Date Title
CN109284363B (en) Question answering method and device, electronic equipment and storage medium
CN107168954B (en) Text keyword generation method and device, electronic equipment and readable storage medium
US9619571B2 (en) Method for searching related entities through entity co-occurrence
CN111159363A (en) Knowledge base-based question answer determination method and device
KR101686068B1 (en) Method and system for answer extraction using conceptual graph matching
WO2015143239A1 (en) Providing search recommendation
CN106874441A (en) Intelligent answer method and apparatus
WO2013078307A1 (en) Image searching
US20190012300A1 (en) Rule matching method and device
US11120214B2 (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN108875065B (en) Indonesia news webpage recommendation method based on content
CN110019669B (en) Text retrieval method and device
CN110362593B (en) Data query method, device, equipment and storage medium
CN112115232A (en) Data error correction method and device and server
WO2018058118A1 (en) Method, apparatus and client of processing information recommendation
US11699435B2 (en) System and method to interpret natural language requests and handle natural language responses in conversation
CN110532371B (en) Full-text retrieval method and device based on configuration management database and electronic equipment
CN116150306A (en) Training method of question-answering robot, question-answering method and device
CN113609847B (en) Information extraction method, device, electronic equipment and storage medium
CN112597277A (en) Document query method and device, storage medium and electronic equipment
CN113392305A (en) Keyword extraction method and device, electronic equipment and computer storage medium
US20170124090A1 (en) Method of discovering and exploring feature knowledge
CN112231513A (en) Learning video recommendation method, device and system
CN117112595A (en) Information query method and device, electronic equipment and storage medium
CN109684357B (en) Information processing method and device, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination