CN111339239B - Knowledge retrieval method and device, storage medium and server - Google Patents

Knowledge retrieval method and device, storage medium and server Download PDF

Info

Publication number
CN111339239B
CN111339239B CN201910510211.2A CN201910510211A CN111339239B CN 111339239 B CN111339239 B CN 111339239B CN 201910510211 A CN201910510211 A CN 201910510211A CN 111339239 B CN111339239 B CN 111339239B
Authority
CN
China
Prior art keywords
knowledge
retrieval
user
intention
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910510211.2A
Other languages
Chinese (zh)
Other versions
CN111339239A (en
Inventor
胡崇海
熊友根
王洪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haitong Securities Co ltd
Original Assignee
Haitong Securities Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haitong Securities Co ltd filed Critical Haitong Securities Co ltd
Priority to CN201910510211.2A priority Critical patent/CN111339239B/en
Publication of CN111339239A publication Critical patent/CN111339239A/en
Application granted granted Critical
Publication of CN111339239B publication Critical patent/CN111339239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Abstract

A knowledge retrieval method and device, a storage medium and a server are provided, wherein the knowledge retrieval method comprises the following steps: receiving input information of a user; identifying the retrieval intention of the user according to the input information to obtain a retrieval intention point, wherein the retrieval intention point is determined according to knowledge in a knowledge base; recombining and fusing each retrieval intention point of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user; and searching according to the searching intention point combination, and outputting a searching result. By the technical scheme of the invention, a more ideal retrieval result can be provided under a small corpus scene.

Description

Knowledge retrieval method and device, storage medium and server
Technical Field
The invention relates to the technical field of big data, in particular to a knowledge retrieval method and device, a storage medium and a server.
Background
Text knowledge retrieval is generally divided into two scenarios, large corpus retrieval and small corpus retrieval. In various professional fields, a large amount of text knowledge exists, and many corpus retrieval scenes with professional text knowledge and small knowledge amount exist. At present, the knowledge retrieval in such scenes still mainly uses traditional retrieval technologies such as distributed Search (ES) and full text retrieval (SOLAR), so that the user intention identification cannot be performed, the retrieval in a keyword form can be provided, and the retrieval quality is limited.
The existing intelligent retrieval system is mainly applied to a large corpus scene, and a neural network algorithm is adopted, so that a large amount of corpora are required to be trained to obtain a retrieval model with higher quality, and the existing intelligent retrieval system cannot be applied to a text retrieval scene under a small corpus (such as within 100 ten thousand knowledge points). The existing retrieval tool aiming at the small corpus scene has unsatisfactory retrieval effect.
Therefore, the knowledge retrieval method for the corpus scene needs to be further researched.
Disclosure of Invention
The invention solves the technical problem of providing a more ideal retrieval result under a small corpus scene.
To solve the above technical problem, an embodiment of the present invention provides a knowledge retrieval method, including: receiving input information of a user; identifying the retrieval intention of the user according to the input information to obtain a retrieval intention point, wherein the retrieval intention point is determined according to knowledge in a knowledge base; recombining and fusing each retrieval intention point of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user; and searching according to the searching intention point combination, and outputting a searching result.
Optionally, the recombining and fusing the retrieval intention points of the user includes: and recombining and fusing each retrieval intention point of the user based on graph theory or decision tree algorithm.
Optionally, the identifying the retrieval intention of the user according to the input information includes: and segmenting the input information according to the word vector and the word frequency inverse text frequency index so as to identify the retrieval intention of the user according to the segmentation result of the input information.
Optionally, the outputting the search result includes: outputting the retrieval results according to the sequence of high matching degree to low matching degree of the retrieval intention point combination; or outputting the search results according to the sequence of the occurrence time of the search results from new to old.
Optionally, the retrieving according to the retrieval intention point combination includes: and retrieving the retrieval intention point combination based on a knowledge element base, wherein the knowledge element base is constructed by a plurality of knowledge elements, and each knowledge element is obtained by segmenting paragraphs and/or clauses of a knowledge source.
Optionally, the knowledge base includes a plurality of pieces of knowledge, the knowledge is extracted from the knowledge element, and the knowledge element have an association relationship.
Optionally, the knowledge is extracted from the knowledge element by the following steps: carrying out word division on the knowledge elements to obtain a plurality of word blocks; and for the plurality of word blocks, calculating mutual information and left-right information entropy of each word block by using a word window, and cleaning the plurality of word blocks at least according to a calculation result to obtain the knowledge.
Optionally, the cleaning the word blocks according to at least the calculation result to obtain the knowledge includes: sequencing the word blocks according to the sequence of the calculation results from large to small, and taking a preset number of word blocks sequenced at the top as knowledge to be cleaned; and checking and eliminating the knowledge to be cleaned based on the knowledge in the knowledge base to obtain at least one knowledge.
In order to solve the above technical problem, an embodiment of the present invention further provides a knowledge retrieval apparatus, including: the receiving module is suitable for receiving input information of a user; the identification module is suitable for identifying the retrieval intention of the user according to the input information to obtain retrieval intention points, and the retrieval intention points are determined according to knowledge in a knowledge base; the fusion module is suitable for recombining and fusing all the retrieval intention points of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user; and the retrieval module is suitable for retrieving according to the retrieval intention point combination and outputting a retrieval result.
To solve the above technical problem, an embodiment of the present invention further provides a storage medium having stored thereon computer instructions, where the computer instructions execute the steps of the above method when executed.
In order to solve the above technical problem, an embodiment of the present invention further provides a server, including a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the above method.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a knowledge retrieval method, which comprises the following steps: receiving input information of a user; identifying the retrieval intention of the user according to the input information to obtain a retrieval intention point, wherein the retrieval intention point is determined according to knowledge in a knowledge base; recombining and fusing each retrieval intention point of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user; and searching according to the searching intention point combination, and outputting a searching result. After receiving input information of a user, the embodiment of the invention can identify the intention based on the input information to determine the retrieval intention of the user, thereby obtaining a retrieval intention point and a retrieval intention point combination which are more in line with the expected search of the user. In a small corpus scene, a retrieval result which meets the retrieval requirements of users and has high retrieval quality is easier to obtain.
Further, the recombining and fusing the retrieval intention points of the user comprises: and recombining and fusing each retrieval intention point of the user based on graph theory or decision tree algorithm. The embodiment of the invention further provides a graph theory or decision tree-based retrieval intention point combination scheme, which is favorable for obtaining a retrieval result which meets the retrieval requirements of users and has higher retrieval quality and is favorable for improving the retrieval experience of the users.
Further, the retrieving according to the retrieval intention point combination comprises: and retrieving the retrieval intention point combination based on a knowledge element base, wherein the knowledge element base is constructed by a plurality of knowledge elements, and each knowledge element is obtained by segmenting paragraphs and/or clauses of a knowledge source. By the technical scheme provided by the embodiment of the invention, the retrieval intention point combination can be retrieved based on the knowledge element library constructed by the knowledge elements, and the probability of obtaining an accurate retrieval result in a small corpus environment is further improved.
Further, the knowledge base comprises a plurality of knowledge, the knowledge is extracted from the knowledge element, and the knowledge element have an association relation. The embodiment of the invention is established on the basis of the fully-knowledgeable knowledge elements and knowledge, each knowledge has at least one part of characteristics of the knowledge elements associated with the knowledge, and the retrieval is carried out based on the characteristics, so that the retrieval speed can be accelerated, and the accuracy of the retrieval result can be further improved.
Drawings
FIG. 1 is a flow diagram of a knowledge retrieval method of an embodiment of the invention;
FIG. 2 is a schematic diagram of a knowledge retrieval architecture system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a knowledge retrieval apparatus according to an embodiment of the present invention.
Detailed Description
As will be appreciated by those skilled in the art, as the background art, the existing technical solution is either suitable for the large corpus scene or the retrieval result in the small corpus scene is not accurate enough, which reduces the retrieval experience of the user.
The big corpus retrieval scenario has a large number of corpora available for retrieval, for example, a Baidu search application belongs to the big corpus retrieval scenario. Under the scene, a large number of intelligent algorithms participate in the method, and a good application effect can be obtained.
The existing intelligent retrieval system usually adopts a neural network algorithm, and a large amount of linguistic data are required to be trained to obtain a retrieval model with higher quality, so that the system is mainly applied to a large-linguistic data scene. Because the corpus in the small corpus scene is limited, the neural network cannot be trained, and the traditional neural network technology is difficult to be adopted to develop intelligent retrieval, the existing intelligent retrieval system cannot be applied to the text retrieval scene under the small corpus (such as within 100 ten thousand knowledge points).
In a text retrieval scenario under a corpus, conventional retrieval tools, such as distributed Search (ES for short), full text retrieval (SOLR), and the like, are mainly used at present. The traditional retrieval tool mainly adopts a mode of directly establishing a label and performing inverted index on the label for retrieval, and the retrieval in the mode of the inverted index of the label only considers keywords proposed in the retrieval of a user and does not consider the text characteristics of a knowledge element, and meanwhile, Natural Language Processing (NLP) such as user intention identification cannot be performed. Therefore, the search effect and the search quality are not ideal for the smart search.
To solve the above technical problem, an embodiment of the present invention provides a knowledge retrieval method, including: receiving input information of a user; identifying the retrieval intention of the user according to the input information to obtain a retrieval intention point, wherein the retrieval intention point is determined according to knowledge in a knowledge base; recombining and fusing each retrieval intention point of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user; and searching according to the searching intention point combination, and outputting a searching result.
After receiving input information of a user, the knowledge of the embodiment of the invention can identify intentions based on the input information to determine the retrieval intention of the user, thereby obtaining a retrieval intention point and a retrieval intention point combination which are more in line with the expected search of the user. In a small corpus scene, a retrieval result which meets the retrieval requirements of a user and has high retrieval quality is easier to obtain, and the retrieval experience of the user is improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
A knowledge domain herein refers to a complete piece of textual knowledge, e.g., a complete set of rules and regulations, etc.
A knowledge element herein refers to a textual knowledge element after being cut, which is a paragraph of text containing specific knowledge, such as a regulation, a term, etc.
The corpus in this text refers to various types of text knowledge. A corpus scene refers to a scene with a small amount of knowledge of text, e.g., less than 100 ten thousand. The large corpus scene refers to a scene with a large amount of text knowledge, and generally more than 1 hundred million texts.
The knowledge in the text refers to a professional vocabulary which is obtained by segmenting a knowledge element and can represent at least part of characteristics of the knowledge element, and the knowledge element have a close association relationship. In general, knowledge meta-information associated therewith may be retrieved based on knowledge.
Information entropy in this context refers to the degree of uncertainty of the information.
Mutual information in this context refers to the degree of interdependency between two variables. Binary mutual information refers to the value of a probability function of two events occurring simultaneously.
Left-right information entropy in this context refers to the entropy of the left boundary and the entropy of the right boundary of the information.
The word vector herein refers to converting a word into a dense vector to determine the degree of similarity between words from the dense vector. For similar words, their corresponding word vectors are also similar.
Term frequency-inverse text frequency index (TF-IDF) in this context refers to a statistical method used to evaluate the importance of a word to a corpus or one of the documents in a corpus.
Fig. 1 is a flow chart of a knowledge retrieval method according to an embodiment of the present invention. The knowledge retrieval method can be executed by a server for knowledge retrieval by a user.
Specifically, the knowledge retrieval method may include the steps of:
step S101, receiving input information of a user;
step S102, identifying the retrieval intention of the user according to the input information to obtain retrieval intention points, wherein the retrieval intention points are determined according to knowledge in a knowledge base;
s103, recombining and fusing each retrieval intention point of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user;
and step S104, retrieving according to the retrieval intention point combination, and outputting a retrieval result.
More specifically, in step S101, the server may receive input information of the user. The input information refers to a sentence containing a user's retrieval intention. For example, the input information is "what is the responsibility of the baseline management department? ".
In step S102, the server may identify the search intention of the user according to the input information, so as to obtain each search intention point. Each retrieval intent point may be determined based on knowledge in a knowledge base.
For example, the input information is "what is the responsibility of the baseline management department? "the identified search intent points include [ department, baseline management, responsibility ].
In a specific implementation, the server may perform word segmentation on the input information according to a word vector and the TF-IDF, so as to identify the search intention of the user according to a word segmentation result.
In a specific implementation, to improve the accuracy of the word segmentation result of the input information, a knowledge source file (e.g., a knowledge domain) may be divided, for example, the knowledge source file is divided by paragraphs or chapters, so as to obtain knowledge elements. The term "knowledge element" refers to a unit of knowledge with complete knowledge representation that is not repartitionable. The category includes concept knowledge element, fact knowledge element, numerical knowledge element and the like.
The knowledge element may then be word partitioned to obtain a plurality of word blocks. Further, a word window can be used for calculating mutual information and left-right information entropy of each word block, and the plurality of word blocks are cleaned at least according to calculation results to obtain the knowledge.
Specifically, the word blocks may be sorted in the descending order of the calculation result, and the word blocks sorted in the top preset number are used as the knowledge to be cleaned. And then, the knowledge to be cleaned can be checked and eliminated in a manual mode to obtain at least one knowledge.
Those skilled in the art understand that the knowledge has an association relationship with the knowledge element, and has a close corresponding relationship. The knowledge is obtained by refining the fully-knowledge-processed knowledge elements, and each knowledge has at least part of characteristics of the knowledge element associated with the knowledge element, so that the retrieval based on the knowledge having the association relation with the knowledge element can accelerate the retrieval speed and improve the accuracy of the retrieval result.
Further, the knowledge may be added to a knowledge base. With the increase of knowledge in the knowledge base, the number of the knowledge to be cleaned can be reduced by comparing and removing the newly added vocabulary with the existing knowledge.
In step S103, the server may perform recombination and fusion on the respective retrieval intention points of the user to obtain a retrieval intention point combination that meets the retrieval intention of the user, so as to maximize the real retrieval intention of the user.
In specific implementation, different forms of recombination fusion can be performed on each retrieval intention point of a user, an optimal recombination sequence can be obtained through measurement and calculation, an optimal retrieval intention point combination matching the retrieval intention of the user is obtained, and further, a suboptimal retrieval intention point combination can be obtained.
In a specific implementation, the retrieval intention points of the user can be recombined and fused based on graph theory. Or, the retrieval intention points of the user can be recombined and fused based on a decision tree algorithm.
For example, if the input information is "what is the responsibility of the baseline management department? Then, based on the knowledge in the knowledge base, word segmentation is carried out, and the contents of the user's search intention point including "baseline management", "department" and "responsibility" can be obtained. After each retrieval intention point is recombined and fused, an optimal matching sequence of 'baseline management- > department- > responsibility' can be obtained, and therefore the query intention of the user can be reproduced to the maximum extent.
In step S104, the server may perform a search according to the search intention point combination. The server may be retrieved in a previously constructed repository. The knowledge element base can be constructed by a plurality of knowledge elements, and each knowledge element is obtained by segmenting paragraphs and/or clauses of a knowledge source.
Then, the server may output the search result in order of a high degree of matching to the search intention point combination.
For example, the input information is "what is the responsibility of the baseline management department? "the obtained search result includes the knowledge element which maximizes the matching sequence [ baseline management- > department- > responsibility ], and also includes the knowledge element which partially matches [ baseline management- > department ], [ baseline management- > responsibility ], [ baseline management- > department ], and the like.
In specific implementation, the server performs comprehensive calculation according to a built-in algorithm, for example, according to the directions of the maximum matching degree of the same matching sequence, the matching degree of different matching sequences, the proximity degree of different matching sequences to the user intention, and the like at the new and old time, sorts the results according to the scores of the calculation results, and sequentially presents the corresponding retrieval results to the user.
The following examples are given by way of illustration.
Fig. 2 is a schematic diagram of a knowledge retrieval architecture system according to an embodiment of the present invention. The knowledge retrieval architecture system 200 can extract and refine the knowledge domain in the text knowledge, and then intelligently identify the user intention and return the retrieval result required by the user.
Referring to fig. 2, the knowledge retrieval architecture system 200 is divided into a knowledge extraction subsystem 201 and a knowledge retrieval subsystem 202.
In the knowledge extraction subsystem 201, the knowledge source 2011 may be finely divided to obtain the knowledge elements 2012. For example, programmed automatic cutting of different types of knowledge is realized through regular matching, and in specific implementation, cutting may be performed based on paragraphs or terms, so as to obtain the knowledge elements 2012.
Then, for the knowledge element 2012, the knowledge 2013 which is present in the knowledge element 2012 and can accurately represent the content or the feature of the knowledge element 2012 can be extracted, and the knowledge 2013 has an association relationship with the knowledge element 2012. Specifically, in the knowledge extraction subsystem 201, the accuracy of extraction of the knowledge 2013 directly determines the accuracy of NLP processing.
Considering that the extraction work is too large by manual means, there are many requirements for the extraction personnel themselves, and it is not practical to extract by pure manual means. The embodiment of the invention can perform semi-automatic extraction with a small amount of manual intervention. The basic processing idea for extracting the knowledge 2013 is to cut the text step by step according to word windows with different lengths, then calculate mutual information and left and right information entropies of the cut words and the front and rear words, sort the results by combining a 3GRAM text model, and obtain the automatically extracted knowledge 2013 according to the sorted values.
In specific implementation, the mutual information and the left and right information entropies of each intercepted word block in the knowledge element 2012 can be judged through a word window, and sequencing is performed by combining a 3-element text model (3GRAM), so that words which are sequenced in front are obtained, and the knowledge 2013 is obtained.
Further, the knowledgeable element 2012 may be added to a knowledgeable element base 2014, and the knowledge 2013 extracted from the knowledgeable element 2012 may be added to a knowledgeable base 2015.
Preferably, the top ranked words can be screened manually, so as to obtain the knowledge 2013 with higher accuracy. Those skilled in the art understand that in order to enhance the precision of the extracted knowledge 2013, the accuracy of the knowledge 2013 can be improved by performing secondary rechecking and combing in a manual mode.
With the increase of the accumulated knowledge 2013, when the word blocks are obtained by being intercepted from other knowledge elements 2012 to extract the knowledge 2013, the existing knowledge 2013 in the knowledge base 2015 can be used for performing contrast elimination or stock processing word elimination on the intercepted word blocks, so that the manual intervention amount of newly added words is greatly reduced, and the generation of the newly added knowledge 2013 is accelerated.
In the knowledge retrieval subsystem 202, after receiving the input information 2021 of the user, the retrieval intention point 2022 of the user may be acquired. When the search intention point 2022 is obtained, the user intention may be reproduced by combining the word vector and the TF-IDF technique to obtain the search intention point 2022 searched by the user. Through testing, the word vector and TF-IDF technology can be used for more accurately obtaining the retrieval intention point 2022 of the user.
Further, a decision tree can be established for the obtained search intention points 2022, and intention fusion recombination is performed to calculate a search intention point combination 2023 that meets the user intention, such as an optimal intention combination, a suboptimal intention combination, and the like, within a limited time.
Further, the search may be performed on a search intention point combination 2023 such as an optimal intention combination, a suboptimal intention combination, and the like, so as to obtain a search result 2024 desired by the user, and the search result is returned to the user through a search result presentation 2025.
Further, take the knowledge source as an example of various regulations. When the search results 2024 are presented to the user, sorting may be performed according to sorting requirements. For example, the degree of matching of the combination of knowledge and the retrieval intention point, the degree of recency of knowledge, and the like. Preferably, the retrieval intention point 2022 (e.g., knowledge) may be displayed in color change when the retrieval result presentation 2025 is performed, so that the user can read the retrieval result 2024 more efficiently.
Therefore, the embodiment of the invention integrates the NLP technology into the small corpus knowledge retrieval scene, and realizes the accurate cutting and the professional knowledge extraction of the small corpus text knowledge through the technologies of mutual information in the information entropy, left and right information entropy and the like. And the extracted knowledge elements are used as a knowledge base under the small corpus scene to become a knowledge source for accurate retrieval of a user. When a user searches, the word vector and the TF-IDF technology are combined, the knowledge elements and the user searching intention are combined through knowledge, the searching intention combination of the user can be reproduced to the maximum extent, and the knowledge which really meets the searching requirement of the user can be obtained based on the searching intention combination.
The NLP intelligent retrieval framework provided by the embodiment of the invention does not need to train a large amount of text corpora, can obtain a retrieval result better than that of a traditional retrieval engine in a small corpus scene, and is suitable for professional field knowledge retrieval processing in most small corpus environments.
Fig. 3 is a schematic structural diagram of a knowledge retrieval apparatus according to an embodiment of the present invention. The knowledge retrieval device 3 may implement the method solutions shown in fig. 1 and fig. 2, and is executed by a server.
Specifically, the knowledge retrieval apparatus 3 may include: a receiving module 31 adapted to receive input information of a user; the identification module 32 is suitable for identifying the retrieval intention of the user according to the input information to obtain a retrieval intention point, and the retrieval intention point is determined according to knowledge in a knowledge base; the fusion module 33 is adapted to perform recombination fusion on each retrieval intention point of the user to obtain a retrieval intention point combination conforming to the retrieval intention of the user; and the retrieval module 34 is suitable for retrieving according to the retrieval intention point combination and outputting a retrieval result.
In a specific implementation, the fusion module 33 may include: the fusion sub-module 331 is adapted to perform recombination and fusion on each retrieval intention point of the user based on graph theory or decision tree algorithm.
In a specific implementation, the identification module 32 may include: the identifying sub-module 321 is adapted to perform word segmentation on the input information according to the word vector and the word frequency inverse text frequency index, so as to identify the search intention of the user according to the word segmentation result of the input information.
In a specific implementation, the retrieving module 34 may include: a first retrieval submodule 341 adapted to output the retrieval result in order of high matching degree to the combination of the retrieval intention points; alternatively, the second retrieving sub-module 342 is adapted to output the retrieval results in order of their occurrence time from new to old.
In a specific implementation, the retrieving module 34 may include: the third retrieving sub-module 343 is adapted to retrieve the combination of retrieval intention points based on a knowledge element library, where the knowledge element library is constructed by a plurality of knowledge elements, and each knowledge element is obtained by segmenting paragraphs and/or clauses of a knowledge source.
In a specific implementation, the knowledge base comprises a plurality of pieces of knowledge, the knowledge is extracted from the knowledge elements, and the knowledge elements have an association relationship.
In a specific implementation, the knowledge retrieval device 3 may extract the knowledge from the knowledge element by using the following steps: carrying out word division on the knowledge elements to obtain a plurality of word blocks; and for the plurality of word blocks, calculating mutual information and left-right information entropy of each word block by using a word window, and cleaning the plurality of word blocks at least according to a calculation result to obtain the knowledge.
In a specific implementation, the knowledge retrieval device 3 may sort the plurality of word blocks according to a descending order of the calculation result, and use a preset number of word blocks sorted in the front as the knowledge to be cleaned; and checking and eliminating the knowledge to be cleaned based on the knowledge in the knowledge base to obtain at least one knowledge.
For more details of the working principle and the working mode of the knowledge retrieval device 3, reference may be made to the related description of the technical solutions in fig. 1 and fig. 2, which is not repeated here.
Further, the embodiment of the present invention also discloses a storage medium, on which computer instructions are stored, and when the computer instructions are executed, the technical solution of the method described in the embodiments shown in fig. 1 and fig. 2 is executed. Preferably, the storage medium may include a computer-readable storage medium such as a non-volatile (non-volatile) memory or a non-transitory (non-transient) memory. The computer readable storage medium may include ROM, RAM, magnetic or optical disks, and the like.
Further, an embodiment of the present invention further discloses a server, which includes a memory and a processor, where the memory stores computer instructions capable of being executed on the processor, and the processor executes the computer instructions to execute the technical solutions of the methods in the embodiments shown in fig. 1 and fig. 2.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. A knowledge retrieval method is characterized in that the method is suitable for text retrieval under the scene of small corpus of less than 100 ten thousand; the method comprises the following steps:
receiving input information of a user;
identifying the retrieval intention of the user according to the input information to obtain a retrieval intention point, wherein the retrieval intention point is determined according to knowledge in a knowledge base; the knowledge base comprises a plurality of knowledge, the knowledge is obtained by performing word division on knowledge elements in the knowledge element base to obtain a plurality of word blocks, calculating mutual information and left and right information entropies of each word block by using word windows for the plurality of word blocks, and cleaning the plurality of word blocks at least according to calculation results;
recombining and fusing each retrieval intention point of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user;
retrieving the retrieval intention point combination based on the knowledge meta-base and outputting a retrieval result; the knowledge element base is constructed by a plurality of knowledge elements; each element of knowledge is obtained by segmenting paragraphs and/or clauses of the knowledge source.
2. The knowledge retrieval method of claim 1, wherein the recombining and fusing the retrieval intent points of the user comprises:
and recombining and fusing each retrieval intention point of the user based on graph theory or decision tree algorithm.
3. The knowledge retrieval method of claim 1, wherein the identifying the retrieval intention of the user according to the input information comprises:
and segmenting the input information according to the word vector and the word frequency inverse text frequency index so as to identify the retrieval intention of the user according to the segmentation result of the input information.
4. The search method according to claim 1, wherein the outputting the search result comprises: outputting the retrieval results according to the sequence of high matching degree to low matching degree of the retrieval intention point combination; alternatively, the first and second electrodes may be,
and outputting the retrieval results according to the sequence of the occurrence time of the retrieval results from new to old.
5. The knowledge retrieval method of claim 1, wherein the washing the plurality of word blocks to obtain the knowledge based at least on the calculation comprises:
sequencing the word blocks according to the sequence of the calculation results from large to small, and taking a preset number of word blocks sequenced at the top as knowledge to be cleaned;
and checking and eliminating the knowledge to be cleaned based on the knowledge in the knowledge base to obtain at least one knowledge.
6. A knowledge retrieval device is characterized by being suitable for text retrieval under the scene of 100 ten thousand or less small corpuses; the method comprises the following steps:
the receiving module is suitable for receiving input information of a user;
the identification module is suitable for identifying the retrieval intention of the user according to the input information to obtain retrieval intention points, and the retrieval intention points are determined according to knowledge in a knowledge base; the knowledge base comprises a plurality of knowledge, the knowledge is obtained by performing word division on knowledge elements in the knowledge element base to obtain a plurality of word blocks, calculating mutual information and left and right information entropies of each word block by using word windows for the plurality of word blocks, and cleaning the plurality of word blocks at least according to calculation results;
the fusion module is suitable for recombining and fusing all the retrieval intention points of the user to obtain a retrieval intention point combination which accords with the retrieval intention of the user;
the retrieval module is suitable for retrieving the retrieval intention point combination based on the knowledge meta-base and outputting a retrieval result; the knowledge element base is constructed by a plurality of knowledge elements; each element of knowledge is obtained by segmenting paragraphs and/or clauses of the knowledge source.
7. A storage medium having stored thereon computer instructions, characterized in that the computer instructions are operative to perform the steps of the method of any one of claims 1 to 5.
8. A server comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any one of claims 1 to 5.
CN201910510211.2A 2019-06-13 2019-06-13 Knowledge retrieval method and device, storage medium and server Active CN111339239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910510211.2A CN111339239B (en) 2019-06-13 2019-06-13 Knowledge retrieval method and device, storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910510211.2A CN111339239B (en) 2019-06-13 2019-06-13 Knowledge retrieval method and device, storage medium and server

Publications (2)

Publication Number Publication Date
CN111339239A CN111339239A (en) 2020-06-26
CN111339239B true CN111339239B (en) 2021-01-05

Family

ID=71183272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910510211.2A Active CN111339239B (en) 2019-06-13 2019-06-13 Knowledge retrieval method and device, storage medium and server

Country Status (1)

Country Link
CN (1) CN111339239B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567408B (en) * 2010-12-31 2014-06-04 阿里巴巴集团控股有限公司 Method and device for recommending search keyword
CN102096717B (en) * 2011-02-15 2013-01-16 百度在线网络技术(北京)有限公司 Search method and search engine
KR102280439B1 (en) * 2015-10-26 2021-07-21 에스케이텔레콤 주식회사 Apparatus for analyzing intention of query and method thereof
CN108804532B (en) * 2018-05-03 2020-06-26 腾讯科技(深圳)有限公司 Query intention mining method and device and query intention identification method and device
CN109739964A (en) * 2018-12-27 2019-05-10 北京拓尔思信息技术股份有限公司 Knowledge data providing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111339239A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
KR101508260B1 (en) Summary generation apparatus and method reflecting document feature
CN106776574B (en) User comment text mining method and device
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN108182175B (en) Text quality index obtaining method and device
CN111611356B (en) Information searching method, device, electronic equipment and readable storage medium
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN112632228A (en) Text mining-based auxiliary bid evaluation method and system
CN112597283B (en) Notification text information entity attribute extraction method, computer equipment and storage medium
CN104199965A (en) Semantic information retrieval method
CN109492081B (en) Text information searching and information interaction method, device, equipment and storage medium
CN115270738B (en) Research and report generation method, system and computer storage medium
CN110866102A (en) Search processing method
CN110688593A (en) Social media account identification method and system
CN116304020A (en) Industrial text entity extraction method based on semantic source analysis and span characteristics
CN110955767A (en) Algorithm and device for generating intention candidate set list set in robot dialogue system
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN110866086A (en) Article matching system
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
CN111339239B (en) Knowledge retrieval method and device, storage medium and server
CN111753067A (en) Innovative assessment method, device and equipment for technical background text
CN116304012A (en) Large-scale text clustering method and device
CN111062832A (en) Auxiliary analysis method and device for intelligently providing patent answer and debate opinions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant