CN110727786A - Self-learning knowledge base management method and device, terminal device and storage medium - Google Patents

Self-learning knowledge base management method and device, terminal device and storage medium Download PDF

Info

Publication number
CN110727786A
CN110727786A CN201910864630.6A CN201910864630A CN110727786A CN 110727786 A CN110727786 A CN 110727786A CN 201910864630 A CN201910864630 A CN 201910864630A CN 110727786 A CN110727786 A CN 110727786A
Authority
CN
China
Prior art keywords
network
keywords
document
idf
added
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910864630.6A
Other languages
Chinese (zh)
Inventor
王春雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Rusong Technology Co Ltd
Original Assignee
Wuhan Rusong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Rusong Technology Co Ltd filed Critical Wuhan Rusong Technology Co Ltd
Priority to CN201910864630.6A priority Critical patent/CN110727786A/en
Publication of CN110727786A publication Critical patent/CN110727786A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a self-learning knowledge base management method, a self-learning knowledge base management device, terminal equipment and a storage medium. The method comprises the following steps: receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction; establishing a TF-IDF algorithm, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of the network keywords according to the algorithm, and determining documents corresponding to the network keywords to be added according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords; the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user. The invention automatically learns knowledge through the TF-IDF algorithm and manages new knowledge, thereby saving the labor cost and improving the efficiency.

Description

Self-learning knowledge base management method and device, terminal device and storage medium
Technical Field
The invention relates to the field of computers, in particular to a self-learning knowledge base management method, a self-learning knowledge base management device, terminal equipment and a storage medium.
Background
With the deep development of information-based construction, an information system becomes a key infrastructure for processing core business of an enterprise, and a knowledge base is born, and has two meanings: one is a rule set applied by expert system design, which contains the facts and data related to the rules, and all of them form a knowledge base, and the knowledge base is related to a specific expert system, and there is no sharing problem of the knowledge base; the other refers to a knowledge base with consulting properties, which is shared and not unique to a family.
However, the learning provided by the current system is unilateral, that is, only people learn from the existing knowledge stored in the knowledge base, but cannot actively learn new knowledge, the knowledge base can only be updated passively in a manual mode, and when the knowledge is updated, technicians with relevant experience are required to manage the knowledge base, so that the human cost and the communication cost are consumed.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
In view of this, the invention provides a self-learning knowledge base management method, a self-learning knowledge base management device, a terminal device and a storage medium, and aims to solve the technical problem that the prior art cannot actively learn and manage new knowledge.
The technical scheme of the invention is realized as follows:
in one aspect, the invention provides a self-learning knowledge base management method, which comprises the following steps:
receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction;
establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result;
the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.
On the basis of the technical scheme, preferably, a TF-IDF algorithm is established, the range of the TF-IDF value of a keyword is set, the keyword corresponding to the document of the relevant document type is obtained from the network according to the document type corresponding to the keyword to be inquired, the TF-IDF value of the network keyword is calculated according to the TF-IDF algorithm, the TF-IDF value of the network keyword is compared with the range of the TF-IDF value of the keyword, when the TF-IDF value of the network keyword meets the range of the TF-IDF value of the keyword, the document corresponding to the network keyword is judged to be real and effective, and the TF-IDF value of the network keyword is calculated according to the TF-IDF algorithm; and when the TF-IDF value of the network keyword does not meet the range of the TF-IDF value of the keyword, acquiring the keyword corresponding to the document of the relevant document category from the network again.
On the basis of the technical scheme, preferably, a TF-IDF algorithm is established, keywords corresponding to documents of related document categories are obtained from a network according to the document categories corresponding to the keywords to be queried, TF-IDF values of the keywords to be queried and TF-IDF values of network keywords are calculated according to the TF-IDF algorithm, the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords are compared according to the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords, and when the TF-IDF values of the keywords to be queried are smaller than or equal to the TF-IDF values of the network keywords, the documents corresponding to the network keywords are taken as network documents to be added; and when the TF-IDF value of the keyword to be queried is greater than the TF-IDF value of the network keyword, re-acquiring the keyword corresponding to the document of the relevant document category from the network.
On the basis of the above technical solution, preferably, the TF-IDF algorithm is:
Figure BDA0002200900450000031
wherein, P (x) represents the TF-IDF value of the keyword to be queried and the network keyword, TF (x) represents the word frequency of the keyword to be queried and the network keyword, N represents the total number of documents in the network, and N (x) represents the number of documents with the keyword.
On the basis of the technical scheme, preferably, a preset language library is established, a file corresponding to the network document to be added is obtained, the content of the file corresponding to the network document to be added is retrieved according to the preset language library, the language category of the file corresponding to the network document to be added is determined, and the file corresponding to the network document to be added is stored in a knowledge base corresponding to the language category.
On the basis of the above technical solution, preferably, a file link corresponding to the network document to be added is obtained, and a document tag is extracted from the file link, where the document tag includes: the method comprises the steps of network document source websites, network document classification and network document attribution, marking network documents to be added according to document labels, and storing the network documents into a knowledge base.
On the basis of the technical scheme, preferably, a preset click quantity threshold value is set, the click quantity of the network document to be added is obtained, the click quantity of the network document to be added is compared with the preset click quantity threshold value, and when the click quantity of the network document to be added is larger than the preset click quantity threshold value, the network document to be added is stored in a knowledge base and is preferentially pushed to a user; and when the click rate of the network document to be added is smaller than a preset click rate threshold value, storing the network document to be added into a knowledge base.
Still further preferably, the self-learning knowledge base management apparatus includes:
the extraction module is used for receiving a document query instruction and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction;
the computing module is used for establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, computing TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result;
the management module is used for acquiring the file link and the file corresponding to the network document to be added, storing the file link and the file corresponding to the network document to be added into the knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.
In a second aspect, the self-learning knowledge base management method further includes a terminal device, where the terminal device includes: a memory, a processor, and a self-learning knowledge base management method program stored on the memory and executable on the processor, the self-learning knowledge base management method program configured to implement the steps of the self-learning knowledge base management method as described above.
In a third aspect, the self-learning knowledge base management method further includes a storage medium, the storage medium is a computer storage medium, the computer storage medium stores a self-learning knowledge base management method program, and the self-learning knowledge base management method program, when executed by a processor, implements the steps of the self-learning knowledge base management method as described above.
Compared with the prior art, the self-learning knowledge base management method has the following beneficial effects:
(1) the method has the advantages that the document keywords are analyzed through the TF-IDF algorithm, the text documents related to the keywords are searched from the network according to the keywords, and then the text documents are added, so that the self-learning process is realized, the self-learning process can be accurately realized according to the keywords, manual operation is not needed, the labor cost is saved, and the working efficiency is improved;
(2) through screening with the click rate, text documents with high click rate are screened from a large number of text documents, the text documents are marked, and when a user selects corresponding keywords, the marked text documents are preferentially recommended to the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a terminal device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of the self-learning knowledge base management method of the present invention;
FIG. 3 is a functional block diagram of a self-learning knowledge base management method according to a first embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the terminal device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the terminal device, and that in actual implementations the terminal device may include more or less components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a self-learning knowledge base management method program.
In the terminal device shown in fig. 1, the network interface 1004 is mainly used for establishing a communication connection between the terminal device and a server storing all data required in the self-learning knowledge base management method system; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the self-learning knowledge base management method device of the invention can be arranged in the self-learning knowledge base management method device, and the self-learning knowledge base management method device calls the self-learning knowledge base management method program stored in the memory 1005 through the processor 1001 and executes the self-learning knowledge base management method provided by the invention.
Referring to fig. 2, fig. 2 is a schematic flow chart of a self-learning knowledge base management method according to a first embodiment of the present invention.
In this embodiment, the self-learning knowledge base management method includes the following steps:
s10: and receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction.
It should be understood that, after receiving the document query instruction, the embodiment may extract the keyword to be queried and the document category corresponding to the keyword to be queried from the document query instruction, screen the corresponding document category from the database and display the document category to the user, and at the same time, acquire the document related to the document category corresponding to the keyword to be queried from the network as the document to be added and add the document to the database.
S20: establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result.
It should be understood that, in this implementation, a TF-IDF algorithm is established in advance, a range of values of the TF-IDF is set, and then documents related to the document category and corresponding keywords are obtained from the network according to the document category corresponding to the keyword to be queried. And then calculating TF-IDF values of the network keywords, comparing the TF-IDF values of the network keywords with a preset range of the TF-IDF values of the keywords for judging whether the documents corresponding to the network keywords are real and effective, only when the TF-IDF values of the network keywords meet the preset range of the TF-IDF values of the keywords, proving that the documents corresponding to the network keywords are real and effective, and otherwise, acquiring related documents and corresponding keywords from the network again.
It should be understood that after determining that the document corresponding to the network keyword is true and valid, the TF-IDF value of the keyword to be queried is calculated, then the TF-IDF value of the keyword to be queried is compared with the TF-IDF value of the keyword corresponding to the true and valid document, and the documents that can be added to the database are further screened, only when the TF-IDF value of the keyword to be queried is less than or equal to the TF-IDF value of the network keyword, indicating that the network keyword contains the keyword to be queried at this time, the document corresponding to the network keyword is taken as the network document to be added, otherwise, the keyword corresponding to the document of the relevant document category is obtained from the network again.
It should be understood that TF-IDF (Term Frequency-Inverse Document Frequency) is a commonly used weighting technique for information retrieval and data mining, TF means Term Frequency (Term Frequency) and IDF means Inverse text Frequency index (Inverse Document Frequency), and meanwhile TF-IDF is a statistical method for evaluating the importance of a word to a set of files or a set of files in a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.
It should be understood that the TF-IDF algorithm is:
wherein, P (x) represents the TF-IDF value of the keyword to be queried and the network keyword, TF (x) represents the word frequency of the keyword to be queried and the network keyword, N represents the total number of documents in the network, and N (x) represents the number of documents with the keyword.
S30: the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.
It should be understood that, in this embodiment, the method further includes the steps of obtaining a language package from the network, establishing a language library according to the language package, after obtaining a file corresponding to the network document to be added, performing, by the system, language identification on the file corresponding to the network document to be added according to the language library, and storing the file in the knowledge base of the corresponding language category according to the language corresponding to the file.
It should be understood that, in this embodiment, a file link corresponding to the added network document is also obtained, and a tag corresponding to the file is extracted from the file link, including: and then correspondingly marking the files according to the labels, and storing the files into a knowledge base of the corresponding labels of the system.
It should be understood that, in this embodiment, a label for hot pushing the to-be-added document is further made according to the click rate of the to-be-added web document, when the click rate of the to-be-added web document is greater than a preset click rate threshold, a mark for preferentially pushing the to-be-added document is given, and when a user selects a corresponding keyword, the document is preferentially recommended. The method can provide convenience for the user and help the user to find the document desired by the user more quickly.
The above description is only for illustrative purposes and does not limit the technical solutions of the present application in any way.
As can be easily found from the above description, the present embodiment extracts the keywords to be queried and the document categories corresponding to the keywords to be queried from the document query instruction by receiving the document query instruction; establishing a TF-IDF algorithm, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of the network keywords according to the algorithm, and determining documents corresponding to the network keywords to be added according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords; the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user. The embodiment automatically learns knowledge through the TF-IDF algorithm and manages new knowledge, so that the labor cost is saved and the efficiency is improved.
In addition, the embodiment of the invention also provides a self-learning knowledge base management device. As shown in fig. 3, the self-learning knowledge base management apparatus includes: the device comprises an acquisition module 10, an extraction module 20, a classification module 30 and a processing module 40.
The extracting module 10 is configured to receive a document query instruction, and extract a keyword to be queried and a document category corresponding to the keyword to be queried from the document query instruction;
the calculation module 20 is configured to establish a TF-IDF algorithm, obtain keywords corresponding to documents of related document categories from a network according to document categories corresponding to keywords to be queried, calculate TF-IDF values of the keywords to be queried and TF-IDF values of network keywords according to the TF-IDF algorithm, compare the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords, and determine documents corresponding to the network keywords to be added according to comparison results;
the management module 30 is configured to obtain a file link and a file corresponding to the network document to be added, store the file link and the file corresponding to the network document to be added in the knowledge base, generate a unique number for the network document to be added, associate the unique number with the file link and the file corresponding to the network document to be added, and display the associated unique number to the user.
In addition, it should be noted that the above-described embodiments of the apparatus are merely illustrative, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of the modules to implement the purpose of the embodiments according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not elaborated in this embodiment may refer to the self-learning knowledge base management method provided in any embodiment of the present invention, and are not described herein again.
In addition, an embodiment of the present invention further provides a storage medium, where the storage medium is a computer storage medium, and a self-learning knowledge base management method program is stored on the computer storage medium, where the self-learning knowledge base management method program, when executed by a processor, implements the following operations:
receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction;
establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result;
the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.
Further, the self-learning knowledge base management method program when executed by the processor further realizes the following operations:
establishing a TF-IDF algorithm, setting a range of TF-IDF values of keywords, acquiring keywords corresponding to documents of related document types from a network according to document types corresponding to the keywords to be inquired, calculating the TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the network keywords with the range of the TF-IDF values of the keywords, judging the documents corresponding to the network keywords to be real and effective when the TF-IDF values of the network keywords meet the range of the TF-IDF values of the keywords, and calculating the TF-IDF values of the network keywords according to the TF-IDF algorithm; and when the TF-IDF value of the network keyword does not meet the range of the TF-IDF value of the keyword, acquiring the keyword corresponding to the document of the relevant document category from the network again.
Further, the self-learning knowledge base management method program when executed by the processor further realizes the following operations:
establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired with the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and taking the documents corresponding to the network keywords as network documents to be added when the TF-IDF values of the keywords to be inquired are less than or equal to the TF-IDF values of the network keywords; and when the TF-IDF value of the keyword to be queried is greater than the TF-IDF value of the network keyword, re-acquiring the keyword corresponding to the document of the relevant document category from the network.
Further, the self-learning knowledge base management method program when executed by the processor further realizes the following operations:
the TF-IDF algorithm is as follows:
wherein, P (x) represents the TF-IDF value of the keyword to be queried and the network keyword, TF (x) represents the word frequency of the keyword to be queried and the network keyword, N represents the total number of documents in the network, and N (x) represents the number of documents with the keyword.
Further, the self-learning knowledge base management method program when executed by the processor further realizes the following operations:
establishing a preset language library, acquiring a file corresponding to a network document to be added, retrieving the content of the file corresponding to the network document to be added according to the preset language library, determining the language category of the file corresponding to the network document to be added, and storing the file corresponding to the network document to be added into a knowledge base corresponding to the language category.
Further, the self-learning knowledge base management method program when executed by the processor further realizes the following operations:
acquiring a file link corresponding to a network document to be added, and extracting a document tag from the file link, wherein the document tag comprises: the method comprises the steps of network document source websites, network document classification and network document attribution, marking network documents to be added according to document labels, and storing the network documents into a knowledge base.
Further, the self-learning knowledge base management method program when executed by the processor further realizes the following operations:
setting a preset click quantity threshold value, acquiring the click quantity of the network document to be added, comparing the click quantity of the network document to be added with the preset click quantity threshold value, and when the click quantity of the network document to be added is greater than the preset click quantity threshold value, storing the network document to be added into a knowledge base and preferentially pushing the network document to a user; and when the click rate of the network document to be added is smaller than a preset click rate threshold value, storing the network document to be added into a knowledge base.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A self-learning knowledge base management method is characterized in that: comprises the following steps;
receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction;
establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result;
the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.
2. The self-learning knowledge base management method of claim 1, wherein: establishing a TF-IDF algorithm, setting a range of TF-IDF values of keywords, acquiring keywords corresponding to documents of related document types from a network according to document types corresponding to the keywords to be inquired, calculating the TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the network keywords with the range of the TF-IDF values of the keywords, judging that the documents corresponding to the network keywords are real and effective when the TF-IDF values of the network keywords meet the range of the TF-IDF values of the keywords, and calculating the TF-IDF values of the network keywords according to the TF-IDF algorithm; and when the TF-IDF value of the network keyword does not meet the range of the TF-IDF value of the keyword, acquiring the keyword corresponding to the document of the relevant document category from the network again.
3. The self-learning knowledge base management method of claim 2, wherein: establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired with the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and taking the documents corresponding to the network keywords as the network documents to be added when the TF-IDF values of the keywords to be inquired are less than or equal to the TF-IDF values of the network keywords; and when the TF-IDF value of the keyword to be queried is greater than the TF-IDF value of the network keyword, re-acquiring the keyword corresponding to the document of the relevant document category from the network.
4. A self-learning knowledge base management method according to claim 2 or 3, characterized by: the TF-IDF algorithm is as follows:
Figure FDA0002200900440000021
wherein, P (x) represents the TF-IDF value of the keyword to be queried and the network keyword, TF (x) represents the word frequency of the keyword to be queried and the network keyword, N represents the total number of documents in the network, and N (x) represents the number of documents with the keyword.
5. The self-learning knowledge base management method of claim 1, wherein: the method comprises the following steps of establishing a preset language library, acquiring a file corresponding to the network document to be added, retrieving the content of the file corresponding to the network document to be added according to the preset language library, determining the language category of the file corresponding to the network document to be added, and storing the file corresponding to the network document to be added into a knowledge base corresponding to the language category.
6. The self-learning knowledge base management method of claim 1, wherein: the method further comprises the following steps of obtaining a file link corresponding to the network document to be added, and extracting a document tag from the file link, wherein the document tag comprises: the method comprises the steps of network document source websites, network document classification and network document attribution, marking network documents to be added according to document labels, and storing the network documents into a knowledge base.
7. The self-learning knowledge base management method of claim 1, wherein: setting a preset click quantity threshold value, acquiring the click quantity of the network document to be added, comparing the click quantity of the network document to be added with the preset click quantity threshold value, and when the click quantity of the network document to be added is greater than the preset click quantity threshold value, storing the network document to be added into a knowledge base and preferentially pushing the network document to a user; and when the click rate of the network document to be added is smaller than a preset click rate threshold value, storing the network document to be added into a knowledge base.
8. A self-learning knowledge base management apparatus, comprising:
the extraction module is used for receiving a document query instruction and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction;
the computing module is used for establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, computing TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result;
the management module is used for acquiring the file link and the file corresponding to the network document to be added, storing the file link and the file corresponding to the network document to be added into the knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.
9. A terminal device, characterized in that the terminal device comprises: memory, a processor and a self-learning knowledge base management method program stored on the memory and executable on the processor, the self-learning knowledge base management method program being configured to implement the steps of the self-learning knowledge base management method according to any of the claims 1 to 7.
10. A storage medium, characterized in that the storage medium is a computer storage medium having stored thereon a self-learning knowledge base management method program, which when executed by a processor implements the steps of the self-learning knowledge base management method according to any one of claims 1 to 7.
CN201910864630.6A 2019-09-12 2019-09-12 Self-learning knowledge base management method and device, terminal device and storage medium Pending CN110727786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910864630.6A CN110727786A (en) 2019-09-12 2019-09-12 Self-learning knowledge base management method and device, terminal device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910864630.6A CN110727786A (en) 2019-09-12 2019-09-12 Self-learning knowledge base management method and device, terminal device and storage medium

Publications (1)

Publication Number Publication Date
CN110727786A true CN110727786A (en) 2020-01-24

Family

ID=69218110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910864630.6A Pending CN110727786A (en) 2019-09-12 2019-09-12 Self-learning knowledge base management method and device, terminal device and storage medium

Country Status (1)

Country Link
CN (1) CN110727786A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101876981A (en) * 2009-04-29 2010-11-03 阿里巴巴集团控股有限公司 Method and device for establishing knowledge base
CN102402619A (en) * 2011-12-23 2012-04-04 广东威创视讯科技股份有限公司 Search method and device
CN104615724A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Establishing method of knowledge base and information search method and device based on knowledge base
CN104778268A (en) * 2015-04-23 2015-07-15 江苏省现代企业信息化应用支撑软件工程技术研发中心 Knowledge querying method
CN106407208A (en) * 2015-07-29 2017-02-15 清华大学 Establishment method and system for city management ontology knowledge base
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
CN109087132A (en) * 2018-07-18 2018-12-25 国家电网有限公司 A kind of the customer problem method for pushing and device of knowledge based map
CN109408644A (en) * 2018-09-03 2019-03-01 平安医疗健康管理股份有限公司 Knowledge base update method, apparatus, computer equipment and storage medium
CN109947921A (en) * 2019-03-19 2019-06-28 河海大学常州校区 A kind of intelligent Answer System based on natural language processing
CN109948121A (en) * 2017-12-20 2019-06-28 北京京东尚科信息技术有限公司 Article similarity method for digging, system, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101876981A (en) * 2009-04-29 2010-11-03 阿里巴巴集团控股有限公司 Method and device for establishing knowledge base
CN102402619A (en) * 2011-12-23 2012-04-04 广东威创视讯科技股份有限公司 Search method and device
CN104615724A (en) * 2015-02-06 2015-05-13 百度在线网络技术(北京)有限公司 Establishing method of knowledge base and information search method and device based on knowledge base
CN104778268A (en) * 2015-04-23 2015-07-15 江苏省现代企业信息化应用支撑软件工程技术研发中心 Knowledge querying method
CN106407208A (en) * 2015-07-29 2017-02-15 清华大学 Establishment method and system for city management ontology knowledge base
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
CN109948121A (en) * 2017-12-20 2019-06-28 北京京东尚科信息技术有限公司 Article similarity method for digging, system, equipment and storage medium
CN109087132A (en) * 2018-07-18 2018-12-25 国家电网有限公司 A kind of the customer problem method for pushing and device of knowledge based map
CN109408644A (en) * 2018-09-03 2019-03-01 平安医疗健康管理股份有限公司 Knowledge base update method, apparatus, computer equipment and storage medium
CN109947921A (en) * 2019-03-19 2019-06-28 河海大学常州校区 A kind of intelligent Answer System based on natural language processing

Similar Documents

Publication Publication Date Title
CN111666401B (en) Document recommendation method, device, computer equipment and medium based on graph structure
US11620321B2 (en) Artificial intelligence based method and apparatus for processing information
CN108345702A (en) Entity recommends method and apparatus
US10606910B2 (en) Ranking search results using machine learning based models
CN115002200B (en) Message pushing method, device, equipment and storage medium based on user portrait
CN112966081B (en) Method, device, equipment and storage medium for processing question and answer information
CN109191158B (en) Processing method and processing equipment for user portrait label data
CN111737443B (en) Answer text processing method and device and key text determining method
EP3961426A2 (en) Method and apparatus for recommending document, electronic device and medium
CN112579729A (en) Training method and device for document quality evaluation model, electronic equipment and medium
CN110737824B (en) Content query method and device
WO2021210992A9 (en) Systems and methods for determining entity attribute representations
CN111708942A (en) Multimedia resource pushing method, device, server and storage medium
KR101450453B1 (en) Method and apparatus for recommending contents
CN113326363A (en) Searching method and device, prediction model training method and device, and electronic device
CN112948526A (en) User portrait generation method and device, electronic equipment and storage medium
CN112182390A (en) Letter pushing method and device, computer equipment and storage medium
CN106575418A (en) Suggested keywords
CN109597873B (en) Corpus data processing method and device, computer readable medium and electronic equipment
CN116186198A (en) Information retrieval method, information retrieval device, computer equipment and storage medium
CN114969371A (en) Heat sorting method and device of combined knowledge graph
CN112085566B (en) Product recommendation method and device based on intelligent decision and computer equipment
CN110727786A (en) Self-learning knowledge base management method and device, terminal device and storage medium
CN109885647B (en) User history verification method, device, electronic equipment and storage medium
RU2549118C2 (en) Iterative filling of electronic glossary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200124