CN110727786A

CN110727786A - Self-learning knowledge base management method and device, terminal device and storage medium

Info

Publication number: CN110727786A
Application number: CN201910864630.6A
Authority: CN
Inventors: 王春雷
Original assignee: Wuhan Rusong Technology Co Ltd
Current assignee: Wuhan Rusong Technology Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2020-01-24

Abstract

The invention provides a self-learning knowledge base management method, a self-learning knowledge base management device, terminal equipment and a storage medium. The method comprises the following steps: receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction; establishing a TF-IDF algorithm, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of the network keywords according to the algorithm, and determining documents corresponding to the network keywords to be added according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords; the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user. The invention automatically learns knowledge through the TF-IDF algorithm and manages new knowledge, thereby saving the labor cost and improving the efficiency.

Description

Self-learning knowledge base management method and device, terminal device and storage medium

Technical Field

The invention relates to the field of computers, in particular to a self-learning knowledge base management method, a self-learning knowledge base management device, terminal equipment and a storage medium.

Background

With the deep development of information-based construction, an information system becomes a key infrastructure for processing core business of an enterprise, and a knowledge base is born, and has two meanings: one is a rule set applied by expert system design, which contains the facts and data related to the rules, and all of them form a knowledge base, and the knowledge base is related to a specific expert system, and there is no sharing problem of the knowledge base; the other refers to a knowledge base with consulting properties, which is shared and not unique to a family.

However, the learning provided by the current system is unilateral, that is, only people learn from the existing knowledge stored in the knowledge base, but cannot actively learn new knowledge, the knowledge base can only be updated passively in a manual mode, and when the knowledge is updated, technicians with relevant experience are required to manage the knowledge base, so that the human cost and the communication cost are consumed.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

In view of this, the invention provides a self-learning knowledge base management method, a self-learning knowledge base management device, a terminal device and a storage medium, and aims to solve the technical problem that the prior art cannot actively learn and manage new knowledge.

The technical scheme of the invention is realized as follows:

in one aspect, the invention provides a self-learning knowledge base management method, which comprises the following steps:

receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction;

establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result;

the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.

On the basis of the technical scheme, preferably, a TF-IDF algorithm is established, the range of the TF-IDF value of a keyword is set, the keyword corresponding to the document of the relevant document type is obtained from the network according to the document type corresponding to the keyword to be inquired, the TF-IDF value of the network keyword is calculated according to the TF-IDF algorithm, the TF-IDF value of the network keyword is compared with the range of the TF-IDF value of the keyword, when the TF-IDF value of the network keyword meets the range of the TF-IDF value of the keyword, the document corresponding to the network keyword is judged to be real and effective, and the TF-IDF value of the network keyword is calculated according to the TF-IDF algorithm; and when the TF-IDF value of the network keyword does not meet the range of the TF-IDF value of the keyword, acquiring the keyword corresponding to the document of the relevant document category from the network again.

On the basis of the technical scheme, preferably, a TF-IDF algorithm is established, keywords corresponding to documents of related document categories are obtained from a network according to the document categories corresponding to the keywords to be queried, TF-IDF values of the keywords to be queried and TF-IDF values of network keywords are calculated according to the TF-IDF algorithm, the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords are compared according to the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords, and when the TF-IDF values of the keywords to be queried are smaller than or equal to the TF-IDF values of the network keywords, the documents corresponding to the network keywords are taken as network documents to be added; and when the TF-IDF value of the keyword to be queried is greater than the TF-IDF value of the network keyword, re-acquiring the keyword corresponding to the document of the relevant document category from the network.

On the basis of the above technical solution, preferably, the TF-IDF algorithm is:

wherein, P (x) represents the TF-IDF value of the keyword to be queried and the network keyword, TF (x) represents the word frequency of the keyword to be queried and the network keyword, N represents the total number of documents in the network, and N (x) represents the number of documents with the keyword.

On the basis of the technical scheme, preferably, a preset language library is established, a file corresponding to the network document to be added is obtained, the content of the file corresponding to the network document to be added is retrieved according to the preset language library, the language category of the file corresponding to the network document to be added is determined, and the file corresponding to the network document to be added is stored in a knowledge base corresponding to the language category.

On the basis of the above technical solution, preferably, a file link corresponding to the network document to be added is obtained, and a document tag is extracted from the file link, where the document tag includes: the method comprises the steps of network document source websites, network document classification and network document attribution, marking network documents to be added according to document labels, and storing the network documents into a knowledge base.

On the basis of the technical scheme, preferably, a preset click quantity threshold value is set, the click quantity of the network document to be added is obtained, the click quantity of the network document to be added is compared with the preset click quantity threshold value, and when the click quantity of the network document to be added is larger than the preset click quantity threshold value, the network document to be added is stored in a knowledge base and is preferentially pushed to a user; and when the click rate of the network document to be added is smaller than a preset click rate threshold value, storing the network document to be added into a knowledge base.

Still further preferably, the self-learning knowledge base management apparatus includes:

the extraction module is used for receiving a document query instruction and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction;

the computing module is used for establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, computing TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result;

the management module is used for acquiring the file link and the file corresponding to the network document to be added, storing the file link and the file corresponding to the network document to be added into the knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.

In a second aspect, the self-learning knowledge base management method further includes a terminal device, where the terminal device includes: a memory, a processor, and a self-learning knowledge base management method program stored on the memory and executable on the processor, the self-learning knowledge base management method program configured to implement the steps of the self-learning knowledge base management method as described above.

In a third aspect, the self-learning knowledge base management method further includes a storage medium, the storage medium is a computer storage medium, the computer storage medium stores a self-learning knowledge base management method program, and the self-learning knowledge base management method program, when executed by a processor, implements the steps of the self-learning knowledge base management method as described above.

Compared with the prior art, the self-learning knowledge base management method has the following beneficial effects:

(1) the method has the advantages that the document keywords are analyzed through the TF-IDF algorithm, the text documents related to the keywords are searched from the network according to the keywords, and then the text documents are added, so that the self-learning process is realized, the self-learning process can be accurately realized according to the keywords, manual operation is not needed, the labor cost is saved, and the working efficiency is improved;

(2) through screening with the click rate, text documents with high click rate are screened from a large number of text documents, the text documents are marked, and when a user selects corresponding keywords, the marked text documents are preferentially recommended to the user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a terminal device in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of the self-learning knowledge base management method of the present invention;

FIG. 3 is a functional block diagram of a self-learning knowledge base management method according to a first embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, the terminal device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the terminal device, and that in actual implementations the terminal device may include more or less components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a self-learning knowledge base management method program.

In the terminal device shown in fig. 1, the network interface 1004 is mainly used for establishing a communication connection between the terminal device and a server storing all data required in the self-learning knowledge base management method system; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the self-learning knowledge base management method device of the invention can be arranged in the self-learning knowledge base management method device, and the self-learning knowledge base management method device calls the self-learning knowledge base management method program stored in the memory 1005 through the processor 1001 and executes the self-learning knowledge base management method provided by the invention.

Referring to fig. 2, fig. 2 is a schematic flow chart of a self-learning knowledge base management method according to a first embodiment of the present invention.

In this embodiment, the self-learning knowledge base management method includes the following steps:

s10: and receiving a document query instruction, and extracting keywords to be queried and document categories corresponding to the keywords to be queried from the document query instruction.

It should be understood that, after receiving the document query instruction, the embodiment may extract the keyword to be queried and the document category corresponding to the keyword to be queried from the document query instruction, screen the corresponding document category from the database and display the document category to the user, and at the same time, acquire the document related to the document category corresponding to the keyword to be queried from the network as the document to be added and add the document to the database.

S20: establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and determining the documents corresponding to the network keywords to be added according to the comparison result.

It should be understood that, in this implementation, a TF-IDF algorithm is established in advance, a range of values of the TF-IDF is set, and then documents related to the document category and corresponding keywords are obtained from the network according to the document category corresponding to the keyword to be queried. And then calculating TF-IDF values of the network keywords, comparing the TF-IDF values of the network keywords with a preset range of the TF-IDF values of the keywords for judging whether the documents corresponding to the network keywords are real and effective, only when the TF-IDF values of the network keywords meet the preset range of the TF-IDF values of the keywords, proving that the documents corresponding to the network keywords are real and effective, and otherwise, acquiring related documents and corresponding keywords from the network again.

It should be understood that after determining that the document corresponding to the network keyword is true and valid, the TF-IDF value of the keyword to be queried is calculated, then the TF-IDF value of the keyword to be queried is compared with the TF-IDF value of the keyword corresponding to the true and valid document, and the documents that can be added to the database are further screened, only when the TF-IDF value of the keyword to be queried is less than or equal to the TF-IDF value of the network keyword, indicating that the network keyword contains the keyword to be queried at this time, the document corresponding to the network keyword is taken as the network document to be added, otherwise, the keyword corresponding to the document of the relevant document category is obtained from the network again.

It should be understood that TF-IDF (Term Frequency-Inverse Document Frequency) is a commonly used weighting technique for information retrieval and data mining, TF means Term Frequency (Term Frequency) and IDF means Inverse text Frequency index (Inverse Document Frequency), and meanwhile TF-IDF is a statistical method for evaluating the importance of a word to a set of files or a set of files in a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.

It should be understood that the TF-IDF algorithm is:

S30: the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user.

It should be understood that, in this embodiment, the method further includes the steps of obtaining a language package from the network, establishing a language library according to the language package, after obtaining a file corresponding to the network document to be added, performing, by the system, language identification on the file corresponding to the network document to be added according to the language library, and storing the file in the knowledge base of the corresponding language category according to the language corresponding to the file.

It should be understood that, in this embodiment, a file link corresponding to the added network document is also obtained, and a tag corresponding to the file is extracted from the file link, including: and then correspondingly marking the files according to the labels, and storing the files into a knowledge base of the corresponding labels of the system.

It should be understood that, in this embodiment, a label for hot pushing the to-be-added document is further made according to the click rate of the to-be-added web document, when the click rate of the to-be-added web document is greater than a preset click rate threshold, a mark for preferentially pushing the to-be-added document is given, and when a user selects a corresponding keyword, the document is preferentially recommended. The method can provide convenience for the user and help the user to find the document desired by the user more quickly.

The above description is only for illustrative purposes and does not limit the technical solutions of the present application in any way.

As can be easily found from the above description, the present embodiment extracts the keywords to be queried and the document categories corresponding to the keywords to be queried from the document query instruction by receiving the document query instruction; establishing a TF-IDF algorithm, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of the network keywords according to the algorithm, and determining documents corresponding to the network keywords to be added according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords; the method comprises the steps of obtaining a file link and a file corresponding to a network document to be added, storing the file link and the file corresponding to the network document to be added into a knowledge base, generating a unique number for the network document to be added, associating the unique number with the file link and the file corresponding to the network document to be added, and displaying the associated unique number to a user. The embodiment automatically learns knowledge through the TF-IDF algorithm and manages new knowledge, so that the labor cost is saved and the efficiency is improved.

In addition, the embodiment of the invention also provides a self-learning knowledge base management device. As shown in fig. 3, the self-learning knowledge base management apparatus includes: the device comprises an acquisition module 10, an extraction module 20, a classification module 30 and a processing module 40.

The extracting module 10 is configured to receive a document query instruction, and extract a keyword to be queried and a document category corresponding to the keyword to be queried from the document query instruction;

the calculation module 20 is configured to establish a TF-IDF algorithm, obtain keywords corresponding to documents of related document categories from a network according to document categories corresponding to keywords to be queried, calculate TF-IDF values of the keywords to be queried and TF-IDF values of network keywords according to the TF-IDF algorithm, compare the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be queried and the TF-IDF values of the network keywords, and determine documents corresponding to the network keywords to be added according to comparison results;

the management module 30 is configured to obtain a file link and a file corresponding to the network document to be added, store the file link and the file corresponding to the network document to be added in the knowledge base, generate a unique number for the network document to be added, associate the unique number with the file link and the file corresponding to the network document to be added, and display the associated unique number to the user.

In addition, it should be noted that the above-described embodiments of the apparatus are merely illustrative, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of the modules to implement the purpose of the embodiments according to actual needs, and the present invention is not limited herein.

In addition, the technical details that are not elaborated in this embodiment may refer to the self-learning knowledge base management method provided in any embodiment of the present invention, and are not described herein again.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium is a computer storage medium, and a self-learning knowledge base management method program is stored on the computer storage medium, where the self-learning knowledge base management method program, when executed by a processor, implements the following operations:

Further, the self-learning knowledge base management method program when executed by the processor further realizes the following operations:

establishing a TF-IDF algorithm, setting a range of TF-IDF values of keywords, acquiring keywords corresponding to documents of related document types from a network according to document types corresponding to the keywords to be inquired, calculating the TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the network keywords with the range of the TF-IDF values of the keywords, judging the documents corresponding to the network keywords to be real and effective when the TF-IDF values of the network keywords meet the range of the TF-IDF values of the keywords, and calculating the TF-IDF values of the network keywords according to the TF-IDF algorithm; and when the TF-IDF value of the network keyword does not meet the range of the TF-IDF value of the keyword, acquiring the keyword corresponding to the document of the relevant document category from the network again.

establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired with the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and taking the documents corresponding to the network keywords as network documents to be added when the TF-IDF values of the keywords to be inquired are less than or equal to the TF-IDF values of the network keywords; and when the TF-IDF value of the keyword to be queried is greater than the TF-IDF value of the network keyword, re-acquiring the keyword corresponding to the document of the relevant document category from the network.

the TF-IDF algorithm is as follows:

establishing a preset language library, acquiring a file corresponding to a network document to be added, retrieving the content of the file corresponding to the network document to be added according to the preset language library, determining the language category of the file corresponding to the network document to be added, and storing the file corresponding to the network document to be added into a knowledge base corresponding to the language category.

acquiring a file link corresponding to a network document to be added, and extracting a document tag from the file link, wherein the document tag comprises: the method comprises the steps of network document source websites, network document classification and network document attribution, marking network documents to be added according to document labels, and storing the network documents into a knowledge base.

setting a preset click quantity threshold value, acquiring the click quantity of the network document to be added, comparing the click quantity of the network document to be added with the preset click quantity threshold value, and when the click quantity of the network document to be added is greater than the preset click quantity threshold value, storing the network document to be added into a knowledge base and preferentially pushing the network document to a user; and when the click rate of the network document to be added is smaller than a preset click rate threshold value, storing the network document to be added into a knowledge base.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A self-learning knowledge base management method is characterized in that: comprises the following steps;

2. The self-learning knowledge base management method of claim 1, wherein: establishing a TF-IDF algorithm, setting a range of TF-IDF values of keywords, acquiring keywords corresponding to documents of related document types from a network according to document types corresponding to the keywords to be inquired, calculating the TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the network keywords with the range of the TF-IDF values of the keywords, judging that the documents corresponding to the network keywords are real and effective when the TF-IDF values of the network keywords meet the range of the TF-IDF values of the keywords, and calculating the TF-IDF values of the network keywords according to the TF-IDF algorithm; and when the TF-IDF value of the network keyword does not meet the range of the TF-IDF value of the keyword, acquiring the keyword corresponding to the document of the relevant document category from the network again.

3. The self-learning knowledge base management method of claim 2, wherein: establishing a TF-IDF algorithm, acquiring keywords corresponding to documents of related document categories from a network according to the document categories corresponding to the keywords to be inquired, calculating TF-IDF values of the keywords to be inquired and TF-IDF values of network keywords according to the TF-IDF algorithm, comparing the TF-IDF values of the keywords to be inquired with the TF-IDF values of the network keywords according to the TF-IDF values of the keywords to be inquired and the TF-IDF values of the network keywords, and taking the documents corresponding to the network keywords as the network documents to be added when the TF-IDF values of the keywords to be inquired are less than or equal to the TF-IDF values of the network keywords; and when the TF-IDF value of the keyword to be queried is greater than the TF-IDF value of the network keyword, re-acquiring the keyword corresponding to the document of the relevant document category from the network.

4. A self-learning knowledge base management method according to claim 2 or 3, characterized by: the TF-IDF algorithm is as follows:

5. The self-learning knowledge base management method of claim 1, wherein: the method comprises the following steps of establishing a preset language library, acquiring a file corresponding to the network document to be added, retrieving the content of the file corresponding to the network document to be added according to the preset language library, determining the language category of the file corresponding to the network document to be added, and storing the file corresponding to the network document to be added into a knowledge base corresponding to the language category.

6. The self-learning knowledge base management method of claim 1, wherein: the method further comprises the following steps of obtaining a file link corresponding to the network document to be added, and extracting a document tag from the file link, wherein the document tag comprises: the method comprises the steps of network document source websites, network document classification and network document attribution, marking network documents to be added according to document labels, and storing the network documents into a knowledge base.

7. The self-learning knowledge base management method of claim 1, wherein: setting a preset click quantity threshold value, acquiring the click quantity of the network document to be added, comparing the click quantity of the network document to be added with the preset click quantity threshold value, and when the click quantity of the network document to be added is greater than the preset click quantity threshold value, storing the network document to be added into a knowledge base and preferentially pushing the network document to a user; and when the click rate of the network document to be added is smaller than a preset click rate threshold value, storing the network document to be added into a knowledge base.

8. A self-learning knowledge base management apparatus, comprising:

9. A terminal device, characterized in that the terminal device comprises: memory, a processor and a self-learning knowledge base management method program stored on the memory and executable on the processor, the self-learning knowledge base management method program being configured to implement the steps of the self-learning knowledge base management method according to any of the claims 1 to 7.

10. A storage medium, characterized in that the storage medium is a computer storage medium having stored thereon a self-learning knowledge base management method program, which when executed by a processor implements the steps of the self-learning knowledge base management method according to any one of claims 1 to 7.