CN112199376A

CN112199376A - Standard knowledge base management method and system based on cluster analysis

Info

Publication number: CN112199376A
Application number: CN202011224456.8A
Authority: CN
Inventors: 金震; 王兆君; 李明; 吴长征; 张先峰
Original assignee: Beijing SunwayWorld Science and Technology Co Ltd
Current assignee: Beijing SunwayWorld Science and Technology Co Ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-01-08
Anticipated expiration: 2040-11-05
Also published as: CN112199376B

Abstract

The invention discloses a standard knowledge base management method and a system based on cluster analysis, which comprises the following steps: the method comprises the steps of structuring target standard information in a target knowledge base, connecting communication connection between the target knowledge base and a target terminal of a user, obtaining a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal, carrying out online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm, obtaining a clustering result, and updating the clustering result into the target knowledge base. The working efficiency or the learning efficiency of the user is greatly improved, manual intervention is reduced, intelligent automatic clustering is realized, and the correctness and the stability of the correlation degree between the standard information are ensured. The user can accurately and quickly acquire the required learning data or working data, and the experience of the user is improved.

Description

Standard knowledge base management method and system based on cluster analysis

Technical Field

The invention relates to the technical field of knowledge base management, in particular to a standard knowledge base management method and system based on cluster analysis.

Background

At present, with the development of big data technology, more and more companies create own unique knowledge bases for storing various index data of the companies, and further, various data in the knowledge bases can be called at any time when a meeting is carried out, and meanwhile, employees of the companies can learn and absorb knowledge in the learning bases, most knowledge base systems record standard information, the formats are various, and the centralized storage system is a centralized storage system and can realize the sharing of standard documents; a tree structure directory is constructed, and the tree structure directory has the characteristic of multi-level expansion; the version of the standard can be recorded when the standard is changed, the historical version can be inquired, standard consulting control is carried out through authority control, the working process of the existing knowledge base system is to input standard information into a knowledge base, manually cluster the associated standard information, display the standard information through a tree structure directory and then provide the standard information for a user to learn or call data, but the method has the following defects: 1. the standard information is not structured before being input, so that value mining of the standard information by using some mathematical algorithms is not easy to reduce the working efficiency or learning efficiency of a user; 2. the method for determining the relevance of the standard information in a manual operation mode greatly consumes manpower, the accuracy of the result determined manually is not high, the situation that the user cannot effectively obtain the required standard information is caused, and the experience of the user is reduced.

Disclosure of Invention

Aiming at the displayed problems, the invention provides a standard knowledge base management method and system based on cluster analysis, which are used for solving the problems that the standard information is not structured before being input, so that value mining is not easy to be carried out on the standard information by using some mathematical algorithms, the working efficiency or the learning efficiency of a user is reduced, the association degree of the standard information determined by adopting a manual operation mode greatly consumes manpower, the accuracy of the result determined by the manual operation is not high, the user cannot effectively obtain the required standard information, and the experience of the user is reduced.

A standard knowledge base management method based on cluster analysis comprises the following steps:

structuring target standard information in a target knowledge base;

communicating the communication connection between the target knowledge base and a target terminal of a user, and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;

carrying out online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result;

and updating the clustering result to the target knowledge base.

Preferably, before structuring the standard information in the knowledge base, the method further comprises: constructing the target knowledge base, wherein the steps comprise:

a knowledge point identification model and a knowledge point judgment model are constructed in advance;

acquiring a plurality of preset knowledge points, inputting the preset knowledge points into the knowledge point identification model for training, and acquiring a trained knowledge point identification model;

acquiring a plurality of knowledge point judgment attributes in the trained knowledge point identification model;

training the knowledge point judgment model by using the plurality of knowledge point judgment attributes to obtain a trained knowledge point judgment model;

and constructing the target knowledge base by using the trained knowledge point identification model and the trained knowledge point judgment model.

Preferably, before structuring the standard information in the knowledge base, the method further comprises: acquiring a plurality of first standard information, checking the plurality of first standard information to obtain the target standard information, wherein the steps comprise:

checking the integrity of the plurality of first standard information;

partial second standard information in the plurality of standard information is unqualified in integrity, the partial second standard information is subjected to information perfection, and after the information perfection is finished, the partial second standard information is reclassified into the plurality of first standard information;

determining whether the first standard information which appears repeatedly exists in the plurality of pieces of first standard information, if so, deleting the first standard information which appears repeatedly to obtain a first target number of pieces of third standard information;

inputting the first target number of third standard information into the knowledge base, counting successfully stored fourth standard information, and marking unsuccessfully stored fifth standard information;

and deleting the fifth standard information, and confirming fourth standard information with a second target number as the target standard information.

Preferably, the target standard information in the target knowledge base is structured, including:

determining the text types of the target standard information with the second target quantity;

acquiring a target segmentation configuration file and a target extraction configuration file corresponding to each target standard information according to the text type of each target standard information;

determining a segmentation rule of each target segmentation configuration file;

acquiring a segmentation segment corresponding to the target segmentation rule from a target standard file corresponding to each target segmentation configuration file according to the segmentation rule of each target segmentation configuration file;

determining an extraction rule of each target extraction configuration file;

matching each extraction rule with the corresponding segmentation segment, and acquiring the structural information of each target standard information from the segmentation segment of each target standard information according to the matching result;

and generating a structured text of each target standard information according to the structured information of each target standard information.

Preferably, the switching on the communication connection between the target knowledge base and the target terminal of the user and the obtaining of the target clustering algorithm selected by the user from the target terminal among a fixed number of preset clustering algorithms includes:

verifying the position safety and the communication connection safety of the target terminal;

when the target terminal passes the verification, sending a fixed number of preset clustering algorithms to the target terminal;

receiving a target clustering algorithm fed back by the target terminal, confirming whether the target clustering algorithm is one of a fixed number of preset clustering algorithms, if so, binding the target clustering algorithm with the target knowledge base, otherwise, confirming that the target clustering algorithm is a user-defined algorithm, storing the target clustering algorithm into the fixed number of preset clustering algorithms, and simultaneously binding the target clustering algorithm with the target knowledge base.

Preferably, the predetermined clustering algorithm with a fixed number includes: hierarchical clustering algorithm, split clustering algorithm, fuzzy clustering algorithm, k-mean algorithm, k-center point algorithm and cosine theorem similarity algorithm.

Preferably, the performing online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result includes:

preprocessing each target standard information in the second target number of target standard information, and removing useless data in each target standard information;

constructing a structure body array, extracting first target standard information and second target standard information from the second target quantity of target standard information, and storing the first target standard information and the second target standard information into the structure body array, wherein the first target standard information is any target standard information in the second target quantity of target standard information, and the second target standard information is any target standard information except the first target standard information in the second target quantity of target standard information;

setting numbers for the first target standard information and the second target standard information, and marking the association state of the first target standard information and the second target standard information;

sequencing and clustering the first target standard information and the second target standard information by using a preset position coordinate rapid sequencing method and the target clustering algorithm to construct a spatial index tree;

determining a first association degree between the first target standard information and the second target standard information according to the spatial index tree;

extracting a plurality of first keywords in the first target standard information and a plurality of second keywords in the second target standard information;

determining a second degree of association between the first target standard information and the second target standard information according to the plurality of first keywords and the plurality of second keywords;

and comprehensively determining a current association coefficient between the first target standard information and the second target standard information according to the first association degree and the second association degree, comparing the current association coefficient with a preset association coefficient, determining that the clustering result is that the first target standard information and the second target standard information are associated when the current association coefficient is greater than or equal to the preset association coefficient, and determining that the clustering result is that the first target standard information and the second target standard information are not associated when the current association coefficient is less than the preset association coefficient.

Preferably, the updating the clustering result to the target knowledge base includes:

clustering target standard information in the target knowledge base according to the clustering result;

and after finishing clustering, displaying the target standard information in the target knowledge base according to the category.

Preferably, the checking the integrity of the plurality of first standard information includes:

carrying out continuous characteristic decomposition processing on each piece of first standard information;

calculating the key information ratio of each piece of first standard information:

wherein S is_iKey information ratio, m, expressed as ith first standard information_iExpressed as the embedding dimension, j, of the key information in the ith first criterion information_iThe time window function is expressed as the key information characteristic in the ith first standard information, x (t) is expressed as the time sequence for continuously decomposing the characteristics of a plurality of first standard information, and delta t is expressed as the time interval of continuously processing the characteristics, log₂Expressed as base 2 logarithm, a_iExpressing the fuzzy characteristic distribution quantity of the key information in the ith first standard information, expressing b as a preset key information utilization rate scheduling index, and expressing c_iExpressed as a feature distribution set in the ith first standard information, N is expressed as the number of the first standard information, p_iA decomposition feature set expressed as ith standard information;

calculating the complete coefficient of each first standard information according to the key information ratio of each first standard information:

wherein Q is_iExpressed as the integral coefficient, τ, of the ith first criterion information_iThe coefficient is expressed as the self-first-closing extension coefficient of the key information in the ith first standard information and takes the value of [0.3,0.8]，L_i1 L_iExpressed as the length of text, L, of key information in the ith first standard information_i2Length of text, K, expressed as ith first criterion information_iExpressed as useless interference factors in the ith first standard information, and the values are [0.05, 0.2%]And e is a natural constant with a value of 2.72;

and determining the integrity of each piece of first standard information by using a preset integrity coefficient conversion method according to the integrity coefficient of each piece of first standard information.

A standard knowledge base management system based on cluster analysis, the system comprising:

the structuring module is used for structuring the target standard information in the target knowledge base;

the acquisition module is used for connecting the communication connection between the target knowledge base and a target terminal of a user and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;

the clustering module is used for carrying out online clustering analysis on the structured target standard information in the target knowledge base by utilizing the target clustering algorithm to obtain a clustering result;

and the updating module is used for updating the clustering result to the target knowledge base.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a flowchart of a method for managing a standard knowledge base based on cluster analysis according to the present invention;

FIG. 2 is another flowchart of the standard knowledge base management method based on cluster analysis according to the present invention;

FIG. 3 is a screenshot of a workflow of a method for managing a standard knowledge base based on cluster analysis according to the present invention;

fig. 4 is a structural diagram of a standard knowledge base management system based on cluster analysis according to the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

At present, with the development of big data technology, more and more companies create own unique knowledge bases for storing various index data of the companies, and further, various data in the knowledge bases can be called at any time when a meeting is carried out, and meanwhile, employees of the companies can learn and absorb knowledge in the learning bases, most knowledge base systems record standard information, the formats are various, and the centralized storage system is a centralized storage system and can realize the sharing of standard documents; a tree structure directory is constructed, and the tree structure directory has the characteristic of multi-level expansion; the version of the standard can be recorded when the standard is changed, the historical version can be inquired, standard consulting control is carried out through authority control, the working process of the existing knowledge base system is to input standard information into a knowledge base, manually cluster the associated standard information, display the standard information through a tree structure directory and then provide the standard information for a user to learn or call data, but the method has the following defects: 1. the standard information is not structured before being input, so that value mining of the standard information by using some mathematical algorithms is not easy to reduce the working efficiency or learning efficiency of a user; 2. the method for determining the relevance of the standard information in a manual operation mode greatly consumes manpower, the accuracy of the result determined manually is not high, the situation that the user cannot effectively obtain the required standard information is caused, and the experience of the user is reduced. In order to solve the above problem, the present embodiment discloses a standard knowledge base management method based on cluster analysis.

A standard knowledge base management method based on cluster analysis, as shown in fig. 1, includes the following steps:

s101, structuring target standard information in a target knowledge base;

s102, connecting the communication connection between the target knowledge base and a target terminal of a user, and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;

s103, carrying out online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result;

and step S104, updating the clustering result to the target knowledge base.

The working principle of the technical scheme is as follows: structuring target standard information in a target knowledge base, connecting the target knowledge base with a target terminal of a user in a communication manner, acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal, performing online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm, acquiring a clustering result, and updating the clustering result into the target knowledge base.

The beneficial effects of the above technical scheme are: the standard information in the knowledge base is structured, so that a clustering algorithm can be selected for a follow-up user to cluster the standard information, and further more values of the standard information can be mined, so that the user can learn more knowledge or know more data, the working efficiency or learning efficiency of the user is greatly improved, meanwhile, manual intervention can be reduced by utilizing a target clustering algorithm to perform online clustering analysis on the standard information, intelligent automatic clustering is realized, and the accuracy and stability of the relevance between the standard information are ensured. The user can accurately and quickly acquire the required learning data or working data, and the experience of the user is improved.

In one embodiment, as shown in fig. 2, before structuring the standard information in the knowledge base, the method further comprises: constructing the target knowledge base, wherein the steps comprise:

step S201, a knowledge point identification model and a knowledge point judgment model are constructed in advance;

step S202, acquiring a plurality of preset knowledge points, inputting the preset knowledge points into the knowledge point identification model for training, and acquiring a trained knowledge point identification model;

step S203, acquiring a plurality of knowledge point judgment attributes in the trained knowledge point identification model;

step S204, training the knowledge point judgment model by using the plurality of knowledge point judgment attributes to obtain a trained knowledge point judgment model;

and S205, constructing the target knowledge base by using the trained knowledge point identification model and the trained knowledge point judgment model.

The beneficial effects of the above technical scheme are: the target database capable of intelligently judging the knowledge points can be completely constructed by constructing the knowledge point identification model and the knowledge point judgment model and then training by utilizing the preset data, so that the situation that the data in the target database is disordered and cannot be effectively classified due to the fact that useless data are stored in the target database can be effectively avoided, and the precision of standard information is guaranteed.

In one embodiment, prior to structuring the standard information in the knowledge base, the method further comprises: acquiring a plurality of first standard information, checking the plurality of first standard information to obtain the target standard information, wherein the steps comprise:

checking the integrity of the plurality of first standard information;

The beneficial effects of the above technical scheme are: the integrity of the first standard information is checked, information is perfected, quality of the standard information in the target knowledge base can be guaranteed to be over-qualified, accuracy of a subsequent intelligent clustering result can be guaranteed, further, the first standard information which repeatedly appears in the first standard information is deleted, it can be guaranteed that no repeated data exists in the subsequent intelligent clustering analysis, workload in the intelligent clustering analysis is reduced, work efficiency of the intelligent clustering analysis is improved, furthermore, useless data can be effectively removed through deleting the standard information which cannot be stored in the knowledge base, the workload in the intelligent clustering analysis is further reduced, and work efficiency of the intelligent clustering analysis is improved.

In one embodiment, structuring the target criteria information in the target repository includes:

determining a segmentation rule of each target segmentation configuration file;

determining an extraction rule of each target extraction configuration file;

The beneficial effects of the above technical scheme are: the corresponding segmentation configuration file and the extraction configuration file are called according to the text type of the target standard information, the text is segmented according to the segmentation configuration file, and the required structural information is extracted from each segmented segment according to the extraction configuration file, so that the structuralization of the target standard information of different text types can be quickly realized, and the stability is ensured.

In one embodiment, the switching on the communication connection between the target knowledge base and the target terminal of the user, and acquiring, from the target terminal, a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms includes:

The beneficial effects of the above technical scheme are: the safety of the position and the communication connection of the target terminal can be verified, so that the user terminal cannot carry Trojan horse viruses, the safety of target standard information in a knowledge base can be guaranteed, meanwhile, the user can select a preset clustering algorithm and can also customize the clustering algorithm according to the preset clustering algorithm, and the experience of the user is further improved.

In one embodiment, the fixed number of preset clustering algorithms comprises: hierarchical clustering algorithm, split clustering algorithm, fuzzy clustering algorithm, k-mean algorithm, k-center point algorithm and cosine theorem similarity algorithm.

The beneficial effects of the above technical scheme are: the method can provide samples for user-defined clustering algorithms while the user selects the algorithm of the self-centering device, so that the experience of the user is further improved, and meanwhile, the fact that subsequent algorithms are guaranteed if one algorithm cannot be used can be guaranteed by arranging a plurality of clustering algorithms, so that the follow-up intelligent clustering analysis work is effectively guaranteed.

In one embodiment, the performing online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result includes:

The beneficial effects of the above technical scheme are: all standard information in the knowledge base can be accurately clustered by establishing the spatial index tree to determine the first association degree between every two target standard information, further, the association coefficient between every two target standard information can be determined from multiple aspects by determining the second association degree between every two target standard information by using the keywords in every target standard information, and then the target standard information in the knowledge base can be rapidly clustered according to the association coefficient, so that the calculation amount is small, the calculation result is accurate, the clustering time is effectively shortened, and the clustering efficiency is improved.

In one embodiment, updating the clustering result into the target knowledge base includes:

The beneficial effects of the above technical scheme are: the clustered target standard information is displayed in the target knowledge base according to the category, so that a better visual experience can be presented for a user when the standard information is called, the user can determine that the user needs to call the target standard information according to the category in a very short time, and the experience of the user is further improved.

In one embodiment, the checking the integrity of the plurality of first criterion information includes:

The beneficial effects of the above technical scheme are: by performing the continuous feature decomposition processing on each piece of first standard information, subsequent calculation can be performed based on the decomposition feature, which is less in calculation amount and higher in calculation accuracy than calculation directly using text content, meanwhile, the importance degree of each first standard information and the composition layout of the text composition can be accurately determined by calculating the key information proportion of each first standard information, and then, the relevance index of the integrity of each first standard information and the key information can be preliminarily determined according to the key information ratio, and further, the integrity of each piece of first standard information can be determined according to the composition information of each piece of first standard information and the parameters of the key information by calculating the integrity coefficient of each piece of first standard information, the whole process is intelligently calculated by a terminal, the influence of artificial factors is avoided, and the overall stability and the accuracy of finally determining the integrity are improved.

In one embodiment, as shown in FIG. 3, includes:

1. standard structuring

Knowledge base management performed by the invention is performed based on structured standards, so that the standards need to be structured before cluster analysis is performed.

2. User switching or self-defining clustering algorithm

The user can select and define a clustering algorithm in the system. The algorithms already covered in the system comprise a hierarchical clustering algorithm, a split clustering algorithm, a fuzzy clustering algorithm, a k-mean value, a k-central point and other mainstream algorithms; the system supports similarity algorithms such as cosine theorem. The algorithm covering various clustering analyses enables the system to have strong universality.

3. Online cluster analysis

The clustering analysis process can be carried out on line without downloading the standard; the analysis process can automatically adjust the optimal threshold value to be selected by the analyzed data, automatically judge the analysis result and reject unreasonable data, and realize intelligent analysis.

In the analysis process, a similarity index can be defined to reflect the similarity, and the classification accuracy is realized through continuous optimization of the index.

4. Updating a knowledge base

After the clustering analysis is finished, the clustering result of the standard information is updated to the knowledge base, the knowledge base is continuously updated and expanded, the covered information is continuously perfect, and the value of the standard can be embodied.

The beneficial effects of the above technical scheme are: 1. the invention processes structured standard data, classifies the standards according to the knowledge element level, and exploits the value of the standards only by performing the transverse and longitudinal comparison of the standards; 2. in the standard use process, the system can automatically perform cluster analysis, reduce manual intervention and improve efficiency and accuracy.

A standard knowledge base management system based on cluster analysis, as shown in fig. 4, the system comprising:

a structuring module 401, configured to structure the target standard information in the target knowledge base;

an obtaining module 402, configured to connect the target knowledge base to a target terminal of a user, and obtain, from the target terminal, a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms;

a clustering module 403, configured to perform online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm, to obtain a clustering result;

an updating module 404, configured to update the clustering result to the target knowledge base.

The working principle and the advantageous effects of the above technical solution have been explained in the method claims, and are not described herein again.

It will be understood by those skilled in the art that the first and second terms of the present invention refer to different stages of application.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A standard knowledge base management method based on cluster analysis is characterized by comprising the following steps:

structuring target standard information in a target knowledge base;

and updating the clustering result to the target knowledge base.

2. The method of claim 1, wherein prior to structuring the standard information in the knowledge base, the method further comprises: constructing the target knowledge base, wherein the steps comprise:

3. The method of claim 1, wherein prior to structuring the standard information in the knowledge base, the method further comprises: acquiring a plurality of first standard information, checking the plurality of first standard information to obtain the target standard information, wherein the steps comprise:

checking the integrity of the plurality of first standard information;

4. The method of claim 3, wherein the step of structuring the target standard information in the target knowledge base comprises:

determining a segmentation rule of each target segmentation configuration file;

determining an extraction rule of each target extraction configuration file;

5. The method as claimed in claim 1, wherein the step of accessing the communication connection between the target knowledge base and the target terminal of the user to obtain the target clustering algorithm selected by the user from a fixed number of preset clustering algorithms comprises:

6. The method of claim 5, wherein the fixed number of predetermined clustering algorithms comprises: hierarchical clustering algorithm, split clustering algorithm, fuzzy clustering algorithm, k-mean algorithm, k-center point algorithm and cosine theorem similarity algorithm.

7. The method for managing standard knowledge base based on cluster analysis according to claim 4, wherein the performing online cluster analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result comprises:

8. The method of claim 7, wherein updating the clustered results to the target knowledge base comprises:

9. The method of claim 3, wherein the checking the integrity of the plurality of first criteria information comprises:

wherein S is_iKey information ratio, m, expressed as ith first standard information_iExpressed as the embedding dimension, j, of the key information in the ith first criterion information_iThe time window function is expressed as the key information characteristic in the ith first standard information, x (t) is expressed as the time sequence for continuously decomposing the characteristics of a plurality of first standard information, and delta t is expressed as the time interval of continuously processing the characteristics, log₂Expressed as a base 2 logarithm,a_iexpressing the fuzzy characteristic distribution quantity of the key information in the ith first standard information, expressing b as a preset key information utilization rate scheduling index, and expressing c_iExpressed as a feature distribution set in the ith first standard information, N is expressed as the number of the first standard information, p_iA decomposition feature set expressed as ith standard information;

10. A standard knowledge base management system based on cluster analysis, the system comprising: