CN112199376A - Standard knowledge base management method and system based on cluster analysis - Google Patents

Standard knowledge base management method and system based on cluster analysis Download PDF

Info

Publication number
CN112199376A
CN112199376A CN202011224456.8A CN202011224456A CN112199376A CN 112199376 A CN112199376 A CN 112199376A CN 202011224456 A CN202011224456 A CN 202011224456A CN 112199376 A CN112199376 A CN 112199376A
Authority
CN
China
Prior art keywords
target
standard information
information
clustering
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011224456.8A
Other languages
Chinese (zh)
Other versions
CN112199376B (en
Inventor
金震
王兆君
李明
吴长征
张先峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SunwayWorld Science and Technology Co Ltd
Original Assignee
Beijing SunwayWorld Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SunwayWorld Science and Technology Co Ltd filed Critical Beijing SunwayWorld Science and Technology Co Ltd
Priority to CN202011224456.8A priority Critical patent/CN112199376B/en
Publication of CN112199376A publication Critical patent/CN112199376A/en
Application granted granted Critical
Publication of CN112199376B publication Critical patent/CN112199376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a standard knowledge base management method and a system based on cluster analysis, which comprises the following steps: the method comprises the steps of structuring target standard information in a target knowledge base, connecting communication connection between the target knowledge base and a target terminal of a user, obtaining a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal, carrying out online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm, obtaining a clustering result, and updating the clustering result into the target knowledge base. The working efficiency or the learning efficiency of the user is greatly improved, manual intervention is reduced, intelligent automatic clustering is realized, and the correctness and the stability of the correlation degree between the standard information are ensured. The user can accurately and quickly acquire the required learning data or working data, and the experience of the user is improved.

Description

Standard knowledge base management method and system based on cluster analysis
Technical Field
The invention relates to the technical field of knowledge base management, in particular to a standard knowledge base management method and system based on cluster analysis.
Background
At present, with the development of big data technology, more and more companies create own unique knowledge bases for storing various index data of the companies, and further, various data in the knowledge bases can be called at any time when a meeting is carried out, and meanwhile, employees of the companies can learn and absorb knowledge in the learning bases, most knowledge base systems record standard information, the formats are various, and the centralized storage system is a centralized storage system and can realize the sharing of standard documents; a tree structure directory is constructed, and the tree structure directory has the characteristic of multi-level expansion; the version of the standard can be recorded when the standard is changed, the historical version can be inquired, standard consulting control is carried out through authority control, the working process of the existing knowledge base system is to input standard information into a knowledge base, manually cluster the associated standard information, display the standard information through a tree structure directory and then provide the standard information for a user to learn or call data, but the method has the following defects: 1. the standard information is not structured before being input, so that value mining of the standard information by using some mathematical algorithms is not easy to reduce the working efficiency or learning efficiency of a user; 2. the method for determining the relevance of the standard information in a manual operation mode greatly consumes manpower, the accuracy of the result determined manually is not high, the situation that the user cannot effectively obtain the required standard information is caused, and the experience of the user is reduced.
Disclosure of Invention
Aiming at the displayed problems, the invention provides a standard knowledge base management method and system based on cluster analysis, which are used for solving the problems that the standard information is not structured before being input, so that value mining is not easy to be carried out on the standard information by using some mathematical algorithms, the working efficiency or the learning efficiency of a user is reduced, the association degree of the standard information determined by adopting a manual operation mode greatly consumes manpower, the accuracy of the result determined by the manual operation is not high, the user cannot effectively obtain the required standard information, and the experience of the user is reduced.
A standard knowledge base management method based on cluster analysis comprises the following steps:
structuring target standard information in a target knowledge base;
communicating the communication connection between the target knowledge base and a target terminal of a user, and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;
carrying out online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result;
and updating the clustering result to the target knowledge base.
Preferably, before structuring the standard information in the knowledge base, the method further comprises: constructing the target knowledge base, wherein the steps comprise:
a knowledge point identification model and a knowledge point judgment model are constructed in advance;
acquiring a plurality of preset knowledge points, inputting the preset knowledge points into the knowledge point identification model for training, and acquiring a trained knowledge point identification model;
acquiring a plurality of knowledge point judgment attributes in the trained knowledge point identification model;
training the knowledge point judgment model by using the plurality of knowledge point judgment attributes to obtain a trained knowledge point judgment model;
and constructing the target knowledge base by using the trained knowledge point identification model and the trained knowledge point judgment model.
Preferably, before structuring the standard information in the knowledge base, the method further comprises: acquiring a plurality of first standard information, checking the plurality of first standard information to obtain the target standard information, wherein the steps comprise:
checking the integrity of the plurality of first standard information;
partial second standard information in the plurality of standard information is unqualified in integrity, the partial second standard information is subjected to information perfection, and after the information perfection is finished, the partial second standard information is reclassified into the plurality of first standard information;
determining whether the first standard information which appears repeatedly exists in the plurality of pieces of first standard information, if so, deleting the first standard information which appears repeatedly to obtain a first target number of pieces of third standard information;
inputting the first target number of third standard information into the knowledge base, counting successfully stored fourth standard information, and marking unsuccessfully stored fifth standard information;
and deleting the fifth standard information, and confirming fourth standard information with a second target number as the target standard information.
Preferably, the target standard information in the target knowledge base is structured, including:
determining the text types of the target standard information with the second target quantity;
acquiring a target segmentation configuration file and a target extraction configuration file corresponding to each target standard information according to the text type of each target standard information;
determining a segmentation rule of each target segmentation configuration file;
acquiring a segmentation segment corresponding to the target segmentation rule from a target standard file corresponding to each target segmentation configuration file according to the segmentation rule of each target segmentation configuration file;
determining an extraction rule of each target extraction configuration file;
matching each extraction rule with the corresponding segmentation segment, and acquiring the structural information of each target standard information from the segmentation segment of each target standard information according to the matching result;
and generating a structured text of each target standard information according to the structured information of each target standard information.
Preferably, the switching on the communication connection between the target knowledge base and the target terminal of the user and the obtaining of the target clustering algorithm selected by the user from the target terminal among a fixed number of preset clustering algorithms includes:
verifying the position safety and the communication connection safety of the target terminal;
when the target terminal passes the verification, sending a fixed number of preset clustering algorithms to the target terminal;
receiving a target clustering algorithm fed back by the target terminal, confirming whether the target clustering algorithm is one of a fixed number of preset clustering algorithms, if so, binding the target clustering algorithm with the target knowledge base, otherwise, confirming that the target clustering algorithm is a user-defined algorithm, storing the target clustering algorithm into the fixed number of preset clustering algorithms, and simultaneously binding the target clustering algorithm with the target knowledge base.
Preferably, the predetermined clustering algorithm with a fixed number includes: hierarchical clustering algorithm, split clustering algorithm, fuzzy clustering algorithm, k-mean algorithm, k-center point algorithm and cosine theorem similarity algorithm.
Preferably, the performing online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result includes:
preprocessing each target standard information in the second target number of target standard information, and removing useless data in each target standard information;
constructing a structure body array, extracting first target standard information and second target standard information from the second target quantity of target standard information, and storing the first target standard information and the second target standard information into the structure body array, wherein the first target standard information is any target standard information in the second target quantity of target standard information, and the second target standard information is any target standard information except the first target standard information in the second target quantity of target standard information;
setting numbers for the first target standard information and the second target standard information, and marking the association state of the first target standard information and the second target standard information;
sequencing and clustering the first target standard information and the second target standard information by using a preset position coordinate rapid sequencing method and the target clustering algorithm to construct a spatial index tree;
determining a first association degree between the first target standard information and the second target standard information according to the spatial index tree;
extracting a plurality of first keywords in the first target standard information and a plurality of second keywords in the second target standard information;
determining a second degree of association between the first target standard information and the second target standard information according to the plurality of first keywords and the plurality of second keywords;
and comprehensively determining a current association coefficient between the first target standard information and the second target standard information according to the first association degree and the second association degree, comparing the current association coefficient with a preset association coefficient, determining that the clustering result is that the first target standard information and the second target standard information are associated when the current association coefficient is greater than or equal to the preset association coefficient, and determining that the clustering result is that the first target standard information and the second target standard information are not associated when the current association coefficient is less than the preset association coefficient.
Preferably, the updating the clustering result to the target knowledge base includes:
clustering target standard information in the target knowledge base according to the clustering result;
and after finishing clustering, displaying the target standard information in the target knowledge base according to the category.
Preferably, the checking the integrity of the plurality of first standard information includes:
carrying out continuous characteristic decomposition processing on each piece of first standard information;
calculating the key information ratio of each piece of first standard information:
Figure BDA0002763187350000051
wherein S isiKey information ratio, m, expressed as ith first standard informationiExpressed as the embedding dimension, j, of the key information in the ith first criterion informationiThe time window function is expressed as the key information characteristic in the ith first standard information, x (t) is expressed as the time sequence for continuously decomposing the characteristics of a plurality of first standard information, and delta t is expressed as the time interval of continuously processing the characteristics, log2Expressed as base 2 logarithm, aiExpressing the fuzzy characteristic distribution quantity of the key information in the ith first standard information, expressing b as a preset key information utilization rate scheduling index, and expressing ciExpressed as a feature distribution set in the ith first standard information, N is expressed as the number of the first standard information, piA decomposition feature set expressed as ith standard information;
calculating the complete coefficient of each first standard information according to the key information ratio of each first standard information:
Figure BDA0002763187350000052
wherein Q isiExpressed as the integral coefficient, τ, of the ith first criterion informationiThe coefficient is expressed as the self-first-closing extension coefficient of the key information in the ith first standard information and takes the value of [0.3,0.8],Li1 LiExpressed as the length of text, L, of key information in the ith first standard informationi2Length of text, K, expressed as ith first criterion informationiExpressed as useless interference factors in the ith first standard information, and the values are [0.05, 0.2%]And e is a natural constant with a value of 2.72;
and determining the integrity of each piece of first standard information by using a preset integrity coefficient conversion method according to the integrity coefficient of each piece of first standard information.
A standard knowledge base management system based on cluster analysis, the system comprising:
the structuring module is used for structuring the target standard information in the target knowledge base;
the acquisition module is used for connecting the communication connection between the target knowledge base and a target terminal of a user and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;
the clustering module is used for carrying out online clustering analysis on the structured target standard information in the target knowledge base by utilizing the target clustering algorithm to obtain a clustering result;
and the updating module is used for updating the clustering result to the target knowledge base.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flowchart of a method for managing a standard knowledge base based on cluster analysis according to the present invention;
FIG. 2 is another flowchart of the standard knowledge base management method based on cluster analysis according to the present invention;
FIG. 3 is a screenshot of a workflow of a method for managing a standard knowledge base based on cluster analysis according to the present invention;
fig. 4 is a structural diagram of a standard knowledge base management system based on cluster analysis according to the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
At present, with the development of big data technology, more and more companies create own unique knowledge bases for storing various index data of the companies, and further, various data in the knowledge bases can be called at any time when a meeting is carried out, and meanwhile, employees of the companies can learn and absorb knowledge in the learning bases, most knowledge base systems record standard information, the formats are various, and the centralized storage system is a centralized storage system and can realize the sharing of standard documents; a tree structure directory is constructed, and the tree structure directory has the characteristic of multi-level expansion; the version of the standard can be recorded when the standard is changed, the historical version can be inquired, standard consulting control is carried out through authority control, the working process of the existing knowledge base system is to input standard information into a knowledge base, manually cluster the associated standard information, display the standard information through a tree structure directory and then provide the standard information for a user to learn or call data, but the method has the following defects: 1. the standard information is not structured before being input, so that value mining of the standard information by using some mathematical algorithms is not easy to reduce the working efficiency or learning efficiency of a user; 2. the method for determining the relevance of the standard information in a manual operation mode greatly consumes manpower, the accuracy of the result determined manually is not high, the situation that the user cannot effectively obtain the required standard information is caused, and the experience of the user is reduced. In order to solve the above problem, the present embodiment discloses a standard knowledge base management method based on cluster analysis.
A standard knowledge base management method based on cluster analysis, as shown in fig. 1, includes the following steps:
s101, structuring target standard information in a target knowledge base;
s102, connecting the communication connection between the target knowledge base and a target terminal of a user, and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;
s103, carrying out online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result;
and step S104, updating the clustering result to the target knowledge base.
The working principle of the technical scheme is as follows: structuring target standard information in a target knowledge base, connecting the target knowledge base with a target terminal of a user in a communication manner, acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal, performing online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm, acquiring a clustering result, and updating the clustering result into the target knowledge base.
The beneficial effects of the above technical scheme are: the standard information in the knowledge base is structured, so that a clustering algorithm can be selected for a follow-up user to cluster the standard information, and further more values of the standard information can be mined, so that the user can learn more knowledge or know more data, the working efficiency or learning efficiency of the user is greatly improved, meanwhile, manual intervention can be reduced by utilizing a target clustering algorithm to perform online clustering analysis on the standard information, intelligent automatic clustering is realized, and the accuracy and stability of the relevance between the standard information are ensured. The user can accurately and quickly acquire the required learning data or working data, and the experience of the user is improved.
In one embodiment, as shown in fig. 2, before structuring the standard information in the knowledge base, the method further comprises: constructing the target knowledge base, wherein the steps comprise:
step S201, a knowledge point identification model and a knowledge point judgment model are constructed in advance;
step S202, acquiring a plurality of preset knowledge points, inputting the preset knowledge points into the knowledge point identification model for training, and acquiring a trained knowledge point identification model;
step S203, acquiring a plurality of knowledge point judgment attributes in the trained knowledge point identification model;
step S204, training the knowledge point judgment model by using the plurality of knowledge point judgment attributes to obtain a trained knowledge point judgment model;
and S205, constructing the target knowledge base by using the trained knowledge point identification model and the trained knowledge point judgment model.
The beneficial effects of the above technical scheme are: the target database capable of intelligently judging the knowledge points can be completely constructed by constructing the knowledge point identification model and the knowledge point judgment model and then training by utilizing the preset data, so that the situation that the data in the target database is disordered and cannot be effectively classified due to the fact that useless data are stored in the target database can be effectively avoided, and the precision of standard information is guaranteed.
In one embodiment, prior to structuring the standard information in the knowledge base, the method further comprises: acquiring a plurality of first standard information, checking the plurality of first standard information to obtain the target standard information, wherein the steps comprise:
checking the integrity of the plurality of first standard information;
partial second standard information in the plurality of standard information is unqualified in integrity, the partial second standard information is subjected to information perfection, and after the information perfection is finished, the partial second standard information is reclassified into the plurality of first standard information;
determining whether the first standard information which appears repeatedly exists in the plurality of pieces of first standard information, if so, deleting the first standard information which appears repeatedly to obtain a first target number of pieces of third standard information;
inputting the first target number of third standard information into the knowledge base, counting successfully stored fourth standard information, and marking unsuccessfully stored fifth standard information;
and deleting the fifth standard information, and confirming fourth standard information with a second target number as the target standard information.
The beneficial effects of the above technical scheme are: the integrity of the first standard information is checked, information is perfected, quality of the standard information in the target knowledge base can be guaranteed to be over-qualified, accuracy of a subsequent intelligent clustering result can be guaranteed, further, the first standard information which repeatedly appears in the first standard information is deleted, it can be guaranteed that no repeated data exists in the subsequent intelligent clustering analysis, workload in the intelligent clustering analysis is reduced, work efficiency of the intelligent clustering analysis is improved, furthermore, useless data can be effectively removed through deleting the standard information which cannot be stored in the knowledge base, the workload in the intelligent clustering analysis is further reduced, and work efficiency of the intelligent clustering analysis is improved.
In one embodiment, structuring the target criteria information in the target repository includes:
determining the text types of the target standard information with the second target quantity;
acquiring a target segmentation configuration file and a target extraction configuration file corresponding to each target standard information according to the text type of each target standard information;
determining a segmentation rule of each target segmentation configuration file;
acquiring a segmentation segment corresponding to the target segmentation rule from a target standard file corresponding to each target segmentation configuration file according to the segmentation rule of each target segmentation configuration file;
determining an extraction rule of each target extraction configuration file;
matching each extraction rule with the corresponding segmentation segment, and acquiring the structural information of each target standard information from the segmentation segment of each target standard information according to the matching result;
and generating a structured text of each target standard information according to the structured information of each target standard information.
The beneficial effects of the above technical scheme are: the corresponding segmentation configuration file and the extraction configuration file are called according to the text type of the target standard information, the text is segmented according to the segmentation configuration file, and the required structural information is extracted from each segmented segment according to the extraction configuration file, so that the structuralization of the target standard information of different text types can be quickly realized, and the stability is ensured.
In one embodiment, the switching on the communication connection between the target knowledge base and the target terminal of the user, and acquiring, from the target terminal, a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms includes:
verifying the position safety and the communication connection safety of the target terminal;
when the target terminal passes the verification, sending a fixed number of preset clustering algorithms to the target terminal;
receiving a target clustering algorithm fed back by the target terminal, confirming whether the target clustering algorithm is one of a fixed number of preset clustering algorithms, if so, binding the target clustering algorithm with the target knowledge base, otherwise, confirming that the target clustering algorithm is a user-defined algorithm, storing the target clustering algorithm into the fixed number of preset clustering algorithms, and simultaneously binding the target clustering algorithm with the target knowledge base.
The beneficial effects of the above technical scheme are: the safety of the position and the communication connection of the target terminal can be verified, so that the user terminal cannot carry Trojan horse viruses, the safety of target standard information in a knowledge base can be guaranteed, meanwhile, the user can select a preset clustering algorithm and can also customize the clustering algorithm according to the preset clustering algorithm, and the experience of the user is further improved.
In one embodiment, the fixed number of preset clustering algorithms comprises: hierarchical clustering algorithm, split clustering algorithm, fuzzy clustering algorithm, k-mean algorithm, k-center point algorithm and cosine theorem similarity algorithm.
The beneficial effects of the above technical scheme are: the method can provide samples for user-defined clustering algorithms while the user selects the algorithm of the self-centering device, so that the experience of the user is further improved, and meanwhile, the fact that subsequent algorithms are guaranteed if one algorithm cannot be used can be guaranteed by arranging a plurality of clustering algorithms, so that the follow-up intelligent clustering analysis work is effectively guaranteed.
In one embodiment, the performing online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result includes:
preprocessing each target standard information in the second target number of target standard information, and removing useless data in each target standard information;
constructing a structure body array, extracting first target standard information and second target standard information from the second target quantity of target standard information, and storing the first target standard information and the second target standard information into the structure body array, wherein the first target standard information is any target standard information in the second target quantity of target standard information, and the second target standard information is any target standard information except the first target standard information in the second target quantity of target standard information;
setting numbers for the first target standard information and the second target standard information, and marking the association state of the first target standard information and the second target standard information;
sequencing and clustering the first target standard information and the second target standard information by using a preset position coordinate rapid sequencing method and the target clustering algorithm to construct a spatial index tree;
determining a first association degree between the first target standard information and the second target standard information according to the spatial index tree;
extracting a plurality of first keywords in the first target standard information and a plurality of second keywords in the second target standard information;
determining a second degree of association between the first target standard information and the second target standard information according to the plurality of first keywords and the plurality of second keywords;
and comprehensively determining a current association coefficient between the first target standard information and the second target standard information according to the first association degree and the second association degree, comparing the current association coefficient with a preset association coefficient, determining that the clustering result is that the first target standard information and the second target standard information are associated when the current association coefficient is greater than or equal to the preset association coefficient, and determining that the clustering result is that the first target standard information and the second target standard information are not associated when the current association coefficient is less than the preset association coefficient.
The beneficial effects of the above technical scheme are: all standard information in the knowledge base can be accurately clustered by establishing the spatial index tree to determine the first association degree between every two target standard information, further, the association coefficient between every two target standard information can be determined from multiple aspects by determining the second association degree between every two target standard information by using the keywords in every target standard information, and then the target standard information in the knowledge base can be rapidly clustered according to the association coefficient, so that the calculation amount is small, the calculation result is accurate, the clustering time is effectively shortened, and the clustering efficiency is improved.
In one embodiment, updating the clustering result into the target knowledge base includes:
clustering target standard information in the target knowledge base according to the clustering result;
and after finishing clustering, displaying the target standard information in the target knowledge base according to the category.
The beneficial effects of the above technical scheme are: the clustered target standard information is displayed in the target knowledge base according to the category, so that a better visual experience can be presented for a user when the standard information is called, the user can determine that the user needs to call the target standard information according to the category in a very short time, and the experience of the user is further improved.
In one embodiment, the checking the integrity of the plurality of first criterion information includes:
carrying out continuous characteristic decomposition processing on each piece of first standard information;
calculating the key information ratio of each piece of first standard information:
Figure BDA0002763187350000131
wherein S isiKey information ratio, m, expressed as ith first standard informationiExpressed as the embedding dimension, j, of the key information in the ith first criterion informationiThe time window function is expressed as the key information characteristic in the ith first standard information, x (t) is expressed as the time sequence for continuously decomposing the characteristics of a plurality of first standard information, and delta t is expressed as the time interval of continuously processing the characteristics, log2Expressed as base 2 logarithm, aiExpressing the fuzzy characteristic distribution quantity of the key information in the ith first standard information, expressing b as a preset key information utilization rate scheduling index, and expressing ciExpressed as a feature distribution set in the ith first standard information, N is expressed as the number of the first standard information, piA decomposition feature set expressed as ith standard information;
calculating the complete coefficient of each first standard information according to the key information ratio of each first standard information:
Figure BDA0002763187350000132
wherein Q isiExpressed as the integral coefficient, τ, of the ith first criterion informationiThe coefficient is expressed as the self-first-closing extension coefficient of the key information in the ith first standard information and takes the value of [0.3,0.8],Li1 LiExpressed as the length of text, L, of key information in the ith first standard informationi2Length of text, K, expressed as ith first criterion informationiExpressed as useless interference factors in the ith first standard information, and the values are [0.05, 0.2%]And e is a natural constant with a value of 2.72;
and determining the integrity of each piece of first standard information by using a preset integrity coefficient conversion method according to the integrity coefficient of each piece of first standard information.
The beneficial effects of the above technical scheme are: by performing the continuous feature decomposition processing on each piece of first standard information, subsequent calculation can be performed based on the decomposition feature, which is less in calculation amount and higher in calculation accuracy than calculation directly using text content, meanwhile, the importance degree of each first standard information and the composition layout of the text composition can be accurately determined by calculating the key information proportion of each first standard information, and then, the relevance index of the integrity of each first standard information and the key information can be preliminarily determined according to the key information ratio, and further, the integrity of each piece of first standard information can be determined according to the composition information of each piece of first standard information and the parameters of the key information by calculating the integrity coefficient of each piece of first standard information, the whole process is intelligently calculated by a terminal, the influence of artificial factors is avoided, and the overall stability and the accuracy of finally determining the integrity are improved.
In one embodiment, as shown in FIG. 3, includes:
1. standard structuring
Knowledge base management performed by the invention is performed based on structured standards, so that the standards need to be structured before cluster analysis is performed.
2. User switching or self-defining clustering algorithm
The user can select and define a clustering algorithm in the system. The algorithms already covered in the system comprise a hierarchical clustering algorithm, a split clustering algorithm, a fuzzy clustering algorithm, a k-mean value, a k-central point and other mainstream algorithms; the system supports similarity algorithms such as cosine theorem. The algorithm covering various clustering analyses enables the system to have strong universality.
3. Online cluster analysis
The clustering analysis process can be carried out on line without downloading the standard; the analysis process can automatically adjust the optimal threshold value to be selected by the analyzed data, automatically judge the analysis result and reject unreasonable data, and realize intelligent analysis.
In the analysis process, a similarity index can be defined to reflect the similarity, and the classification accuracy is realized through continuous optimization of the index.
4. Updating a knowledge base
After the clustering analysis is finished, the clustering result of the standard information is updated to the knowledge base, the knowledge base is continuously updated and expanded, the covered information is continuously perfect, and the value of the standard can be embodied.
The beneficial effects of the above technical scheme are: 1. the invention processes structured standard data, classifies the standards according to the knowledge element level, and exploits the value of the standards only by performing the transverse and longitudinal comparison of the standards; 2. in the standard use process, the system can automatically perform cluster analysis, reduce manual intervention and improve efficiency and accuracy.
A standard knowledge base management system based on cluster analysis, as shown in fig. 4, the system comprising:
a structuring module 401, configured to structure the target standard information in the target knowledge base;
an obtaining module 402, configured to connect the target knowledge base to a target terminal of a user, and obtain, from the target terminal, a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms;
a clustering module 403, configured to perform online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm, to obtain a clustering result;
an updating module 404, configured to update the clustering result to the target knowledge base.
The working principle and the advantageous effects of the above technical solution have been explained in the method claims, and are not described herein again.
It will be understood by those skilled in the art that the first and second terms of the present invention refer to different stages of application.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A standard knowledge base management method based on cluster analysis is characterized by comprising the following steps:
structuring target standard information in a target knowledge base;
communicating the communication connection between the target knowledge base and a target terminal of a user, and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;
carrying out online clustering analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result;
and updating the clustering result to the target knowledge base.
2. The method of claim 1, wherein prior to structuring the standard information in the knowledge base, the method further comprises: constructing the target knowledge base, wherein the steps comprise:
a knowledge point identification model and a knowledge point judgment model are constructed in advance;
acquiring a plurality of preset knowledge points, inputting the preset knowledge points into the knowledge point identification model for training, and acquiring a trained knowledge point identification model;
acquiring a plurality of knowledge point judgment attributes in the trained knowledge point identification model;
training the knowledge point judgment model by using the plurality of knowledge point judgment attributes to obtain a trained knowledge point judgment model;
and constructing the target knowledge base by using the trained knowledge point identification model and the trained knowledge point judgment model.
3. The method of claim 1, wherein prior to structuring the standard information in the knowledge base, the method further comprises: acquiring a plurality of first standard information, checking the plurality of first standard information to obtain the target standard information, wherein the steps comprise:
checking the integrity of the plurality of first standard information;
partial second standard information in the plurality of standard information is unqualified in integrity, the partial second standard information is subjected to information perfection, and after the information perfection is finished, the partial second standard information is reclassified into the plurality of first standard information;
determining whether the first standard information which appears repeatedly exists in the plurality of pieces of first standard information, if so, deleting the first standard information which appears repeatedly to obtain a first target number of pieces of third standard information;
inputting the first target number of third standard information into the knowledge base, counting successfully stored fourth standard information, and marking unsuccessfully stored fifth standard information;
and deleting the fifth standard information, and confirming fourth standard information with a second target number as the target standard information.
4. The method of claim 3, wherein the step of structuring the target standard information in the target knowledge base comprises:
determining the text types of the target standard information with the second target quantity;
acquiring a target segmentation configuration file and a target extraction configuration file corresponding to each target standard information according to the text type of each target standard information;
determining a segmentation rule of each target segmentation configuration file;
acquiring a segmentation segment corresponding to the target segmentation rule from a target standard file corresponding to each target segmentation configuration file according to the segmentation rule of each target segmentation configuration file;
determining an extraction rule of each target extraction configuration file;
matching each extraction rule with the corresponding segmentation segment, and acquiring the structural information of each target standard information from the segmentation segment of each target standard information according to the matching result;
and generating a structured text of each target standard information according to the structured information of each target standard information.
5. The method as claimed in claim 1, wherein the step of accessing the communication connection between the target knowledge base and the target terminal of the user to obtain the target clustering algorithm selected by the user from a fixed number of preset clustering algorithms comprises:
verifying the position safety and the communication connection safety of the target terminal;
when the target terminal passes the verification, sending a fixed number of preset clustering algorithms to the target terminal;
receiving a target clustering algorithm fed back by the target terminal, confirming whether the target clustering algorithm is one of a fixed number of preset clustering algorithms, if so, binding the target clustering algorithm with the target knowledge base, otherwise, confirming that the target clustering algorithm is a user-defined algorithm, storing the target clustering algorithm into the fixed number of preset clustering algorithms, and simultaneously binding the target clustering algorithm with the target knowledge base.
6. The method of claim 5, wherein the fixed number of predetermined clustering algorithms comprises: hierarchical clustering algorithm, split clustering algorithm, fuzzy clustering algorithm, k-mean algorithm, k-center point algorithm and cosine theorem similarity algorithm.
7. The method for managing standard knowledge base based on cluster analysis according to claim 4, wherein the performing online cluster analysis on the structured target standard information in the target knowledge base by using the target clustering algorithm to obtain a clustering result comprises:
preprocessing each target standard information in the second target number of target standard information, and removing useless data in each target standard information;
constructing a structure body array, extracting first target standard information and second target standard information from the second target quantity of target standard information, and storing the first target standard information and the second target standard information into the structure body array, wherein the first target standard information is any target standard information in the second target quantity of target standard information, and the second target standard information is any target standard information except the first target standard information in the second target quantity of target standard information;
setting numbers for the first target standard information and the second target standard information, and marking the association state of the first target standard information and the second target standard information;
sequencing and clustering the first target standard information and the second target standard information by using a preset position coordinate rapid sequencing method and the target clustering algorithm to construct a spatial index tree;
determining a first association degree between the first target standard information and the second target standard information according to the spatial index tree;
extracting a plurality of first keywords in the first target standard information and a plurality of second keywords in the second target standard information;
determining a second degree of association between the first target standard information and the second target standard information according to the plurality of first keywords and the plurality of second keywords;
and comprehensively determining a current association coefficient between the first target standard information and the second target standard information according to the first association degree and the second association degree, comparing the current association coefficient with a preset association coefficient, determining that the clustering result is that the first target standard information and the second target standard information are associated when the current association coefficient is greater than or equal to the preset association coefficient, and determining that the clustering result is that the first target standard information and the second target standard information are not associated when the current association coefficient is less than the preset association coefficient.
8. The method of claim 7, wherein updating the clustered results to the target knowledge base comprises:
clustering target standard information in the target knowledge base according to the clustering result;
and after finishing clustering, displaying the target standard information in the target knowledge base according to the category.
9. The method of claim 3, wherein the checking the integrity of the plurality of first criteria information comprises:
carrying out continuous characteristic decomposition processing on each piece of first standard information;
calculating the key information ratio of each piece of first standard information:
Figure FDA0002763187340000041
wherein S isiKey information ratio, m, expressed as ith first standard informationiExpressed as the embedding dimension, j, of the key information in the ith first criterion informationiThe time window function is expressed as the key information characteristic in the ith first standard information, x (t) is expressed as the time sequence for continuously decomposing the characteristics of a plurality of first standard information, and delta t is expressed as the time interval of continuously processing the characteristics, log2Expressed as a base 2 logarithm,aiexpressing the fuzzy characteristic distribution quantity of the key information in the ith first standard information, expressing b as a preset key information utilization rate scheduling index, and expressing ciExpressed as a feature distribution set in the ith first standard information, N is expressed as the number of the first standard information, piA decomposition feature set expressed as ith standard information;
calculating the complete coefficient of each first standard information according to the key information ratio of each first standard information:
Figure FDA0002763187340000051
wherein Q isiExpressed as the integral coefficient, τ, of the ith first criterion informationiThe coefficient is expressed as the self-first-closing extension coefficient of the key information in the ith first standard information and takes the value of [0.3,0.8],Li1 LiExpressed as the length of text, L, of key information in the ith first standard informationi2Length of text, K, expressed as ith first criterion informationiExpressed as useless interference factors in the ith first standard information, and the values are [0.05, 0.2%]And e is a natural constant with a value of 2.72;
and determining the integrity of each piece of first standard information by using a preset integrity coefficient conversion method according to the integrity coefficient of each piece of first standard information.
10. A standard knowledge base management system based on cluster analysis, the system comprising:
the structuring module is used for structuring the target standard information in the target knowledge base;
the acquisition module is used for connecting the communication connection between the target knowledge base and a target terminal of a user and acquiring a target clustering algorithm selected by the user from a fixed number of preset clustering algorithms from the target terminal;
the clustering module is used for carrying out online clustering analysis on the structured target standard information in the target knowledge base by utilizing the target clustering algorithm to obtain a clustering result;
and the updating module is used for updating the clustering result to the target knowledge base.
CN202011224456.8A 2020-11-05 2020-11-05 Standard knowledge base management method and system based on cluster analysis Active CN112199376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011224456.8A CN112199376B (en) 2020-11-05 2020-11-05 Standard knowledge base management method and system based on cluster analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011224456.8A CN112199376B (en) 2020-11-05 2020-11-05 Standard knowledge base management method and system based on cluster analysis

Publications (2)

Publication Number Publication Date
CN112199376A true CN112199376A (en) 2021-01-08
CN112199376B CN112199376B (en) 2021-07-20

Family

ID=74033284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011224456.8A Active CN112199376B (en) 2020-11-05 2020-11-05 Standard knowledge base management method and system based on cluster analysis

Country Status (1)

Country Link
CN (1) CN112199376B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392104A (en) * 2021-05-19 2021-09-14 江苏星月测绘科技股份有限公司 CIM-based mass data analysis method and system
CN117312480A (en) * 2023-10-24 2023-12-29 广东广信通信服务有限公司 Knowledge base updating and optimizing method and system based on multiple fields

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099131A1 (en) * 2009-10-22 2011-04-28 Yahoo! Inc. Pairwise ranking-based classifier
CN103093394A (en) * 2013-01-23 2013-05-08 广东电网公司信息中心 Clustering fusion method based on user electrical load data subdivision
CN103544299A (en) * 2013-10-30 2014-01-29 刘峰 Construction method for commercial intelligent cloud computing system
CN105183804A (en) * 2015-08-26 2015-12-23 陕西师范大学 Ontology based clustering service method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099131A1 (en) * 2009-10-22 2011-04-28 Yahoo! Inc. Pairwise ranking-based classifier
CN103093394A (en) * 2013-01-23 2013-05-08 广东电网公司信息中心 Clustering fusion method based on user electrical load data subdivision
CN103544299A (en) * 2013-10-30 2014-01-29 刘峰 Construction method for commercial intelligent cloud computing system
CN105183804A (en) * 2015-08-26 2015-12-23 陕西师范大学 Ontology based clustering service method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392104A (en) * 2021-05-19 2021-09-14 江苏星月测绘科技股份有限公司 CIM-based mass data analysis method and system
CN117312480A (en) * 2023-10-24 2023-12-29 广东广信通信服务有限公司 Knowledge base updating and optimizing method and system based on multiple fields

Also Published As

Publication number Publication date
CN112199376B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN112199376B (en) Standard knowledge base management method and system based on cluster analysis
CN109472280B (en) Method for updating species recognition model library, storage medium and electronic equipment
CN109472310B (en) Identification method and device for determining two resumes to be identical talents
CN112163424A (en) Data labeling method, device, equipment and medium
CN111815432B (en) Financial service risk prediction method and device
CN111860981A (en) Enterprise national industry category prediction method and system based on LSTM deep learning
CN112699923A (en) Document classification prediction method and device, computer equipment and storage medium
CN114841789A (en) Block chain-based auditing and auditing pricing fault data online editing method and system
CN117372144A (en) Wind control strategy intelligent method and system applied to small sample scene
CN112818068A (en) Big data and multidimensional feature-based data tracing method and system
CN108986786B (en) Voice interaction equipment rating method, system, computer equipment and storage medium
CN111061779A (en) Data processing method and device based on big data platform
CN112800219B (en) Method and system for feeding back customer service log to return database
CN112819527B (en) User grouping processing method and device
CN111798237B (en) Abnormal transaction diagnosis method and system based on application log
CN114238768A (en) Information pushing method and device, computer equipment and storage medium
CN111400644B (en) Calculation processing method for laboratory analysis sample
CN114048148A (en) Crowdsourcing test report recommendation method and device and electronic equipment
CN114780589A (en) Multi-table connection query method, device, equipment and storage medium
CN113191569A (en) Enterprise management method and system based on big data
CN113628077A (en) Method for generating non-repeated examination questions, terminal and readable storage medium
CN111027296A (en) Report generation method and system based on knowledge base
CN113094567A (en) Malicious complaint identification method and system based on text clustering
CN116541382B (en) Data management method and system based on data security identification level
CN113674115B (en) University data management auxiliary system and method based on data management technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant