CN116578611A - Knowledge management method and system for inoculated knowledge - Google Patents

Knowledge management method and system for inoculated knowledge Download PDF

Info

Publication number
CN116578611A
CN116578611A CN202310555201.7A CN202310555201A CN116578611A CN 116578611 A CN116578611 A CN 116578611A CN 202310555201 A CN202310555201 A CN 202310555201A CN 116578611 A CN116578611 A CN 116578611A
Authority
CN
China
Prior art keywords
knowledge
inoculated
population
knowledge data
individuals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310555201.7A
Other languages
Chinese (zh)
Other versions
CN116578611B (en
Inventor
杨钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shengcheng Mother Network Technology Co ltd
Original Assignee
Guangzhou Shengcheng Mother Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shengcheng Mother Network Technology Co ltd filed Critical Guangzhou Shengcheng Mother Network Technology Co ltd
Priority to CN202310555201.7A priority Critical patent/CN116578611B/en
Publication of CN116578611A publication Critical patent/CN116578611A/en
Application granted granted Critical
Publication of CN116578611B publication Critical patent/CN116578611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data mining, in particular to a knowledge management method and system for inoculated knowledge, comprising the following steps: acquiring inoculated knowledge data through multiple channels; preprocessing the acquired inoculated knowledge data; mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm; searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user. According to the invention, through collecting and mining inoculation data, the connection between different inoculation data is obtained, a reference of implicit inoculation knowledge is provided for a user, an implicit inoculation knowledge rule implicit in the inoculation knowledge is obtained, the scientificity of the inoculation knowledge is improved, and the cognition of the user on the inoculation knowledge is improved; and a decision tree is generated, so that the classification of inoculation data and the inquiry of users are facilitated, the prenatal and postnatal care quality is improved, and the probability of mother and infant diseases is reduced.

Description

Knowledge management method and system for inoculated knowledge
Technical Field
The invention relates to the technical field of data mining, in particular to a knowledge management method and system for inoculated knowledge.
Background
At present, 1700 ten thousand neonates exist in China each year, and the annual pregnant crowd is larger than the figure, so that more people can master correct inoculation knowledge and methods, and personalized guidance is performed according to the environment and individual difference of each person, the quality of prenatal and postnatal care is improved continuously, and the problem that each family and even society needs to be solved continuously.
In the prior art, the inoculated knowledge is organized and optimized based on the correlation by a machine learning mode, the implicit knowledge contained in the inoculated knowledge is not mined, the reference of the inoculated knowledge implicit knowledge cannot be provided for the user, and the cognition of the user on the inoculated knowledge is improved.
Disclosure of Invention
The invention aims to solve the defects in the background technology by providing a knowledge management method and a knowledge management system for inoculating knowledge.
The technical scheme adopted by the invention is as follows:
the knowledge management method for inoculating knowledge comprises the following steps:
s1: acquiring inoculated knowledge data through multiple channels;
s2: preprocessing the acquired inoculated knowledge data;
s3: mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm;
s4: searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user.
As a preferred technical scheme of the invention: the multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.
As a preferred technical scheme of the invention: and S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.
As a preferred technical scheme of the invention: the knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.
As a preferred technical scheme of the invention: the CART algorithm is specifically as follows:
set X 1 ,X 2 ,…,X N N attributes contained in a certain sample of inoculated knowledge data are represented, Y is used for representing the category to which the attribute belongs, and each attribute has a fixed output value; selecting the j-th attribute X in inoculated knowledge data set j And attribute X j Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(j,z)={x|x(j)≤z}、X 2 Two regions = (j, z) = { x|x (j) > z } find the best cut point X j I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Attribute category, c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing an N-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f),D 2 =d, wherein x represents pregnancyThe training knowledge data, y represents the attribute category of the training knowledge data x, F (x) =f represents the value F when the training knowledge data is x, and under the fixed attribute of the attribute F, the base index G (D, F) of the training knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D j Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the j-th attribute inoculated knowledge data.
As a preferred technical scheme of the invention: the decision tree generation is specifically as follows:
coding an inoculation knowledge data set to generate an initial population, carrying out knowledge mining and attribute division on inoculation knowledge data by taking the information gain of the inoculation knowledge data as a fitness function, calculating individual fitness of the population, taking the selected inoculation knowledge data as a leading population, taking unselected inoculation knowledge data as an auxiliary population, carrying out self-adaptive crossover operation and self-adaptive mutation operation on the individuals of the leading population, judging whether convergence conditions are met, ending if the convergence conditions are met, carrying out self-adaptive crossover and self-adaptive mutation operation on the individuals of the amplitude population, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the individuals of the population corresponding to the inoculation knowledge data into the same class according to the attribute.
As a preferred technical scheme of the invention: the adaptive crossover operation in S3 is specifically as follows:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max 、f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
the adaptive mutation operation in S3 is specifically as follows:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Representing the minimum variation probability and the maximum variation of the auxiliary population respectivelyProbability; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max 、f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
As a preferred technical scheme of the invention: after the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:
δ(t)=σ(t)+τ|L|
wherein σ (t) represents the prediction error of the leaf node t, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged.
As a preferred technical scheme of the invention: the creating step of the search index in S4 is as follows:
s4.1: generating an inoculated knowledge set based on the mining result of the knowledge mining algorithm;
s4.2: creating a query statement;
s4.3: performing lexical analysis to form a series of words;
s4.4: searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set;
s4.5: calculating the correlation between the result knowledge of S4.4 and the query statement;
s4.6: and sorting and outputting the query results according to the relevance.
There is provided a knowledge management system for deriving knowledge, comprising:
knowledge acquisition module: for collecting inoculation knowledge through multiple channels;
knowledge preprocessing module: the device is used for sorting and cleaning acquired inoculation knowledge;
knowledge mining module: the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;
an index creation module: for search index creation of inoculated knowledge sets for user query.
Compared with the prior art, the knowledge management method and system for inoculated knowledge provided by the invention have the beneficial effects that:
according to the invention, through collecting and mining inoculation data, the connection between different inoculation data is obtained, a reference of implicit inoculation knowledge is provided for a user, an implicit inoculation knowledge rule implicit in the inoculation knowledge is obtained, the scientificity of the inoculation knowledge is improved, and the cognition of the user on the inoculation knowledge is improved; and a decision tree is generated, so that the classification of inoculation data and the inquiry of users are facilitated, the prenatal and postnatal care quality is improved, and the probability of mother and infant diseases is reduced.
Drawings
FIG. 1 is a flow chart of a method of a preferred embodiment of the present invention;
fig. 2 is a block diagram of a system in a preferred embodiment of the present invention.
The meaning of each label in the figure is: 100. a knowledge acquisition module; 200. a knowledge preprocessing module; 300. a knowledge mining module; 400. and an index creation module.
Detailed Description
It should be noted that, under the condition of no conflict, the embodiments of the present embodiments and features in the embodiments may be combined with each other, and the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and obviously, the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a preferred embodiment of the present invention provides a knowledge management method for inoculating knowledge, comprising the steps of:
s1: acquiring inoculated knowledge data through multiple channels;
s2: preprocessing the acquired inoculated knowledge data;
s3: mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm;
s4: searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user.
The multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.
And S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.
The knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.
The CART algorithm is specifically as follows:
set X 1 ,X 2 ,…,X N N attributes contained in a certain sample of inoculated knowledge data are represented, Y is used for representing the category to which the attribute belongs, and each attribute has a fixed output value; selecting the j-th attribute X in inoculated knowledge data set j And attribute X j Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(j,z)={x|x(j)≤z}、X 2 Two regions = (j, z) = { x|x (j) > z } find the best cut point X j I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Attribute classLet c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing an N-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f},D 2 And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D j Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the j-th attribute inoculated knowledge data.
The decision tree generation is specifically as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,
m is the value number of the attribute F;
|D j the |and |d| respectively represent the subset D j And the size of the total data set D.
By using the information gain ratio as a fitness function, the drawbacks of the information gain can be overcome to some extent, thereby better selecting splitting properties when generating the decision tree.
Coding an inoculation knowledge data set to generate an initial population, carrying out knowledge mining and attribute division on inoculation knowledge data by taking the information gain of the inoculation knowledge data as a fitness function, calculating individual fitness of the population, taking the selected inoculation knowledge data as a leading population, taking unselected inoculation knowledge data as an auxiliary population, carrying out self-adaptive crossover operation and self-adaptive mutation operation on the individuals of the leading population, judging whether convergence conditions are met, ending if the convergence conditions are met, carrying out self-adaptive crossover and self-adaptive mutation operation on the individuals of the amplitude population, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the individuals of the population corresponding to the inoculation knowledge data into the same class according to the attribute.
Wherein the fitness function Gain (F) information Gain may be biased towards properties with more values when selecting split properties, as such properties tend to yield more branches, resulting in a reduced information entropy. But this does not mean that the attribute with more values must be the best split attribute. To solve this problem, the present embodiment improves it using the information gain ratio.
The information gain rate introduces an inherent value of an attribute as a normalization factor in the calculation process, so that the weight of the attribute with more values is reduced. The specific definition is as follows:
the adaptive crossover operation in S3 is specifically as follows:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max、 f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
the adaptive mutation operation in S3 is specifically as follows:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max、 f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
After the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:
δ(t)=σ(t)+τ|L|
wherein σ (t) represents the prediction error of the leaf node t, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged.
The creating step of the search index in S4 is as follows:
s4.1: generating an inoculated knowledge set based on the mining result of the knowledge mining algorithm;
s4.2: creating a query statement;
s4.3: performing lexical analysis to form a series of words;
s4.4: searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set;
s4.5: calculating the correlation between the result knowledge of S4.4 and the query statement;
s4.6: and sorting and outputting the query results according to the relevance.
Referring to fig. 2, there is provided a knowledge management system for deriving knowledge, comprising:
knowledge acquisition module 100: for collecting inoculation knowledge through multiple channels;
knowledge preprocessing module 200: the device is used for sorting and cleaning acquired inoculation knowledge;
knowledge mining module 300: the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;
index creation module 400: for search index creation of inoculated knowledge sets for user query.
In this embodiment, the knowledge acquisition module 100 sorts and cleans various acquired inoculated knowledge data by using computer terminals, such as web pages, APP, mobile equipment terminals, such as mobile phone terminals, smart bracelets, and the like, and manually acquired inoculated knowledge data, such as clothes, wherein the inoculated knowledge further includes maternal and infant knowledge, and the knowledge preprocessing module 200 performs word analysis operations on the acquired inoculated data, such as word segmentation, word stop removal, punctuation removal, uppercase and lowercase, to form an inoculated knowledge data set composed of a series of words.
The knowledge mining module 300 performs the following inoculated knowledge mining operations, set X 1 ,X 2 ,…,X 50 50 attributes contained in a certain sample representing inoculated knowledge data are selected, and the 25 th attribute X in the inoculated knowledge data set is selected 25 And attribute X 25 Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(25,z)={x|x(25)≤z}、X 2 Two regions = (25, z) = { x|x (25) > z } find the best cut point X 25 I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation areaDomain X 1 Inoculated knowledge data x in (1) 1 Attribute category, c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing a 50-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f),D 2 And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D 25 Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the 25 th attribute inoculated knowledge data.
Coding the inoculated knowledge data set to generate an initial population, carrying out attribute division on inoculated knowledge data by taking the information gain of the inoculated knowledge data as a fitness function, calculating individual fitness of the population, taking selected inoculated knowledge data as a leading population, taking unselected inoculated knowledge data as an auxiliary population, and carrying out self-adaptive cross operation on individuals of the leading population:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max 、f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
and adaptive mutation operation:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max 、f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
Interleaving and mutation are carried out on inoculated knowledge data so as to excavate inoculated knowledge. After the excavation is completed, judging whether convergence conditions are met, if yes, finishing, and if not, performing self-adaptive intersection and self-adaptive mutation operation on the population individuals with the amplitude values, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the population individuals corresponding to inoculation knowledge data into the same class according to the attributes.
Generating a decision tree according to inoculated knowledge data with the same attribute as a class, setting the number of leaf nodes of the decision tree as 500 after generating the decision tree, and defining a loss function delta (150) of a 150 th leaf node as:
δ(150)=σ(150)+τ|L|
where σ (150) represents the prediction error of the 150 th leaf node, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged. The decision tree is created, and an inoculated knowledge set is generated according to the decision tree, the index creation module 400 collects query sentences of the user, and lexical analysis is carried out on the query sentences to form a series of words; searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set; calculating the correlation between the result knowledge and the query statement; and sorting according to the relevance, and outputting and displaying according to the order of the relevance from high to low.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (10)

1. A knowledge management method for inoculating knowledge is characterized in that: the method comprises the following steps:
s1: acquiring inoculated knowledge data through multiple channels;
s2: preprocessing the acquired inoculated knowledge data;
s3: mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm;
s4: searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user.
2. The knowledge management method of inoculated knowledge according to claim 1 wherein: the multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.
3. The knowledge management method of inoculated knowledge according to claim 1 wherein: and S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.
4. The knowledge management method of inoculated knowledge according to claim 1 wherein: the knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.
5. The knowledge management method for deriving knowledge according to claim 4, wherein: the CART algorithm is specifically as follows:
set X 1 ,X 2 ,…,X N N attributes contained in a certain sample of inoculated knowledge data are represented, Y is used for representing the category to which the attribute belongs, and each attribute has a fixed output value; selecting the j-th attribute X in inoculated knowledge data set j And attribute X j Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(j,z)={x|x(j)≤z}、X 2 Two regions = (j, z) = { x|x (j) > z } find the best cut point X j I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Attribute category, c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing an N-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f},D 2 And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D j Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the j-th attribute inoculated knowledge data.
6. The knowledge management method for deriving knowledge according to claim 5, wherein: the decision tree generation is specifically as follows:
coding an inoculation knowledge data set to generate an initial population, carrying out knowledge mining and attribute division on inoculation knowledge data by taking the information gain of the inoculation knowledge data as a fitness function, calculating individual fitness of the population, taking the selected inoculation knowledge data as a leading population, taking unselected inoculation knowledge data as an auxiliary population, carrying out self-adaptive crossover operation and self-adaptive mutation operation on the individuals of the leading population, judging whether convergence conditions are met, ending if the convergence conditions are met, carrying out self-adaptive crossover and self-adaptive mutation operation on the individuals of the amplitude population, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the individuals of the population corresponding to the inoculation knowledge data into the same class according to the attribute.
7. The knowledge management method for deriving knowledge as claimed in claim 6 wherein: the adaptive crossover operation in S3 is specifically as follows:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max 、f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
the adaptive mutation operation in S3 is specifically as follows:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max 、f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
8. The knowledge management method of inoculated knowledge of claim 7 wherein: after the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:
δ(t)=σ(t)+τ|L|
wherein σ (t) represents the prediction error of the leaf node t, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged.
9. The knowledge management method of inoculated knowledge according to claim 1 wherein: the creating step of the search index in S4 is as follows:
s4.1: generating an inoculated knowledge set based on the mining result of the knowledge mining algorithm;
s4.2: creating a query statement;
s4.3: performing lexical analysis to form a series of words;
s4.4: searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set;
s4.5: calculating the correlation between the result knowledge of S4.4 and the query statement;
s4.6: and sorting and outputting the query results according to the relevance.
10. Knowledge management system for inoculated knowledge, based on a knowledge management method for inoculated knowledge according to any of the claims 1-9, characterized in that: comprising the following steps:
knowledge acquisition module (100): for collecting inoculation knowledge through multiple channels;
knowledge preprocessing module (200): the device is used for sorting and cleaning acquired inoculation knowledge;
knowledge mining module (300): the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;
index creation module (400): for search index creation of inoculated knowledge sets for user query.
CN202310555201.7A 2023-05-16 2023-05-16 Knowledge management method and system for inoculated knowledge Active CN116578611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310555201.7A CN116578611B (en) 2023-05-16 2023-05-16 Knowledge management method and system for inoculated knowledge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310555201.7A CN116578611B (en) 2023-05-16 2023-05-16 Knowledge management method and system for inoculated knowledge

Publications (2)

Publication Number Publication Date
CN116578611A true CN116578611A (en) 2023-08-11
CN116578611B CN116578611B (en) 2023-11-03

Family

ID=87537276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310555201.7A Active CN116578611B (en) 2023-05-16 2023-05-16 Knowledge management method and system for inoculated knowledge

Country Status (1)

Country Link
CN (1) CN116578611B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050177351A1 (en) * 2004-02-09 2005-08-11 The Board Of Trustees Of The University Of Illinois Methods and program products for optimizing problem clustering
CN101763529A (en) * 2010-01-14 2010-06-30 中山大学 Rough set attribute reduction method based on genetic algorithm
CN106096641A (en) * 2016-06-07 2016-11-09 南京邮电大学 A kind of multi-modal affective characteristics fusion method based on genetic algorithm
CN106529666A (en) * 2016-11-17 2017-03-22 衢州学院 Difference evolution algorithm for controlling parameter adaptive and strategy adaptive
CN109902740A (en) * 2019-02-27 2019-06-18 浙江理工大学 It is a kind of based on more algorithm fusions it is parallel learn Industry Control intrusion detection method again
CN110365648A (en) * 2019-06-14 2019-10-22 东南大学 A kind of vehicle-mounted CAN bus method for detecting abnormality based on decision tree
US20190377870A1 (en) * 2018-06-11 2019-12-12 Palo Alto Research Center Incorporated System and method for remotely detecting an anomaly
CN111813669A (en) * 2020-07-04 2020-10-23 毛澄映 Adaptive random test case generation method based on multi-target group intelligence
CN114185800A (en) * 2021-12-16 2022-03-15 中国电信股份有限公司 Test case generation method, system, equipment and medium based on genetic algorithm
CN114373467A (en) * 2022-01-11 2022-04-19 山东大学 Antagonistic audio sample generation method based on three-group parallel genetic algorithm
CN114758023A (en) * 2022-03-30 2022-07-15 桂林电子科技大学 Stomach electrical impedance tomography method of self-adaptive genetic algorithm
CN115248592A (en) * 2022-01-10 2022-10-28 齐齐哈尔大学 Multi-robot autonomous exploration method and system based on improved rapid exploration random tree

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050177351A1 (en) * 2004-02-09 2005-08-11 The Board Of Trustees Of The University Of Illinois Methods and program products for optimizing problem clustering
CN101763529A (en) * 2010-01-14 2010-06-30 中山大学 Rough set attribute reduction method based on genetic algorithm
CN106096641A (en) * 2016-06-07 2016-11-09 南京邮电大学 A kind of multi-modal affective characteristics fusion method based on genetic algorithm
CN106529666A (en) * 2016-11-17 2017-03-22 衢州学院 Difference evolution algorithm for controlling parameter adaptive and strategy adaptive
US20190377870A1 (en) * 2018-06-11 2019-12-12 Palo Alto Research Center Incorporated System and method for remotely detecting an anomaly
CN109902740A (en) * 2019-02-27 2019-06-18 浙江理工大学 It is a kind of based on more algorithm fusions it is parallel learn Industry Control intrusion detection method again
CN110365648A (en) * 2019-06-14 2019-10-22 东南大学 A kind of vehicle-mounted CAN bus method for detecting abnormality based on decision tree
CN111813669A (en) * 2020-07-04 2020-10-23 毛澄映 Adaptive random test case generation method based on multi-target group intelligence
CN114185800A (en) * 2021-12-16 2022-03-15 中国电信股份有限公司 Test case generation method, system, equipment and medium based on genetic algorithm
CN115248592A (en) * 2022-01-10 2022-10-28 齐齐哈尔大学 Multi-robot autonomous exploration method and system based on improved rapid exploration random tree
CN114373467A (en) * 2022-01-11 2022-04-19 山东大学 Antagonistic audio sample generation method based on three-group parallel genetic algorithm
CN114758023A (en) * 2022-03-30 2022-07-15 桂林电子科技大学 Stomach electrical impedance tomography method of self-adaptive genetic algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GEORGED.SMITH: "Evolutionary feature construction using information Gain and Gini index", 《GENETIC PROGRAMMING》, pages 379 - 388 *
刘鹏莎: "基于遗传规划算法的高维数据特征选择与特征构造方法研究", 《中国优秀硕士论文全文数据库 医药卫生科技辑 》, pages 072 - 24 *
马捷等: "基于CART算法的医疗隐性知识挖掘研究——以中医医案为例", 《情报科学》, vol. 39, no. 6, pages 84 - 91 *

Also Published As

Publication number Publication date
CN116578611B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN109635291B (en) Recommendation method for fusing scoring information and article content based on collaborative training
CN107818164A (en) A kind of intelligent answer method and its system
CN111340661B (en) Automatic application problem solving method based on graph neural network
CN111291188B (en) Intelligent information extraction method and system
CN112051986B (en) Code search recommendation device and method based on open source knowledge
CN117290489B (en) Method and system for quickly constructing industry question-answer knowledge base
CN116524960A (en) Speech emotion recognition system based on mixed entropy downsampling and integrated classifier
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN114332519A (en) Image description generation method based on external triple and abstract relation
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN116578611B (en) Knowledge management method and system for inoculated knowledge
CN116050419A (en) Unsupervised identification method and system oriented to scientific literature knowledge entity
CN113392191B (en) Text matching method and device based on multi-dimensional semantic joint learning
CN111782964B (en) Recommendation method of community posts
CN112465054B (en) FCN-based multivariate time series data classification method
CN113535928A (en) Service discovery method and system of long-term and short-term memory network based on attention mechanism
CN114328633A (en) Wrong question knowledge point strengthening training test question recommendation method based on concept lattice
CN114372148A (en) Data processing method based on knowledge graph technology and terminal equipment
CN113495964A (en) Method, device and equipment for screening triples and readable storage medium
CN112818122A (en) Dialog text-oriented event extraction method and system
CN112307288A (en) User clustering method for multiple channels
CN111581326A (en) Method for extracting answer information based on heterogeneous external knowledge source graph structure
CN114036946B (en) Text feature extraction and auxiliary retrieval system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant