CN116578611A - Knowledge management method and system for inoculated knowledge - Google Patents
Knowledge management method and system for inoculated knowledge Download PDFInfo
- Publication number
- CN116578611A CN116578611A CN202310555201.7A CN202310555201A CN116578611A CN 116578611 A CN116578611 A CN 116578611A CN 202310555201 A CN202310555201 A CN 202310555201A CN 116578611 A CN116578611 A CN 116578611A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- inoculated
- population
- knowledge data
- individuals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007726 management method Methods 0.000 title claims abstract description 22
- 238000011081 inoculation Methods 0.000 claims abstract description 48
- 238000005065 mining Methods 0.000 claims abstract description 41
- 238000003066 decision tree Methods 0.000 claims abstract description 25
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 230000035772 mutation Effects 0.000 claims description 12
- 238000004458 analytical method Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 6
- 230000000295 complement effect Effects 0.000 claims description 4
- 238000000034 method Methods 0.000 claims description 4
- 230000002068 genetic effect Effects 0.000 claims description 3
- 230000019771 cognition Effects 0.000 abstract description 3
- 238000007418 data mining Methods 0.000 abstract description 2
- 201000010099 disease Diseases 0.000 abstract description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 20
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/20—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- General Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of data mining, in particular to a knowledge management method and system for inoculated knowledge, comprising the following steps: acquiring inoculated knowledge data through multiple channels; preprocessing the acquired inoculated knowledge data; mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm; searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user. According to the invention, through collecting and mining inoculation data, the connection between different inoculation data is obtained, a reference of implicit inoculation knowledge is provided for a user, an implicit inoculation knowledge rule implicit in the inoculation knowledge is obtained, the scientificity of the inoculation knowledge is improved, and the cognition of the user on the inoculation knowledge is improved; and a decision tree is generated, so that the classification of inoculation data and the inquiry of users are facilitated, the prenatal and postnatal care quality is improved, and the probability of mother and infant diseases is reduced.
Description
Technical Field
The invention relates to the technical field of data mining, in particular to a knowledge management method and system for inoculated knowledge.
Background
At present, 1700 ten thousand neonates exist in China each year, and the annual pregnant crowd is larger than the figure, so that more people can master correct inoculation knowledge and methods, and personalized guidance is performed according to the environment and individual difference of each person, the quality of prenatal and postnatal care is improved continuously, and the problem that each family and even society needs to be solved continuously.
In the prior art, the inoculated knowledge is organized and optimized based on the correlation by a machine learning mode, the implicit knowledge contained in the inoculated knowledge is not mined, the reference of the inoculated knowledge implicit knowledge cannot be provided for the user, and the cognition of the user on the inoculated knowledge is improved.
Disclosure of Invention
The invention aims to solve the defects in the background technology by providing a knowledge management method and a knowledge management system for inoculating knowledge.
The technical scheme adopted by the invention is as follows:
the knowledge management method for inoculating knowledge comprises the following steps:
s1: acquiring inoculated knowledge data through multiple channels;
s2: preprocessing the acquired inoculated knowledge data;
s3: mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm;
s4: searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user.
As a preferred technical scheme of the invention: the multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.
As a preferred technical scheme of the invention: and S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.
As a preferred technical scheme of the invention: the knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.
As a preferred technical scheme of the invention: the CART algorithm is specifically as follows:
set X 1 ,X 2 ,…,X N N attributes contained in a certain sample of inoculated knowledge data are represented, Y is used for representing the category to which the attribute belongs, and each attribute has a fixed output value; selecting the j-th attribute X in inoculated knowledge data set j And attribute X j Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(j,z)={x|x(j)≤z}、X 2 Two regions = (j, z) = { x|x (j) > z } find the best cut point X j I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Attribute category, c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing an N-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f),D 2 =d, wherein x represents pregnancyThe training knowledge data, y represents the attribute category of the training knowledge data x, F (x) =f represents the value F when the training knowledge data is x, and under the fixed attribute of the attribute F, the base index G (D, F) of the training knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D j Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the j-th attribute inoculated knowledge data.
As a preferred technical scheme of the invention: the decision tree generation is specifically as follows:
coding an inoculation knowledge data set to generate an initial population, carrying out knowledge mining and attribute division on inoculation knowledge data by taking the information gain of the inoculation knowledge data as a fitness function, calculating individual fitness of the population, taking the selected inoculation knowledge data as a leading population, taking unselected inoculation knowledge data as an auxiliary population, carrying out self-adaptive crossover operation and self-adaptive mutation operation on the individuals of the leading population, judging whether convergence conditions are met, ending if the convergence conditions are met, carrying out self-adaptive crossover and self-adaptive mutation operation on the individuals of the amplitude population, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the individuals of the population corresponding to the inoculation knowledge data into the same class according to the attribute.
As a preferred technical scheme of the invention: the adaptive crossover operation in S3 is specifically as follows:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max 、f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
the adaptive mutation operation in S3 is specifically as follows:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Representing the minimum variation probability and the maximum variation of the auxiliary population respectivelyProbability; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max 、f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
As a preferred technical scheme of the invention: after the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:
δ(t)=σ(t)+τ|L|
wherein σ (t) represents the prediction error of the leaf node t, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged.
As a preferred technical scheme of the invention: the creating step of the search index in S4 is as follows:
s4.1: generating an inoculated knowledge set based on the mining result of the knowledge mining algorithm;
s4.2: creating a query statement;
s4.3: performing lexical analysis to form a series of words;
s4.4: searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set;
s4.5: calculating the correlation between the result knowledge of S4.4 and the query statement;
s4.6: and sorting and outputting the query results according to the relevance.
There is provided a knowledge management system for deriving knowledge, comprising:
knowledge acquisition module: for collecting inoculation knowledge through multiple channels;
knowledge preprocessing module: the device is used for sorting and cleaning acquired inoculation knowledge;
knowledge mining module: the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;
an index creation module: for search index creation of inoculated knowledge sets for user query.
Compared with the prior art, the knowledge management method and system for inoculated knowledge provided by the invention have the beneficial effects that:
according to the invention, through collecting and mining inoculation data, the connection between different inoculation data is obtained, a reference of implicit inoculation knowledge is provided for a user, an implicit inoculation knowledge rule implicit in the inoculation knowledge is obtained, the scientificity of the inoculation knowledge is improved, and the cognition of the user on the inoculation knowledge is improved; and a decision tree is generated, so that the classification of inoculation data and the inquiry of users are facilitated, the prenatal and postnatal care quality is improved, and the probability of mother and infant diseases is reduced.
Drawings
FIG. 1 is a flow chart of a method of a preferred embodiment of the present invention;
fig. 2 is a block diagram of a system in a preferred embodiment of the present invention.
The meaning of each label in the figure is: 100. a knowledge acquisition module; 200. a knowledge preprocessing module; 300. a knowledge mining module; 400. and an index creation module.
Detailed Description
It should be noted that, under the condition of no conflict, the embodiments of the present embodiments and features in the embodiments may be combined with each other, and the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and obviously, the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a preferred embodiment of the present invention provides a knowledge management method for inoculating knowledge, comprising the steps of:
s1: acquiring inoculated knowledge data through multiple channels;
s2: preprocessing the acquired inoculated knowledge data;
s3: mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm;
s4: searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user.
The multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.
And S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.
The knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.
The CART algorithm is specifically as follows:
set X 1 ,X 2 ,…,X N N attributes contained in a certain sample of inoculated knowledge data are represented, Y is used for representing the category to which the attribute belongs, and each attribute has a fixed output value; selecting the j-th attribute X in inoculated knowledge data set j And attribute X j Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(j,z)={x|x(j)≤z}、X 2 Two regions = (j, z) = { x|x (j) > z } find the best cut point X j I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Attribute classLet c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing an N-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f},D 2 And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D j Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the j-th attribute inoculated knowledge data.
The decision tree generation is specifically as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,
m is the value number of the attribute F;
|D j the |and |d| respectively represent the subset D j And the size of the total data set D.
By using the information gain ratio as a fitness function, the drawbacks of the information gain can be overcome to some extent, thereby better selecting splitting properties when generating the decision tree.
Coding an inoculation knowledge data set to generate an initial population, carrying out knowledge mining and attribute division on inoculation knowledge data by taking the information gain of the inoculation knowledge data as a fitness function, calculating individual fitness of the population, taking the selected inoculation knowledge data as a leading population, taking unselected inoculation knowledge data as an auxiliary population, carrying out self-adaptive crossover operation and self-adaptive mutation operation on the individuals of the leading population, judging whether convergence conditions are met, ending if the convergence conditions are met, carrying out self-adaptive crossover and self-adaptive mutation operation on the individuals of the amplitude population, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the individuals of the population corresponding to the inoculation knowledge data into the same class according to the attribute.
Wherein the fitness function Gain (F) information Gain may be biased towards properties with more values when selecting split properties, as such properties tend to yield more branches, resulting in a reduced information entropy. But this does not mean that the attribute with more values must be the best split attribute. To solve this problem, the present embodiment improves it using the information gain ratio.
The information gain rate introduces an inherent value of an attribute as a normalization factor in the calculation process, so that the weight of the attribute with more values is reduced. The specific definition is as follows:
the adaptive crossover operation in S3 is specifically as follows:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max、 f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
the adaptive mutation operation in S3 is specifically as follows:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max、 f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
After the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:
δ(t)=σ(t)+τ|L|
wherein σ (t) represents the prediction error of the leaf node t, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged.
The creating step of the search index in S4 is as follows:
s4.1: generating an inoculated knowledge set based on the mining result of the knowledge mining algorithm;
s4.2: creating a query statement;
s4.3: performing lexical analysis to form a series of words;
s4.4: searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set;
s4.5: calculating the correlation between the result knowledge of S4.4 and the query statement;
s4.6: and sorting and outputting the query results according to the relevance.
Referring to fig. 2, there is provided a knowledge management system for deriving knowledge, comprising:
knowledge acquisition module 100: for collecting inoculation knowledge through multiple channels;
knowledge preprocessing module 200: the device is used for sorting and cleaning acquired inoculation knowledge;
knowledge mining module 300: the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;
index creation module 400: for search index creation of inoculated knowledge sets for user query.
In this embodiment, the knowledge acquisition module 100 sorts and cleans various acquired inoculated knowledge data by using computer terminals, such as web pages, APP, mobile equipment terminals, such as mobile phone terminals, smart bracelets, and the like, and manually acquired inoculated knowledge data, such as clothes, wherein the inoculated knowledge further includes maternal and infant knowledge, and the knowledge preprocessing module 200 performs word analysis operations on the acquired inoculated data, such as word segmentation, word stop removal, punctuation removal, uppercase and lowercase, to form an inoculated knowledge data set composed of a series of words.
The knowledge mining module 300 performs the following inoculated knowledge mining operations, set X 1 ,X 2 ,…,X 50 50 attributes contained in a certain sample representing inoculated knowledge data are selected, and the 25 th attribute X in the inoculated knowledge data set is selected 25 And attribute X 25 Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(25,z)={x|x(25)≤z}、X 2 Two regions = (25, z) = { x|x (25) > z } find the best cut point X 25 I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation areaDomain X 1 Inoculated knowledge data x in (1) 1 Attribute category, c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing a 50-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f),D 2 And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D 25 Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the 25 th attribute inoculated knowledge data.
Coding the inoculated knowledge data set to generate an initial population, carrying out attribute division on inoculated knowledge data by taking the information gain of the inoculated knowledge data as a fitness function, calculating individual fitness of the population, taking selected inoculated knowledge data as a leading population, taking unselected inoculated knowledge data as an auxiliary population, and carrying out self-adaptive cross operation on individuals of the leading population:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max 、f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
and adaptive mutation operation:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max 、f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
Interleaving and mutation are carried out on inoculated knowledge data so as to excavate inoculated knowledge. After the excavation is completed, judging whether convergence conditions are met, if yes, finishing, and if not, performing self-adaptive intersection and self-adaptive mutation operation on the population individuals with the amplitude values, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the population individuals corresponding to inoculation knowledge data into the same class according to the attributes.
Generating a decision tree according to inoculated knowledge data with the same attribute as a class, setting the number of leaf nodes of the decision tree as 500 after generating the decision tree, and defining a loss function delta (150) of a 150 th leaf node as:
δ(150)=σ(150)+τ|L|
where σ (150) represents the prediction error of the 150 th leaf node, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged. The decision tree is created, and an inoculated knowledge set is generated according to the decision tree, the index creation module 400 collects query sentences of the user, and lexical analysis is carried out on the query sentences to form a series of words; searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set; calculating the correlation between the result knowledge and the query statement; and sorting according to the relevance, and outputting and displaying according to the order of the relevance from high to low.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
Claims (10)
1. A knowledge management method for inoculating knowledge is characterized in that: the method comprises the following steps:
s1: acquiring inoculated knowledge data through multiple channels;
s2: preprocessing the acquired inoculated knowledge data;
s3: mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm;
s4: searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user.
2. The knowledge management method of inoculated knowledge according to claim 1 wherein: the multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.
3. The knowledge management method of inoculated knowledge according to claim 1 wherein: and S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.
4. The knowledge management method of inoculated knowledge according to claim 1 wherein: the knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.
5. The knowledge management method for deriving knowledge according to claim 4, wherein: the CART algorithm is specifically as follows:
set X 1 ,X 2 ,…,X N N attributes contained in a certain sample of inoculated knowledge data are represented, Y is used for representing the category to which the attribute belongs, and each attribute has a fixed output value; selecting the j-th attribute X in inoculated knowledge data set j And attribute X j Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X 1 =(j,z)={x|x(j)≤z}、X 2 Two regions = (j, z) = { x|x (j) > z } find the best cut point X j I.e. the point at which the square error minimum is calculated:
wherein y is 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Attribute category, c 1 Representation area X 1 Inoculated knowledge data x in (1) 1 Output value of y 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Attribute category, c 2 Representation area X 2 Inoculated knowledge data x in (1) 2 Output value of (2);
dividing an N-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;
let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p k The base index G (p) of the probability distribution is:
determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D 1 And D 2 Two areas D 1 ={(x,y)∈D|F(x)=f},D 2 And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:
wherein G (D) 1 ) Representing a sub-set D of inoculated knowledge data 1 Is of the formula G (D) 2 ) Representing a sub-set D of inoculated knowledge data 2 Is a base index of (2);
and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:
wherein p is i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D j Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the j-th attribute inoculated knowledge data.
6. The knowledge management method for deriving knowledge according to claim 5, wherein: the decision tree generation is specifically as follows:
coding an inoculation knowledge data set to generate an initial population, carrying out knowledge mining and attribute division on inoculation knowledge data by taking the information gain of the inoculation knowledge data as a fitness function, calculating individual fitness of the population, taking the selected inoculation knowledge data as a leading population, taking unselected inoculation knowledge data as an auxiliary population, carrying out self-adaptive crossover operation and self-adaptive mutation operation on the individuals of the leading population, judging whether convergence conditions are met, ending if the convergence conditions are met, carrying out self-adaptive crossover and self-adaptive mutation operation on the individuals of the amplitude population, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the individuals of the population corresponding to the inoculation knowledge data into the same class according to the attribute.
7. The knowledge management method for deriving knowledge as claimed in claim 6 wherein: the adaptive crossover operation in S3 is specifically as follows:
wherein p is a Representing the crossover probability of the dominant population, p a1 And p a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p b Representing the crossover probability of the helper population, p b1 And p b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) ai 、f bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f a,max 、f b,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;
the adaptive mutation operation in S3 is specifically as follows:
wherein p is A Representing the probability of variation of the dominant population, p A1 And p A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p B Representing the probability of variation of the helper population, p B1 And p B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) Ai 、f Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f A,max 、f B,max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.
8. The knowledge management method of inoculated knowledge of claim 7 wherein: after the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:
δ(t)=σ(t)+τ|L|
wherein σ (t) represents the prediction error of the leaf node t, τ represents the model parameters, l| represents the complexity of the model;
calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged.
9. The knowledge management method of inoculated knowledge according to claim 1 wherein: the creating step of the search index in S4 is as follows:
s4.1: generating an inoculated knowledge set based on the mining result of the knowledge mining algorithm;
s4.2: creating a query statement;
s4.3: performing lexical analysis to form a series of words;
s4.4: searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set;
s4.5: calculating the correlation between the result knowledge of S4.4 and the query statement;
s4.6: and sorting and outputting the query results according to the relevance.
10. Knowledge management system for inoculated knowledge, based on a knowledge management method for inoculated knowledge according to any of the claims 1-9, characterized in that: comprising the following steps:
knowledge acquisition module (100): for collecting inoculation knowledge through multiple channels;
knowledge preprocessing module (200): the device is used for sorting and cleaning acquired inoculation knowledge;
knowledge mining module (300): the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;
index creation module (400): for search index creation of inoculated knowledge sets for user query.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310555201.7A CN116578611B (en) | 2023-05-16 | 2023-05-16 | Knowledge management method and system for inoculated knowledge |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310555201.7A CN116578611B (en) | 2023-05-16 | 2023-05-16 | Knowledge management method and system for inoculated knowledge |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116578611A true CN116578611A (en) | 2023-08-11 |
CN116578611B CN116578611B (en) | 2023-11-03 |
Family
ID=87537276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310555201.7A Active CN116578611B (en) | 2023-05-16 | 2023-05-16 | Knowledge management method and system for inoculated knowledge |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116578611B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050177351A1 (en) * | 2004-02-09 | 2005-08-11 | The Board Of Trustees Of The University Of Illinois | Methods and program products for optimizing problem clustering |
CN101763529A (en) * | 2010-01-14 | 2010-06-30 | 中山大学 | Rough set attribute reduction method based on genetic algorithm |
CN106096641A (en) * | 2016-06-07 | 2016-11-09 | 南京邮电大学 | A kind of multi-modal affective characteristics fusion method based on genetic algorithm |
CN106529666A (en) * | 2016-11-17 | 2017-03-22 | 衢州学院 | Difference evolution algorithm for controlling parameter adaptive and strategy adaptive |
CN109902740A (en) * | 2019-02-27 | 2019-06-18 | 浙江理工大学 | It is a kind of based on more algorithm fusions it is parallel learn Industry Control intrusion detection method again |
CN110365648A (en) * | 2019-06-14 | 2019-10-22 | 东南大学 | A kind of vehicle-mounted CAN bus method for detecting abnormality based on decision tree |
US20190377870A1 (en) * | 2018-06-11 | 2019-12-12 | Palo Alto Research Center Incorporated | System and method for remotely detecting an anomaly |
CN111813669A (en) * | 2020-07-04 | 2020-10-23 | 毛澄映 | Adaptive random test case generation method based on multi-target group intelligence |
CN114185800A (en) * | 2021-12-16 | 2022-03-15 | 中国电信股份有限公司 | Test case generation method, system, equipment and medium based on genetic algorithm |
CN114373467A (en) * | 2022-01-11 | 2022-04-19 | 山东大学 | Antagonistic audio sample generation method based on three-group parallel genetic algorithm |
CN114758023A (en) * | 2022-03-30 | 2022-07-15 | 桂林电子科技大学 | Stomach electrical impedance tomography method of self-adaptive genetic algorithm |
CN115248592A (en) * | 2022-01-10 | 2022-10-28 | 齐齐哈尔大学 | Multi-robot autonomous exploration method and system based on improved rapid exploration random tree |
-
2023
- 2023-05-16 CN CN202310555201.7A patent/CN116578611B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050177351A1 (en) * | 2004-02-09 | 2005-08-11 | The Board Of Trustees Of The University Of Illinois | Methods and program products for optimizing problem clustering |
CN101763529A (en) * | 2010-01-14 | 2010-06-30 | 中山大学 | Rough set attribute reduction method based on genetic algorithm |
CN106096641A (en) * | 2016-06-07 | 2016-11-09 | 南京邮电大学 | A kind of multi-modal affective characteristics fusion method based on genetic algorithm |
CN106529666A (en) * | 2016-11-17 | 2017-03-22 | 衢州学院 | Difference evolution algorithm for controlling parameter adaptive and strategy adaptive |
US20190377870A1 (en) * | 2018-06-11 | 2019-12-12 | Palo Alto Research Center Incorporated | System and method for remotely detecting an anomaly |
CN109902740A (en) * | 2019-02-27 | 2019-06-18 | 浙江理工大学 | It is a kind of based on more algorithm fusions it is parallel learn Industry Control intrusion detection method again |
CN110365648A (en) * | 2019-06-14 | 2019-10-22 | 东南大学 | A kind of vehicle-mounted CAN bus method for detecting abnormality based on decision tree |
CN111813669A (en) * | 2020-07-04 | 2020-10-23 | 毛澄映 | Adaptive random test case generation method based on multi-target group intelligence |
CN114185800A (en) * | 2021-12-16 | 2022-03-15 | 中国电信股份有限公司 | Test case generation method, system, equipment and medium based on genetic algorithm |
CN115248592A (en) * | 2022-01-10 | 2022-10-28 | 齐齐哈尔大学 | Multi-robot autonomous exploration method and system based on improved rapid exploration random tree |
CN114373467A (en) * | 2022-01-11 | 2022-04-19 | 山东大学 | Antagonistic audio sample generation method based on three-group parallel genetic algorithm |
CN114758023A (en) * | 2022-03-30 | 2022-07-15 | 桂林电子科技大学 | Stomach electrical impedance tomography method of self-adaptive genetic algorithm |
Non-Patent Citations (3)
Title |
---|
GEORGED.SMITH: "Evolutionary feature construction using information Gain and Gini index", 《GENETIC PROGRAMMING》, pages 379 - 388 * |
刘鹏莎: "基于遗传规划算法的高维数据特征选择与特征构造方法研究", 《中国优秀硕士论文全文数据库 医药卫生科技辑 》, pages 072 - 24 * |
马捷等: "基于CART算法的医疗隐性知识挖掘研究——以中医医案为例", 《情报科学》, vol. 39, no. 6, pages 84 - 91 * |
Also Published As
Publication number | Publication date |
---|---|
CN116578611B (en) | 2023-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN109635291B (en) | Recommendation method for fusing scoring information and article content based on collaborative training | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN111340661B (en) | Automatic application problem solving method based on graph neural network | |
CN111291188B (en) | Intelligent information extraction method and system | |
CN112051986B (en) | Code search recommendation device and method based on open source knowledge | |
CN117290489B (en) | Method and system for quickly constructing industry question-answer knowledge base | |
CN116524960A (en) | Speech emotion recognition system based on mixed entropy downsampling and integrated classifier | |
CN111666374A (en) | Method for integrating additional knowledge information into deep language model | |
CN114332519A (en) | Image description generation method based on external triple and abstract relation | |
CN112489689B (en) | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN116578611B (en) | Knowledge management method and system for inoculated knowledge | |
CN116050419A (en) | Unsupervised identification method and system oriented to scientific literature knowledge entity | |
CN113392191B (en) | Text matching method and device based on multi-dimensional semantic joint learning | |
CN111782964B (en) | Recommendation method of community posts | |
CN112465054B (en) | FCN-based multivariate time series data classification method | |
CN113535928A (en) | Service discovery method and system of long-term and short-term memory network based on attention mechanism | |
CN114328633A (en) | Wrong question knowledge point strengthening training test question recommendation method based on concept lattice | |
CN114372148A (en) | Data processing method based on knowledge graph technology and terminal equipment | |
CN113495964A (en) | Method, device and equipment for screening triples and readable storage medium | |
CN112818122A (en) | Dialog text-oriented event extraction method and system | |
CN112307288A (en) | User clustering method for multiple channels | |
CN111581326A (en) | Method for extracting answer information based on heterogeneous external knowledge source graph structure | |
CN114036946B (en) | Text feature extraction and auxiliary retrieval system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |