CN116578611A

CN116578611A - Knowledge management method and system for inoculated knowledge

Info

Publication number: CN116578611A
Application number: CN202310555201.7A
Authority: CN
Inventors: 杨钢
Original assignee: Guangzhou Shengcheng Mother Network Technology Co ltd
Current assignee: Guangzhou Shengcheng Mother Network Technology Co ltd
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-08-11
Anticipated expiration: 2043-05-16
Also published as: CN116578611B

Abstract

The invention relates to the technical field of data mining, in particular to a knowledge management method and system for inoculated knowledge, comprising the following steps: acquiring inoculated knowledge data through multiple channels; preprocessing the acquired inoculated knowledge data; mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm; searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user. According to the invention, through collecting and mining inoculation data, the connection between different inoculation data is obtained, a reference of implicit inoculation knowledge is provided for a user, an implicit inoculation knowledge rule implicit in the inoculation knowledge is obtained, the scientificity of the inoculation knowledge is improved, and the cognition of the user on the inoculation knowledge is improved; and a decision tree is generated, so that the classification of inoculation data and the inquiry of users are facilitated, the prenatal and postnatal care quality is improved, and the probability of mother and infant diseases is reduced.

Description

Knowledge management method and system for inoculated knowledge

Technical Field

The invention relates to the technical field of data mining, in particular to a knowledge management method and system for inoculated knowledge.

Background

At present, 1700 ten thousand neonates exist in China each year, and the annual pregnant crowd is larger than the figure, so that more people can master correct inoculation knowledge and methods, and personalized guidance is performed according to the environment and individual difference of each person, the quality of prenatal and postnatal care is improved continuously, and the problem that each family and even society needs to be solved continuously.

In the prior art, the inoculated knowledge is organized and optimized based on the correlation by a machine learning mode, the implicit knowledge contained in the inoculated knowledge is not mined, the reference of the inoculated knowledge implicit knowledge cannot be provided for the user, and the cognition of the user on the inoculated knowledge is improved.

Disclosure of Invention

The invention aims to solve the defects in the background technology by providing a knowledge management method and a knowledge management system for inoculating knowledge.

The technical scheme adopted by the invention is as follows:

the knowledge management method for inoculating knowledge comprises the following steps:

s1: acquiring inoculated knowledge data through multiple channels;

s2: preprocessing the acquired inoculated knowledge data;

s3: mining inoculated knowledge data after pretreatment based on a knowledge mining algorithm;

s4: searching index creation is carried out on the inoculated knowledge set obtained through mining so as to be inquired by a user.

As a preferred technical scheme of the invention: the multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.

As a preferred technical scheme of the invention: and S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.

As a preferred technical scheme of the invention: the knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.

As a preferred technical scheme of the invention: the CART algorithm is specifically as follows:

set X ₁ ，X ₂ ，…，X _N N attributes contained in a certain sample of inoculated knowledge data are represented, Y is used for representing the category to which the attribute belongs, and each attribute has a fixed output value; selecting the j-th attribute X in inoculated knowledge data set _j And attribute X _j Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X ₁ ＝(j，z)＝{x|x(j)≤z}、X ₂ Two regions = (j, z) = { x|x (j) > z } find the best cut point X _j I.e. the point at which the square error minimum is calculated:

wherein y is ₁ Representation area X ₁ Inoculated knowledge data x in (1) ₁ Attribute category, c ₁ Representation area X ₁ Inoculated knowledge data x in (1) ₁ Output value of y ₂ Representation area X ₂ Inoculated knowledge data x in (1) ₂ Attribute category, c ₂ Representation area X ₂ Inoculated knowledge data x in (1) ₂ Output value of (2);

dividing an N-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;

let the probability that a certain point of the inoculated knowledge data set D belongs to the kth class be p _k The base index G (p) of the probability distribution is:

determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D ₁ And D ₂ Two areas D ₁ ＝{(x，y)∈D|F(x)＝f)，D ₂ =d, wherein x represents pregnancyThe training knowledge data, y represents the attribute category of the training knowledge data x, F (x) =f represents the value F when the training knowledge data is x, and under the fixed attribute of the attribute F, the base index G (D, F) of the training knowledge data set D is:

wherein G (D) ₁ ) Representing a sub-set D of inoculated knowledge data ₁ Is of the formula G (D) ₂ ) Representing a sub-set D of inoculated knowledge data ₂ Is a base index of (2);

and generating a decision tree by taking the information Gain of inoculated knowledge data as a fitness function, wherein the fitness function Gain (F) is specifically as follows:

wherein p is _i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D _j Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the j-th attribute inoculated knowledge data.

As a preferred technical scheme of the invention: the decision tree generation is specifically as follows:

coding an inoculation knowledge data set to generate an initial population, carrying out knowledge mining and attribute division on inoculation knowledge data by taking the information gain of the inoculation knowledge data as a fitness function, calculating individual fitness of the population, taking the selected inoculation knowledge data as a leading population, taking unselected inoculation knowledge data as an auxiliary population, carrying out self-adaptive crossover operation and self-adaptive mutation operation on the individuals of the leading population, judging whether convergence conditions are met, ending if the convergence conditions are met, carrying out self-adaptive crossover and self-adaptive mutation operation on the individuals of the amplitude population, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the individuals of the population corresponding to the inoculation knowledge data into the same class according to the attribute.

As a preferred technical scheme of the invention: the adaptive crossover operation in S3 is specifically as follows:

wherein p is _a Representing the crossover probability of the dominant population, p _a1 And p _a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p _b Representing the crossover probability of the helper population, p _b1 And p _b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) _ai 、f _bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f _a，max 、f _b，max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;

the adaptive mutation operation in S3 is specifically as follows:

wherein p is _A Representing the probability of variation of the dominant population, p _A1 And p _A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p _B Representing the probability of variation of the helper population, p _B1 And p _B2 Representing the minimum variation probability and the maximum variation of the auxiliary population respectivelyProbability; f (f) _Ai 、f _Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f _A，max 、f _B，max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.

As a preferred technical scheme of the invention: after the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:

δ(t)＝σ(t)+τ|L|

wherein σ (t) represents the prediction error of the leaf node t, τ represents the model parameters, l| represents the complexity of the model;

calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged.

As a preferred technical scheme of the invention: the creating step of the search index in S4 is as follows:

s4.1: generating an inoculated knowledge set based on the mining result of the knowledge mining algorithm;

s4.2: creating a query statement;

s4.3: performing lexical analysis to form a series of words;

s4.4: searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set;

s4.5: calculating the correlation between the result knowledge of S4.4 and the query statement;

s4.6: and sorting and outputting the query results according to the relevance.

There is provided a knowledge management system for deriving knowledge, comprising:

knowledge acquisition module: for collecting inoculation knowledge through multiple channels;

knowledge preprocessing module: the device is used for sorting and cleaning acquired inoculation knowledge;

knowledge mining module: the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;

an index creation module: for search index creation of inoculated knowledge sets for user query.

Compared with the prior art, the knowledge management method and system for inoculated knowledge provided by the invention have the beneficial effects that:

according to the invention, through collecting and mining inoculation data, the connection between different inoculation data is obtained, a reference of implicit inoculation knowledge is provided for a user, an implicit inoculation knowledge rule implicit in the inoculation knowledge is obtained, the scientificity of the inoculation knowledge is improved, and the cognition of the user on the inoculation knowledge is improved; and a decision tree is generated, so that the classification of inoculation data and the inquiry of users are facilitated, the prenatal and postnatal care quality is improved, and the probability of mother and infant diseases is reduced.

Drawings

FIG. 1 is a flow chart of a method of a preferred embodiment of the present invention;

fig. 2 is a block diagram of a system in a preferred embodiment of the present invention.

The meaning of each label in the figure is: 100. a knowledge acquisition module; 200. a knowledge preprocessing module; 300. a knowledge mining module; 400. and an index creation module.

Detailed Description

It should be noted that, under the condition of no conflict, the embodiments of the present embodiments and features in the embodiments may be combined with each other, and the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and obviously, the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a preferred embodiment of the present invention provides a knowledge management method for inoculating knowledge, comprising the steps of:

s1: acquiring inoculated knowledge data through multiple channels;

s2: preprocessing the acquired inoculated knowledge data;

The multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.

And S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.

The knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.

The CART algorithm is specifically as follows:

wherein y is ₁ Representation area X ₁ Inoculated knowledge data x in (1) ₁ Attribute classLet c ₁ Representation area X ₁ Inoculated knowledge data x in (1) ₁ Output value of y ₂ Representation area X ₂ Inoculated knowledge data x in (1) ₂ Attribute category, c ₂ Representation area X ₂ Inoculated knowledge data x in (1) ₂ Output value of (2);

determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D ₁ And D ₂ Two areas D ₁ ＝{(x，y)∈D|F(x)＝f}，D ₂ And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:

The decision tree generation is specifically as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

m is the value number of the attribute F;

|D _j the |and |d| respectively represent the subset D _j And the size of the total data set D.

By using the information gain ratio as a fitness function, the drawbacks of the information gain can be overcome to some extent, thereby better selecting splitting properties when generating the decision tree.

Wherein the fitness function Gain (F) information Gain may be biased towards properties with more values when selecting split properties, as such properties tend to yield more branches, resulting in a reduced information entropy. But this does not mean that the attribute with more values must be the best split attribute. To solve this problem, the present embodiment improves it using the information gain ratio.

The information gain rate introduces an inherent value of an attribute as a normalization factor in the calculation process, so that the weight of the attribute with more values is reduced. The specific definition is as follows:

the adaptive crossover operation in S3 is specifically as follows:

wherein p is _a Representing the crossover probability of the dominant population, p _a1 And p _a2 Respectively representing the minimum cross probability and the maximum cross probability of the dominant population; p is p _b Representing the crossover probability of the helper population, p _b1 And p _b2 Respectively representing the minimum cross probability and the maximum cross probability of the auxiliary population; f (f) _ai 、f _bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f _a，max、 f _b，max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,respectively representing the average value of the fitness of the individuals of the main population and the individuals of the auxiliary population;

the adaptive mutation operation in S3 is specifically as follows:

wherein p is _A Representing the probability of variation of the dominant population, p _A1 And p _A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p _B Representing the probability of variation of the helper population, p _B1 And p _B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) _Ai 、f _Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f _A，max、 f _B，max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.

After the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:

δ(t)＝σ(t)+τ|L|

The creating step of the search index in S4 is as follows:

s4.2: creating a query statement;

s4.3: performing lexical analysis to form a series of words;

s4.6: and sorting and outputting the query results according to the relevance.

Referring to fig. 2, there is provided a knowledge management system for deriving knowledge, comprising:

knowledge acquisition module 100: for collecting inoculation knowledge through multiple channels;

knowledge preprocessing module 200: the device is used for sorting and cleaning acquired inoculation knowledge;

knowledge mining module 300: the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;

index creation module 400: for search index creation of inoculated knowledge sets for user query.

In this embodiment, the knowledge acquisition module 100 sorts and cleans various acquired inoculated knowledge data by using computer terminals, such as web pages, APP, mobile equipment terminals, such as mobile phone terminals, smart bracelets, and the like, and manually acquired inoculated knowledge data, such as clothes, wherein the inoculated knowledge further includes maternal and infant knowledge, and the knowledge preprocessing module 200 performs word analysis operations on the acquired inoculated data, such as word segmentation, word stop removal, punctuation removal, uppercase and lowercase, to form an inoculated knowledge data set composed of a series of words.

The knowledge mining module 300 performs the following inoculated knowledge mining operations, set X ₁ ，X ₂ ，…，X ₅₀ 50 attributes contained in a certain sample representing inoculated knowledge data are selected, and the 25 th attribute X in the inoculated knowledge data set is selected ₂₅ And attribute X ₂₅ Is used as a segmentation point of the regression tree, thereby dividing the inoculated knowledge data set into X ₁ ＝(25，z)＝{x|x(25)≤z}、X ₂ Two regions = (25, z) = { x|x (25) > z } find the best cut point X ₂₅ I.e. the point at which the square error minimum is calculated:

wherein y is ₁ Representation areaDomain X ₁ Inoculated knowledge data x in (1) ₁ Attribute category, c ₁ Representation area X ₁ Inoculated knowledge data x in (1) ₁ Output value of y ₂ Representation area X ₂ Inoculated knowledge data x in (1) ₂ Attribute category, c ₂ Representation area X ₂ Inoculated knowledge data x in (1) ₂ Output value of (2);

dividing a 50-dimensional space boundary of the inoculated knowledge data set into complementary overlapped rectangles in a recursion mode, selecting an optimal characteristic attribute by calculating a base index, and determining an optimal binary dividing point of the attribute by the base index;

determining that a certain point F exists in the inoculated knowledge data set D of inoculated data and dividing the inoculated knowledge data set into D ₁ And D ₂ Two areas D ₁ ＝{(x，y)∈D|F(x)＝f)，D ₂ And =d, where x represents inoculated knowledge data, y represents an attribute category of inoculated knowledge data x, F (x) =f represents a value F when inoculated knowledge data is x, and under the fixed attribute of attribute F, a base index G (D, F) of inoculated knowledge data set D is:

wherein p is _i Representing the probability that a certain point in the inoculated knowledge data set belongs to the ith class, D ₂₅ Represents the sub-set of inoculated knowledge data obtained by dividing the inoculated knowledge data set by the 25 th attribute inoculated knowledge data.

Coding the inoculated knowledge data set to generate an initial population, carrying out attribute division on inoculated knowledge data by taking the information gain of the inoculated knowledge data as a fitness function, calculating individual fitness of the population, taking selected inoculated knowledge data as a leading population, taking unselected inoculated knowledge data as an auxiliary population, and carrying out self-adaptive cross operation on individuals of the leading population:

and adaptive mutation operation:

wherein p is _A Representing the probability of variation of the dominant population, p _A1 And p _A2 Respectively representing the minimum variation probability and the maximum variation probability of the dominant population; p is p _B Representing the probability of variation of the helper population, p _B1 And p _B2 Respectively representing the minimum variation probability and the maximum variation probability of the auxiliary population; f (f) _Ai 、f _Bi Respectively representing the fitness of the individuals of the dominant population and the individuals of the auxiliary population, f _A，max 、f _B，max Representing the maximum fitness of the individuals of the dominant population and the individuals of the auxiliary population respectively,the average value of the fitness of the individuals in the main population and the individuals in the auxiliary population is respectively represented.

Interleaving and mutation are carried out on inoculated knowledge data so as to excavate inoculated knowledge. After the excavation is completed, judging whether convergence conditions are met, if yes, finishing, and if not, performing self-adaptive intersection and self-adaptive mutation operation on the population individuals with the amplitude values, increasing iteration times, returning to calculate individual fitness again, and executing subsequent steps until convergence, namely dividing the population individuals corresponding to inoculation knowledge data into the same class according to the attributes.

Generating a decision tree according to inoculated knowledge data with the same attribute as a class, setting the number of leaf nodes of the decision tree as 500 after generating the decision tree, and defining a loss function delta (150) of a 150 th leaf node as:

δ(150)＝σ(150)+τ|L|

where σ (150) represents the prediction error of the 150 th leaf node, τ represents the model parameters, l| represents the complexity of the model;

calculating the experience entropy of each leaf node, recursively traversing each node from the leaf node upwards, judging whether the value of the loss function is reduced after deleting a certain leaf node, and taking the father node as a new leaf node if the value of the loss function is reduced; traversing all nodes until all nodes are judged. The decision tree is created, and an inoculated knowledge set is generated according to the decision tree, the index creation module 400 collects query sentences of the user, and lexical analysis is carried out on the query sentences to form a series of words; searching indexes by utilizing a decision tree to obtain index knowledge related to query sentences, intersecting, differencing and operating the knowledge to obtain a result knowledge set; calculating the correlation between the result knowledge and the query statement; and sorting according to the relevance, and outputting and displaying according to the order of the relevance from high to low.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. A knowledge management method for inoculating knowledge is characterized in that: the method comprises the following steps:

s1: acquiring inoculated knowledge data through multiple channels;

s2: preprocessing the acquired inoculated knowledge data;

2. The knowledge management method of inoculated knowledge according to claim 1 wherein: the multi-channel acquisition inoculation knowledge data of the S1 specifically comprises acquisition inoculation knowledge data by a computer end, a mobile equipment end and a manual acquisition mode.

3. The knowledge management method of inoculated knowledge according to claim 1 wherein: and S2, sorting and cleaning the acquired inoculated knowledge data, and forming an inoculated knowledge data set consisting of a series of words after word segmentation, word stopping, punctuation mark removal and uppercase lowercase word analysis operation of the acquired inoculated knowledge data.

4. The knowledge management method of inoculated knowledge according to claim 1 wherein: the knowledge mining algorithm is used for mining inoculated knowledge data based on the combination of a CART algorithm and an improved genetic algorithm.

5. The knowledge management method for deriving knowledge according to claim 4, wherein: the CART algorithm is specifically as follows:

6. The knowledge management method for deriving knowledge according to claim 5, wherein: the decision tree generation is specifically as follows:

7. The knowledge management method for deriving knowledge as claimed in claim 6 wherein: the adaptive crossover operation in S3 is specifically as follows:

the adaptive mutation operation in S3 is specifically as follows:

8. The knowledge management method of inoculated knowledge of claim 7 wherein: after the decision tree is generated, setting the leaf node number of the decision tree as L, and defining a loss function delta (t) as:

δ(t)＝σ(t)+τ|L|

9. The knowledge management method of inoculated knowledge according to claim 1 wherein: the creating step of the search index in S4 is as follows:

s4.2: creating a query statement;

s4.3: performing lexical analysis to form a series of words;

s4.6: and sorting and outputting the query results according to the relevance.

10. Knowledge management system for inoculated knowledge, based on a knowledge management method for inoculated knowledge according to any of the claims 1-9, characterized in that: comprising the following steps:

knowledge acquisition module (100): for collecting inoculation knowledge through multiple channels;

knowledge preprocessing module (200): the device is used for sorting and cleaning acquired inoculation knowledge;

knowledge mining module (300): the inoculated knowledge after pretreatment is used for carrying out inoculated knowledge mining based on a knowledge mining algorithm;

index creation module (400): for search index creation of inoculated knowledge sets for user query.