CN117591668A

CN117591668A - Science popularization theme and core knowledge extraction method based on concept lattice

Info

Publication number: CN117591668A
Application number: CN202311603695.8A
Authority: CN
Inventors: 郭修远; 马超民; 贾振华; 阎明哲; 蔡明杰; 宋秉华; 朱云
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-02-23

Abstract

A science popularization theme and core knowledge extraction method based on concept lattice comprises the following steps: s1, acquiring a science popularization theme as an object set O, acquiring core knowledge corresponding to each science popularization theme as an attribute set A, and simultaneously acquiring audience groups of each science popularization theme content as a label set D; s2, constructing a decision form background F, S3, selecting a part of subsets from the decision form background as decision form sub-backgrounds; s4, constructing a concept lattice L of the decision form sub-background according to the definition of the concept lattice on the decision form sub-background; s5, respectively constructing form background concept lattices by using decision form sub-background concept lattices through an incremental method; s6, acquiring non-trivial decision rules, and putting all non-trivial decision rules meeting the conditions into a rule set R.

Description

Science popularization theme and core knowledge extraction method based on concept lattice

Technical Field

The invention relates to the technical field of formal concept analysis in grain calculation, in particular to a science popularization theme and core knowledge extraction method based on concept grids.

Background

With the development of mobile internet technology, the expression forms of knowledge are increasingly diverse, such as videos, texts, pictures, songs and the like, and the learning habit of 'fragmentation' of readers is objectively developed. The fragmented learning has the characteristics of the digital age, and firstly has high flexibility and is not limited by time and places. Second, the fragmentation learning is highly targeted, and the learner can focus on selecting the part of the content that the learner wants to learn. Thirdly, the learning efficiency of fragmentation learning is high, and the learning interest of readers is guaranteed due to the short learning time of single fragment content. In the face of the phenomenon of fragmentation of the current knowledge, how to systematically and individually provide popular science for people becomes a current urgent problem to be solved.

A concept lattice is a mathematical tool and method of concept analysis that can be used to process and analyze data, information, and knowledge. The core idea is to represent data or knowledge as a conceptual network structure, including objects and attributes. The network is presented in the form of a table or matrix, where rows represent objects and columns represent attributes. The elements in each lattice represent whether the object has a particular attribute. By analyzing the network structure, different concepts, namely, association relations between objects and attributes, can be identified. In fact, formal concept analysis may be used for knowledge extraction. Formal backgrounds can provide a solid basis for knowledge extraction and represent science popularization topics and core knowledge in a structured manner. On the other hand, formal concept analysis is helpful to systematically extract and analyze key knowledge, and makes knowledge extraction more accurate and reliable, thereby providing powerful support for science popularization and education.

Decision making research is a comprehensive discussion activity conducted by a multidisciplinary joint arm, and has a place in information science, management science, system science, behavior science and even psychology. Generally, data information with decisions is more attractive than data without decisions. On the one hand, the data with decision can comprehensively reflect the relation among various factors, and the inherent law of waiting for our discovery is hidden. On the other hand, one of the main purposes of the human cognitive world is to make predictions and decisions, and only on the basis of the original accumulated knowledge, the human can make a larger progress by obtaining general rules or guidance capable of guiding subsequent life, scientific research and creation. It is therefore desirable to obtain more valuable information, in particular information related to decision rules, from the data with decisions.

However, current recommendation methods generally do not have the systematicness and layering of formal concept analysis, and lack of deep mining and comprehensive understanding of knowledge by decision-making studies, resulting in unilateral and limited recommendation results. Second, these recommendation methods often rely too much on labels, i.e., the recommendation performance is highly dependent on accurate and comprehensive label information, so that when the labels are inaccurate or inadequate, the robustness of the recommendation system may be drastically reduced, thereby possibly degrading the user experience and recommendation quality.

Disclosure of Invention

In view of this, the present invention provides a method for extracting a subject of science popularization and core knowledge based on concept lattices, which is used for at least solving the problem that the performance of a recommendation model in the prior art excessively depends on a correct label.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a science popularization theme and core knowledge extraction method based on concept lattice comprises the following steps:

s1, acquiring a science popularization subject as an object set O= { O ₁ ，o ₂ ，...，o _n And acquiring core knowledge corresponding to each science popularization topic as an attribute set A= { a ₁ ，a ₂ ，...，a _m Simultaneously acquiring audience groups of all science popularization subject contents as tag sets D= { D } ₁ ，d ₂ ，...，d _r -a }; wherein n, m and r are all greater than 0;

s2, constructing decision form background F= (O, A, I) _OA ，D，I _OD ) Wherein I _OA For the binary relation between the object set O and the attribute set A, if I _OA (o _i ，a _j ) =1, representing object o _i Possession of attribute a _j The method comprises the steps of carrying out a first treatment on the surface of the Conversely if I _OA (o _i ，a _j ) =0, then object o is represented _i Not possessing attribute a _j ；I _OD Representing the binary relationship between the object set O and the attribute set D, if I _OD (o _i ，d _j ) =1, representing object o _i The audience group of (2) is d _j The method comprises the steps of carrying out a first treatment on the surface of the Conversely if I _OD (o _i ，d _j ) =0, then object o is represented _i The audience group of (2) is not d _j ；

S3, selecting a part of subsets from the decision form background to be used as decision form sub-background;

s4, in decision form sub-contexts (O', A, I) _O′A ) And decision form sub-contexts (O', D, I) _O′D ) Respectively constructing concept lattices L (O', A, I) according to the definition of the concept lattices _O′A ) And L (O', D, I) _O′D )；

S5, utilizing concept lattice L (O', A, I) _O′A ) And L (O', D, I) _O′D ) Construction of formal backgrounds by incremental methods (O, A, I) _OA ) And L (O, D, I) _OD ) Is a concept lattice of (2);

s6, acquiring non-trivial decision rules according to the decision rule definition, and putting all non-trivial decision rules meeting the conditions into a rule set R to be regarded as finishing rule and knowledge extraction.

Preferably, the concept lattice L (O, A, I) is obtained in S5 _OA ) Specific content package of (a)The method comprises the following steps:

if for concept lattice L (O', A, I) _O′A ) Newly add an object o _i Wherein the object isAnd->Then for concept lattice L (O', A, I) _O′A ) The original concept (X, B) of (C) is changed to satisfy the following conditions:

(1) If it isWherein f (o) _i ) Representing object o _i If the owned attribute is the concept (X, B) is not updated, except that B is an empty set;

(2) If B.u.f (o) _i ) The concept (X, B) will be updated to (X ∈ { o) _i }，B)；

(3) At the position ofAnd B.andf (o) _i ) In case of not equal to B, if o _i Outside has no object o _j Satisfy the following requirementsThen a new concept (xU { o) is generated by means of the concept (X, B) _i }，B∩f(o _i ))；

(4) If any j < i is not satisfiedObject o _i Generates a new concept by itself ({ o) _i }，f(o _i ))；

By continuously adding objects, a formal background (O, A, I) _OA ) Concept lattice L (O, A, I) _OA )。

Preferably, the concept lattice L (O, D, I) is obtained in S5 _OD ) The specific contents of (3) include:

if the concept lattice L (O', D,I _OD ) Newly add an object o _i Wherein the object isAnd->Then for concept lattice L (O', D, I) _OD ) The original concept (Y, C) of (C) is changed to satisfy the following conditions:

(1) If it isWherein f (o) _i ) Representing object o _i If the owned attribute is the concept (Y, C) is not updated, except that C is an empty set;

(2) If C.u.f (o) _i ) The concept (Y, C) will be updated to (Y ≡o- _i }，C)；

(3) At the position ofAnd C.andf (o) _i ) In case of not equal to C, if o _i Outside has no object o _j Satisfy the following requirementsThen a new concept (Y.u.o) is generated by means of the concept (Y, C) _i }，C∩f(o _i ))；

By continuously adding objects, a formal background (O, D, I) is finally obtained _OD ) Concept lattice L (O, D, I) _OD )。

Preferably, the defining specific content for obtaining the non-trivial decision rule according to the decision rule in S6 includes: for (X, B) ∈L (O, A, I) _OA ) And (Y, C) εL (O, D, I) _OD ) If (3)Then B.fwdarw.C is called a decision rule, and if X, Y, B, C are all non-null, B.fwdarw.C is called a non-trivial decision rule.

Compared with the prior art, the invention discloses a science popularization theme and core knowledge extraction method based on concept lattices, and the technology has the following beneficial effects:

formal concept analysis helps identify concepts of diversity and accuracy to better meet the needs of users, which can provide more diversity by representing different concepts in a concept lattice. The decision rule based on formal concept analysis is constructed based on concept lattices, has a clear hierarchical structure, is easier to analyze and interpret, and is helpful for avoiding contradictory decisions; in addition, because of the multi-level relationship between concepts, the model can be better adapted to changeable and complex situations, especially when the data volume is small, a considerable rule number can still be provided, so that the model can be widely applied to data with relatively small data volume. In addition, the decision rule based on the decision form background is more universal and adaptive, and can be simply applied to the fields such as a recommendation model by constructing indexes such as similarity. Most importantly, the decision rules generated based on the decision form background are more and are not easily affected by inaccuracy or incompleteness of the labels, so that compared with the prior art, the method has higher robustness and higher performance in the fields such as recommendation models.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a science popularization topic and core knowledge extraction method based on concept lattices.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a science popularization theme and core knowledge extraction method based on concept lattices, which is shown in fig. 1 and comprises the following steps:

s1, acquiring a science popularization theme (such as various algorithm type science popularization themes) as an object set O= { O ₁ ，o ₂ ，...，o _n And obtaining the corresponding core knowledge (such as decision tree algorithm, clustering algorithm, neural network algorithm, etc.) of each science popularization topic as an attribute set A= { a ₁ ，a ₂ ，...，a _m Simultaneously acquiring audience groups (such as primary school, junior middle school, high school, university and above) of various science popularization subjects as a tag set D= { D ₁ ，d ₂ ，...，d _r -a }; wherein n, m and r are all greater than 0;

s4, in decision form sub-contexts (O', A, I) _O′A ) And decision form sub-contexts (O', D, I) _O′D ) Respectively constructing concept lattices L (O', A, I) according to the definition of the concept lattices _O′A ) And L (O', D, I) _O′D ) Wherein

S5, in concept lattice L (O', A, I) _O′A ) And L (O', D, I) _O′D ) Based on (a) respectively constructing form background (O, A, I) by incremental method _OA ) And (O, D, I) _OD ) Corresponding concept lattice L (O, A, I) _OA ) And L (O, D, I) _OD )；

S6, acquiring non-trivial decision rules according to the decision rule definition, putting all non-trivial decision rules meeting the conditions into a rule set R, and taking the non-trivial decision rules as finishing rules and knowledge extraction.

The content acquired in the present embodiment includes:

object set o= { algorithm 1, algorithm 2, algorithm 3, algorithm 4};

attribute set a= { cluster, neural network, decision tree, naive bayes };

tag set d= { primary school, university, and above };

as shown in table 1, where "1" indicates that the science popularization content has the corresponding knowledge (or the corresponding tag), and "0" indicates that the science popularization content does not have the corresponding knowledge (or does not have the corresponding tag).

TABLE 1

In S3, a smaller-scale decision form sub-background is selected from the decision form backgrounds, and in this embodiment, only the set of object sets O ' = { algorithm 1, algorithm 2, algorithm 3} will be considered as decision form sub-backgrounds F ' = (O ', a, I) _O′A ，D，I _O′D ) In formal sub-contexts (O', A, I _O′A ) In accordance with the conceptConsider a subset of all O', i.e. pairsConsider all concepts (gf (X), f (X)) and remove duplicate concepts, resulting in a set of concepts In the same way, the processing method comprises the steps of,

in order to further implement the above technical solution, the concept lattice L (O, A, I) is obtained in S5 _OA ) The specific contents of (3) include:

(3) At the position ofAnd B.andf (oi) +.B, if o _i Outside has no object o _j Satisfy the following requirementsThen a new concept (xU { o) is generated by means of the concept (X, B) _i }，B∩f(o _i ))；

In order to further implement the above technical solution, the concept lattice L (O, D, I) is obtained in S5 _OD ) The specific contents of (3) include:

if the concept lattice L (O', D, I) _OD ) Newly add an object o _i Wherein the object isAnd->Then for concept lattice L (O', D, I) _OD ) The original concept (Y, C) of (C) is changed to satisfy the following conditions:

In this embodiment:

now a new object "Algorithm 4" is introduced, according to the incremental method, for the conceptDue to-> And a n f ({ algorithm 4 }) is not equal to a, so a new concept ({ algorithm 4}, { cluster, neural network, decision tree }) will be generated according to case (3);

for the concept ({ algorithm 1, algorithm 2}, { cluster }), since { cluster } ≡f ({ algorithm 4 }) = { cluster }, the concept is updated as the case (2) ({ algorithm 1, algorithm 2, algorithm 4}, { cluster });

for concepts ({ Algorithm 3}, { decision tree, naive Bayes }) due to And { decision tree, naive bayes } n f ({ algorithm 4 }) is not equal to { decision tree, naive bayes }, so a new concept will be generated according to case (3) ({ algorithm 3, algorithm 4}, { decision tree });

for the concept ({ algorithm 2}, { cluster, neural network }), the concept is updated as the case (2) ({ algorithm 2, algorithm 4, { cluster, neural network }) because { cluster, neural network } f ({ algorithm 4 }) = { cluster, neural network };

in particular, due toThus, the concept->Updated to->For case (4), a concept is generated ({ algorithm 4}, { cluster, neural network, decision tree }), but this concept appears and is therefore not considered. }

Therefore, it is

Is available in the same way

In order to further implement the above technical solution, the defining specific content of obtaining the non-trivial decision rule according to the decision rule in S6 includes: for (X, B) ∈L (O, A, I) _OA ) And (Y, C) εL (O, D, I) _OD ) If (3)Then B.fwdarw.C is called a decision rule, and if X, Y, B, C are all non-null, B.fwdarw.C is called a non-trivial decision rule.

In this embodiment:

traversing L (O, A, I) according to the definition of the decision rule _OA )×L(O，D，I _OD ) Searching for a concept extension meeting the condition. For example: for ({ Algorithm 3}, { decision tree, naive Bayes }) ε L (O, A, I) _OA ) And ({ Algorithm 1, algorithm 3}, { Primary and secondary }) ε L (O, D, I) _OD ) Due toSo { decision tree, naive Bayes } → { middle and primary school }, is a non-trivial decision rule.

Repeating the above steps can obtain all non-trivial decision rules r= { decision tree, naive bayes } → { middle and primary school }, { cluster, neural network } - { university and above }, { cluster, neural network, decision tree } - { university and above }.

The invention applies the decision rule based on formal concept analysis to the extraction of popular science content, thereby bringing a plurality of excellent effects for the extraction of popular science content, in particular: firstly, the decision rule based on formal concept analysis can more accurately identify and extract the content related to the science popularization subject, and ensures the high quality and relevance of related science popularization resources while not excessively depending on labels; second, concept lattices based on formal concept analysis can represent popular content in a structured manner, making it easier to understand and organize, which helps to improve discoverability and usability of knowledge; finally, the decision rules involved can be used to build a more intelligent science popularization platform to improve education and training areas, or to provide decision support for users to help them make informed decisions in the scientific and technical areas, thereby improving scientific literacy and scientific decision making capability.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The science popularization theme and core knowledge extraction method based on the concept lattice is characterized by comprising the following steps of:

2. The method for extracting subject matter and core knowledge based on concept lattice as claimed in claim 1, wherein the step S5 is to obtain concept lattice L (O, A, I) _OA ) The specific contents of (3) include:

3. The method for extracting subject matter and core knowledge based on concept lattice as claimed in claim 1, wherein the step S5 is to obtain concept lattice L (O, D, I) _OD ) The specific contents of (3) include:

4. The method for extracting science popularization topics and core knowledge based on concept lattices according to claim 1, wherein the step of defining and acquiring specific contents of non-trivial decision rules according to the decision rules in S6 comprises: for (X, B) ∈L (O, A, I) _OA ) And (Y, C) εL (O, D, I) _OD ) If (3)Then B.fwdarw.C is called a decision rule, and if X, Y, B, C are all non-null, B.fwdarw.C is called a non-trivial decision rule.