CN114595337A - Method for constructing curriculum knowledge graph based on GMM - Google Patents
Method for constructing curriculum knowledge graph based on GMM Download PDFInfo
- Publication number
- CN114595337A CN114595337A CN202210109036.8A CN202210109036A CN114595337A CN 114595337 A CN114595337 A CN 114595337A CN 202210109036 A CN202210109036 A CN 202210109036A CN 114595337 A CN114595337 A CN 114595337A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- test question
- test
- chapter
- question
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012360 testing method Methods 0.000 claims abstract description 178
- 239000011159 matrix material Substances 0.000 claims abstract description 21
- 238000005516 engineering process Methods 0.000 claims abstract description 13
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 239000000203 mixture Substances 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000004140 cleaning Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000013499 data model Methods 0.000 claims description 2
- 238000011160 research Methods 0.000 abstract description 3
- 239000013078 crystal Substances 0.000 description 5
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002425 crystallisation Methods 0.000 description 1
- 230000008025 crystallization Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for constructing a curriculum knowledge graph based on GMM, which comprises the following steps: grouping the test questions according to chapters, preprocessing the test questions in each chapter and performing Chinese word segmentation to obtain structured test question data, and then performing jieba word segmentation on the effective test question word strings to generate word frequency matrixes of the test questions; based on the processed word frequency matrix of the test question, carrying out test question knowledge point clustering and feature extraction by using a GMM (Gaussian mixture model) model to generate a test question knowledge point model; meanwhile, performing feature recognition and extraction on chapter knowledge points based on the structured test question data to generate a chapter knowledge point model; based on the generated test question knowledge point model and the chapter knowledge point model, the knowledge map technology is utilized to integrate the two types of knowledge points of the test question and the chapter into a course knowledge map. The invention takes massive course test questions as research objects, utilizes a Gaussian mixture clustering method to identify test question knowledge points and the association thereof, combines the existing chapter knowledge point system, and utilizes a knowledge map technology to realize the reconstruction of the course knowledge system.
Description
Technical Field
The invention belongs to the technical field of education data mining, and particularly relates to a method for constructing a course knowledge graph based on GMM.
Background
The method provides new opportunities for improving the quality of education and teaching in order to meet the needs of sustainable development of economy and society, and China is dedicated to building high-quality education systems, the rapid development of information technology and the continuous emergence of various education big data. In the existing teaching mode, the course teaching outline is an important basis for course teaching and teaching quality evaluation, and the course knowledge points are generally summarized into a tree-shaped knowledge system in a chapter hierarchy mode so as to guide students to learn and establish course assessment standards. However, in the face of intense academic competition, especially in the middle school education stage, in order to realize the 'election' of the examination, various test questions are ingenious in standing, and fusion and comprehensive application of knowledge points are emphasized. Therefore, students have to rely on the theme and sea tactics to strengthen the knowledge understanding ability and the comprehensive application ability of the students, and course assessment is gradually separated from the specified paradigm of the course teaching outline. The tree-shaped knowledge system in the existing course outline can not meet the requirements of course teaching and examination.
In addition, with the development of socio-economic, new technology and knowledge are continuously appearing, and the expression and understanding of the curriculum knowledge points are dynamically changed, such as: the introduction of the crystal in the middle school textbooks of different periods in China is from using the word of crystallization to express the crystal and only serving as a knowledge point of the concept content of solution to specially listing the crystal to increase the contents of crystal classification, structure model, unit cell, close packing and the like and adding simple calculation related to the crystal. Therefore, the course knowledge system should present knowledge point characteristics comprehensively, accurately and dynamically, and the knowledge system should be a network structure expressing many-to-many association between knowledge points.
Disclosure of Invention
The invention provides a method for constructing a course knowledge graph based on GMM (Gaussian mixture clustering), which aims at solving the problem that a tree-shaped knowledge system in the existing course outline cannot comprehensively, accurately and dynamically present knowledge point characteristics.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for constructing a curriculum knowledge graph based on GMM comprises the following steps:
step A, grouping the test questions according to chapters, preprocessing the test questions in each chapter and segmenting Chinese words, wherein the method comprises the steps of cleaning the test questions containing invalid characters and picture contents to obtain structured test question data, then performing jieba segmentation on character strings of the valid test questions to generate a word frequency matrix of the test questions, and processing the word frequency matrix of the test questions;
step B, identifying and extracting the characteristics of the two types of knowledge points of the test question and the chapter, wherein the step B comprises clustering and extracting the characteristics of the knowledge points of the test question by using a GMM (Gaussian mixture model) based on the processed word frequency matrix of the test question to generate a test question knowledge point model; meanwhile, performing feature recognition and extraction on chapter knowledge points based on the structured test question data to generate a chapter knowledge point model;
and step C, integrating the two types of knowledge points of the test question and the chapter into a course knowledge map by using a knowledge map technology based on the generated test question knowledge point model and the chapter knowledge point model.
Further, the step a includes:
a1, modeling the test question, including deleting the picture part of the test question and cleaning useless characters, wherein the useless characters include numbers, letters, space symbols and line feed symbols; the structure of the test question data is arranged, the preprocessing operation of the attributes of the affiliated chapters, the question types and the scores is introduced, and the structured test question data is createdA model; defining the question bank which completes cleaning and pretreatment asWherein q isiThe I-th test question is represented by { V, W, e, t }, wherein V represents a washed test question string, W represents an effective test question string set, e represents a chapter to which the test question belongs, t represents a test question type, and I represents the number of test questions in a test question bank;
step A2, generating word-frequency matrix of test questions, including dividing Chinese words for effective test question strings by jieba method, decomposing each effective test question string into phrase set WaThen using the stop word list WsFiltering the irrelevant phrases and low-frequency phrases to makeThe phrase complete set of all the test questions is expressed aswjIs the jth word segmentation; for the question bankIts word frequency matrixfijRepresents the jth participle wjIn the question bankThe ith test question qiWhether or not it appears;
step A3, the word frequency matrix is alignedCarrying out de-duplication processing, combining strong correlation word group columns, defining word group correlation threshold rho, and if the Pearson correlation coefficient rho of any two word groups x and yx,y>0.8, then the two words are combined into a new word group wx,y=wx∪wyAnd delete wxAnd wy(ii) a Defining a phrase coverage upper and lower threshold [ eta ]min,ηmax]And only using the phrase features in the threshold value range to carry out knowledge point identification.
Further, the step B includes:
step B1, for the question bankThe probability distribution p (q) of the Gaussian mixture model GMM of the test question q is shown in formula (1):
wherein K represents the number of knowledge points contained in the test question;is the gaussian distribution density function of the k-th knowledge point; alpha is alphakRepresenting the probability that the test question contains the k-th knowledge point and satisfyingμkRepresents the mean value; sigmakRepresents a covariance;
using EM algorithm to make parameter alphak,μkSum ΣkAnd (3) estimating: first, test questions q are calculatediProbability gamma of containing knowledge point kikAs shown in formula (2); then, αk,μkSum ΣkThe calculation method of (2) is shown in formulas (3) to (5):
given a convergence threshold ε, equations (2) - (5) are iteratively computed until | α'k-α′k-1Until | ≦ ε, find αk,μk,∑k;
Determining the distribution situation of the knowledge points in the test questions by using a formula (1), and constructing a many-to-many mapping relation between the test questions and the knowledge points; setting the clustering category number, evaluating the clustering result by using a Bayesian information criterion, and selecting the optimal clustering number, namely the number of knowledge points in each chapter;
step B2, according to the clustering result of the test questions, orderRepresenting a corpus of knowledge points of test questions, KiShowing the ith test question knowledge point; the most frequently co-occurring phrase of the test questions belonging to the knowledge point is used for expressing the characteristics of the knowledge point of the test questions: order toThe knowledge points of the test questions are represented,<q>and<w>test question set, feature set respectively representing K<w>Composed of several phrases with the highest coverage of test question knowledge points and feature set<w>The overall coverage of the test question knowledge points is 100 percent;
step B3, performing chapter knowledge point identification and feature extraction: structured-based question bankThe chapter attributes q.e of each test question in the course, identify the chapter-level knowledge structure of the course, and identify each chapter knowledge point; using a high coverage word group set to represent chapter knowledge point features: order toA complete set of the course chapters is represented,a point of knowledge of a chapter is represented,<q>and<w>respectively representing the test question set and the feature set of C<w>Is composed of several phrases with highest coverage to chapter knowledge points and features set<w>The overall coverage of chapter knowledge points is 100%.
Further, the step C includes:
the RDF technology in the knowledge map is adopted to define the course knowledge system asTo describe the association between knowledge point entities of course knowledge, the association between chapter entities and knowledge point entities;representing knowledge point entities and their relationships, knowledge point entity KaAnd KbIs KaAnd KbA set of co-occurring word groups; gc=<<Cx,e,Ky>>Represents a chapter entity CxAnd knowledge point entity KyThe association of (1) is e, which is consistent with the chapter attribute of the test question.
Compared with the prior art, the invention has the following beneficial effects:
the invention takes massive course test questions as research objects, utilizes a Gaussian mixture clustering method to identify test question knowledge points and the association thereof, combines the existing chapter knowledge point system, and utilizes a knowledge map technology to realize the reconstruction of the course knowledge system. The invention expresses knowledge points and the association thereof by using the test question characteristic phrases, can simply and efficiently construct a knowledge system, and can effectively avoid understanding and calculating the complex semantics of the knowledge points.
Drawings
Fig. 1 is a flowchart of a method for constructing a curriculum knowledge graph based on a GMM according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
the related concepts of knowledge points are first normalized. The invention defines the test question knowledge points as the knowledge entities which are analyzed from the test questions and are characterized by a plurality of phrases; "Chapter knowledge points" refer to the knowledge entities in the course schema that are defined by the hierarchical structure of chapters. The invention hopes to realize the identification and description of the test question knowledge points and chapter knowledge points and reconstruct the course knowledge map based on the course test question database data.
The method is roughly divided into three steps, firstly, the original test question data is preprocessed and Chinese word segmentation is carried out, the contents of test question modeling, word frequency matrix generation, word combination and screening and the like are included, and a structured test question library and a word frequency matrix are generated. Secondly, carrying out feature recognition and extraction on the two types of knowledge points, and carrying out test question knowledge point clustering and feature extraction by using a GMM (Gaussian mixture model) model to generate a test question knowledge point model; and simultaneously, carrying out feature recognition and extraction on the chapter knowledge points to generate a chapter knowledge point model. And finally, integrating the two types of knowledge points into a course knowledge graph by using a knowledge graph technology. A technical roadmap is shown in fig. 1.
Specifically, the method for constructing the curriculum knowledge graph based on the GMM comprises the following steps:
step A: firstly, test questions are grouped according to chapters, and preprocessing and Chinese word segmentation are carried out on the test questions in each chapter. In order to carry out quantitative calculation on test questions, firstly, the test questions containing the contents of characters, pictures and the like need to be preprocessed and cleaned to obtain structured test question data, and then the jieba word segmentation is carried out on effective test question word strings to further generate word frequency matrixes of the test questions. The pre-processing and cleaning of the test question data comprises the steps of constructing test question modeling, generating a word frequency matrix, combining phrases, screening and the like.
The step A comprises the following specific steps:
step A1, firstly, modeling the test question, because the method provided by the invention only processes the character information part of the test question, the picture belonging to invalid information needs to be deleted, and in addition, characters (such as numbers, letters, space characters, line feed characters and the like) also need to be removed, so as to improve the accuracy of word segmentation. The preprocessing refers to the arrangement of the structure of the test question data, the introduction of attributes such as 'affiliated chapter', 'question type', 'score', and the like, and the creation of a knotAnd (4) a structured test question data model. The question bank for cleaning and pre-processing can be defined asWherein q isiWhere { V, W, e, t } represents the ith test question, V represents the washed test question string, W represents the valid word set (valid test question string set), i.e., the segmentation result, e represents the chapter to which the test question belongs, t represents the type of the test question, and I represents the number of test questions in the test question library.
Step A2, generating word frequency matrix of test question, realizing Chinese word segmentation process of test question character string by using current popular word segmentation component jieba method, decomposing each effective test question character string into phrase set WaThen using the stop word list WsFiltering irrelevant phrases such as prepositions and adjectives and low-frequency phrasesThe phrase complete set of all the test questions is expressed aswjIs the jth participle. For the question bankIts word frequency matrixfijDenotes the jth participle wjIn the question bankThe ith test question qiWhether or not it occurs.
Step A3, next, the word frequency matrix is processedAnd carrying out duplication elimination processing, combining strongly related word group columns, improving the independence of the word group columns and supporting subsequent analysis. We define a phrase correlation threshold ρ, if the Pearson correlation coefficient ρ of any two phrases x and yx,y>0.8, then the two words are combined into a new word group wx,y=wx∪wyAnd delete wxAnd wy. In addition, in order to improve the discrimination of phrase features, we also define the upper and lower limit thresholds [ eta ] of phrase coveragemin,ηmax]And only using the phrase features in the threshold value range to carry out knowledge point identification. Because the calculation targets of the test question knowledge points and the chapter knowledge points are different, the upper and lower thresholds of the phrase coverage of the test question knowledge points are slightly different, and the threshold of the coverage of the test question knowledge points is (4 percent and 60 percent)]The coverage threshold of the chapter knowledge points is [ 4%, 100%]。
And step B, after the test question word frequency matrix is preprocessed in the step A, the test question word frequency matrix obeys the mixed Gaussian distribution of a plurality of knowledge characteristics, and a Gaussian Mixture Model (GMM) can be adopted to realize the identification and characteristic expression of the test question knowledge points. The chapter hierarchy of the course knowledge can be identified by inquiring the question bank and the characteristics of the course knowledge are expressed by high-frequency phrases.
The step B comprises the following specific steps:
and step B1, the GMM model is a model which decomposes things into a plurality of Gaussian probability density functions, can accurately quantize things by the Gaussian probability density functions, is suitable for the condition that the data record contains a plurality of distribution characteristics, and can be regarded as mixed probability distribution of a plurality of knowledge characteristics.
For the question bankAssuming that K knowledge points are contained in the test question, the probability distribution of the Gaussian mixture model of the test question q is shown as formula (1), wherein,is the Gaussian distribution density function of the k-th knowledge point, alphakRepresenting the probability that the test question contains the kth knowledge point and satisfyingμkRepresents the mean value, sigmakTo representThe covariance.
Parameter α in equation (1)k,μkSum ΣkEstimation is performed using the EM (expectation maximization) algorithm. First, test questions q are calculatediProbability gamma of containing knowledge point kikAs shown in equation (2), then, αk,μkSum ΣkThe calculation method of (2) is shown in formulas (3) to (5).
Given a convergence threshold ε, equations (2) - (5) are iteratively computed until | α'k-α′k-1Until | ≦ epsilon, find alphak,μk,∑k. By using the formula (1), the distribution condition of the knowledge points in the test questions can be determined, and the many-to-many mapping relation between the test questions and the knowledge points is constructed. We set the number of cluster categories as: [2, 30]And evaluating the clustering result by using a Bayesian Information (BIC) rule, and selecting the optimal clustering number, namely the number of knowledge points in each chapter.
Step B2, according to the clustering result of the test questions, orderShowing knowledge of test questionsComplete set of points, KiThe ith test question knowledge point is shown. To fully describe the test question knowledge point features, we express the test question knowledge point features with the most frequently co-occurring phrases of the test questions belonging to the knowledge point. Order to The knowledge points of the test questions are represented,<q>and<w>test question set, feature set respectively representing K<w>2 conditions are satisfied: (1) the phrase w is a plurality of phrases with the highest coverage to the knowledge points K, and (2) the coverage to the test set is 100%. In short, we choose the phrases with the highest coverage of the test question knowledge points to express the features of the test question knowledge points, and the total coverage of the test question knowledge points is 100%.
And step B3, performing chapter knowledge point identification and feature extraction. Structured question bankEach question in the series has a chapter attribute q.e, whereby we can identify the chapter-level knowledge structure of the course and identify each chapter knowledge point. Similar to the test question knowledge point feature expression, we also use high coverage word group sets to represent chapter knowledge point features. Order toA complete set of the course chapters is represented, representing chapter knowledge points. Characteristic word group set<w>The extraction method is similar to the above-mentioned extraction method of the test question knowledge point features, and is not described here again.
Step C, on the basis of the course knowledge characteristic expression, a knowledge graph technology is utilized to countAnd calculating knowledge association and constructing a course knowledge graph. We define the course knowledge system as RDF (resource Description framework) technology in the knowledge graphTwo types of associations of course knowledge are described, namely associations between knowledge point entities and associations of chapter entities and knowledge point entities. Representing knowledge point entities and their relationships, knowledge point entity KaAnd KbIs KaAnd KbA set of co-occurring word groups. The expression method utilizes the test question characteristic phrases to express the knowledge points and the association thereof, not only can simply and efficiently construct a knowledge system, but also can effectively avoid understanding and calculating the complex semantics of the knowledge points. Gc=<<Cx,e,Ky>>Represents a chapter entity CxAnd knowledge point entity KyThe association of (1) is e, which is consistent with the chapter attribute of the test question.
In conclusion, the invention takes massive course test questions as research objects, utilizes the Gaussian mixture clustering method to identify the test question knowledge points and the association thereof, combines the existing chapter knowledge point system, and utilizes the knowledge map technology to realize the reconstruction of the course knowledge system. The invention expresses the knowledge points and the association thereof by using the test question characteristic phrases, can simply and efficiently construct a knowledge system, and can also effectively avoid the understanding and the calculation of the complex semantics of the knowledge points.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.
Claims (4)
1. A method for constructing a curriculum knowledge graph based on GMM is characterized by comprising the following steps:
step A, grouping the test questions according to chapters, preprocessing the test questions in each chapter and segmenting Chinese words, wherein the method comprises the steps of cleaning the test questions containing invalid characters and picture contents to obtain structured test question data, then performing jieba segmentation on character strings of the valid test questions to generate a word frequency matrix of the test questions, and processing the word frequency matrix of the test questions;
step B, identifying and extracting the characteristics of the two types of knowledge points of the test question and the chapter, wherein the step B comprises clustering and extracting the characteristics of the knowledge points of the test question by using a GMM (Gaussian mixture model) based on the processed word frequency matrix of the test question to generate a test question knowledge point model; meanwhile, performing feature recognition and extraction on chapter knowledge points based on the structured test question data to generate a chapter knowledge point model;
and step C, integrating the two types of knowledge points of the test question and the chapter into a course knowledge map by using a knowledge map technology based on the generated test question knowledge point model and the chapter knowledge point model.
2. The method of claim 1, wherein the step a comprises:
step A1, modeling the test question, including deleting the picture part of the test question and cleaning useless characters, wherein the useless characters include numbers, letters, space characters and linefeed characters; the method comprises the following steps of (1) arranging the structure of test question data, introducing preprocessing operation of attributes of 'affiliated chapters', 'question types' and 'scores', and creating a structured test question data model; defining the question bank which completes cleaning and pretreatment asWherein q isiThe I-th test question is represented by { V, W, e, t }, wherein V represents a washed test question string, W represents an effective test question string set, e represents a chapter to which the test question belongs, t represents a test question type, and I represents the number of test questions in a test question bank;
step A2, generating word-frequency matrix of test questions, including dividing Chinese words for effective test question strings by jieba method, decomposing each effective test question string into phrase set WaThen, howeverPost-utilization stop word list WsFiltering the irrelevant phrases and low-frequency phrases to makeThe phrase complete set of all the test questions is expressed aswjIs the jth word segmentation; for the question bankIts word frequency matrixDenotes the jth participle wjIn the question bankThe ith test question qiWhether or not it appears;
step A3, the word frequency matrix is alignedCarrying out de-duplication processing, combining strong correlation word group columns, defining word group correlation threshold rho, and if the Pearson correlation coefficient rho of any two word groups x and yx,yIf the word length is more than 0.8, the two words are combined to form a new word group wx,y=wx∪wyAnd delete wxAnd wy(ii) a Defining a phrase coverage upper and lower threshold [ eta ]min,ηmax]And only using the phrase features in the threshold value range to carry out knowledge point identification.
3. The method of claim 2, wherein step B comprises:
step B1, for the item libraryGaussian mixture model G of test question qMM probability distribution p (q) is shown in equation (1):
wherein K represents the number of knowledge points contained in the test question;is the gaussian distribution density function of the k-th knowledge point; alpha is alphakRepresenting the probability that the test question contains the k-th knowledge point and satisfyingμkRepresents the mean value; sigmakRepresents a covariance;
using EM algorithm to make parameter alphak,μkSum ΣkAnd (3) estimating: first, test questions q are calculatediProbability gamma of containing knowledge point kikAs shown in formula (2); then, αk,μkSum ΣkThe calculation method of (2) is shown in formulas (3) to (5):
given a convergence threshold ε, equations (2) - (5) are iteratively computed until | α'k-α′k-1Until | ≦ ε, find αk,μk,∑k;
Determining the distribution condition of the knowledge points in the test questions by using a formula (1), and constructing a many-to-many mapping relation between the test questions and the knowledge points; setting the clustering category number, evaluating the clustering result by using a Bayesian information criterion, and selecting the optimal clustering number, namely the number of knowledge points in each chapter;
step B2, according to the clustering result of the test questions, orderRepresenting a corpus of knowledge points of test questions, KiRepresenting the ith test question knowledge point; the most frequently co-occurring phrase of the test questions belonging to the knowledge point is used for expressing the characteristics of the knowledge point of the test questions: order toThe knowledge points of the test questions are represented,<q>and<w>test question set, feature set respectively representing K<w>Composed of several phrases with the highest coverage of test question knowledge points and feature set<w>The overall coverage of the test question knowledge points is 100 percent;
step B3, performing chapter knowledge point identification and feature extraction: structured-based question bankThe chapter attributes q.e of each test question in the course, identify the chapter-level knowledge structure of the course, and identify each chapter knowledge point; using a high coverage word group set to represent chapter knowledge point features: order toA complete set of the course chapters is represented,a point of knowledge of a chapter is represented,<q>and<w>respectively representing the test question set and the feature set of C<w>Knowledge point coverage by chapter pairHighest number of phrases and feature set<w>The overall coverage of chapter knowledge points is 100%.
4. The method of claim 3, wherein step C comprises:
the RDF technology in the knowledge map is adopted to define the course knowledge system asTo describe the association between knowledge point entities of course knowledge, the association between chapter entities and knowledge point entities;representing knowledge point entities and their relationships, knowledge point entity KaAnd KbIs KaAnd KbA set of co-occurring word groups; gc=《Cx,e,KyA chapter entity CxAnd knowledge point entity KyThe association of (1) is e, which is consistent with the chapter attribute of the test question.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210109036.8A CN114595337B (en) | 2022-01-28 | 2022-01-28 | Method for constructing course knowledge graph based on GMM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210109036.8A CN114595337B (en) | 2022-01-28 | 2022-01-28 | Method for constructing course knowledge graph based on GMM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114595337A true CN114595337A (en) | 2022-06-07 |
CN114595337B CN114595337B (en) | 2024-06-28 |
Family
ID=81806283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210109036.8A Active CN114595337B (en) | 2022-01-28 | 2022-01-28 | Method for constructing course knowledge graph based on GMM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114595337B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815338A (en) * | 2018-12-28 | 2019-05-28 | 北京市遥感信息研究所 | Relation extraction method and system in knowledge mapping based on mixed Gauss model |
CN111883140A (en) * | 2020-07-24 | 2020-11-03 | 中国平安人寿保险股份有限公司 | Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition |
US20210104234A1 (en) * | 2019-10-08 | 2021-04-08 | Pricewaterhousecoopers Llp | Intent-based conversational knowledge graph for spoken language understanding system |
CN113127731A (en) * | 2021-03-16 | 2021-07-16 | 西安理工大学 | Knowledge graph-based personalized test question recommendation method |
-
2022
- 2022-01-28 CN CN202210109036.8A patent/CN114595337B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815338A (en) * | 2018-12-28 | 2019-05-28 | 北京市遥感信息研究所 | Relation extraction method and system in knowledge mapping based on mixed Gauss model |
US20210104234A1 (en) * | 2019-10-08 | 2021-04-08 | Pricewaterhousecoopers Llp | Intent-based conversational knowledge graph for spoken language understanding system |
CN111883140A (en) * | 2020-07-24 | 2020-11-03 | 中国平安人寿保险股份有限公司 | Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition |
CN113127731A (en) * | 2021-03-16 | 2021-07-16 | 西安理工大学 | Knowledge graph-based personalized test question recommendation method |
Non-Patent Citations (2)
Title |
---|
刘一然;骆力明;: "基于知识图谱的学科单选题考点提取研究", 计算机应用研究, no. 06, 8 April 2018 (2018-04-08) * |
阮彤;高炬;冯东雷;钱夕元;王婷;孙程琳;: "基于电子病历的临床医疗大数据挖掘流程与方法", 大数据, no. 05, 20 September 2017 (2017-09-20) * |
Also Published As
Publication number | Publication date |
---|---|
CN114595337B (en) | 2024-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766324B (en) | Text consistency analysis method based on deep neural network | |
CN111753024B (en) | Multi-source heterogeneous data entity alignment method oriented to public safety field | |
CN103678670B (en) | Micro-blog hot word and hot topic mining system and method | |
CN111414461B (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN108595706A (en) | A kind of document semantic representation method, file classification method and device based on theme part of speech similitude | |
CN107203600B (en) | Evaluation method for enhancing answer quality ranking by depicting causal dependency relationship and time sequence influence mechanism | |
CN109657061B (en) | Integrated classification method for massive multi-word short texts | |
CN111581368A (en) | Intelligent expert recommendation-oriented user image drawing method based on convolutional neural network | |
CN116110405B (en) | Land-air conversation speaker identification method and equipment based on semi-supervised learning | |
CN111582506A (en) | Multi-label learning method based on global and local label relation | |
CN112800229A (en) | Knowledge graph embedding-based semi-supervised aspect-level emotion analysis method for case-involved field | |
CN110941958A (en) | Text category labeling method and device, electronic equipment and storage medium | |
CN116756347A (en) | Semantic information retrieval method based on big data | |
CN115659947A (en) | Multi-item selection answering method and system based on machine reading understanding and text summarization | |
CN115935998A (en) | Multi-feature financial field named entity identification method | |
CN113312907B (en) | Remote supervision relation extraction method and device based on hybrid neural network | |
CN112489689B (en) | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure | |
CN106448660A (en) | Natural language fuzzy boundary determining method with introduction of big data analysis | |
CN116189671B (en) | Data mining method and system for language teaching | |
CN113569008A (en) | Big data analysis method and system based on community management data | |
CN108596245A (en) | It is a kind of that the complete face identification method for differentiating sub-space learning is cooperateed with based on multiple view | |
CN114996442B (en) | Text abstract generation system combining abstract degree discrimination and abstract optimization | |
CN114595337B (en) | Method for constructing course knowledge graph based on GMM | |
CN116049349A (en) | Small sample intention recognition method based on multi-level attention and hierarchical category characteristics | |
CN115292456A (en) | Knowledge-driven non-cooperative personality prediction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |