CN115906867B

CN115906867B - Test question feature extraction and knowledge point labeling method based on hidden knowledge space mapping

Info

Publication number: CN115906867B
Application number: CN202211520312.6A
Authority: CN
Inventors: 李�浩; 杜旭; 王静; 普旭; 胡壮
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-10-31
Anticipated expiration: 2042-11-30
Also published as: CN115906867A

Abstract

The invention belongs to the technical field of text processing, and discloses a test question feature extraction and knowledge point labeling method based on hidden knowledge space mapping, which comprises the steps of firstly simulating knowledge understanding of a labeling person on a test question text, constructing a heterogeneous knowledge attribute graph, capturing application context and sequence information of knowledge attribute words by using a double-stack aggregation method, and forming knowledge vector representation of the test question text; secondly, simulating the knowledge processing process of the annotators, constructing a hidden knowledge space based on a scene association matrix, and projecting knowledge vectors into the hidden knowledge space through a learnable mapping matrix, wherein the weighted sum of the basis vectors is used as knowledge features of the test questions. Finally, the explicit semantic features and the implicit knowledge features are deeply fused by using a vector knowledge attention mechanism, so that accurate knowledge point automatic labeling is realized. The invention simulates the cognition process of the annotator, enriches the characteristic representation of the test questions from two layers of semantic understanding and knowledge understanding of the test questions, thereby improving the accuracy of automatic annotation of the knowledge points of the test questions.

Description

Test question feature extraction and knowledge point labeling method based on hidden knowledge space mapping

Technical Field

The invention belongs to the technical field of text processing, and particularly relates to a test question feature extraction and knowledge point labeling method based on hidden knowledge space mapping.

Background

At present, with rapid development of internet education, educational resources, particularly test question resources, are rapidly increasing, and test questions, which are most typical learning resources for evaluating the knowledge points of students and diagnosing the learning blind points of the students, are widely applied to various educational scenes of various subjects. How to effectively organize and manage test question resources is a key technology supporting intelligent education applications, such as knowledge tracking, education resource recommendation and cognitive diagnosis. Therefore, the knowledge point labels are used as the intersection subjects of education data mining and information processing, and aim to enrich the characteristic representation of the test questions, understand the knowledge intention of the test questions, allocate the most relevant knowledge points for each test question, and realize the effective organization management of the test question resources on the knowledge level. Traditional knowledge point labeling is typically manually labeled by an experienced teacher or field expert, but the entire process is time consuming and expensive. Along with the rapid development of the deep learning method, automatic labeling becomes a main stream method for labeling current knowledge points, and the labeling efficiency and accuracy are improved.

Knowledge point labeling is generally regarded as a specific task of text classification, and knowledge points are automatically assigned to test questions as knowledge labels. Therefore, natural Language Processing (NLP) algorithms and deep learning techniques (DL) are widely used in this task, focusing mainly on two aspects: 1) First, a more powerful semantic representation of the test question is learned through a language model. 2) Secondly, an effective labeling algorithm is designed based on deep learning, such as CNNs, RNNs, GNNs, attention. Although the existing research obtains encouraging labeling effect by extracting explicit semantic information of test question text, misjudgment of knowledge point labeling can be caused for some test questions with similar semantics but different knowledge intentions. Therefore, in addition to considering semantic information, the knowledge of the test questions is intended to be a key for improving the accuracy of labeling the test questions, which is not fully considered in the prior art.

Through the above analysis, the problems and defects existing in the prior art are as follows: in the prior art, only explicit semantic information of test question text is extracted, but implicit knowledge information in the test questions is ignored, misjudgment of test question knowledge point labels with similar semantic meaning and different knowledge connotation is easily caused, and the accuracy of labeling the knowledge points is limited. Therefore, the knowledge of the test questions is intended to be a key for improving the accuracy of labeling the test questions, which is not fully considered in the prior art.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a test question feature extraction and knowledge point labeling method based on hidden knowledge space mapping, which is used for improving the accuracy of knowledge point labeling.

The invention provides a knowledge feature extraction module (KMN) which aims at mining the knowledge connotation of test question text and identifying the knowledge intention by understanding the context and sequence information of question knowledge attribute words. First, knowledge understanding of a annotator on test question text is simulated, a knowledge attribute dictionary of subjects is determined from two dimensions of formal knowledge and practical knowledge based on a large number of subjects education resources, and knowledge attribute words in the test question text are accurately positioned. On the basis, constructing a heterogeneous knowledge attribute graph aiming at each test question, and providing a double-stacked pyramid aggregation method for capturing application context and sequence information of knowledge attribute words to form knowledge vector representation of test question text; secondly, simulating the cognitive processing process of a annotator, mathematically representing the knowledge meaning of a knowledge scene, constructing a hidden knowledge space based on a scene association matrix, and projecting knowledge vectors into the hidden knowledge space through a learnable mapping matrix, wherein the weighted sum of the basis vectors is used as the knowledge characteristics of the test questions. Finally, the explicit semantic features and the implicit knowledge features are deeply fused by using a vector knowledge attention mechanism, so that accurate knowledge point automatic labeling is realized. The invention simulates the cognition process of the annotator, enriches the characteristic representation of the test questions from two layers of semantic understanding and knowledge understanding of the test questions, thereby improving the accuracy of automatic annotation of the knowledge points of the test questions.

Further, the method comprises:

step one, simulating knowledge understanding of a annotator on test question text, determining a knowledge attribute dictionary of subjects from two dimensions of formal knowledge and practical knowledge based on a large number of subjects education resources, and accurately positioning knowledge attribute words in the test question text;

step two, constructing a heterogeneous knowledge attribute graph by taking knowledge attribute words in the test question text as key nodes, capturing application context and sequence information of the knowledge attribute words by using a double-stacked pyramid aggregation method, and forming knowledge vector representation of the test question text;

simulating the knowledge processing process of a annotator, mathematically representing the knowledge meaning of a knowledge scene, constructing a hidden knowledge space based on a scene association matrix, and projecting a knowledge vector into the hidden knowledge space through a learnable mapping matrix, wherein the weighted sum of all basis vectors is used as the knowledge characteristic of a test question;

and fourthly, deeply fusing the explicit semantic features and the implicit knowledge features by using a vector knowledge attention mechanism, so as to realize accurate knowledge point automatic labeling.

Further, the step two builds a heterogeneous knowledge attribute graph for each test question, captures application context and sequence information of knowledge attribute words by using a double-stacked pyramid aggregation method, and forms knowledge vector representation of test question text, and specifically comprises the following steps:

firstly, generating a self-adaptive knowledge attribute graph aiming at each test question, wherein a node set comprises the questions, knowledge attribute words and neighborhood words;

then, a pyramid aggregation is provided as a new knowledge aggregation method to fuse the context information of the knowledge attribute words;

finally, the collected knowledge attribute words are arranged according to the original sequence to form a knowledge vector representation of the test question text.

Further, the knowledge attribute graph construction method comprises the following steps:

step 1: determining the position of a knowledge attribute word in a test question: firstly, word segmentation pretreatment is carried out on test question text to generate word sequencesComputing knowledge attribute dictionary->And test question text->To obtain knowledge attribute word sets of the current test questionCorresponding index ∈ ->

Step 2: index-basedSelecting the knowledge attribute words in the test question text to be used as first-order neighbor nodes of the test question nodes in the knowledge attribute graph;

step 3: and extracting the neighborhood word of each knowledge attribute word by utilizing a sliding window according to the position of the knowledge attribute word, learning the application scene of the knowledge attribute word, and determining the sequence information according to the position relation of different knowledge attribute words in the test questions.

Further, the knowledge attribute dictionaryThe construction method of (1) comprises:

(1) Selection of knowledge attribute words in formal knowledge: selecting high-frequency words from formal knowledge such as teaching materials and teaching auxiliary books as knowledge attribute words except stop words;

(2) Selection of knowledge attribute words in the practical knowledge: selecting common words in practical knowledge such as test questions, operation, test paper and the like;

(3) Determining a final knowledge attribute dictionary by combining formal knowledge and practical knowledge, mapping the values of the final knowledge attribute dictionary to a certain numerical range through function transformation, adding the final knowledge attribute dictionary and the functional knowledge to obtain importance scores w of words, selecting words with higher weights as knowledge attribute words, and forming the knowledge attribute dictionary

Further, in the knowledge attribute graph, the neighborhood word is directly associated with the corresponding knowledge attribute word, and sequence information of the knowledge attribute word in the test question text is marked.

Further, the specific steps of pyramid aggregation include:

(1) The purpose of the first layer aggregation is to expand the receptive field of knowledge attribute words, using a sliding window C with a weight of 1 _win ∈R ^p To capture a contextual word sequence of knowledge attribute words;

first, knowledge attribute wordsLocated in sliding window C _win For learning contextual information in front of knowledge attribute words; the sliding window is then moved backwards in sequence until +.>Finishing the alignment when the sliding window is positioned at the initial position of the sliding window>Learning of a subsequent context;

C _win applying agg (-) to aggregate semantic information of p words in the sliding window once sliding; generating p vector sequences fusing context information for each knowledge attribute wordWherein p represents the length of the sliding window, which determines the range of knowledge attribute word receptive fields, ++>Representing a "pyramid" aggregation.

(2) The second layer aggregation is based on the feature vector generated by the first layer aggregationGenerating a knowledge attribute word vector representation of the fused context information again using an aggregation function agg (·)>The agg (& gt) integrates the context information into the knowledge attribute word vector by adopting three aggregation modes, namely summing and aggregating the agg _s Average aggregation agg _m Splicing and polymerizing agg _c ；

Finally, arranging the knowledge attribute words with the enhanced characteristics according to the sequence of appearance in the test question text to form a knowledge representation Q of the test question text _k ；

Further, the method for constructing the hidden knowledge space based on the scene association matrix in the third step comprises the following steps:

(1) For knowledge attribute words repeatedly appearing in multiple knowledge scenes, weights of the knowledge attribute words in different knowledge scenes are added to obtainRecalculating knowledge attribute word a _i In the current knowledge scene s _j Weight of->Duty cycle as attenuation factor->

(2) Initial weightAnd attenuation factor->Multiplying to obtain updated scene association weights; conversely, if a knowledge attribute word appears in only one knowledge scene, the word is unique to the current knowledge scene, and the relevant weight of the word increases accordingly;

further, the third step projects the knowledge vector to the hidden knowledge space through the leachable mapping matrix, and specifically includes:

representing the knowledge of the test question Q _k Mapping to a d-dimensional hidden knowledge space to extract high-dimensional knowledge features, and mapping knowledge vector representations to each scene base in the hidden knowledge space through a learnable parameter matrix H to obtain feature representations of corresponding base vectors;

n knowledge attribute words are arranged in the test question, and the scene basis vector corresponding to the test question is A _q ∈R ^n×k The method comprises the steps of carrying out a first treatment on the surface of the To obtain the adaptive projection vector Q 'of each test question on the scene base' _k Designing a knowledge mapping layer:

in the formula, H is E R ^v×d Is a learnable mapping matrix, v represents the dimension of the word vector,is A _q Transpose of->Representing the activation function ReLU, Q' _k ∈R ^k×d Is the knowledge vector of each test question, and d is the dimension of the knowledge feature.

Further, the weighted sum of the three basis vectors serves as knowledge features of the test questions, and specifically includes:

weighted summation of different scene bases as final knowledge feature of test questions

Y＝tanh(Q′ _k )

α＝softmax(w ^T Y)

γ＝Q′ _k α ^T

Wherein Q 'is' _k Is the output vector of the knowledge mapping layer, w ^T Is a trainable parameter matrix, alpha represents the weight of the output vector,and then the knowledge characteristic vector which contains knowledge information in the test question is represented.

In combination with the technical scheme and the technical problems to be solved, the technical scheme to be protected has the following advantages and positive effects:

1) The invention discloses a test question knowledge feature extraction method based on hidden knowledge space mapping, which simulates the thinking process of a annotator when reading a test question, and deeply understands the knowledge intention of the test question on the basis of reading and understanding the text semantics of the test question. The invention provides a knowledge feature extraction module, which aims to mine the knowledge connotation of test question text by understanding the context and sequence information of the question knowledge attribute words. The invention simulates the cognition process of the annotator, enriches the characteristic representation of the test question from two layers of semantic understanding and knowledge understanding of the test question, thereby improving the accuracy of automatic annotation of the test question knowledge points.

2) And constructing a knowledge attribute graph, namely simulating knowledge understanding of a annotator on test question text, and endowing the deep learning model with interpretability.

And constructing a knowledge attribute graph aiming at each test question, integrating the context and sequence information of the knowledge attribute words, and forming the knowledge representation of the test question text. The knowledge connotation implicit in the test question text is mined by using the graph structure, and the test question text is converted into a machine-readable characteristic representation.

3) The invention provides a new knowledge aggregation method, namely 'pyramid' aggregation, which can expand knowledge attribute word receptive fields and further enhance feature representation at the same time, so that the situation information of the neighborhood can be effectively aggregated.

The hidden knowledge space construction is carried out by taking a scene association matrix as a base to construct a hidden knowledge space, comprehensively considering mapping vectors of different scene bases, and extracting knowledge features. The closer the knowledge vector is in the hidden knowledge space, the more similar the knowledge connotation of the test question.

The knowledge mapping layer design can map the knowledge representation of the test question text onto the scene basis of the hidden knowledge space through the learning parameter matrix, and the self-adaptive knowledge features are obtained through the weighted summation of different scene basis vectors.

4) The expected benefits and commercial values after the technical scheme of the invention is converted are as follows:

the test question knowledge feature extraction method based on hidden knowledge space mapping provides important bottom technical support for intelligent education applications such as intelligent group paper, test question recommendation, personalized operation generation and the like, is a basic and important work, and the accuracy of labeling results determines the quality and effect of intelligent service. The automatic labeling algorithm of the knowledge points is applied to the intelligent service platform, so that manpower and material resources spent by manual labeling are greatly saved, the knowledge organization management of test question resources is facilitated, and personalized and rich intelligent learning services are provided for learners.

5) The technical scheme of the invention fills the technical blank in the domestic and foreign industries:

the method for extracting the test question knowledge features based on the hidden knowledge space mapping simulates the knowledge understanding and cognition processing process of a annotator on the test questions, realizes the automatic learning of advanced features from test question texts, and recognizes knowledge intention while understanding the semantics of the test question texts. Compared with the existing knowledge point labeling task of representing test questions by using vectors constructed by syntactic, lexical or semantic features, the KMN innovatively proposes a new feature dimension, namely knowledge features, which are different from general semantic features, but analyze knowledge intention of the test questions from the knowledge understanding point of view. The Bert-KMN deeply fuses semantic features and knowledge features of the test question text, enriches feature representation of the test question text, and realizes accurate knowledge point labeling. The Bert-KMN further recognizes knowledge intention while understanding text semantics of test questions, and has the advantages of interpretability and strong generalization capability.

Drawings

FIG. 1 is a flowchart of a method for extracting test question knowledge features based on hidden knowledge space mapping and Bert-KMN provided by an embodiment of the invention;

FIG. 2 is a flow chart of context-aware knowledge vector representations provided by an embodiment of the invention;

fig. 3 is a flowchart of extracting knowledge features based on a hidden knowledge space according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In order to fully understand how the invention may be embodied by those skilled in the art, this section is an illustrative embodiment in which the claims are presented for purposes of illustration.

In order to facilitate an understanding of the present invention, the following related concepts need to be explained and illustrated:

knowledge attribute dictionary: some words in the test questions have more definite knowledge connotations, such as "speed", "uniform speed" and "m/s" are often used to explain the "speed" knowledge points, and these words are called "knowledge attribute words". Knowledge attribute dictionary is composed of knowledge attribute words extracted from a large number of learning resources such as teaching materials, teaching aids and test questionsm represents the number of knowledge attribute words in the dictionary.

Knowledge attribute graph: in order to capture the relation among test questions, knowledge attribute words and contexts thereof, the invention constructs a heterogeneous knowledge attribute graph aiming at each test questionTo highlight implicit knowledge information, +.>Represents the set of vertices, ε represents the relationship between vertices, +.>Then it refers to the sequence information of the knowledge attribute word. The test question itself is used as the center of the graph, the knowledge attribute words are first-order nodes directly connected with the test question, the neighborhood words are directly connected with the corresponding knowledge attribute words, and the appearance sequence of the knowledge attribute words in the test question is also taken into consideration to form a complete knowledge attribute graph.

Scene association matrix: some knowledge attribute words are applied in multiple knowledge scenarios, but the importance of these words in different knowledge scenarios is different each. Thus, the association weight w is calculated _ij To represent knowledge attribute word a _i And knowledge scene s _j Is a correlation of (3). Scene correlation matrix a= { [ w ₁₁ ，w ₁₂ ，…，w _1k ]，…，[w _m1 ，w _m2 ，…，w _mk ]The knowledge attribute dictionary is composed of all words in the knowledge attribute dictionary and association weights among different application scenes, and k represents the number of knowledge scenes, namely knowledge points.

Hidden knowledge space: in order to enrich the connotation and extension of knowledge points, constructing a hidden knowledge space S= { S based on a scene correlation matrix ₁ ，s ₂ ，…，s _k }. In the space, knowledge features among knowledge vectors with closer distances are more similar, and test question texts with similar knowledge feature representations are easier to label the same knowledge point.

Knowledge characteristics: knowledge features are a new dimension feature, different from semantic features, that is used to highlight, measure, and test knowledge information implicit in the test question. By mapping knowledge representations in the test questions to the hidden knowledge space, a high-dimensional knowledge feature representation is obtained based on projection vectors of different scene bases.

As shown in fig. 1, the present example provides a test question knowledge feature extraction method (KMN) based on hidden knowledge space mapping, which can be generalized into two main phases, context-aware knowledge vector representation and hidden knowledge space-based knowledge feature extraction.

1. Context-aware knowledge vector representation

In the test question text, some words play an important role in transmitting knowledge information, and the invention is called as 'knowledge attribute words'. A knowledge attribute dictionary is constructed from two dimensions of formal knowledge and practical knowledge by learning a large corpus of educational resources, including teaching materials, teaching aids and practice problems. Because word level information is limited, the invention also considers the context and sequence information of the knowledge attribute words in the test question text. Therefore, firstly, an adaptive knowledge attribute graph is generated for each test question, wherein the knowledge attribute graph comprises knowledge attribute words, neighborhood words and sequence information. Then, a "pyramid" aggregation is proposed as a new knowledge aggregation operation to fuse the context information of knowledge attribute words. Finally, the collected knowledge attribute words are arranged according to the original sequence to form a knowledge vector representation of the test question. The whole flow is shown in fig. 2.

(1) Knowledge attribute dictionaryAnd (5) generating. In the text of the test questions, some words play an important role in conveying knowledge information, such as conceptual words, phrases, units and symbols. In order to extract knowledge attribute words containing rich knowledge connotation, the invention takes a large amount of teaching resources as a corpus, such as teaching materials, teaching aids and practice problems, and the application scenes and the expression forms of the teaching resources are different, so that the teaching resources can be further divided into formal knowledge and practical knowledge.

TABLE 1 Primary types of knowledge attribute words

Step one: selection of knowledge attribute words in formal knowledge. Textbooks or teaching aids provide the most intuitive explanation and illustration of the principles of knowledge points and equipment scenes, are also standard forms for storing, organizing and expressing discipline knowledge, and belong to "formal knowledge". The vocabulary of these teaching resources implying knowledge content appears more frequently. Thus, in addition to the stop words, a high frequency vocabulary is selected in the formal knowledge as knowledge attribute words.

Step two: and selecting knowledge attribute words by using the practical knowledge. The post-class homework and test questions are important tools for student knowledge diagnosis, are also flexible application of discipline knowledge in specific scenes, and belong to 'practical knowledge'. Therefore, the high-frequency words in the test questions are not the most important, and the common words in the knowledge application scene are more significant, such as "rainbow", "triple prism" in the "light dispersion" knowledge point, and "glasses", "microscope", "magnifier" in the "lens" knowledge point.

Step three: and combining the formal knowledge and the practical knowledge to determine a final knowledge attribute dictionary, mapping the value of the final knowledge attribute dictionary to a certain numerical interval through function transformation, and adding the final knowledge attribute dictionary and the numerical interval to obtain the importance score w of the vocabulary. The normalization operation is directed to keywords that are common to formal and utility knowledge. For keywords of each of two different types of knowledge, corresponding weightsThe resume holds its initial value. Finally, selecting words with higher weight as knowledge attribute words, and forming a knowledge attribute dictionary

(1) And (5) constructing a knowledge attribute graph. Considering knowledge attribute words, application context and sequence information in each test question, each test question is expressed as a knowledge attribute graphWherein->Three types of nodes are represented, namely test questions, knowledge attribute words and related neighborhood words, epsilon represents an edge set of the relation between different types of nodes, and +.>And the sequence information of the knowledge attribute words in the test question text is represented.

Step one: and determining the position of the knowledge attribute word in the test question. Firstly, word segmentation pretreatment is carried out on test question text to generate word sequencesComputing knowledge attribute dictionary->And test question text->To obtain knowledge attribute vocabulary ++>Corresponding index ∈ ->

Step two: index-basedAnd selecting the knowledge attribute words in the test question text to be used as first-order neighbor nodes of the test question nodes in the knowledge attribute graph.

Step three: and extracting the neighborhood word of each knowledge attribute word by utilizing a sliding window according to the position of the knowledge attribute word, extracting the application scene of the knowledge attribute word, and determining the sequence information according to the position relation of different knowledge attribute words in the test questions. In the knowledge attribute graph, the neighborhood word is directly associated with the corresponding knowledge attribute word, and sequence information of the knowledge attribute word in the test question text is marked.

(3) "pyramid" aggregation. In order to integrate the application context information of the knowledge attribute words, the invention provides a new knowledge aggregation operation, namely 'pyramid aggregation', which realizes two stacked aggregation operations, carries out convolution-like operation on the knowledge attribute graph, and enriches the characteristic representation of the knowledge attribute words by aggregating the neighborhood context information.

Step one: the purpose of the first layer aggregation is to expand the receptive field of knowledge attribute words. Using a sliding window C with a weight of 1 _win ∈R ^p To capture a sequence of contextual words of knowledge attribute words. First, knowledge attribute wordsLocated in sliding window C _win For learning contextual information in front of knowledge attribute words; the sliding window is then moved backwards in sequence until +.>Finishing the alignment when the sliding window is positioned at the initial position of the sliding window>Subsequent learning of the context.

C _win Each slide would then apply agg (·) to aggregate semantic information for p words within the sliding window. Finally, p vector sequences fusing the contexts are generated for each knowledge attribute wordWherein p represents the length of the sliding window, which determines the range of knowledge attribute word receptive fields, ++>Representing a "pyramid" aggregation.

Step two: the second layer aggregation is based on the feature vector generated by the first layer aggregationGenerating a knowledge attribute word vector representation of the fused context information again using an aggregation function agg (·)>In the pyramid aggregation process, the stacking aggregation operation is carried out twice to enhance the self characteristics of knowledge attribute words and the context information of the fusion receptive field, so that the knowledge connotation of test question texts is excavated. The agg (& gt) integrates the context information into the knowledge attribute word vector by adopting three aggregation modes, namely summation aggregation agg _s Average aggregation agg _m Splicing and polymerizing agg _c . Finally, arranging the knowledge attribute words with the enhanced characteristics according to the sequence of appearance in the test question text to form a knowledge representation Q of the test question text _k 。

2. Knowledge feature extraction based on hidden knowledge space

Because the knowledge attribute words have respective knowledge understanding under different knowledge scenes, the invention quantifies knowledge connotation by means of the importance of the knowledge attribute words in a plurality of knowledge scenes, and generates mathematical expression of the hidden knowledge space. Firstly, constructing a scene association matrix based on the weight of knowledge attribute words and the mutual influence between knowledge scenes, and generating a hidden knowledge space taking the scene association matrix as a base. And then mapping knowledge vector representations in the test question text to a hidden knowledge space through a learnable parameter matrix to obtain self-adaptive vector representations on different scene bases. Finally, the weighted summation of the scene basis vectors is the knowledge feature of the test question. The whole flow is shown in fig. 3.

(1) And constructing a hidden knowledge space based on the scene incidence matrix. In order to analyze knowledge expression of knowledge attribute words under multiple scenes, a hidden knowledge space is created by taking a scene association matrix as a base vector, connotation and extension of knowledge points are expanded, the scene association matrix consists of base vectors of a plurality of knowledge scenes, and in the space, the knowledge connotation between feature vectors which are closer in distance is more similar. As shown in fig. 2, the rows represent knowledge attribute words, the columns represent different knowledge scenes, and the intersections refer to the important weights of the current words in the application scene.

Step one: there are two types of knowledge attribute words in the scene association matrix. One is to point out that the frequency of occurrence is high and widely applied to multiple knowledge scenarios, such as "speed" and "constant speed" are weighted higher in both "speed" and "force" knowledge scenarios. Since these words often appear in multiple knowledge scenarios, which can lead to confusion of knowledge scenarios, their weights are reduced to a relative extent on the basis of the original values.

Step two: another class of knowledge attribute words, though weighted lower, is relatively independent of knowledge scenes. For example, the word "celsius" appears relatively less frequently in a "temperature" scenario. However, since this type of word is unique in the "temperature" scenario, its associated weight should be increased.

Step three: in addition to considering the weights of knowledge attribute words in the current knowledge scene, the analysis is integratedIn addition, the weight of the word in other knowledge scenes is comprehensively considered, so that the scene association matrix is further updated.

Step four: the invention provides a weight calculation formula of knowledge attribute words under different knowledge scenes, which is based on the important weight of the knowledge attribute words in a single knowledge scene and also considers the mutual influence among different knowledge scenes. Wherein, the liquid crystal display device comprises a liquid crystal display device,as an initial value of the scene correlation matrix.

Step five: for knowledge attribute words repeatedly appearing in multiple knowledge scenes, weights of the knowledge attribute words in different knowledge scenes are added to obtainRecalculating knowledge attribute word a _i In the current knowledge scene s _j Weight of->Duty cycle as attenuation factor->

Step six: initial weightAnd attenuation factor->The multiplication may obtain updated scene correlation weights. Conversely, if a knowledge attribute word appears in only one knowledge scene, that word is unique to the current knowledge scene and the associated weight of that word increases accordingly.

(2) Knowledge mapping layer. Representing the knowledge of the test question Q _k Mapping to d-dimensional hidden knowledge space to extract knowledge features of high dimension. Since the scene basis vectors are the basic components of the hidden knowledge space, the knowledge vector representation is mapped onto each scene basis in the hidden knowledge space by a learnable parameter matrix H, resulting in a feature representation of the corresponding basis vector. The scene basis vectors in the hidden knowledge space are self-adaptive to each test question and are determined by knowledge attribute words in the test questions. For example, if there are n knowledge attribute words in a test question, the scene basis vector corresponding to the test question is a _q ∈R ^n×k . To obtain the adaptive projection vector Q 'of each test question on the scene base' _k Designing a knowledge mapping layer:

in the formula, H is E R ^v×d Is a learnable mapping matrix, v represents the dimension of the word vector,is A _q Transpose of->Representing the activation function ReLU. Q'. _k ∈R ^k×d Is the knowledge vector of each test question, and d is the dimension of the knowledge feature.

(3) And (5) knowledge feature extraction. The feature representation of each test question in the hidden knowledge space is formed by combining mapping vectors on all scene bases, but each scene base has different important roles in highlighting knowledge connotation. In order to highlight knowledge scene information related to the connotation of test question knowledge, an attention mechanism is applied to automatically allocate attention weights for k scene mapping vectors of the hidden knowledge space. Thus, the weighted summation of different scene bases is used as the final knowledge characteristic of the test questions

Y＝tanh(Q′ _\ )

α＝softmax(w ^T Y)

γ＝Q′ _k α ^T

In order to prove the inventive and technical value of the technical solution of the present invention, this section is an application example on specific products or related technologies of the claim technical solution.

Automatic group scrolling and cognitive diagnosis: the test question knowledge feature extraction method based on hidden knowledge space mapping provided by the invention can be applied to learner knowledge capability evaluation, uses classroom practice, post-class operation, unit test and the like in middle school physical classroom teaching as actual application scenes, respectively develops the quality of the operation, the quick assembly of test paper and the question in the test paper, quickly corrects the learner test paper and the test paper, and is oriented to deep analysis of evaluation results of individual and group learners, and the like.

Knowledge organization management of test question library: the method is characterized in that a primary school is taken as an object, a physical test question knowledge point marking tool and a physical test question knowledge point marking method are applied to the construction and management of a test question library, and application research oriented to test question knowledge organization management is developed. Various test question resources accumulated in the teaching activities of schools in the past year are quickly built by adopting the knowledge point marking tool and the method developed by the project, and the problem library is used for assisting the teacher in teaching by informatization means in aspects of cooperative lesson preparation, classroom teaching, job correction, post-class coaching and the like, so that the teaching workload is reduced.

The embodiment of the invention has a great advantage in the research and development or use process, and has the following description in combination with data, charts and the like of the test process.

In order to verify the knowledge feature extraction method provided by the invention, the invention also provides a BERT-KMN integrating knowledge features and semantic features to simulate the process of annotating the thinking problem of a annotator, including knowledge understanding and cognitive processing. Most automatic labeling models only focus on the global semantic features of the test questions, while BERT-KMN further recognizes knowledge intent while understanding the text semantics of the test questions. According to the invention, the BERT-KMN framework is compared with the deep learning model based on a plurality of text classification models such as TextGCN and BERT, and experimental results show that compared with a single semantic feature extraction model, the BERT-KMN framework with double feature fusion effectively improves the accuracy of knowledge point labeling. This shows that the knowledge features extracted by KMN are different from the general semantic features of the test question text, and play an important role in understanding the knowledge intent.

The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims

1. A method for extracting test question features and labeling knowledge points based on hidden knowledge space mapping includes simulating knowledge understanding of a labeling person to test question text, constructing a heterogeneous knowledge attribute graph, capturing application context and sequence information of knowledge attribute words by using a double-stack aggregation method, and forming knowledge vector representation of the test question text; secondly, simulating the knowledge processing process of a annotator, constructing a hidden knowledge space based on a scene association matrix, and projecting knowledge vectors into the hidden knowledge space through a learnable mapping matrix so as to obtain base vectors of different knowledge scenes in the hidden knowledge space, wherein the weighted sum of the base vectors is used as knowledge features of the test questions; finally, the explicit semantic features and the implicit knowledge features are deeply fused by using a vector knowledge attention mechanism, so that accurate knowledge point automatic labeling is realized;

the automatic labeling of the knowledge points comprises the following steps:

step two, constructing a heterogeneous knowledge attribute graph aiming at each test question, capturing application context and sequence information of knowledge attribute words by using a double-stack aggregation method, and forming knowledge vector representation of test question text;

simulating the knowledge processing process of a annotator, mathematically representing the knowledge connotation of a knowledge scene, constructing a hidden knowledge space based on a scene association matrix, and projecting a knowledge vector into the hidden knowledge space through a learnable mapping matrix, wherein the weighted sum of the knowledge scene base vectors is used as the knowledge feature of the test question;

step four, utilizing a vector knowledge attention mechanism to deeply fuse the explicit semantic features and the implicit knowledge features, so as to realize accurate knowledge point automatic labeling;

step two, constructing a heterogeneous knowledge attribute graph aiming at each test question, capturing application context and sequence information of knowledge attribute words by using a double-stack aggregation method, and forming knowledge vector representation of test question text, wherein the method specifically comprises the following steps:

firstly, generating a self-adaptive knowledge attribute graph for each test question, wherein the self-adaptive knowledge attribute graph comprises knowledge attribute words, neighborhood words and sequence information;

then, a pyramid aggregation is proposed as a new knowledge aggregation operation to fuse the context information of the knowledge attribute words;

finally, arranging the collected knowledge attribute words in sequence to form a knowledge vector representation of the test question text;

the construction method of the knowledge attribute graph comprises the following steps:

the first step: determining the position of a knowledge attribute word in a test question: firstly, word segmentation pretreatment is carried out on test question text to generate word sequencesComputing knowledge attribute dictionary->And test question text->To obtain knowledge attribute vocabulary ++>Corresponding index ∈ ->

And a second step of: index-basedSelecting the knowledge attribute words in the test question text to be used as first-order neighbor nodes of the test question nodes in the knowledge attribute graph;

and a third step of: extracting neighborhood words of each knowledge attribute word by utilizing a sliding window according to the positions of the knowledge attribute words, extracting application scenes of the knowledge attribute words, and determining sequence information of the knowledge attribute words according to the position relations of different knowledge attribute words in the test questions;

the pyramid aggregation comprises the following specific steps:

C _win applying an aggregation function agg (-) to aggregate semantic information of p words in the sliding window once; generating p vector sequences fusing context for each knowledge attribute wordWherein p represents the length of the sliding window, which determines the range of the knowledge attribute word receptive field;

The method for constructing the hidden knowledge space based on the scene association matrix comprises the following steps:

(1) For knowledge attribute words repeatedly appearing in multiple knowledge scenes, weights of the knowledge attribute words in different knowledge scenes are added to obtainRecalculating knowledge attribute words alpha _i In the current knowledge scene s _j Weight of->Duty cycle as attenuation factor

the third step of projecting the knowledge vector to the hidden knowledge space through the leachable mapping matrix specifically comprises the following steps:

representing the knowledge of the test question Q _k Mapping to a d-dimensional hidden knowledge space to extract high-dimensional knowledge features, and mapping knowledge vector representations to each knowledge scene base in the hidden knowledge space through a learnable parameter matrix H to obtain feature representations of corresponding base vectors;

in the formula, H is E R ^v×d Is a learnable mapping matrix, v represents the dimension of the word vector,is A _q Transpose of->Representing the activation function ReLU, Q' _k ∈R ^k×d The knowledge vector of each test question, and d is the dimension of the knowledge feature;

the weighted sum of the three basis vectors serves as knowledge characteristics of the test questions, and specifically comprises the following steps:

Y＝tanh(Q′ _k )

α＝softmax(w ^T Y)

γ＝Q′ _k α ^T

2. The method for extracting test question features and labeling knowledge points based on hidden knowledge space mapping as claimed in claim 1, wherein said knowledge attribute dictionaryThe construction method of (1) comprises:

(1) Selection of knowledge attribute words in formal knowledge: selecting a high-frequency word as a knowledge attribute word in the formal knowledge except for stop words;

(2) Selection of knowledge attribute words in practical knowledge: selecting common words in a knowledge application scene;

(3) Determining a final knowledge attribute dictionary by combining formal knowledge and practical knowledge, mapping the values of the final knowledge attribute dictionary to a certain numerical interval through function transformation, adding keyword weights in the mapped formal knowledge and practical knowledge to obtain importance scores w of words, selecting words with higher weights as knowledge attribute words, and forming the knowledge attribute dictionary

3. The method for extracting test question features and labeling knowledge points based on hidden knowledge space mapping according to claim 1, wherein in the knowledge attribute graph, the neighborhood words are directly associated with the corresponding knowledge attribute words, and sequence information of the knowledge attribute words in the test question text is labeled.