CN110968708A - Method and system for labeling education information resource attributes - Google Patents

Method and system for labeling education information resource attributes Download PDF

Info

Publication number
CN110968708A
CN110968708A CN201911330362.6A CN201911330362A CN110968708A CN 110968708 A CN110968708 A CN 110968708A CN 201911330362 A CN201911330362 A CN 201911330362A CN 110968708 A CN110968708 A CN 110968708A
Authority
CN
China
Prior art keywords
information
information resource
resource units
relation
education
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911330362.6A
Other languages
Chinese (zh)
Inventor
何彬
余新国
夏盟
李心宇
张哲进
陈胜男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN201911330362.6A priority Critical patent/CN110968708A/en
Publication of CN110968708A publication Critical patent/CN110968708A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an education information resource attribute labeling method, which comprises the steps of classifying information formats of education information resource units, preprocessing the content of the education information resource units, and acquiring an element information set; extracting the association relation among the element information to form an element relation set; reasoning and constructing a complete logical relationship of educational information resource units, and acquiring a target relationship set according to the logical relationship; converting the element relation set and the target relation set into corresponding knowledge point names, and acquiring one or more knowledge labels corresponding to the educational information resource units to form a knowledge sequence; and outputting the education information resource units and the knowledge sequence to finish the attribute labeling of the current education information resources. The technical scheme of the invention aims at the conditions of large workload, low accuracy and high cost of the existing education information labeling method, and adopts a mode of combining examination question knowledge attribute labeling and knowledge attribute mining, so that the accurate and efficient labeling of education information resources can be realized.

Description

Method and system for labeling education information resource attributes
Technical Field
The invention belongs to the field of education information intellectualization, and particularly relates to an education information resource attribute labeling method and system.
Background
Along with the development of intelligent education and online education, students can conveniently acquire learning resources required by individual learning from various open learning platforms. The test questions are used as an important educational information resource and widely used in cognitive diagnosis and personalized learning recommendation of students. However, the proportion difference between the knowledge points to be mastered by the students in the process of education and learning is very large relative to the test question information, and the quantity of the test question information is far larger than the content of the knowledge points on the premise that the knowledge points are relatively fixed. If the question information is not screened but the topic sea tactics are directly adopted, the mastering degree of each knowledge point is different, and the comprehensive development of students is not facilitated. How to automatically, accurately and efficiently label the attributes of the on-line test questions becomes a research hotspot for refining adaptive learning services under the background of intelligent education.
In the field of education informatization, test question knowledge attributes (also called test question attributes) are knowledge point sets used for describing test question understanding and test question solving. The method accurately describes the attribute of the test question, and is greatly helpful for diagnosing the mastery degree of each knowledge point of the student according to the answer records of the student, accurately positioning weak links and further developing personalized resource recommendation and learning services. At present, two test question attribute labeling methods mainly exist: first, manual labeling. The method completely understands the test questions according to the self experience knowledge by the experts, and extracts one or more keywords as the knowledge attribute labels of the test questions through comprehensive analysis of the subject information, the test question investigation intentions and the like, thereby completing the labeling of the test question attributes. In the traditional work of building the test question bank, a manual labeling method is adopted in a large quantity. And secondly, a test question attribute labeling method based on statistics. If the word frequency statistics is carried out, keywords with high occurrence frequency in the test question surface are obtained, and then the knowledge point names corresponding to the high-frequency keywords are called as attribute labels of the test questions by combining a pre-designed knowledge point name library. With the rise of education informatization, an attribute labeling method based on machine learning is also provided. On the basis of partial test question attribute calibration results, the method trains a single classifier or multiple classifiers by adopting a machine learning method, establishes a class label for each attribute, and classifies test questions into different label categories by the classifiers during labeling, thereby realizing the labeling of the attributes of other test questions.
However, the existing test question attribute labeling method has the following two disadvantages: firstly, the labeling efficiency is low and the large-scale application is difficult. Although manual labeling has the advantages of high precision, strong adaptability and the like on small sample data, due to the complexity of test question attribute calibration, when knowledge and technical attributes are more or the number of questions is larger, the problems of large workload, strong subjectivity, too coarse knowledge granularity and the like exist in the calibration completely by experts. Secondly, the labeling quality is low and the corpus cost is high. Both the attribute labeling method based on statistics and the attribute labeling method based on machine learning generally have the defects of no combination of teaching and research experiences and insufficient richness of knowledge labeling, and particularly have extremely poor prediction effect on the knowledge points with few labeled corpora, so that the corpus construction cost is extremely high for achieving the high-quality knowledge labeling effect.
Disclosure of Invention
In view of the above-mentioned deficiencies in the prior art or needs for improvement, the present invention provides a method for annotating attributes of educational information resources, which at least partially solves the above-mentioned problems. The technical scheme of the invention aims at the conditions of large workload, low accuracy and high cost of the existing education information labeling method, and adopts a mode of combining examination question knowledge attribute labeling and knowledge attribute mining, so that the accurate and efficient labeling of education information resources can be realized.
To achieve the above object, according to one aspect of the present invention, there is provided an educational information resource attribute labeling method, comprising
S1, classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification, and acquiring the element information set of the education information resource units;
s2, extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
s3, reasoning and constructing a complete logic relationship of the educational information resource units by using the educational information resource units, the element information and/or the association relationship, and acquiring a target relationship set according to the logic relationship;
s4, converting the element relation set and the target relation set into corresponding knowledge point names, and acquiring one or more knowledge labels corresponding to the educational information resource units to form a knowledge sequence;
s5, the education information resource unit and the knowledge sequence are output, and the attribute labeling of the current education information resource is completed.
As a preferable embodiment of the present invention, the preprocessing in step S1 includes text preprocessing and graphics preprocessing, and the text preprocessing includes
S111, decomposing the text to obtain a combination of words forming an educational information resource unit;
s121, recognizing, classifying and marking words;
s131 acquires a text information set of the education information resource unit.
As a preferable embodiment of the present invention, the preprocessing in step S1 includes text preprocessing and graphics preprocessing, and the graphics preprocessing includes
S112, detecting, identifying and classifying the image information to obtain the composition information of the image information;
s122, determining image composition elements and relationships according to composition information;
s132 obtains a set of image information based on the image composition elements and the relationships.
As a preferable aspect of the present invention, the obtaining of the association relationship in step S2 includes syntactic and semantic matching and topology analysis, the syntactic and semantic matching includes,
s211, defining a syntactic semantic template according to the elements, the element properties and/or the element relations;
s221, building a text semantic vector corresponding to the element information according to the syntactic semantic template, and training the deep network model by using the text semantic vector;
s231, inputting the element information into the depth network model trained by the set, and deriving the association relation between the elements.
As a preferable aspect of the present invention, the obtaining of the association relationship in step S2 includes syntactic-semantic matching and topology analysis, the topology analysis includes,
s212, determining the basic topological structure type of the current educational information resource unit according to the property of the current educational information resource unit;
s222, traversing the image elements according to the types, and executing mapping processing from the basic topological structure to the element relation;
s232 derives the association relationship between the elements according to the mapping result.
As a preferable aspect of the present invention, step S3 includes,
s31, initializing element names corresponding to each association relation;
s32 creating attribute variables and initializing the attribute variables;
s33, searching < element, attribute variable > in the element relation set, if both are known, marking as 1, otherwise marking as 0;
s34 selects any < element, attribute variable > marked as 1, verifies any < element, attribute variable > marked as 0 in combination with the corresponding logic rule, and adds the logic rule to the target relationship set if the verification result is 1.
According to an aspect of the present invention, there is provided an educational information resource attribute labeling system, comprising
The element information module is used for classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification and acquiring an element information set of the education information resource units;
the element relation module is used for extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
the target relation module is used for reasoning and constructing a complete educational information resource unit logical relation by using the educational information resource units, the element information and/or the association relation, and acquiring a target relation set according to the logical relation;
the knowledge sequence module is used for converting the element relation set and the target relation set into corresponding knowledge point names and acquiring one or more knowledge labels corresponding to the education information resource units to form a knowledge sequence;
and the attribute labeling module is used for outputting 'educational information resource units and knowledge sequences' and finishing the attribute labeling of the current educational information resources.
As a preferable preference of the technical scheme of the invention, the preprocessing in the element information module comprises text preprocessing and graphic preprocessing, and the text preprocessing process comprises
The text decomposition module is used for decomposing the text to obtain the combination of words forming the educational information resource unit;
the text processing module is used for identifying, classifying and marking words;
and the text information module is used for acquiring the text information set of the education information resource unit.
As a preferable preference of the technical scheme of the invention, the preprocessing in the element information module comprises text preprocessing and graphic preprocessing, and the graphic preprocessing process comprises
The image decomposition module is used for detecting, identifying and classifying the image information to obtain the composition information of the image information;
the image processing module is used for determining image composition elements and relationships according to the composition information;
and the image information module is used for obtaining an image information set based on the image composition elements and the relationship.
As a preferred embodiment of the present invention, the obtaining of the association relationship in the element relationship module includes syntactic and semantic matching and topology analysis, the syntactic and semantic matching includes,
the syntactic semantic module is used for defining a syntactic semantic template according to the elements, the element properties and/or the element relations;
the training module is used for building a text semantic vector corresponding to the element information according to the syntactic semantic template and training the deep network model by using the text semantic vector;
and the syntactic relation module is used for inputting the element information into the set trained deep network model and deriving the association relation between the elements.
As a preferred embodiment of the present invention, the obtaining of the association relationship in the element relationship module includes syntactic-semantic matching and topology analysis, the topology analysis includes,
the basic topological structure module is used for determining the basic topological structure type according to the properties of the current educational information resource units;
the mapping processing module is used for traversing the image elements according to the types and executing the mapping processing from the basic topological structure to the element relation;
and the topological relation module is used for deriving the incidence relation between the elements according to the mapping result.
As a preferable aspect of the present invention, the target relation module includes,
the element initialization module is used for initializing the element names corresponding to each association relation;
the attribute variable module is used for creating an attribute variable and initializing the attribute variable;
the retrieval module is used for retrieving the < element, attribute variable > in the element relation set, if the < element, attribute variable > are known, the < element, attribute variable > is marked as 1, and if the < element, attribute variable > is not known, the < element, attribute variable > is marked as 0;
and the verification module is used for selecting the element and the attribute variable which are arbitrarily marked as 1, verifying the element and the attribute variable which are arbitrarily marked as 0 by combining with the corresponding logic rule, and adding the logic rule into the target relation set if the verification result is 1.
According to one aspect of the invention, there is provided a memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor:
s1, classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification, and acquiring the element information set of the education information resource units;
s2, extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
s3, reasoning and constructing a complete logic relationship of the educational information resource units by using the educational information resource units, the element information and/or the association relationship, and acquiring a target relationship set according to the logic relationship;
s4, converting the element relation set and the target relation set into corresponding knowledge point names, and acquiring one or more knowledge labels corresponding to the educational information resource units to form a knowledge sequence;
s5, the education information resource unit and the knowledge sequence are output, and the attribute labeling of the current education information resource is completed.
According to an aspect of the present invention, there is provided a terminal comprising a processor adapted to implement instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to:
s1, classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification, and acquiring the element information set of the education information resource units;
s2, extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
s3, reasoning and constructing a complete logic relationship of the educational information resource units by using the educational information resource units, the element information and/or the association relationship, and acquiring a target relationship set according to the logic relationship;
s4, converting the element relation set and the target relation set into corresponding knowledge point names, and acquiring one or more knowledge labels corresponding to the educational information resource units to form a knowledge sequence;
s5, the education information resource unit and the knowledge sequence are output, and the attribute labeling of the current education information resource is completed.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
1) the technical scheme of the invention is a topic text knowledge attribute mining method based on a relation template. The topic style is an important source of the test question knowledge attribute. In the traditional topic text knowledge attribute labeling method, the knowledge attribute contained in the topic text is predicted from the perspective of natural language understanding through the dominant characteristics of vocabulary, semantics, syntax and the like. The attribute mining based on the relation template, which is provided by the invention, deeply understands the education knowledge semantics in the question text by designing a group of relation templates from the aspect of knowledge semantic comprehension, thereby improving the attribute annotation quality.
2) The technical scheme of the invention is a topic graphic knowledge mining method based on topology contraction. Topic graphics are also another important source of educational knowledge attributes. Traditional graphic understanding only stays at the aspect of graphic element detection and identification, and little knowledge is involved in connection relations (also called topological relations, such as serial connection, parallel connection and the like) among elements. The topology contraction-based educational information graph knowledge mining method further analyzes the topological relation among elements on the basis of the element identification result, and mines the educational knowledge attribute in the graph by adopting a topology contraction algorithm.
3) The technical scheme of the invention is a topic implicit knowledge mining method constructed based on solving question logic. The two technical problems mainly mine the knowledge attributes of the question statement from the perspective of question understanding, and the problem implicit knowledge mining method constructed based on the problem solving logic mines the implicit knowledge semantics in the question investigation intention by constructing a problem solving model from the perspective of constructing the problem solving logic.
Drawings
FIG. 1 is a flow chart of a method for annotating educational resource information knowledge attributes in an embodiment of the technical solution of the present invention;
fig. 2 is a process of generating test question labeling information in an embodiment of the technical solution of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The present invention will be described in further detail with reference to specific embodiments.
The technical scheme of the invention provides an educational information resource attribute labeling method, which comprises two parts of test question knowledge attribute labeling and knowledge attribute mining. In this embodiment, the method for tagging attributes of educational information resources is further described by taking a preferred physical circuit topic as an example, but the educational information resources in this embodiment may also be other types of topics, introduction information, or the like, or other subjects, other types of topics, or other data objects containing certain educational information, and this embodiment does not specifically limit this.
If a test question is used as an educational information resource unit, as shown in fig. 1, the test question knowledge attribute labeling framework of the embodiment includes: preprocessing, extracting the subject-to-face relationship, constructing the target relationship and labeling the test question attributes.
Firstly, preprocessing realizes reading in and preprocessing of test question contents of texts, graphs or a combination of the texts and the graphs, and obtains element information of the test questions. That is, test question information is obtained through texts or graphs, that is, the test questions are converted into educational information resource information sets in a format for subsequent process processing.
The text preprocessing of the physical test questions mainly comprises part-of-speech tagging and named entity tagging, and tagging information and original question information are used as one type of input information for extracting the question-face relationship. The part-of-speech tagging and the named entity tagging can adopt a third-party word segmentation tool (such as an automatic analysis system NLPIR-ICTCCLAS of a Chinese academy of sciences, a Chinese word segmentation tool jieba and the like). In order to ensure that the part-of-speech tags of the labeling results of different tools are uniform, the noun, the number word and the quantifier are labeled uniformly in this embodiment, for example, the noun is labeled as/n, the number word is labeled as/m, and the quantifier is labeled as/u. Meanwhile, in order to further identify the name category, the noun mark symbols are preferably further refined by establishing a user dictionary in the embodiment. Taking the physical circuit diagram as an example, the method can be divided into eight categories, namely voltage/juCurrent/jiResistance/jrElectrothermal/jhElectric power/jpElectric power/jqIn series/scAnd parallel connection/sp
In this embodiment, the test question graph is preprocessed, so that a circuit graph with an image as a carrier can be converted into a graph (G) composed of vertices and edges, and the graph is used as another input information extracted by the question-plane relationship. The test question graph preprocessing mainly comprises the detection and the identification of circuit symbols in the image and the detection of the connection relation between the circuit symbols. In this embodiment, the detection and recognition of the circuit symbols may be implemented by training an SVM classifier or the like, and the connection relationship between the circuit symbols may be implemented by a line segment detection algorithm. In a preferred embodiment, the detected line segments are taken as the edge (E) of the graph G, the intersection points of the line segments are taken as the vertex (V) of the graph G, and the obtained circuit symbols are identified as the elements of the edge E, and the elements jointly form the subject input information obtained after the image carrier is preprocessed.
And secondly, acquiring the association relationship among the elements, namely extracting the topic relationship. For the subject in this embodiment, the association relationship may be a mathematical expression or an association relationship in another form, which is not specifically limited in this embodiment.
Taking circuit questions as an example, the extraction of the question-surface relationship in the present embodiment refers to extracting the circuit relationship directly stated in the question surface (also called question stem), including extracting the circuit relationship from the question text and extracting the circuit relationship (element relationship) from the question graph, such as "the resistor R is connected in series with the capacitor C". Specifically, the subjects are different according to the categories of the subjects, such as the subjects of the physics circuit diagram including: ohm's law relation, six kinds of relations of electric power relation, electric heating relation, electric power relation, series relation and parallel relation, etc. The physics mechanics questions include: newton's first and second laws of motion, hooke's law, etc. By analogy, the mathematical sequence subjects, the trigonometric function subjects and the like all have corresponding knowledge point information.
The extraction of the topic relation in this embodiment can be specifically realized by a topic relation extraction method, and the obtained topic circuit relation constitutes a topic relation set and is used as an input for constructing a target relation set and generating a knowledge tag.
And thirdly, constructing a target relation set.
Taking a circuit problem as an example, the target relationship set construction in this embodiment refers to a process of constructing an answer relationship set for solving a target variable by using a problem-plane relationship set and a knowledge base theorem/law relationship set (hereinafter referred to as a hidden relationship set). Specifically, the constructed target relationship set is composed of a set of relationship subsets, the construction of each relationship subset is a relationship pushing process, the relationship pushing process starts from a set of assigned attributes of the circuit entities in the topic relationship set, n relationships (n is 1,2, …, adjustable) of the implicit relationship set are selected each time, and whether the unassigned attributes of the circuit entities in the topic relationship set can be solved or not is verified. If the verification is successful, the set of assigned attributes, the n selected implicit relationships, and the successfully solved unassigned attributes together form the subset of relationships. Further, the plurality of relationship subsets are arranged according to the solving sequence of the unassigned attributes, so that the unassigned attributes which are successfully solved corresponding to the last relationship subset are target variables of the title, and the set of the relationship subsets obtained in this way is a target relationship set. The construction of the target relation is specifically realized by solving problem logic construction processing, and the extracted circuit relation forms a target relation set which is used as the input of the generation of the knowledge label together with the problem relation set.
And fourthly, generating a knowledge tag.
Taking a circuit question as an example, the knowledge tag generation in this embodiment is to convert the circuit relationships in the question relationship set (i.e., the association relationship set between elements) and the target relationship set into specific knowledge point names, and use the converted knowledge point names as knowledge tags of test questions. The conversion process from the circuit relation to the knowledge point name is specifically realized by a knowledge point label generation method. One test question may correspond to a plurality of knowledge points, and all the knowledge points of one question correspond to a knowledge point sequence (knowledge sequence) constituting the question.
And fifthly, outputting.
Taking the circuit question as an example, the final output in this embodiment is a < test question, knowledge point sequence > binary group. The test questions are original input test questions, and the knowledge point sequence is a knowledge point set obtained by the scheme marking.
Further, the knowledge attribute mining method in this embodiment includes a topic relation extraction process, a target relation set construction process, a knowledge point label generation process, and the like. The following is a further description of the circuit diagram, and other subjects of other disciplines can be processed by analogy with the method.
The method comprises the following steps: the processing of extracting the question relation refers to extracting effective physical relation from the question (also called question stem), and the processing comprises syntactic semantic matching and topological structure analysis, and respectively realizes the extraction of circuit relation from the question text and the extraction of circuit relation from the question graph.
Specifically, syntactic semantic matching is to mine the circuit relationship stated in the question text by matching syntactic semantic structural features stated in the circuit relationship stated in the question text. The syntactic and semantic matching process comprises three steps of template definition, model training and structure prediction.
In the template definition phase, the invention defines a group of (seven types) syntactic semantic templates (T-S)2) To match the circuit relationships in the title text. T-S2The template structure is defined as follows:
Mt=(Kt,Pt,Rt) (1)
wherein KtRepresenting a key of a circuit element, PtIs a part of speech pattern, RtFor the circuit relationship between elements, ∑ M ═ Mti=(Ki,Pi,Ri) 1,2, K, m is called T-S2And (4) a template pool. In this embodiment, X represents a topic text, and R represents a knowledge relationship set extracted from the topic text X and is denoted by (R)1,r2,...,rn) X is a set of natural language description sentence sequence, noted as (X)1,x2,...,xm). Syntax semantic matching process is converted into building natural sentence X detected from XiCorresponding template mjThe process of (A) is
Figure BDA0002329403440000091
Wherein n istIs the deep network model parameter, f (x)i,mj) Is mjA corresponding text semantic vector.
Specifically, in other types of topics of other subjects, the required information may be the above element keywords, and in the model training stage, the word embedding technology is used to implement topic style vectorization, so as to train a multi-classification deep neural network for structure prediction. Specifically, topic text vectorization refers to embedding a topic text sequence by using a BERT Chinese pre-training model, and taking the sum of a word vector, a position vector and a sentence segmentation vector as an input of a training deep neural network. The output of the model training is a set of deep neural network parameters, which are called deep network models for short.
In the structural prediction orderSegment, the present embodiment predicts T-S preferably using a deep network model2And (5) template. And the prediction process adopts a maximum optimization mechanism to realize the similarity ranking of the syntactic semantic model. The maximum preference mechanism is composed of a maximum preference probability function and a preference probability objective function. Let A ═ a1,a2,L,amIs a natural sentence xiT-S of2 cSet, then its maximum preferred probability function:
Figure BDA0002329403440000092
wherein n isaAre the model parameters. We obtain each by training the SVM classifier
Figure BDA0002329403440000093
Is preferred. The preferred probability objective function may be defined as follows:
Figure BDA0002329403440000094
among them, the feature vector f (x) is preferredi,ak) And T-S2The part of speech and the syntactic semantic pattern P of the keyword K in (1) are related. And, +, -represent the correct instance or the wrong instance, respectively.
The method takes Top-N (N < ═ 3) as an output result of syntactic semantic matching for the preferred result of the syntactic semantic model of each natural sentence of the topic text. And the output results of all natural sentences of all the topic texts after being matched form a subset of the final topic relation set.
In the topology analysis of this embodiment, preferably, the circuit relationship mining in the topic graph is implemented by analyzing the topology type of the edge E in the graph G and then performing the mapping process from the basic topology to the circuit relationship. The topology structure type of the present embodiment includes two basic topology structures, i.e., a series structure and a parallel structure, and a hybrid topology structure composed of one or more series structures and/or more parallel structures. Mapping of basic topology to circuit relationship refers to writing two basic topologies according to physical circuit laws and rules: the series structure and the parallel structure correspond to the circuit relation expression.
For the basic topological structure, the present embodiment identifies the serial structure and the parallel structure by using a branch method search method, and maps each of the identified serial structure and the identified parallel structure into a group of circuit relationships. The branch search comprises the following steps:
a: and traversing the main path. In graph G, the secondary power source corresponds to the vertex V0Starting, the depth first traverses all edges E at most once and finally returns to the vertex V0And taking the loop with the largest number of the top points in the obtained loops as a main path.
b: and (6) simple path traversal. In graph G, a path consisting of a set of consecutive edges that have not been accessed is treated as a simple path. All the simple answer paths are obtained through searching, and a simple path set is added.
c: and marking a serial structure. And performing branch node detection on each simple path in the simple path set, and marking each simple path not containing a branch node as a series structure.
d: and marking a parallel structure. For each series structure, a sub-path having the same end point is searched in the main path, and the sub-path and the corresponding series structure are labeled together as a parallel structure.
For the mixed topological structure, the scheme adopts a structure contraction method to simplify the mixed topological structure, and a plurality of basic topological structures are formed. And the structure contraction is based on a branch method search method, and the following steps are continuously executed:
e: the serial connection structure is contracted, and for each serial connection structure, one edge is used for replacing a simple path corresponding to the serial connection structure;
f: and (3) shrinking the parallel structure, wherein for each parallel structure, each simple path or sub-path of the parallel structure is shrunk into one edge in the step 1, and then all edges obtained by series connection are shrunk into a new edge.
g: step 1-step 6 are cycled until the main path shrinks to<V0,V0>And (6) ending.
Step two: and constructing a target relation set. Specifically, the method is to simulate a problem solving process by performing relationship reasoning based on a problem-plane relationship extraction result, and construct a solution relation set of target variables (i.e., variables to be solved for test questions), which is called a target relation set for short, as input of test question attribute labeling processing.
And performing relation reasoning according to the extracted topic-to-surface relation, namely performing solvability verification on the variable state in the reasoning table by using the extracted topic-to-surface relation and the implicit relation until the target variable is verified and solvable. The inference table in this embodiment is a two-dimensional table T, and the process of creating and initializing the two-dimensional table T is as follows:
(1) initializing element names: creating a row for each circuit element corresponding to each circuit relationship in the topic relationship set, and naming the row by the name of the circuit element;
(2) initializing variable names: creating seven attribute variables for each row, namely voltage U, current I, resistance R, electric work W, electric power P, electric heat Q and time parameter t;
(3) setting the variable state: for the table cell t [ i, j ], a subject relation set is searched, and if the corresponding physical quantity of t [ i, j ] is known, t [ i, j ] is set to 1, otherwise t [ i, j ] is set to 0. Where i, j represent the row and column coordinates of the table T.
The solvability verification is to use a set of variables t [ i, j ] ═ 1 to arbitrarily select an implicit relationship (for the physical circuit problem, the theorem/law expression in the following table) r to verify whether a new variable value corresponding to the table unit of t [ i ', j' ] ═ 0 can be successfully calculated. If the verification is successful, the variable of the expression r and the set t [ i, j ] ═ 1 is added to the target relationship set.
The theorem/law expression is composed of 20 relational expression templates in six classes (the implicit relationship set composed of the theorem/law expression is shown in the following table).
Figure BDA0002329403440000111
Step three: and labeling the attributes of the test questions.
In this embodiment, the test question attribute labeling refers to a process of generating test question labeling information according to test question features, solving features, a knowledge base and a label base. As shown in fig. 2, the test question features include a set of topic relationship set element keywords. The element keywords are generated by carrying out word frequency statistics on the test question text. The solution features are a set of target relationships. The knowledge base is a set of relevant knowledge of all circuit theorems, rules, theorems and the like, the tag base is a set of names of knowledge points, and the two are obtained through manual statistics of a textbook.
Further, the annotation information generation of the present embodiment includes knowledge point mapping and concept mapping. The knowledge point mapping is to map the circuit relation in the target relation set into corresponding circuit knowledge points through a knowledge base, then perform character matching on the obtained knowledge points and knowledge tags in a tag base, and record the successfully matched knowledge tags as output.
The concept mapping is to map the element keywords corresponding to the effective topic relation set into the knowledge concepts and output the knowledge concepts. The effective topic relation set comprises a set of circuit relations between the topic relation set and the target variable solving, and can be obtained by performing intersection operation on the topic relation set and the target relation set.
And finally, the knowledge point labels and the knowledge concept labels generated in the steps are jointly used as output to form final test question labeling information.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. An educational information resource attribute labeling method is characterized by comprising
S1, classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification, and acquiring the element information set of the education information resource units;
s2, extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
s3, reasoning and constructing a complete logic relationship of the educational information resource units by using the educational information resource units, the element information and/or the association relationship, and acquiring a target relationship set according to the logic relationship;
s4, converting the element relation set and the target relation set into corresponding knowledge point names, and acquiring one or more knowledge labels corresponding to the educational information resource units to form a knowledge sequence;
s5, the education information resource unit and the knowledge sequence are output, and the attribute labeling of the current education information resource is completed.
2. The method for annotating attributes of educational information according to claim 1, wherein said preprocessing of step S1 comprises text preprocessing and graphic preprocessing, and said text preprocessing comprises the following steps
S111, decomposing the text to obtain a combination of words forming an educational information resource unit;
s121, recognizing, classifying and marking words;
s131 acquires a text information set of the education information resource unit.
3. The method for annotating attributes of educational information according to claim 1, wherein said preprocessing of step S1 comprises text preprocessing and graphic preprocessing, and said graphic preprocessing comprises the steps of
S112, detecting, identifying and classifying the image information to obtain the composition information of the image information;
s122, determining image composition elements and relationships according to composition information;
s132 obtains a set of image information based on the image composition elements and the relationships.
4. The method for labeling attribute of educational information resource according to any one of claims 1 to 3, wherein the obtaining of the association relationship in the step S2 comprises syntactic semantic matching and topology analysis, wherein the syntactic semantic matching comprises,
s211, defining a syntactic semantic template according to the elements, the element properties and/or the element relations;
s221, building a text semantic vector corresponding to the element information according to the syntactic semantic template, and training the deep network model by using the text semantic vector;
s231, inputting the element information into the depth network model trained by the set, and deriving the association relation between the elements.
5. The method for labeling attributes of educational information resources according to any one of claims 1 to 3, wherein the obtaining of the association relationship in the step S2 comprises syntactic semantic matching and topological structure analysis, wherein the topological structure analysis comprises,
s212, determining the basic topological structure type of the current educational information resource unit according to the property of the current educational information resource unit;
s222, traversing the image elements according to the types, and executing mapping processing from the basic topological structure to the element relation;
s232 derives the association relationship between the elements according to the mapping result.
6. The method for labeling an attribute of an educational information resource according to any one of claims 1 to 3, wherein said step S3 comprises,
s31, initializing element names corresponding to each association relation;
s32 creating attribute variables and initializing the attribute variables;
s33, searching < element, attribute variable > in the element relation set, if both are known, marking as 1, otherwise marking as 0;
s34 selects any < element, attribute variable > marked as 1, verifies any < element, attribute variable > marked as 0 in combination with the corresponding logic rule, and adds the logic rule to the target relationship set if the verification result is 1.
7. An attribute labeling system for educational information resources, which is characterized by comprising
The element information module is used for classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification and acquiring an element information set of the education information resource units;
the element relation module is used for extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
the target relation module is used for reasoning and constructing a complete educational information resource unit logical relation by using the educational information resource units, the element information and/or the association relation, and acquiring a target relation set according to the logical relation;
the knowledge sequence module is used for converting the element relation set and the target relation set into corresponding knowledge point names and acquiring one or more knowledge labels corresponding to the education information resource units to form a knowledge sequence;
and the attribute labeling module is used for outputting 'educational information resource units and knowledge sequences' and finishing the attribute labeling of the current educational information resources.
8. The system of claim 7, wherein the preprocessing in the element information module comprises text preprocessing and graphic preprocessing, and the text preprocessing comprises text preprocessing
The text decomposition module is used for decomposing the text to obtain the combination of words forming the educational information resource unit;
the text processing module is used for identifying, classifying and marking words;
and the text information module is used for acquiring the text information set of the education information resource unit.
9. The system of claim 7, wherein the preprocessing in the element information module comprises text preprocessing and graphic preprocessing, and the graphic preprocessing comprises text preprocessing and graphic preprocessing
The image decomposition module is used for detecting, identifying and classifying the image information to obtain the composition information of the image information;
the image processing module is used for determining image composition elements and relationships according to the composition information;
and the image information module is used for obtaining an image information set based on the image composition elements and the relationship.
10. An educational information resource attribute labeling system according to any one of claims 7 to 9, wherein the obtaining of the association relationship in the element relationship module comprises syntactic semantic matching and topological structure analysis, the syntactic semantic matching comprises,
the syntactic semantic module is used for defining a syntactic semantic template according to the elements, the element properties and/or the element relations;
the training module is used for building a text semantic vector corresponding to the element information according to the syntactic semantic template and training the deep network model by using the text semantic vector;
and the syntactic relation module is used for inputting the element information into the set trained deep network model and deriving the association relation between the elements.
11. An educational information resource attribute labeling system according to any one of claims 7 to 9, wherein the obtaining of the association relationship in the element relationship module comprises syntactic semantic matching and topological structure analysis, the topological structure analysis comprising,
the basic topological structure module is used for determining the basic topological structure type according to the properties of the current educational information resource units;
the mapping processing module is used for traversing the image elements according to the types and executing the mapping processing from the basic topological structure to the element relation;
and the topological relation module is used for deriving the incidence relation between the elements according to the mapping result.
12. An educational information resource attribute labeling system according to any one of claims 7 to 9, wherein the goal relationship module comprises,
the element initialization module is used for initializing the element names corresponding to each association relation;
the attribute variable module is used for creating an attribute variable and initializing the attribute variable;
the retrieval module is used for retrieving the < element, attribute variable > in the element relation set, if the < element, attribute variable > are known, the < element, attribute variable > is marked as 1, and if the < element, attribute variable > is not known, the < element, attribute variable > is marked as 0;
and the verification module is used for selecting the element and the attribute variable which are arbitrarily marked as 1, verifying the element and the attribute variable which are arbitrarily marked as 0 by combining with the corresponding logic rule, and adding the logic rule into the target relation set if the verification result is 1.
13. A memory device having stored therein a plurality of instructions adapted to be loaded and executed by a processor:
s1, classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification, and acquiring the element information set of the education information resource units;
s2, extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
s3, reasoning and constructing a complete logic relationship of the educational information resource units by using the educational information resource units, the element information and/or the association relationship, and acquiring a target relationship set according to the logic relationship;
s4, converting the element relation set and the target relation set into corresponding knowledge point names, and acquiring one or more knowledge labels corresponding to the educational information resource units to form a knowledge sequence;
s5, the education information resource unit and the knowledge sequence are output, and the attribute labeling of the current education information resource is completed.
14. A terminal comprising a processor adapted to implement instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to:
s1, classifying the information formats of the education information resource units, preprocessing the content of the education information resource units according to the classification, and acquiring the element information set of the education information resource units;
s2, extracting the association relation among a plurality of element information of the education information resource units according to the education information resource units and/or the element information sets to form an element relation set;
s3, reasoning and constructing a complete logic relationship of the educational information resource units by using the educational information resource units, the element information and/or the association relationship, and acquiring a target relationship set according to the logic relationship;
s4, converting the element relation set and the target relation set into corresponding knowledge point names, and acquiring one or more knowledge labels corresponding to the educational information resource units to form a knowledge sequence;
s5, the education information resource unit and the knowledge sequence are output, and the attribute labeling of the current education information resource is completed.
CN201911330362.6A 2019-12-20 2019-12-20 Method and system for labeling education information resource attributes Pending CN110968708A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911330362.6A CN110968708A (en) 2019-12-20 2019-12-20 Method and system for labeling education information resource attributes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911330362.6A CN110968708A (en) 2019-12-20 2019-12-20 Method and system for labeling education information resource attributes

Publications (1)

Publication Number Publication Date
CN110968708A true CN110968708A (en) 2020-04-07

Family

ID=70035618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911330362.6A Pending CN110968708A (en) 2019-12-20 2019-12-20 Method and system for labeling education information resource attributes

Country Status (1)

Country Link
CN (1) CN110968708A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418088A (en) * 2020-11-23 2021-02-26 华中师范大学 Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing
CN112507931A (en) * 2020-12-16 2021-03-16 华南理工大学 Deep learning-based information chart sequence detection method and system
CN113657325A (en) * 2021-08-24 2021-11-16 北京百度网讯科技有限公司 Method, apparatus, medium, and program product for determining annotation style information
CN113743974A (en) * 2021-01-14 2021-12-03 北京沃东天骏信息技术有限公司 Resource recommendation method and device, equipment and storage medium
CN114416890A (en) * 2022-01-21 2022-04-29 中国人民解放军国防科技大学 Heterogeneous knowledge point integrated representation, storage, retrieval, generation and interaction method
CN118608344A (en) * 2024-08-08 2024-09-06 西安启光信息技术有限责任公司 Resource management platform and management method based on educational knowledge

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107783960A (en) * 2017-10-23 2018-03-09 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for Extracting Information
CN108334493A (en) * 2018-01-07 2018-07-27 深圳前海易维教育科技有限公司 A kind of topic knowledge point extraction method based on neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107783960A (en) * 2017-10-23 2018-03-09 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for Extracting Information
US20190122145A1 (en) * 2017-10-23 2019-04-25 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for extracting information
CN108334493A (en) * 2018-01-07 2018-07-27 深圳前海易维教育科技有限公司 A kind of topic knowledge point extraction method based on neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINGUO YU ET AL: "Extraction Algebraic Relations from Circuit Images Using Topology Breaking Down and Shrinking", 《PACIFIC-RIM SYMPOSIUM ON IMAGE AND VIDEO TECHNOLOGY》 *
何彬等: "基于属性关系深度挖掘的试题知识点标注模型", 《南京信息工程大学学报》 *
菅朋朋等: "一种基于图文理解的电路题目自动解答方法", 《通信技术》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418088A (en) * 2020-11-23 2021-02-26 华中师范大学 Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing
CN112418088B (en) * 2020-11-23 2022-04-29 华中师范大学 Video learning resource extraction and knowledge annotation method and system based on crowd-sourcing
CN112507931A (en) * 2020-12-16 2021-03-16 华南理工大学 Deep learning-based information chart sequence detection method and system
CN112507931B (en) * 2020-12-16 2023-12-22 华南理工大学 Deep learning-based information chart sequence detection method and system
CN113743974A (en) * 2021-01-14 2021-12-03 北京沃东天骏信息技术有限公司 Resource recommendation method and device, equipment and storage medium
CN113657325A (en) * 2021-08-24 2021-11-16 北京百度网讯科技有限公司 Method, apparatus, medium, and program product for determining annotation style information
CN113657325B (en) * 2021-08-24 2024-04-12 北京百度网讯科技有限公司 Method, apparatus, medium and program product for determining annotation style information
CN114416890A (en) * 2022-01-21 2022-04-29 中国人民解放军国防科技大学 Heterogeneous knowledge point integrated representation, storage, retrieval, generation and interaction method
CN118608344A (en) * 2024-08-08 2024-09-06 西安启光信息技术有限责任公司 Resource management platform and management method based on educational knowledge

Similar Documents

Publication Publication Date Title
Neculoiu et al. Learning text similarity with siamese recurrent networks
CN110968708A (en) Method and system for labeling education information resource attributes
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
CN111639171A (en) Knowledge graph question-answering method and device
CN111475629A (en) Knowledge graph construction method and system for math tutoring question-answering system
CN111738007B (en) Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
Contreras et al. Automated essay scoring with ontology based on text mining and nltk tools
CN112115238A (en) Question-answering method and system based on BERT and knowledge base
CN111209384A (en) Question and answer data processing method and device based on artificial intelligence and electronic equipment
CN111858896B (en) Knowledge base question-answering method based on deep learning
CN108021703B (en) Conversation type intelligent teaching system
CN114238653B (en) Method for constructing programming education knowledge graph, completing and intelligently asking and answering
CN114218379B (en) Attribution method for question answering incapacity of intelligent question answering system
CN112328800A (en) System and method for automatically generating programming specification question answers
CN113505589B (en) MOOC learner cognitive behavior recognition method based on BERT model
CN112686025A (en) Chinese choice question interference item generation method based on free text
CN114970563B (en) Chinese question generation method and system fusing content and form diversity
Agarwal et al. Autoeval: A nlp approach for automatic test evaluation system
CN115309910A (en) Language piece element and element relation combined extraction method and knowledge graph construction method
CN112966518B (en) High-quality answer identification method for large-scale online learning platform
CN117540063A (en) Education field knowledge base searching optimization method and device based on problem generation
CN112397201A (en) Restated sentence generation optimization method for intelligent inquiry system
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN113157932B (en) Metaphor calculation and device based on knowledge graph representation learning
CN115964486A (en) Small sample intention recognition method based on data enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407