Automatic test question classification system and method
Technical Field
The invention relates to the technical field of network education, in particular to an automatic test question classification system and method.
Background
The continuous development of computer technology and network technology makes some paperless and random question practice and examination modes increasingly applied to various examination systems, and the realization of the application is not separated from an electronic question library. In the electronic test question bank, the test questions are manually input and can be used for a long time after one time input, however, with the continuous and large quantity supplement of the test questions, the problem of test question bank management is quite prominent, namely, the conventional manual classification and detection cannot be used for the difficult work. Therefore, how to effectively classify the questions is a problem to be solved in the test question bank Guan Lizhong.
The test question library management has very important significance and value, and the good test question library management system can provide effective and rich test question data for an examination system and also can provide a new teaching platform for teaching and learning activities of teachers and students. However, the sharing of information resources and the explosive growth of the amount of information data brought by the network make the traditional manual information processing method impractical, so that an information processing method with higher automation degree and better efficiency is needed to help people to perform test question classification more efficiently.
At present, the management of the test question library is classified manually, or the classification is known before the test questions are put in storage, and the test questions of the corresponding classification are imported into the database. At present, a machine learning and deep learning method is also utilized to realize text classification, but the accuracy rate meets the bottleneck.
Disclosure of Invention
The invention aims at overcoming the defects in the prior art and provides an automatic test question classification system and method.
The aim of the invention is achieved by the following technical scheme: the automatic test question classification system comprises a test question database to be classified, an automatic classification module, a tag library, a test question database to be checked, a manual checking module, a manual classification module, an automatic classification correction module, a classified test question database and a similar question searching module;
the automatic classification module is used for matching the test questions in the test question database to be classified with the labels of the label library and transmitting the matched test questions to the test question database to be checked;
the manual checking module is used for checking the correctness of automatic classification of the test questions, transmitting the test questions to the classified test question database if the automatic classification is correct, and transmitting the test questions to the manual classification module if the automatic classification is incorrect;
the manual classification module is used for matching the test questions with the labels of the label library, if the labels matched with the test questions exist in the label library, directly matching the test questions with the labels, if the labels matched with the test questions do not exist in the label library, creating new labels matched with the test questions, adding the new labels into the label library, and transmitting the matched test questions to the classified test question database;
the automatic classification correction module is used for improving the correctness of the automatic classification module;
the similar question searching module is used for further classifying the questions of the classified question database.
A classification method comprising the steps of:
step A: the automatic classification module matches the test questions in the test question database to be classified with the labels of the label library, and transmits the matched test questions to the test question database to be checked;
and (B) step (B): the manual examination module examines the automatic classification test questions of the test question database to be examined, if the automatic classification is correct, the test questions are transmitted to the classified test question database, and if the automatic classification is incorrect, the test questions are transmitted to the manual classification module;
step C: the manual classification module matches the test questions with the labels of the label library, if the labels matched with the test questions exist in the label library, the test questions are directly matched with the labels, if the labels matched with the test questions do not exist in the label library, new labels are created to be matched with the test questions, the new labels are added into the label library, and the matched test questions are transmitted to the classified test question database;
step D: the automatic classification correction module feeds the accuracy of the automatic classification and the accuracy of the manual classification back to the automatic classification module to improve the accuracy of the automatic classification module; .
Step E: the similar question searching module further classifies the questions of the classified question database.
The invention further provides that the step A comprises the following steps:
a1: extracting test question feature words from the test questions to be classified in the test question database to be classified;
a2: storing the test question feature words into a test question feature word set;
a3: extracting tag characteristic words from the existing tags in the tag library;
a4: storing the tag feature words into a tag feature word set;
a5: the test question feature word set and the label feature word set pass through a K nearest neighbor algorithm model;
a6: and matching the most similar labels with the test questions to be classified.
The invention further provides that the step A1 of extracting the test question feature words comprises the following steps:
a1: preprocessing the test questions to be classified of the test question database to be classified;
a2: one test question is put forward from the pretreated test questions;
a3: word segmentation is carried out on the test questions;
a4: obtaining a plurality of candidate words after word segmentation;
a5: calculating the weight of each candidate word;
a6: and obtaining the test question feature words of the test questions to be classified.
The invention further provides that the step D comprises the following steps:
b1: a new tag library is obtained through a manual classification module, and a new tag characteristic word set is manually extracted from the new tag library;
b1: counting the number x of correct matching times of the test question feature word set and the automatically extracted tag feature word set;
b2: counting the number y of correct matching of the test question feature word set and the manually extracted tag feature word set;
b3: if y is larger than x, replacing the automatically extracted tag feature word set with the manually extracted tag feature word set when the automatic classification module automatically classifies the tag feature word set next time; if y is smaller than x, the label feature word set extracted automatically last time is continuously used when the automatic classification module automatically classifies next time.
The invention further provides that the step E comprises the following steps:
c1: extracting a test question feature word set of each test question;
c2: sorting out the similarity of the test question feature word set;
and C3: finding out similar topics.
The invention is further arranged that one test question can be matched with a plurality of labels.
The invention is further arranged that the candidate words include nouns, formulas, symbols, and graphics.
The invention has the beneficial effects that: 1. when the test questions are automatically classified, the high-accuracy tag feature word set is used for replacing the low-accuracy tag feature word set, so that the classification accuracy is improved continuously; 2. extracting test question feature words in test questions and label feature words in labels, and finding out corresponding labels by using a K nearest neighbor algorithm, so that the speed of classifying the test questions is greatly improved; 3. a perfect manual auditing mechanism is established, and the accuracy of classifying the question bank is ensured.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
FIG. 1 is a system flow diagram of the present invention;
FIG. 2 is a flow chart of the automatic classification module of the present invention;
FIG. 3 is a flow chart of the extraction of test question feature words of the present invention;
FIG. 4 is a flow chart of a manual audit module of the present invention;
FIG. 5 is a flow chart of the manual classification module of the present invention;
FIG. 6 is a flow chart of the automatic classification correction module of the present invention;
FIG. 7 is a flow chart of the homogeneous title search module of the present invention.
Detailed Description
The invention will be further described with reference to the following examples.
As can be seen from fig. 1 to 7; the automatic test question classification system comprises a test question database to be classified, an automatic classification module, a tag library, a test question database to be checked, a manual checking module, a manual classification module, an automatic classification correction module, a classified test question database and a similar question searching module;
the automatic classification module is used for matching the test questions in the test question database to be classified with the labels of the label library and transmitting the matched test questions to the test question database to be checked;
the manual checking module is used for checking the correctness of automatic classification of the test questions, transmitting the test questions to the classified test question database if the automatic classification is correct, and transmitting the test questions to the manual classification module if the automatic classification is incorrect;
the manual classification module is used for matching the test questions with the labels of the label library, if the labels matched with the test questions exist in the label library, directly matching the test questions with the labels, if the labels matched with the test questions do not exist in the label library, creating new labels matched with the test questions, adding the new labels into the label library, and transmitting the matched test questions to the classified test question database;
the automatic classification correction module is used for improving the correctness of the automatic classification module;
the similar question searching module is used for further classifying the questions of the classified question database.
When the test questions are automatically classified, the high-accuracy tag feature word set is used for replacing the low-accuracy tag feature word set, so that the classification accuracy is improved continuously; extracting test question feature words in test questions and label feature words in labels, and finding out corresponding labels by using a K nearest neighbor algorithm, so that the speed of classifying the test questions is greatly improved; a perfect manual auditing mechanism is established, and the accuracy of classifying the question bank is ensured.
The automatic test question classification method of the embodiment comprises the following steps:
step A: the automatic classification module matches the test questions in the test question database to be classified with the labels of the label library, and transmits the matched test questions to the test question database to be checked;
and (B) step (B): the manual examination module examines the automatic classification test questions of the test question database to be examined, if the automatic classification is correct, the test questions are transmitted to the classified test question database, and if the automatic classification is incorrect, the test questions are transmitted to the manual classification module;
step C: the manual classification module matches the test questions with the labels of the label library, if the labels matched with the test questions exist in the label library, the test questions are directly matched with the labels, if the labels matched with the test questions do not exist in the label library, new labels are created to be matched with the test questions, the new labels are added into the label library, and the matched test questions are transmitted to the classified test question database;
step D: the automatic classification correction module feeds the accuracy of the automatic classification and the accuracy of the manual classification back to the automatic classification module to improve the accuracy of the automatic classification module; .
Step E: the similar question searching module further classifies the questions of the classified question database.
The automatic test question classification method of the embodiment, the step A includes the following steps:
a1: extracting test question feature words from the test questions to be classified in the test question database to be classified;
a2: storing the test question feature words into a test question feature word set;
a3: extracting tag characteristic words from the existing tags in the tag library;
a4: storing the tag feature words into a tag feature word set;
a5: the test question feature word set and the label feature word set pass through a K nearest neighbor algorithm model;
a6: and matching the most similar labels with the test questions to be classified.
The automatic classification method for test questions in the embodiment, wherein the step A1 of extracting test question feature words comprises the following steps:
a1: preprocessing the test questions to be classified of the test question database to be classified;
a2: one test question is put forward from the pretreated test questions;
a3: word segmentation is carried out on the test questions;
a4: obtaining a plurality of candidate words after word segmentation;
a5: calculating the weight of each candidate word;
a6: and obtaining the test question feature words of the test questions to be classified.
A test question is usually proposed aiming at knowledge in a certain field, the knowledge in the field has corresponding attributes or characteristic words, and the characteristic words of certain knowledge can highlight what attributes the test question belongs to.
An unsupervised text keyword extraction, i.e. feature statistics keyword extraction, is presented herein, and the idea of a statistical feature-based keyword extraction algorithm is to extract keywords of a document using statistics of words in the document. The text is usually preprocessed to obtain a set of candidate words, and then keywords are obtained from the candidate set in a characteristic value quantization mode. The keyword extraction method based on the statistical features utilizes feature quantization based on word weights, wherein the feature quantization based on the word weights mainly comprises parts of speech, word frequency, word position and the like. For test question classification, feature quantization based on word weight mainly aims at parts of speech, nouns and formulas, and the invention increases the weight of nouns and reduces the weight of adjectives, verbs and the like.
The automatic test question classification method of the embodiment, the step D includes the following steps:
b1: a new tag library is obtained through a manual classification module, and a new tag characteristic word set is manually extracted from the new tag library;
b1: counting the number x of correct matching times of the test question feature word set and the automatically extracted tag feature word set;
b2: counting the number y of correct matching of the test question feature word set and the manually extracted tag feature word set;
b3: if y is larger than x, replacing the automatically extracted tag feature word set with the manually extracted tag feature word set when the automatic classification module automatically classifies the tag feature word set next time; if y is smaller than x, the label feature word set extracted automatically last time is continuously used when the automatic classification module automatically classifies next time.
The automatic test question classification method of the embodiment, the step E includes the following steps:
c1: extracting a test question feature word set of each test question;
c2: sorting out the similarity of the test question feature word set;
and C3: finding out similar topics.
According to the test question automatic classification method, one test question can be matched with a plurality of labels.
According to the test question automatic classification method, the candidate words comprise nouns, formulas, symbols and figures.
When the test questions are automatically classified, the high-accuracy tag feature word set is used for replacing the low-accuracy tag feature word set, so that the classification accuracy is improved continuously; extracting test question feature words in test questions and label feature words in labels, and finding out corresponding labels by using a K nearest neighbor algorithm, so that the speed of classifying the test questions is greatly improved; a perfect manual auditing mechanism is established, and the accuracy of classifying the question bank is ensured.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.