Disclosure of Invention
The purpose of the invention is as follows:
the invention provides a question bank generating system based on a web crawler and an application method thereof, aiming at solving the problems that the existing question bank system can not automatically generate test paper and the content of test questions is not novel enough.
The technical scheme is as follows:
a question bank generating system based on a web crawler comprises a system development framework module, a database module and a server, wherein the system development framework module is connected with the database module, and the system development framework module and the database module are built on the server;
a crawler module, a question bank management module and an intelligent volume algorithm module are nested in the system development framework module, and the crawler module, the question bank management module and the intelligent volume algorithm module are separated and matched with each other;
the crawler module is used for capturing test question contents in a webpage, and storing various test questions into the source test question resource library module by preliminarily marking the test question contents through an administrator;
the question bank management module is used for storing the network exercise resources dynamically collected by the crawler module into a test question resource bank according to the knowledge points and providing a test question source for the intelligent paper organizing module;
the intelligent test paper combination algorithm module is used for carrying out fragmentation management on the test questions on the basis of the knowledge points and screening the test questions to form a set of complete test papers when the test papers are combined;
the database module comprises a source test question resource database module and a user test question resource database module;
the source test question resource library module is used for storing the initially marked test question resource information and provides a question source for the optional lessee to create the test question resource library;
the user test question resource library module is used for storing user-defined course information, test question resources recorded by the user and test paper resources generated by the user.
The server is a WSGI server with a flash framework.
An application method of a question bank generating system based on a web crawler comprises the following steps:
1) collecting and updating test questions: the method comprises the steps of grabbing test questions by a crawler module and compiling the test questions by a user, and storing the collected test questions into a source test question resource library module;
2) creating a synopsis: storing the custom course outline according to the self requirements of the user into a user test question resource library module;
3) screening test questions: marking the scores and the difficulty level of the test questions extracted from the user test question resource library module, and storing the test questions to the user test question resource library module, wherein repeated test questions are not stored;
4) and (3) test paper generation: the intelligent test paper algorithm module extracts test questions from the test questions stored in the user test question resource library module to form test papers.
In the step 1), the crawler module captures test question resources from the URL to the webpage, marks test question types of the captured test question resources, and finally stores the test question resources into the source test question resource library module.
Step 4), the intelligent volume-assembling algorithm of the intelligent volume-assembling algorithm module comprises the following steps:
step1, quantifying the test questions in each course through analyzing the knowledge points and the test questions to obtain fragmented test questions;
step2, combing the mutual association relation among the fragmented test questions to determine the constraint conditions of the intelligent test paper combination;
step3 adopts the elastic child indexing engine to search, and adds weight to different index key words, finally group the volume. The constraint conditions include:
condition 1:
the fraction of the test questions i in the test paper, n is the total number of the test questions in the test paper, and the fraction of the test paper is to reach the full score;
condition 2:
the difficulty value of the test questions i in the test paper, n is the total number of the test questions in the test paper, and the difficulty of the test paper is determined according to the requirements of the user;
condition 3: the exposure of the test question i in the test paper is smaller than the average exposure of the test questions of the corresponding question types in the corresponding chapters;
condition 4: the recently selected mark bit of the test questions is false, and the low repetition rate of the test questions selected by two adjacent test papers is ensured.
Advantageous effects
The invention uses Web crawler technology to traverse Web space, continuously moves from one site to another site, automatically establishes indexes, integrates test questions in various places into a database, has novel test question content, classifies various test questions and automatically forms paper, and can be browsed online by a user or printed by a traditional method. The invention is customized for teachers, provides support for teachers to teach, and further improves teaching quality.
Detailed Description
The invention is described in more detail below with reference to the accompanying drawings.
The invention develops a set of Web test paper automatic generation system capable of automatically updating the test paper resources by crawling the network resources, which can greatly reduce the workload of teaching teachers, and the question bank system can automatically generate test papers with novel test paper contents, improves the paper forming efficiency to a certain extent, is a significant work and is also a real urgent need.
The intelligent question bank generating system based on the web crawler can be divided into two versions, one version is divided into three functional modules, namely a question collecting module, a question sorting module and a test paper management module, wherein the question collecting module realizes the function of manually adding test questions by a manager to realize database updating, the question sorting module realizes six functions of course classification, question type classification, chapter classification, difficulty classification, test question score and test paper exposure, and the test paper management module realizes three functions of intelligent test paper grouping, manual test paper grouping and test paper setting; the second version adds the function of automatically adding the test question resources by the web crawler on the basis of the first version, so that the updating efficiency of the test question resources is improved, the problem that the test question contents are not novel is solved, and a specific functional module diagram is shown in fig. 1. In the aspect of a database, 12 tables are designed in the project and used for storing the basic information of the user and the test question resource information, and a specific database overview table is shown in table 1.
Table 1 database overview
As shown in fig. 1, a general user browses a website page in an unregistered state, and the website only provides an online resource viewing function and only provides 10 sets of recently generated test question resources;
creating a synopsis: the user can create a course outline according to the self requirement, and the creating process is to create a course, create a course section and create a section knowledge point;
screening test questions: a user can select a plurality of questions from a source test question resource database according to own requirements, and the selected questions are added into the user's own test question resource database according to created outline according to needs;
and (3) test paper generation: the user selects the number of various test questions under the corresponding knowledge points from the own test question resource library according to the requirement, and the system can screen out the best matched test questions according to factors such as exposure, difficulty and the like and generate a set of test paper with a complete format;
and (3) updating crawler resources: the administrator can fill in the URL in the background and select the course, the system can automatically crawl the test question resource of the corresponding page under the URL and display the test question resource in the content display bar, and the management can upload the test question resource to the database after simply composing the test question and update the test question resource;
the above is the main function of the system, and besides, the system also largely uses technologies such as cache, etc., so as to improve the system retrieval efficiency.
In order to realize the functions, the invention adopts the following technical scheme: the utility model provides a topic bank generating system based on web crawler which characterized in that: the system comprises a system development frame module, a database module and a server, wherein the system development frame module is connected with the database module, and the system development frame module and the database module are built on the server;
a crawler module, a question bank management module and an intelligent volume algorithm module are nested in the system development framework module, and the crawler module, the question bank management module and the intelligent volume algorithm module are separated and matched with each other;
the crawler module captures test question contents in the webpage by utilizing the rule that the test question contents in the webpage are all in the designated Html tag, and stores various test questions into the source test question resource library module by preliminarily marking the test question contents through an administrator.
The question bank management module is used for storing the network exercise resources dynamically collected by the crawler module into a test question resource bank according to the knowledge points and providing a test question source for the intelligent paper organizing module;
the intelligent test paper organizing algorithm module conducts fragmentation management on the test questions on the basis of the knowledge points, and screens the test questions by considering factors such as test question types, discrimination, exposure and the like during paper organizing to form a set of complete test papers.
The database module is Mysql5 and comprises a source test question resource library module and a user test question resource library module, the source test question resource library module is used for storing the initially marked test question resource information, and the module provides question sources for the lessee to create a test question resource library; the user test question resource library module stores the user-defined course information, various test question resources recorded by the user, test paper resources generated by the user and other information.
The system development framework module is a flash framework which is light, compact and strong in expansibility, maintainability of the system is improved, and later maintenance and expansion are facilitated.
The server is a WSGI server with a flash framework.
The invention relates to a question bank generation system based on a web crawler, which comprises the following steps:
1) building a basic system framework by utilizing the system development framework module, the database module and the server;
2) developing an intelligent volume algorithm module and a crawler module, and nesting the intelligent volume algorithm module and the crawler module into a development framework;
3) the administrator crawls a rich source test question resource library module of test question resources in a webpage according to the URL, and the user extracts a rich individual test question resource library of test questions from the source test question resource library module according to the self requirement.
The invention relates to an application method of a question bank generating system based on a web crawler, which comprises the following steps:
1) the administrator crawls test question resources in a webpage according to the URL by using a crawler module at the background, simply marks test question types of the captured test question contents, and finally stores the test question contents into a source test question resource library module;
2) the common user self-defines a course outline according to self requirements at a foreground, the outline can create courses, course chapters and chapter knowledge points according to levels and store the courses, the course chapters and the chapter knowledge points into a user test question resource library module;
3) ordinary users extract a specified number of test questions from the source test question resource library module according to self requirements, mark test question scores and difficulty level, and selectively store the test questions under specified courses, chapters and knowledge points, namely repeated test questions are not stored, and unrepeated test question users selectively store the test questions in the user test question resource library module so as to form a complete and comprehensive test paper according to the knowledge points at a later stage;
4) the method comprises the following steps that a common user specifies the number of various types of test questions under each knowledge point and the difficulty level of test paper according to the self requirement, an intelligent paper-assembling algorithm module extracts the specified number of test questions from a user test question resource library module according to the knowledge points, preferentially considers the test questions with low exposure level in the extraction process, and simultaneously calculates the score and the difficulty level to ensure that the score is 100 and the average difficulty level is the user level specified by the user, so that the randomness, the comprehensiveness, the innovation and the low repetition rate of the test paper are realized;
5) the ordinary user not only extracts the test questions from the source test question resource library module and stores the test questions into the user test question resource library module, but also can write the test questions by himself and input the test questions into the user test question resource library module;
6) the formed test paper can be read on line or printed for use.
Evaluation of product quality
Functionality: all basic functions in the product are realized, and a large number of sample tests can be carried out.
Reliability: the program can handle most exceptions, usually without crashing.
Ease of use: the system interface is clean and tidy, the identification is clear, and most users can use the system without referring to a user manual.
Efficiency: the system uses technologies such as multithreading, caching and the like, optimizes an intelligent retrieval algorithm, and improves the operating efficiency of the system.
Maintenance performance: the product is developed based on a flash framework, and the flash framework has the characteristics of lightness, conciseness and strong expansibility, so that the maintainability of the system is improved, and the later expansion is facilitated.
Portability: the product can be compatible in a Linux environment, and in the aspect of user use, Chrome, Firefox, IE10 and the above version browsers can support.
Evaluation of technical methods
A system development framework module: the product takes a flash frame as a main body, the flash Web frame is a younger one in a family, so that the product absorbs the advantages of other frames, the main field of the product is defined on a smiling project, meanwhile, the product is extensible, the flash enables a developer to select which database plug-in to store data of the developer, the product is a micro-frame facing simple requirements and small applications, a plurality of websites with simple functions and excellent performance are built based on the flash frame, and in the product development process, the language of a selected area is Python language seamlessly linked with the flash frame, so that the efficiency of the system is further improved. The intelligent volume-combining algorithm: the invention designs and realizes a fragmentation-oriented intelligent question bank system by combining the practical situation of first-line teaching on the basis of analyzing related files of recent domestic and foreign online education and online examination systems, and the specific scheme is as follows: (1) firstly, a test question organization model is constructed through analysis of knowledge points and test questions, a strategy associated with the knowledge points is adopted on the organization of the test questions through design summary of the model, and the association relation among fragmented test questions is effectively combed through systematization of the knowledge points, so that the intelligent question bank system can better organize and manage the test questions; the test question organization model divides the difficulty value of the test question into five grades, namely a difficult question-5, a difficult question-4, a middle difficulty question-3, an easier question-2 and an easy question-1; the test question score is set by the teaching teacher according to the question type; setting a flag bit for each question to indicate whether the question is selected recently; the total exposure, i.e., the total number of selections, is calculated for each topic. (2) On the basis of deep analysis of the basic attributes of the test questions, the key attributes of the test questions are combinedComprehensively considering the test question type, the discrimination, the exposure and the like, and determining the constraint conditions of intelligent paper grouping; the constraint conditions include: condition 1:


the fraction of the test questions i in the test paper, n is the total number of the test questions in the test paper, and the fraction of the test paper is to reach the full score; condition 2:

the difficulty value of the test questions i in the test paper, n is the total number of the test questions in the test paper, and the difficulty of the test paper is determined according to the requirements of the user; condition 3: exposure of test question i in test paper<Average exposure of the test questions corresponding to the question type in the corresponding chapter, wherein each question in the test paper is used least or not used for the longest time as possible; condition 4: the recently selected mark bit of the test questions is false, and the repetition rate of the test questions selected by two adjacent test papers is ensured to be as low as possible. (3) Secondly, for searching resources such as test questions and knowledge points, the elastic searching engine is adopted, weights are added to different index keywords, the efficiency of searching the resources such as the test questions is improved, and finally the paper is formed.
Web crawlers: in order to efficiently store and update the test questions, the network crawler and related components are packaged inside the system. The crawler module captures the test question contents in the webpage according to the rule that the test question contents are in the designated Html label, and stores various test questions into the source test question resource library module by preliminarily marking the test question contents through an administrator, so that the source test question resource library module is updated.
Combining a web crawler with a system platform: the crawler module is packaged in the background of the product, and an administrator can acquire all test question contents under the URL only by filling in the corresponding URL, so that batch addition can be realized on the basis, and the efficiency is greatly improved. A complete question bank with wide source of the test questions and various questions is constructed.
In the aspect of project maintenance, the development framework of the system is a flash framework, the flash is free, flexible and high in expandability, the third-party library is wide in selection range, and the later maintenance and the expansion of the system are facilitated.
In the aspect of project popularization, the examination is still used as a main mode for examining the work of teachers and the performance of students, so that the system has strong popularization significance and practical significance. The system can save a large amount of time of teachers, and in the later period, students can be popularized and mobile phone applications with wider audiences can be developed, so that the learning scores and enthusiasm of the students can be effectively improved.
The main method for updating the test question resource library by the system is that a web crawler is used for grabbing test question resources, the test question resources are processed, sorted and classified, the test question resources are simply divided into three types of test questions, namely, a selection question, a blank filling question and a short answer question, and then the test question resources are stored into a source test question resource library module by a manager for being selected and used by a user. Through the intelligent natural language recognition technology, the test questions are rapidly processed in batches, the burden of an administrator is reduced, and therefore the resource updating efficiency is improved.