CN113569540B

CN113569540B - Test paper generation method and device based on socioeconomic teaching materials

Info

Publication number: CN113569540B
Application number: CN202110528654.1A
Authority: CN
Inventors: 李冠艺; 徐林海; 赵南南; 华解语; 郑孔迪
Original assignee: Nanjing Allpass Information Industry Co ltd
Current assignee: Nanjing Allpass Information Industry Co ltd
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2023-10-20
Anticipated expiration: 2041-05-14
Also published as: CN113569540A

Abstract

The application provides a test paper generating method and device based on social science teaching materials, wherein the method comprises the steps of identifying paper social science teaching materials and obtaining electronic documents corresponding to the teaching materials; searching a plurality of text segments (formed by one or a plurality of continuous sentences in the electronic document) which accord with text characteristics in the electronic document, and identifying text segments containing professional terms in the text segments by using a term identification model; generating at least one topic according to each text segment containing a term of art; extracting a plurality of knowledge points from the catalogue and each question of the electronic document respectively; combining the knowledge points and the position information of the text segment for generating the questions, and constructing a knowledge graph through knowledge fusion; and searching in the knowledge graph according to the designated test paper generation parameters to obtain a set of test paper questions conforming to the test paper generation parameters, and combining a plurality of questions contained in the set of test paper questions into the test paper. According to the scheme, the questions are automatically generated by using the teaching materials, and the automatic unreeling efficiency is effectively improved.

Description

Test paper generation method and device based on socioeconomic teaching materials

Technical Field

The application relates to the field of computer-aided teaching, in particular to a test paper generation method and device based on socioeconomic teaching materials.

Background

With the development of computer technology, computer-aided teaching developed by various computer devices has been widely used in society. Among them, computer-aided automatic unreeling is an important application of computer-aided teaching. The existing automatic paper discharging technology generally collects a large number of manually written questions (including stems and answers) into a question bank in advance, reads a certain amount of questions from the question bank when paper discharging is required, and provides the questions to the textstaff for the textstaff to compose a test paper by using the questions.

It can be seen that the existing automatic unreeling technology can only use the pre-written questions to carry out reeling, and a large number of questions in the question bank still need to be manually edited by related personnel, so that the time consumption for constructing the question bank is long and the efficiency is low.

Disclosure of Invention

Aiming at the problems in the prior art, the application provides a test paper generation method and a test paper generation device based on social science teaching materials, so as to provide a scheme for automatically generating questions and completing a set of test paper by using the social science teaching materials.

The application provides a test paper generation method based on social science textbooks, which comprises the following steps:

identifying paper socioeconomic teaching materials according to an optical character identification technology, and obtaining an electronic document corresponding to the socioeconomic teaching materials;

Searching a plurality of text segments conforming to any one of the first text feature, the second text feature and the third text feature in the electronic document by utilizing a pre-constructed regular expression, identifying the text segments containing the professional terms in the text segments by utilizing a pre-trained semantic identification model, and recording the position information of each text segment in the electronic document; wherein a text segment is formed by one or a plurality of continuous sentences in the electronic document;

generating at least one title according to each text segment containing a technical term; the title type of the title comprises a blank filling title and a selection title;

extracting a plurality of knowledge points from the catalogue of the electronic document and each question respectively; wherein knowledge points refer to entities, relationships, and attributes contained in the catalog or the topic;

combining the knowledge points and the position information of the text segment for generating the questions, and constructing a knowledge graph through knowledge fusion; the knowledge graph comprises nodes and edges for connecting the nodes; the node characterizes knowledge points, titles or subtitles of the catalog;

Searching the knowledge graph according to the specified test paper generation parameters to obtain a set of test paper questions conforming to the test paper generation parameters, and combining a plurality of questions contained in the set of test paper questions into a test paper; the test paper generation parameters comprise knowledge point parameters, question type parameters, difficulty parameters and knowledge relevance parameters;

the first text feature is that the font of the text segment is a preset target font, and the text segment contains a specified first feature word;

the second text feature is that the text segment is a single sentence and is positioned at the beginning or the end of any natural segment of the electronic document, the end symbol of the sentence is a period, and the sentence contains a specified second feature word and does not contain a specified third feature word;

the third text feature is that the text segment comprises a plurality of sentences, wherein the first sentence comprises a designated fourth feature word, and the sentences except the first sentence are provided with sequence numbers positioned at the beginning of the sentence;

wherein the generating at least one topic from the text segment comprises:

extracting at least one keyword from the text segment by using a natural language processing technology aiming at the text segment conforming to the first text feature or the second text feature to obtain a blank filling problem; the text segment after the keyword is extracted is used as a stem of the gap-filling question, and the extracted keyword is used as an answer of the gap-filling question;

And selecting at least one sentence from a plurality of sentences with serial numbers in the text section as a correct answer of the selection question aiming at the text section conforming to the third text characteristic, replacing keywords of the sentences with serial numbers except the correct answer with similar words to obtain a wrong answer of the selection question, and determining the first sentence of the text section as a question stem of the selection question to obtain the selection question.

Optionally, the replacing the keywords of the sentences with sequence numbers except the correct answer with similar words includes:

obtaining a preset target difficulty, and determining a similarity threshold corresponding to the target difficulty; wherein, the size of the similarity threshold value is positively correlated with the height of the target difficulty;

aiming at keywords of other sentences with sequence numbers except the correct answer, searching words with similarity not more than a similarity threshold corresponding to the target difficulty degree as similar words;

and replacing the keywords of other sentences with serial numbers except the correct answer with the similar words.

Optionally, the searching in the knowledge graph to obtain the set of examination paper questions meeting the examination paper generation parameters according to the specified examination paper generation parameters includes:

Searching each knowledge point contained in the knowledge point parameters in the knowledge map, and taking the searched knowledge point as a first knowledge point;

searching a second knowledge point connected with the first knowledge point in the knowledge graph according to the knowledge relevance parameter;

extracting topics containing the first knowledge points and/or the second knowledge points to obtain a topic set;

and adjusting the proportion of the questions with different difficulties in the question set according to the difficulty parameter, and adjusting the proportion of the questions with different question types in the question set according to the question type parameter to obtain the group-wound question set.

Optionally, the recording the position information of each text segment in the electronic document includes;

and recording the logical position and the structural position of each text segment in the electronic document.

Optionally, after the knowledge points are combined with the position information of the text segment for generating the title and knowledge fusion is performed to construct a knowledge graph, the method further includes:

and counting the topic coverage and topic difficulty distribution of the topics recorded in the knowledge graph.

The application also provides a test paper generating device based on the social science teaching material, which comprises:

The identification unit is used for identifying the paper social science teaching materials according to the optical character identification technology to obtain electronic documents corresponding to the social science teaching materials;

the searching unit is used for searching a plurality of text segments conforming to any one of the first text feature, the second text feature and the third text feature in the electronic document by utilizing a pre-constructed regular expression, identifying the text segment containing the professional term in the text segments by utilizing a pre-trained semantic identification model, and recording the position information of each text segment in the electronic document; wherein a text segment is formed by one or a plurality of continuous sentences in the electronic document;

a generation unit, configured to generate at least one topic according to each text segment containing a term; the title type of the title comprises a blank filling title and a selection title;

the extraction unit is used for extracting a plurality of knowledge points from the catalogue of the electronic document and each question respectively; wherein knowledge points refer to entities, relationships, and attributes contained in the catalog or the topic;

the construction unit is used for combining the knowledge points and generating position information of the text segment of the title, and constructing a knowledge graph through knowledge fusion; the knowledge graph comprises nodes and edges for connecting the nodes; the node characterizes knowledge points, titles or subtitles of the catalog;

The searching unit is used for searching the knowledge graph to obtain a plurality of questions conforming to the test paper generation parameters according to the designated test paper generation parameters, and combining the plurality of questions obtained by searching into a test paper; the test paper generation parameters comprise knowledge point parameters, question type parameters, difficulty parameters and knowledge relevance parameters;

the generation unit is specifically configured to, when generating at least one question according to the text segment:

Optionally, when the generating unit replaces the keywords of the sentences with serial numbers except the correct answer with similar words, the generating unit is specifically configured to:

Optionally, the searching unit is specifically configured to, according to the specified test paper generation parameter, when the knowledge graph searches for a set of test paper questions that meets the test paper generation parameter:

Optionally, when the searching unit records the position information of each text segment in the electronic document, the searching unit is specifically configured to:

Optionally, the apparatus further includes:

and the statistics unit is used for counting the topic coverage and topic difficulty distribution of the topics recorded in the knowledge graph.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a method for generating a test paper based on a social science-based textbook according to an embodiment of the present application;

fig. 2 is a schematic diagram of a knowledge graph according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a test paper generating device based on social science teaching materials according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, the application provides a test paper generating method based on social science textbooks, which comprises the following steps:

s101, identifying paper societies teaching materials according to an optical character identification technology, and obtaining electronic documents corresponding to the societies teaching materials.

Optical character recognition (Optical Character Recognition, OCR) technology is an existing character recognition technology that can recognize characters on a paper teaching material by collecting optical information of the paper teaching material, thereby converting the paper teaching material into an electronic document.

In general, the converted electronic document may be in HTML format or PDF format. The electronic document is used as basic data of the test paper generation method provided by the application.

S102, identifying text segments containing technical terms in the electronic document, and recording position information of each text segment.

Wherein the text segment is made up of one or more successive sentences in the electronic document.

Searching a plurality of text segments conforming to any one of the first text feature, the second text feature and the third text feature in the electronic document by utilizing a pre-constructed regular expression, identifying the text segment containing the professional term in the text segments by utilizing a pre-trained semantic identification model, and recording the position information of each text segment in the electronic document.

Regular expression (also called regular expression, regular Expression, abbreviated as RE) is a common technical means in computer science. Regular expressions are typically used to retrieve, replace, from an electronic document, text that meets certain characteristics.

Thus, in the present invention, it is possible to design corresponding regular expressions based on several specific text features listed below, and then identify text segments conforming to any one of the text features from the electronic document obtained in step S101 using these regular expressions.

The first text feature is that the font of the text segment is a preset target font, and the text segment contains a specified first feature word.

Generally, a text segment conforming to the first text feature belongs to a noun interpretation class text segment, and is mainly used for interpreting a certain concept appearing in a teaching material. Specifically, the target font may be a bold font, and the first feature word may be empirically set, and may include phrases such as "yes", "concept", "meaning" and "meaning". In other words, the first text feature may be a font bolded of the text segment, and phrases such as "yes", "conceptual yes", "meaning" and "connotation" exist in the text segment.

The second text feature is that the text segment is a single sentence and is located at the beginning or end of any natural segment of the electronic document, the end symbol of the sentence being a period, the sentence containing the specified second feature word and not containing the specified third feature word.

Similarly, the second feature word and the third feature word may be empirically set, for example, the second feature word may include words such as "yes", "i.e", "split" and the like, and the third feature word may include words such as "but", "even" and the like.

In other words, the second text feature may be a beginning sentence or an ending sentence ending in the period and belonging to any natural segment, and the sentence contains words such as "yes", "i.e", "split" and the like, and does not contain words such as "but", "even" and the like.

The third text feature is that the text segment includes a plurality of sentences, wherein the first sentence contains a specified fourth feature word, and the sentences except the first sentence each have a sequence number at the beginning of the sentence.

The fourth feature word may include "meaning", "social influence", "effect", "historical meaning", "feature: "," action: "and phrase, the sequence number in the third text feature may have different expressions, for example, the sequence number may be" 1.2.3.4 … … "or" (1) (2) (3) … … ".

Specifically, if a sentence (or a natural segment) ending in a period is entirely in a bold font and the sentence (the natural segment) includes a first feature word, the sentence (the natural segment) may be identified as a text segment conforming to the first text feature.

If a sentence ending in a period and belonging to a first sentence or a last sentence of a natural segment, wherein the sentence contains the second feature word and does not contain the third feature word, the sentence can be identified as a text segment conforming to the second text feature.

If there are a plurality of consecutive sentences, the sentence head carries a sequence number which is shaped as "1.2.3.4 … …" or "(1) (2) (3) … …", and the sentence preceding the sentence with the sequence number 1 contains the fourth feature word, then the text segment formed by the sentence preceding the sentence with the sequence number 1 and the group of sentences with the sequence number can be identified as the text segment conforming to the third text feature.

Optionally, recording position information of each text segment in the electronic document, including;

the logical and structural positions of the respective text segments in the electronic document are recorded.

The logical position refers to the position of the chapter, program, mesh, and paragraph of the logical content of the teaching material where a certain text segment is located. For example, in the "first chapter overview," in "one, recent major events" of the development profile in the first section. Logical positions can be generally recorded by the relationship between points and edges in the map, and can be searched according to paths in the map when needed.

The structure location may be location information marked with xpath (XML Path Language, a computer language), such as "xml.select nodes ("/book/chair [1]/title "); "then represents the title of the first character node.

The semantic recognition model used in step S102 may be a neural network model trained using text containing a large number of pre-labels.

Specifically, a text is extracted from another teaching material, which words in the text belong to the technical terms are manually marked, and a pre-constructed neural network model is trained by using the marked text until the neural network model converges, so that the semantic recognition model used in step S102 is obtained. By using the semantic recognition model, whether each text segment contains a term or not can be recognized, and if a certain text segment accords with any text feature and contains the term, the text segment can be used for generating one or more topics.

S103, generating at least one title according to each text segment containing the technical terms according to the text segment.

The question types of the questions comprise a filling question and a selecting question.

Step S103, namely, generating at least one question according to the text segment, may specifically include generating a blank question according to the text segment, and generating a selection question according to the text segment.

The method for generating the filling questions according to the text segments is as follows:

extracting at least one keyword from the text segment by using a natural language processing technology aiming at the text segment conforming to the first text feature or the second text feature to obtain a blank filling problem; the text segment after the keyword is extracted is used as the stem of the gap-filling question, and the extracted keyword is used as the answer of the gap-filling question.

The keywords in the text segment can be identified by any one of the existing natural language processing techniques (Natural Language Processing, NLP).

For example, assume a sentence for generating a topic is:

the meaning of "A" is B, C, D, E.

Wherein a is the term identified in step S102, and B, C, D, and E are keywords, then in step S103, any one or more of B, C, D, and E may be extracted, for example, B and D may be extracted, to obtain the following gap-filling problem stems:

the meaning of "A" is (), C, (), E).

Wherein brackets indicate the space to be filled, and the answer to the space filling question is B and D, respectively.

Further, if a text segment has a plurality of extractable keywords, different numbers of keywords can be extracted according to the set difficulty level, so as to obtain a plurality of blank filling questions with different difficulty levels.

The difficulty level can be divided into three levels of easiness, medium and difficulty, and still taking the sentence as an example, a keyword can be extracted from the meaning of "A is B, C, D and E", so that a blank filling problem with easy difficulty level is obtained:

the meaning of the stem A is (), C, D, E, and the answer is B.

Two keywords are extracted from the sentence, and a gap-filling question as described above, which has a medium difficulty, can be obtained.

Further, each keyword except the term of art can be extracted from the sentence to obtain a gap-filling question with difficulty:

the connotation of the stem "A" is (), (), (), and () ", and the answers are B, C, D, and E.

The method for generating the selection questions according to the text segments is as follows:

in general, it is necessary to generate the choice questions using text segments that conform to the aforementioned third text feature and contain technical terms.

For a text segment conforming to the third text feature, selecting at least one sentence from a plurality of sentences with serial numbers in the text segment as a correct answer of the selection question, replacing keywords of the sentences with serial numbers except the correct answer with similar words to obtain a wrong answer of the selection question, and determining the first sentence of the text segment as a question stem of the selection question, thereby obtaining the selection question.

For example, assume a text segment that meets the third text feature and includes a term of art is:

"social impact of a sport is:

(1)……；

(2)……；

(3)……；

(4)……。”

where "a sport" is a term of art identified in the preceding steps, the ellipses represent specific text content in the teaching material.

For the text segment, one or more sentences corresponding to the sequence numbers can be designated as correct answers, if one of the sentences is designated, the generated selection questions are single selection questions, and if a plurality of sentences are designated, the generated selection questions are indefinite item selection questions.

For example, the statement (4) above may be designated as a correct answer.

Thereafter, the sentences numbered (1) through (3) may be obfuscated to obtain the error options.

Taking the sequence number (1) as an example, confusing the statement of the sequence number (1) comprises:

identifying one or more keywords in the sentences corresponding to the sequence number (1) as keywords to be replaced, generating word vectors (marked as word vectors to be replaced) corresponding to the keywords to be replaced by using a natural language processing technology, finding out corresponding replacement words in a vocabulary library for each keyword to be replaced, replacing the corresponding keywords in the sentences of the sequence number (1) by the replacement words, and forming an error option in the selection question by the sentences corresponding to the sequence number (1) after the replacement is completed.

The term replacement refers to a word whose similarity (specifically, the similarity may be cosine similarity, and a specific calculation method refers to the prior art) between the corresponding word vector and the word vector to be replaced is not greater than a set similarity threshold. If the similarity between the word vector of the vocabulary Y and the word vector of the keyword X to be replaced is not greater than the similarity threshold, the vocabulary Y can be used as the replacement word corresponding to the keyword X to be replaced.

Optionally, replacing the keywords of the sentences with sequence numbers except the correct answer with similar words includes:

obtaining a preset target difficulty, and determining a corresponding target similarity interval according to the target difficulty; the lower limit of the target similarity interval is positively correlated with the height of the target difficulty;

aiming at the keywords of other sentences with sequence numbers except the correct answers, searching the words with similarity to the keywords in the target similarity interval as similar words (namely the replacement words);

and replacing the keywords of other sentences with serial numbers (namely the keywords to be replaced) except the correct answers with similar words.

That is, the above-mentioned similarity threshold may be adjusted according to the difficulty level, and as described above, the difficulty level may be divided into difficulty, medium and easy from high to low, and the higher the difficulty level, the higher the corresponding similarity threshold, in other words, the more similar the replacement word used for the replacement and the corresponding keyword to be replaced.

When the similarity between the replacement word and the corresponding keyword to be replaced is too high, the replaced sentence may still be the correct option. To avoid this problem, when a selection question whose difficulty is difficult is generated, the selection question may be manually reviewed.

Taking the text segment as an example, the selection question generated according to the text segment may be:

the social impact of a sport (single choice questions) includes ():

(1)……；

(2)……；

(3)……；

(4)……。”

the hollow brackets are used for filling any one of the items (1) to (4), wherein the options (1) to (3) are the mixed sentences, and the option (4) is the original text of the sentence (4) in the text section.

Generally, in step S103, each time a question is generated, the question may be marked with a difficulty level, for example, a difficulty level of a selection question is medium, and a difficulty level of a filling question is difficult.

S104, extracting a plurality of knowledge points from the catalogue of the electronic document and each question.

Where knowledge points refer to entities, relationships, and attributes contained in a directory (also called a synopsis) or topic.

The specific definition of the entity, the relation and the attribute and the corresponding identification and extraction method can refer to the existing knowledge graph technology, and are not repeated here.

In step S104, a knowledge graph is mainly used to form a concatenation of knowledge points and questions.

Specifically, in step S104, firstly, the catalog of the entire teaching material can be extracted from the foregoing electronic document, and in general, the catalog needs to be refined to a third-level catalog when the catalog is extracted, that is, the extracted content includes a first-level title, a second-level title and a third-level title (recorded as sub-titles) in the catalog, and the entity, the relationship and the attribute of each sub-title are extracted by using the knowledge graph technology, that is, the knowledge points are extracted from the catalog in step S104.

And simultaneously, extracting entities, relations and attributes in the blank filling questions and the selected questions by using a knowledge graph technology, namely extracting knowledge points in the questions in the step S104. For the selection questions, knowledge points can be extracted from the stems and correct options of the selection questions respectively.

S105, combining the knowledge points and position information of the text segment for generating the questions, and constructing a knowledge graph through knowledge fusion.

The knowledge graph comprises nodes and edges for connecting the nodes; the nodes characterize knowledge points, topics or subtitles of the directory.

Step S105 is equivalent to combining the positions of the catalogue, the blank filling questions and the selected question stem (namely, the position information of the text segment), aligning and fusing knowledge points of the outline and the questions, thereby constructing a graph database (namely, a knowledge graph) and forming the question storage of the structural knowledge.

Taking fig. 2 as an example, assuming that knowledge point 1 is extracted from topics 1 to 3 and knowledge point 1 is also extracted from sub-topic 1, through knowledge fusion, nodes representing knowledge point 1, nodes representing topics 1 to 3, and nodes representing sub-topic 1 can be generated in the knowledge graph, respectively, and then the above nodes are connected in the manner shown in fig. 2 to illustrate that topics 1 to 3 contain knowledge point 1, and sub-topic 1 contains knowledge point 1.

Further, the relationship between the plurality of knowledge points appearing in the knowledge graph may be analyzed by the aforementioned electronic document, for example, assuming that the knowledge point 1 and the knowledge point 2 in fig. 2 appear in the same sentence of the electronic document, for example, "the knowledge point 1 includes the knowledge point 2", or, alternatively, the knowledge point 1 appears in the sub-title 1 and the upper-level title of the sub-title 1 at the same time, and the upper-level title also includes the sub-title 2 (the sub-title 2 includes the knowledge point 2 again), then the nodes representing the knowledge point 1 and the knowledge point 2 may be connected in the manner shown in fig. 2 to represent that the two knowledge points have the association relationship.

Further, when a plurality of knowledge points are included in a topic, the topic may be fused in a knowledge graph according to location information of a text segment used to generate the topic. Specifically, it is assumed that the title 1 contains the knowledge point 1 and the knowledge point 2, but the text segment of the generated title 1 is subordinate to the section of the subtitle 1, and thus, in step S105, the title 1 may be connected to the knowledge point 1 contained in the subtitle 1 instead of the knowledge point 2.

Optionally, after combining the knowledge points and the position information of the text segment for generating the questions and constructing the knowledge graph through knowledge fusion, the method further comprises:

and counting the topic coverage and topic difficulty distribution of topics recorded in the knowledge graph.

That is, in step S105, the topic coverage amount and the topic difficulty distribution may be counted in synchronization.

The term "refers to the number of terms generated in each chapter in the directory. For example, 50 topics are generated using the text of chapter 1 for chapter 1, and the topic coverage amount of chapter 1 is 50. The statistical topic coverage is used to analyze whether there are too few generated topics in each section of the teaching material, and if so, it is necessary to manually question the section to ensure that a sufficient number of topics are generated in each section.

The difficulty distribution is that the proportion of each problem with difficulty is the largest proportion of the generated problems, and the proportion of the problems with medium difficulty is smaller, if the distribution of the calculated difficulty does not meet the condition, the generated problems need to be modified to adjust the difficulty.

S106, searching in the knowledge graph according to the designated test paper generation parameters to obtain a set of test paper questions which accord with the test paper generation parameters, and combining a plurality of questions contained in the set of test paper questions into the test paper.

The test paper generation parameters comprise knowledge point parameters, question type parameters, difficulty parameters and knowledge relevance parameters.

The test paper generation parameters can be entered by the instructor through the front-end query function.

Step S106 is equivalent to carrying out automatic question setting work according to knowledge points, question types and difficulty and knowledge relativity by utilizing a knowledge graph and a pre-constructed front-end query function.

Optionally, searching in the knowledge graph to obtain a set of examination paper questions meeting the examination paper generation parameters according to the specified examination paper generation parameters, including:

searching each knowledge point contained in the knowledge point parameters in the knowledge graph, and taking the searched knowledge point as a first knowledge point.

And searching a second knowledge point connected with the first knowledge point in the knowledge graph according to the knowledge relevance parameter.

Specifically, the knowledge point parameter may include one or more knowledge points, and taking fig. 2 as an example, the knowledge point parameter may include knowledge point 1, and then, when step S106 is performed, knowledge point 1 may be found as the aforementioned first knowledge point.

The knowledge point relevance parameter may be an integer for characterizing the distance when finding the second knowledge point. Taking fig. 2 as an example, if the knowledge point relevance parameter is 1, searching from the knowledge point 1 can only find the knowledge point 2 with the distance of 1 as the second knowledge point, and if the knowledge point relevance parameter is 2, searching from the knowledge point 1 can find the knowledge point 2 with the distance of less than or equal to 2 and the knowledge point 3 as the second knowledge point.

And extracting topics containing the first knowledge points and/or the second knowledge points to obtain a topic set.

As described above, in the constructed knowledge graph, if the sufficient topic includes a certain knowledge point, the node representing the topic is connected to the node representing the corresponding knowledge point, so the above steps are equivalent to reading the topic connected to the corresponding node of the first knowledge point and/or the second knowledge point from the knowledge graph to obtain the topic set, taking fig. 2 as an example, assuming that the knowledge point 1 is the first knowledge point and the knowledge points 2 and 3 are the second knowledge points, the topic set that can be obtained from the knowledge graph of fig. 2 includes the topics 1 to 7.

And adjusting the proportion of the questions with different difficulties in the question set according to the difficulty parameter, and adjusting the proportion of the questions with different question types in the question set according to the question type parameter to obtain the group-scroll question set.

Specifically, the question type parameter may specify a question set of the group paper, that is, a ratio of different types of questions in the finally generated test paper, for example, 40% of the generated test paper is a filling question, and 60% is a selecting question. Similarly, the difficulty parameter may specify the ratio of the different difficulty topics in the set of coil topics, e.g., 30% for the easy topic, 20% for the difficult topic, and 50% for the medium topic.

Therefore, after the question set is obtained, whether the question set accords with the specified duty ratio in the question type parameter and the difficulty parameter or not can be judged, if so, the question set is directly determined to be a winding question set, and if not, the duty ratio of different types of questions is required to be adjusted, or the duty ratio of different difficulty questions is required to be adjusted, so that the winding question set which accords with the question type parameter and the difficulty parameter is obtained.

After the group paper question set is obtained, typesetting the question stems of the questions according to a given paper template to obtain an electronic paper, and printing the electronic paper to obtain a plurality of paper papers.

Meanwhile, after the examination is completed, the answers of the questions can be utilized to automatically read papers, namely, the answers written by the examinees are compared with the answers determined in the process of giving the questions, so that whether the answers of the examinees are correct or not is determined, and the specific process is not repeated.

The application provides a test paper generation method based on social science teaching materials, which comprises the steps of identifying paper social science teaching materials to obtain electronic documents corresponding to the teaching materials; searching a plurality of text segments (formed by one or a plurality of continuous sentences in the electronic document) which accord with text characteristics in the electronic document, and identifying text segments containing professional terms in the text segments by using a term identification model; generating at least one topic according to each text segment containing a term of art; extracting a plurality of knowledge points from the catalogue and each question of the electronic document respectively; combining the knowledge points and the position information of the text segment for generating the questions, and constructing a knowledge graph through knowledge fusion; and searching in the knowledge graph according to the designated test paper generation parameters to obtain a set of test paper questions conforming to the test paper generation parameters, and combining a plurality of questions contained in the set of test paper questions into the test paper. According to the scheme, the questions are automatically generated by using the teaching materials, and the automatic unreeling efficiency is effectively improved.

In combination with the test paper generating method based on the social science textbook provided by the embodiment of the application, the embodiment of the application also provides a test paper generating device based on the social science textbook, referring to fig. 3, the device can comprise the following units:

The recognition unit 301 is configured to recognize the paper social science-class textbook according to the optical character recognition technology, and obtain an electronic document corresponding to the social science-class textbook.

The searching unit 302 is configured to search, in the electronic document, a plurality of text segments conforming to any one of the first text feature, the second text feature and the third text feature by using a pre-constructed regular expression, identify, in the plurality of text segments, a text segment containing a term of art by using a pre-trained semantic recognition model, and record location information of each text segment in the electronic document.

A generating unit 303, configured to generate at least one topic according to the text segment for each text segment containing a term of art.

The extracting unit 304 is configured to extract a plurality of knowledge points from the catalog of the electronic document and each topic, respectively.

Where knowledge points refer to entities, relationships, and attributes contained by a directory or topic.

A construction unit 305, configured to construct a knowledge graph through knowledge fusion by combining knowledge points and position information of text segments for generating topics.

And the searching unit 306 is configured to search in the knowledge graph according to the specified test paper generation parameters to obtain a plurality of questions corresponding to the test paper generation parameters, and combine the plurality of questions obtained by searching into the test paper.

The first text features are that the fonts of the text segments are preset target fonts, and the text segments contain appointed first feature words;

the third text feature is that the text segment comprises a plurality of sentences, wherein the first sentence contains a designated fourth feature word, and the sentences except the first sentence are provided with sequence numbers positioned at the beginning of the sentence;

the generating unit 303 is specifically configured to, when generating at least one question according to the text segment:

extracting at least one keyword from the text segment by using a natural language processing technology aiming at the text segment conforming to the first text feature or the second text feature to obtain a blank filling problem; the text segment after the keyword is extracted is used as a question stem of the gap-filling question, and the extracted keyword is used as an answer of the gap-filling question;

aiming at the keywords of other sentences with sequence numbers except the correct answers, searching the words with similarity not larger than the similarity threshold corresponding to the target difficulty as similar words;

and replacing the keywords of other sentences with sequence numbers except the correct answers with similar words.

Optionally, the searching unit searches the knowledge graph to obtain a set of examination paper questions meeting the examination paper generating parameters according to the designated examination paper generating parameters, and is specifically configured to:

Searching each knowledge point contained in the knowledge point parameters in the knowledge graph, and taking the searched knowledge point as a first knowledge point;

searching and obtaining a second knowledge point connected with the first knowledge point in the knowledge graph according to the knowledge relevance parameter;

Optionally, the apparatus further comprises:

The specific working principle of the test paper generating device based on the social science textbook provided by the embodiment of the application can refer to the relevant steps in the test paper generating method based on the social science textbook provided by the embodiment of the application, and the description thereof is omitted here.

The application provides a test paper generating device based on social science teaching materials, wherein a recognition unit 301 is used for recognizing paper social science teaching materials according to an optical character recognition technology to obtain electronic documents corresponding to the social science teaching materials; a searching unit 302, configured to search, using a pre-constructed regular expression, a plurality of text segments that conform to any one of the first text feature, the second text feature, and the third text feature in the electronic document, identify, using a pre-trained semantic recognition model, a text segment that includes a term of art, and record location information of each text segment in the electronic document; wherein the text segment is composed of one or a plurality of continuous sentences in the electronic document; a generating unit 303, configured to generate at least one topic according to the text segment for each text segment containing a term of art; the title type of the title comprises a blank filling title and a selection title; an extracting unit 304, configured to extract a plurality of knowledge points from the catalog of the electronic document and each question, respectively; wherein, knowledge points refer to entities, relationships and attributes contained in a catalog or topic; a construction unit 305, configured to combine knowledge points and position information of text segments for generating topics, and construct a knowledge graph through knowledge fusion; the knowledge graph comprises nodes and edges for connecting the nodes; the nodes represent knowledge points, titles or subtitles of the catalogs; the searching unit 306 is configured to search in the knowledge graph according to the specified test paper generation parameters to obtain a plurality of questions that conform to the test paper generation parameters, and combine the plurality of questions obtained by searching into a test paper; the test paper generation parameters comprise knowledge point parameters, question type parameters, difficulty parameters and knowledge relevance parameters. According to the scheme, the questions are automatically generated by using the teaching materials, and the automatic unreeling efficiency is effectively improved.

The embodiment of the application also provides a computer storage medium for storing a computer program, which is particularly used for realizing the test paper generation method based on the social science textbook provided by any embodiment of the application when being executed.

Referring to fig. 4, the embodiment of the present application further provides an electronic device, where the electronic device includes a memory 401 and a processor 402, and the memory 401 is used to store a computer program, and the processor 402 is used to execute the computer program, and specifically is used to implement the test paper generating method based on the social science textbook provided by any embodiment of the present application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.

Those skilled in the art will be able to make or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A test paper generation method based on social science teaching materials is characterized by comprising the following steps:

wherein the generating at least one topic from the text segment comprises:

2. The method of claim 1, wherein replacing keywords of other numbered sentences except the correct answer with similar words comprises:

3. The method of claim 1, wherein searching the knowledge-graph for a set of test-paper topics meeting the test-paper generation parameters according to the specified test-paper generation parameters comprises:

4. The method according to claim 1, wherein said recording positional information of each of said text segments in said electronic document comprises;

5. The method according to claim 1, wherein after the knowledge points are combined with the location information for generating the text segment of the topic, the method further comprises:

6. A test paper generating device based on social science teaching material is characterized by comprising:

7. The apparatus of claim 6, wherein when the generating unit replaces the keywords of the sentences with sequence numbers except the correct answer with similar words, the generating unit is specifically configured to:

8. The apparatus of claim 6, wherein the search unit is configured to, according to the specified test paper generation parameter, when the knowledge-graph searches for a set of test paper questions that matches the test paper generation parameter:

9. The apparatus according to claim 6, wherein the search unit is configured to, when recording the location information of each text segment in the electronic document:

10. The apparatus of claim 6, wherein the apparatus further comprises: