CN110674354A - Test paper content extraction method, test paper matching method, device, equipment and medium - Google Patents

Test paper content extraction method, test paper matching method, device, equipment and medium Download PDF

Info

Publication number
CN110674354A
CN110674354A CN201910876239.8A CN201910876239A CN110674354A CN 110674354 A CN110674354 A CN 110674354A CN 201910876239 A CN201910876239 A CN 201910876239A CN 110674354 A CN110674354 A CN 110674354A
Authority
CN
China
Prior art keywords
document
paragraph
test paper
question
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910876239.8A
Other languages
Chinese (zh)
Inventor
朱达华
徐宋传
陈晓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Everbright Education Software Polytron Technologies Inc
Original Assignee
Guangzhou Everbright Education Software Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Everbright Education Software Polytron Technologies Inc filed Critical Guangzhou Everbright Education Software Polytron Technologies Inc
Priority to CN201910876239.8A priority Critical patent/CN110674354A/en
Publication of CN110674354A publication Critical patent/CN110674354A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Abstract

The invention relates to the technical field of computer technology, in particular to a test paper content extraction method, a test paper matching method, a device, equipment and a medium, wherein the test paper content extraction method comprises the following steps: s10: if the document test questions are obtained, obtaining test question documents from the document test questions; s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format; s30: traversing the document content file, and acquiring document paragraph data from the document content file; s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object; s50: adding the paragraph object to a set plist, and taking the set plist as a test paper content set. The method and the device have the effects of quickly acquiring the test question document content and acquiring the specific test questions from the test question document content.

Description

Test paper content extraction method, test paper matching method, device, equipment and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method for extracting test paper content, a method for matching test paper, an apparatus, a device, and a medium.
Background
At present, in schools, especially for students needing to participate in college entrance examination in high schools, in order to enable the students to contact more questions and improve the learning scores of the students, the students can participate in a large number of examinations or do a large number of examination questions.
When a question teacher gives a question, the corresponding question needs to be selected from a large number of question banks or test papers, and then a new test paper or exercise paper is formed. When an item library is created, usually, the items in the test questions are split from the existing test questions, and then the split items are used as the item library. However, when the test paper of the existing document is split, the test paper needs to be marked manually, so that the manual participation is large, the trouble is troublesome, and the error is easy to occur, so that the improvement space is provided.
Disclosure of Invention
The invention aims to provide a test paper content extraction method for rapidly acquiring test question document content.
The above object of the present invention is achieved by the following technical solutions:
a test paper content extraction method comprises the following steps:
s10: if the document test questions are obtained, obtaining test question documents from the document test questions;
s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;
s30: traversing the document content file, and acquiring document paragraph data from the document content file;
s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object;
s50: adding the paragraph object to a set plist, and taking the set plist as a test paper content set.
By adopting the technical scheme, the document content file in the xml format is obtained from the test question document in the document test questions, so that the label of the corresponding document paragraph is convenient to read from the test question document, the corresponding paragraph can be further obtained from the document paragraph label, and the document paragraph data in each paragraph is favorable for reading the paragraph object in the document paragraph data from the document paragraph data; meanwhile, the paragraph exclusive is added to the collection plist, so that the preset rule for acquiring the specific test questions in the test question document can be stored in the file form of the plist, the specific test questions can be automatically acquired in the test question document, and the identification and the splitting of the test question document can be facilitated.
The invention is further configured to: step S20 includes:
s21: obtaining a document format from the test question document;
s22: judging the compatibility of the document format, and if the document format is judged to be incompatible, converting the document format into a compatible format;
s23: and obtaining a document content file from the test question document in the compatible format.
By adopting the technical scheme, the compatibility of the document is judged, and the format of the test question document in the incompatible format is converted, so that the accuracy in obtaining the document content file can be ensured, and the subsequent extraction and splitting of the test questions are facilitated.
The second purpose of the invention is to provide a test paper content extraction method capable of acquiring specific test questions from the test question document content.
A test paper matching method comprises the following steps:
s60: acquiring a preset matching rule, and traversing a test paper content set according to the matching rule to obtain large-topic paragraph data, wherein the test paper content set is acquired by adopting the test paper content extraction method;
s70: acquiring corresponding question type description information from the big question paragraph data, and acquiring corresponding small question paragraph data according to the question type description information;
s80: composing the big-question paragraph data and the small-question paragraph data into a replacement file;
s90: and replacing the document content files in the test paper content set with the replacement files to obtain test question files.
By adopting the technical scheme, the matching rule is preset, and the large-subject paragraph data is traversed from the test paper content set obtained by the test paper content extraction method according to the matching rule, so that the small subjects in each large subject in the test paper and the content corresponding to each small subject can be further matched by matching the large-subject paragraph data; the specific test question content in the test paper content set is identified through matching, the formats of questions in the document test paper can be split and stored respectively, and therefore the splitting of manual intervention test paper can be reduced, errors can be reduced when the test paper is split, a teacher can be helped to establish a question bank of the test questions, and the teacher can conveniently output the test paper.
The invention is further configured to: after step S60, before step S70, the test paper matching method further includes:
s61: if the large-topic paragraph data is the first matched large-topic paragraph data and the first matched large-topic paragraph data is not the first paragraph object of the test paper content set, acquiring a title matching rule from the matching rule;
s62: acquiring the object serial number of the large-topic paragraph data obtained by the first matching, and acquiring the corresponding paragraph object smaller than the object serial number from the test paper content set;
s63: and matching the corresponding paragraph objects smaller than the object sequence number by using the title matching rule, and if the matching is successful, taking the matching result as the test paper title.
By adopting the technical scheme, after the large-topic paragraph data obtained by matching is judged to be the first large-topic paragraph data obtained by matching, the test paper title of the document test paper can be matched by matching the paragraph object with the object sequence number smaller than the large-topic paragraph data.
The invention is further configured to: step S70 includes:
s71: acquiring a sub-question matching rule corresponding to each major-question paragraph data from the matching rules according to the question type description information;
s72: traversing the test paper content set according to the sub-question matching rule to obtain a sub-question list;
s73: traversing all the subtopic objects in the subtopic list, and setting the subtopic objects as corresponding subtopic objects to obtain the subtopic paragraph data.
By adopting the technical scheme, the corresponding topic list can be matched from each topic paragraph data according to different topic type description information by using the topic matching rule, and the topic objects traversed from the topic list are associated with the corresponding topic objects, so that the specific topic contents in each topic can be obtained.
The third object of the invention is realized by the following technical scheme:
a test paper content extraction device, the test paper content extraction device comprising:
the test question acquisition module is used for acquiring a test question document from the document test questions if the document test questions are acquired;
the content acquisition module is used for acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;
the paragraph traversing module is used for traversing the document content file and acquiring document paragraph data from the document content file;
the object acquisition module is used for acquiring text contents in each document paragraph data and forming the text contents in each document paragraph data into a corresponding paragraph object;
and the object adding module is used for adding the paragraph object into a set plist and taking the set plist as a test paper content set.
By adopting the technical scheme, the document content file in the xml format is obtained from the test question document in the document test questions, so that the label of the corresponding document paragraph is convenient to read from the test question document, the corresponding paragraph can be further obtained from the document paragraph label, and the document paragraph data in each paragraph is favorable for reading the paragraph object in the document paragraph data from the document paragraph data; meanwhile, the paragraph exclusive is added to the collection plist, so that the preset rule for acquiring the specific test questions in the test question document can be stored in the file form of the plist, the specific test questions can be automatically acquired in the test question document, and the identification and the splitting of the test question document can be facilitated.
The fourth object of the invention is realized by the following technical scheme:
a test paper matching apparatus, characterized in that the test paper matching apparatus comprises:
the large-topic paragraph traversing module is used for acquiring a preset matching rule, and traversing a test paper content set according to the matching rule to acquire large-topic paragraph data, wherein the test paper content set is acquired by adopting the test paper content extraction method;
the small topic paragraph acquisition module is used for acquiring corresponding topic type description information from the large topic paragraph data and acquiring corresponding small topic paragraph data according to the topic type description information;
the replacing file obtaining module is used for forming a replacing file by the big question paragraph data and the small question paragraph data;
and the replacing module is used for replacing the document content file in the test paper content set with the replacing file to obtain the test question file.
By adopting the technical scheme, the matching rule is preset, and the large-subject paragraph data is traversed from the test paper content set obtained by the test paper content extraction method according to the matching rule, so that the small subjects in each large subject in the test paper and the content corresponding to each small subject can be further matched by matching the large-subject paragraph data; the specific test question content in the test paper content set is identified through matching, the formats of questions in the document test paper can be split and stored respectively, and therefore the splitting of manual intervention test paper can be reduced, errors can be reduced when the test paper is split, a teacher can be helped to establish a question bank of the test questions, and the teacher can conveniently output the test paper.
The fifth invention object of the present invention is achieved by the following technical solutions:
a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-mentioned test paper content extraction method when executing the computer program.
The sixth object of the present invention is achieved by the following technical solutions:
a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned test paper matching method.
In conclusion, the beneficial technical effects of the invention are as follows:
1. the method comprises the steps that a document content file in an xml format is obtained from a test question document in a document test question, so that a label of a corresponding document paragraph can be read conveniently from the test question document, and then a corresponding paragraph and document paragraph data in each paragraph can be obtained from the document paragraph label, and paragraph objects in the document paragraph data can be read conveniently from the document paragraph data; meanwhile, the paragraph exclusive is added to the set plist, so that the preset rule for acquiring the specific test questions in the test question document can be stored in the file form of the plist, the specific test questions can be automatically acquired from the test question document, and the identification and the splitting of the test question document can be facilitated;
2. the matching rule is preset, and the large-question paragraph data is traversed from the test paper content set obtained by the test paper content extraction method according to the matching rule, so that the small questions in each large question in the test paper and the content corresponding to each small question can be further matched by matching the large-question paragraph data; the specific test question content in the test paper content set is identified through matching, the formats of questions in the document test paper can be split and stored respectively, and therefore the splitting of manual intervention test paper can be reduced, errors can be reduced when the test paper is split, a teacher can be helped to establish a question bank of the test questions, and the teacher can conveniently output the test paper.
Drawings
FIG. 1 is a flow chart of a method for extracting test paper content according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating an implementation of step S20 in the test paper content extraction method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for matching test paper according to an embodiment of the present invention;
FIG. 4 is another flow chart of a method of matching test sheets in an embodiment of the present invention;
FIG. 5 is a flowchart illustrating the implementation of step S70 in the test paper matching method according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a test paper content extracting apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a test paper matching apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The first embodiment is as follows:
in an embodiment, as shown in fig. 1, the present invention discloses a method for extracting test paper content, which specifically includes the following steps: s10: and if the document test questions are obtained, obtaining the test question document from the document test questions.
In this embodiment, the document test question refers to a test paper that needs to be written in a Word document and needs to be identified and split. The test question document refers to a document test question which is imported into a corresponding system for identifying and splitting the document.
Specifically, the test question document is obtained by a teacher or other operators after the test question of the document is imported into the system for splitting the document.
S20: and acquiring a document content file from the test question document, wherein the document content file is a file in an xml format.
In the present embodiment, the document content file is a file in which contents in the test question document are stored in xml (Extensible markup language) format.
Specifically, the test question document is read into the compressed package through a zip technology, a file of "document.xml" is obtained from the obtained compressed package, and the file of "document.xml" is used as the document content file.
S30: and traversing the document content file and acquiring document paragraph data from the document content file.
In the present embodiment, the document paragraph data refers to content data of each paragraph in the contents in the document content file.
Specifically, the document paragraph data is obtained from the document content file by using a dom4j tool, wherein the dom4j tool is a tool for reading and writing an xml file.
S40: and acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object.
In this embodiment, the text content refers to the content of a specific word or text in each document paragraph data.
Specifically, all paragraph nodes in each document paragraph data are traversed, byte point labels w: r/w: t in paragraph node elements are obtained, text contents in the paragraph nodes are obtained and connected together, all text contents of one paragraph data are obtained, and the texts are used as paragraph objects in the document paragraph data.
S50: the paragraph object is added to the set plist, and the set plist is taken as a test paper content set.
In this embodiment, the set plist refers to a store pass dom4j tool. The test paper content set is a collection plist in which paragraph objects are stored.
Specifically, after a paragraph object is obtained, the paragraph object is sequentially put into the set plist, and a content set of the draft test paper is obtained.
When a paragraph object is put into the set plist once, each paragraph object can be labeled according to the sequence of the test questions in the document test question, and after each paragraph object is labeled, the corresponding paragraph objects are put into the set plist in sequence according to the sequence of the labels from small to large.
In the implementation of this market, the document content file in the xml format is obtained from the test question document in the document test questions, so that the tags of the corresponding document paragraphs are convenient to read from the test question document, and the corresponding paragraphs and the document paragraph data in each paragraph can be further obtained from the document paragraph tags, thereby being beneficial to reading the paragraph object in the document paragraph data from the document paragraph data; meanwhile, the paragraph exclusive is added to the collection plist, so that the preset rule for acquiring the specific test questions in the test question document can be stored in the file form of the plist, the specific test questions can be automatically acquired in the test question document, and the identification and the splitting of the test question document can be facilitated.
In an embodiment, as shown in fig. 2, in step S20, that is, obtaining the document content file from the test question document specifically includes the following steps:
s21: and acquiring a document format from the test question document.
In the present embodiment, the document format refers to a suffix name of the test question document. For example,. doc,. docx, etc.
Specifically, the document format of the test question document is obtained.
S22: judging the compatibility of the document format, and if the document format is judged to be incompatible, converting the document format into a compatible format;
in the present embodiment, the compatible format refers to a test question document with a suffix name of. docx.
Specifically, it is determined whether the document format of the test question document is a test question document with a suffix name of. docx. If not, for example, the document format is.doc, then the document format is converted to a compatible format with a suffix name.docx.
S23: and obtaining the document content file from the test question document in the compatible format.
Specifically, the document content file is acquired from the test question document in the compatible format, using the method in step S20.
In an embodiment, as shown in fig. 3, the present invention discloses a test paper matching method, which specifically includes the following steps: s60: and acquiring a preset matching rule, traversing the test paper content set according to the matching rule to obtain the data of the large-topic paragraph, wherein the test paper content set is acquired by adopting a test paper content extraction method.
In this embodiment, the matching rule is a preset rule for obtaining specific test questions from the test paper content set through matching. The large topic paragraph data is a paragraph object in which all the questions in each large topic are described in a test sheet.
Specifically, the matching rule is preset, for example, the following table:
Figure BDA0002204451620000071
specifically, the content set of the test paper is traversed by using the matching rule of the big question, and then the data of the big question paragraph is obtained.
S70: and acquiring corresponding question type description information from the big question paragraph data, and acquiring corresponding small question paragraph data according to the question type description information.
In the present embodiment, the topic description information refers to information for describing the topic type of each large-topic paragraph data. The small topic paragraph data refers to paragraph objects corresponding to small topics in each large topic. The title description information may refer to the following table:
Figure BDA0002204451620000072
Figure BDA0002204451620000081
specifically, according to the question type description information, matching is performed on the large-question paragraph data, and corresponding small-question paragraph data are obtained in each large-question paragraph data in a matching mode. For example, when matching corresponding small topic paragraph data from large topic paragraph data of choice questions, that is, matching specific topics from choice questions, the following method may be adopted: traversing the test paper content set according to the question type description information of the choice question, and acquiring the large-question paragraph data which is in line with the question type description information of the choice question from the test paper content set, such as: selecting questions; further, the short-term paragraph data, that is, the specific topic of the choice question in the test paper, is obtained from the text content in the large-term paragraph data of the choice question according to the matching rule of the short-term question. Preferably, the score of the big topic paragraph data can be matched in the topic stem part of the big topic.
S80: and composing the large-question paragraph data and the small-question paragraph data into a replacement file.
In this embodiment, the replacement file refers to a file for replacing text contents in the test paper content set.
Specifically, after obtaining the large topic paragraph data of each large topic and the corresponding small topic paragraph data from the specific topic identified from the test paper content set, an st _ source file is generated, and the st _ source file is used as the replacement file.
S90: and replacing the document content files in the test paper content set with the replacement files to obtain the test question files.
In this embodiment, the test question file is a file for identifying a specific question of a test paper content set.
Specifically, an st _ source file is opened through a zip technology, a corresponding st.xml file is generated, and furthermore, a file of "document.xml" in the test paper content set is replaced by the st.xml file, so that the test question file is obtained.
It should be noted that the above-mentioned method for extracting the contents of the test paper is used to identify the text part from the test questions in the document, and the above-mentioned method for matching the test paper is used to further match the specific contents of the test questions from the identified text part, and classify the identified test questions according to the types of different test questions, i.e. the types of the big questions.
In this embodiment, the matching rule is preset, and the paragraph data of the big topic is traversed from the test paper content set obtained by the test paper content extraction method according to the matching rule, so that the small topic in each big topic in the test paper and the content corresponding to each small topic can be further matched by matching the paragraph data of the big topic; the specific test question content in the test paper content set is identified through matching, the formats of questions in the document test paper can be split and stored respectively, and therefore the splitting of manual intervention test paper can be reduced, errors can be reduced when the test paper is split, a teacher can be helped to establish a question bank of the test questions, and the teacher can conveniently output the test paper.
In an embodiment, as shown in fig. 4, after step S60 and before step S70, the test paper matching method further includes: s61: and if the large-topic paragraph data is the first matched large-topic paragraph data and the first matched large-topic paragraph data is not the first paragraph object of the test paper content set, acquiring the title matching rule from the matching rule.
In the present embodiment, the title matching rule refers to a rule for matching the title of a test paper in a test paper content set.
Specifically, according to the label of the paragraph object in step S50, the large-subject paragraph data in the test paper content set is traversed according to the sequence of the label from small to large, when the first large-subject paragraph data is traversed, it is determined from the label whether the large-subject paragraph data is the first paragraph object in the paragraph object, and if so, the title matching rule for matching the test paper title is obtained from the matching rule.
S62: and acquiring the object sequence number of the first matching obtained large-topic paragraph data, and acquiring the corresponding paragraph object smaller than the object sequence number from the test paper content set.
In the present embodiment, the object number refers to the number of the paragraph object in step S50 and step S61.
Specifically, after the first large-topic paragraph data is matched, the object sequence number of the large-topic paragraph data is obtained. Further, a paragraph object having a smaller object number than that of the large-question paragraph data is acquired.
S63: and matching the corresponding paragraph objects smaller than the object serial number by using a title matching rule, and if the matching is successful, taking the matching result as the test paper title.
In the present embodiment, the test paper title refers to the title in the test question of the document.
Specifically, the title matching rule is used to match a paragraph object smaller than the object number of the first matched big-question paragraph data, so as to check whether the text content and the format in the paragraph object conform to the format of the test question title, if so, the matching is determined to be successful, and the paragraph object is taken as the test question title. It can be understood that when the test question titles are matched, the content of the main question is matched first, and then the data of the main question paragraph corresponding to the first main question is matched, and then the test question titles are matched forward.
In one embodiment, as shown in fig. 5, in step S70, obtaining corresponding topic type description information from the large topic paragraph data, and obtaining corresponding small topic paragraph data according to the topic type description information specifically includes the following steps:
s71: and acquiring a sub-question matching rule corresponding to each large-question paragraph data from the matching rules according to the question type description information.
In this embodiment, the sub-question matching rule is a rule for matching each question to a specific test question in each section data.
Specifically, according to the question type description information, a sub-question matching rule of each type of big questions is obtained from the matching rules, namely, a sub-question matching rule corresponding to each big-question paragraph data is obtained.
S72: and traversing the test paper content set according to the sub-question matching rule to obtain a sub-question list.
In this embodiment, the question list is a data table in which each specific test question is described.
Specifically, all objects in the test paper content set are traversed by using the corresponding sub-question matching rules one by one, and then the identified objects are stored in a preset list, so that the sub-question list is obtained.
S73: and traversing all the subtopic objects in the subtopic list, and setting the corresponding subtopic objects as corresponding large topic objects to obtain small topic paragraph data.
In this embodiment, the topic object refers to an object of a specific topic in the test paper content set. The topic object is an object of a topic stem part of a topic in the test paper content set. Wherein, a large topic object comprises a plurality of sub topic objects.
Specifically, the topic objects are classified according to the topic description information and the corresponding topic matching rules, for example, the topic objects traversed by the topic description information corresponding to the choice topic and the corresponding topic matching rules are classified into one class.
Furthermore, the sub-topic object of the classified class is assigned to the corresponding big topic object, namely, the sub-topic object of the classified class is used as the sub-object, and the corresponding big topic object is set as the good object of the sub-topic object of the classified class. And further obtaining the data of the small topic paragraphs of the type of the topic. For example, for the choice questions, the question stem part of the choice question is matched to be used as a big question object, then all the choice questions in the test paper content set are traversed, the disease is used as a sub-object to be assigned to the big question object of the question stem part, and then the small question paragraph data corresponding to the choice questions is obtained.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Example two:
in an embodiment, a test paper content extraction device is provided, and the test paper content extraction device corresponds to the test paper content extraction method in the above embodiment one to one. As shown in fig. 6, the test paper content extraction apparatus includes a test question acquisition module 10, a content acquisition module 20, a paragraph traversal module 30, an object acquisition module 40, and an object addition module 50. The functional modules are explained in detail as follows:
the test question acquiring module 10 is used for acquiring a test question document from the document test questions if the document test questions are acquired;
a content obtaining module 20, configured to obtain a document content file from the test question document, where the document content file is a file in an xml format;
a paragraph traversing module 30, configured to traverse the document content file, and obtain document paragraph data from the document content file;
an object obtaining module 40, configured to obtain text content in each document paragraph data, and form a corresponding paragraph object from the text content in each document paragraph data;
and an object adding module 50, configured to add the paragraph object to a set plist, and use the set plist as a test paper content set.
Preferably, the content obtaining module 20 includes:
the format obtaining submodule 21 is used for obtaining a document format from the test question document;
the compatibility judging submodule 22 is used for judging the compatibility of the document format, and if the document format is judged to be incompatible, the document format is converted into a compatible format;
and the content obtaining sub-module 23 is configured to obtain a document content file from the test question document in the compatible format.
For specific limitations of the test paper content extraction device, reference may be made to the above limitations of the test paper content extraction method, which will not be described herein again. The modules in the test paper content extraction device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In an embodiment, a test paper matching device is provided, and the test paper matching device corresponds to the test paper matching method in the above embodiment one to one. As shown in fig. 7, the test paper matching apparatus includes a large-question paragraph traversing module 60, a small-question paragraph obtaining module 70, a replacement file obtaining module 80, and a replacement module 90. The functional modules are explained in detail as follows:
the large-topic paragraph traversing module 60 is configured to obtain a preset matching rule, and traverse a test paper content set according to the matching rule to obtain large-topic paragraph data, where the test paper content set is obtained by using a test paper content extraction method;
a question paragraph obtaining module 70, configured to obtain corresponding question type description information from the big question paragraph data, and obtain corresponding small question paragraph data according to the question type description information;
a substitute file obtaining module 80, configured to combine the large-question paragraph data and the small-question paragraph data into a substitute file;
and the replacing module 90 is used for replacing the document content file in the test paper content set with a replacing file to obtain the test question file.
Preferably, the test paper matching apparatus further includes:
the matching sub-module 61 is configured to obtain a title matching rule from the matching rule if the large-topic paragraph data is the first large-topic paragraph data obtained through matching and the first large-topic paragraph data obtained through matching is not the first paragraph object of the test paper content set;
the object obtaining sub-module 62 is configured to obtain an object sequence number of the first matching-obtained large-topic paragraph data, and obtain a corresponding paragraph object smaller than the object sequence number from the test paper content set;
and the title matching submodule 63 is configured to match the corresponding paragraph object smaller than the object sequence number by using a title matching rule, and if matching is successful, the matching result is used as a test paper title.
Preferably, the question paragraph retrieving module 70 includes:
the matching rule obtaining submodule 71 is configured to obtain a sub-question matching rule corresponding to each piece of major-question paragraph data from the matching rules according to the question type description information;
the sub-question traversing sub-module 72 is used for traversing the test paper content set according to the sub-question matching rule to obtain a sub-question list;
the paragraph acquisition sub-module 73 is configured to traverse all the sub-topic objects in the sub-topic list, and set the sub-topic objects as corresponding question objects to obtain question paragraph data.
For the specific definition of the test paper matching device, reference may be made to the above definition of the test paper matching method, which is not described herein again. The modules in the test paper matching device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Example three:
in one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the test paper content set and storing the test question file. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a test paper matching method; alternatively, the computer program is executed by a processor to implement a method of matching test sheets.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
s10: if the document test questions are obtained, obtaining test question documents from the document test questions;
s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;
s30: traversing the document content file, and acquiring document paragraph data from the document content file;
s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object;
s50: the paragraph object is added to the set plist, and the set plist is taken as a test paper content set.
Alternatively, the processor may also implement the following steps when executing the computer program:
s60: acquiring a preset matching rule, traversing a test paper content set according to the matching rule to obtain large-topic paragraph data, wherein the test paper content set is acquired by adopting a test paper content extraction method;
s70: acquiring corresponding question type description information from the big question paragraph data, and acquiring corresponding small question paragraph data according to the question type description information;
s80: composing the paragraph data of the big question and the paragraph data of the small question into a replacement file;
s90: and replacing the document content files in the test paper content set with the replacement files to obtain the test question files.
Example four:
in one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
s10: if the document test questions are obtained, obtaining test question documents from the document test questions;
s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;
s30: traversing the document content file, and acquiring document paragraph data from the document content file;
s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object;
s50: the paragraph object is added to the set plist, and the set plist is taken as a test paper content set.
Alternatively, the computer program when executed by the processor may further implement the steps of:
s60: acquiring a preset matching rule, traversing a test paper content set according to the matching rule to obtain large-topic paragraph data, wherein the test paper content set is acquired by adopting a test paper content extraction method;
s70: acquiring corresponding question type description information from the big question paragraph data, and acquiring corresponding small question paragraph data according to the question type description information;
s80: composing the paragraph data of the big question and the paragraph data of the small question into a replacement file;
s90: and replacing the document content files in the test paper content set with the replacement files to obtain the test question files.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A test paper content extraction method is characterized by comprising the following steps:
s10: if the document test questions are obtained, obtaining test question documents from the document test questions;
s20: acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;
s30: traversing the document content file, and acquiring document paragraph data from the document content file;
s40: acquiring text content in each document paragraph data, and forming the text content in each document paragraph data into a corresponding paragraph object;
s50: adding the paragraph object to a set plist, and taking the set plist as a test paper content set.
2. The test paper content extraction method according to claim 1, wherein the step S20 includes:
s21: obtaining a document format from the test question document;
s22: judging the compatibility of the document format, and if the document format is judged to be incompatible, converting the document format into a compatible format;
s23: and obtaining a document content file from the test question document in the compatible format.
3. A test paper matching method is characterized by comprising the following steps:
s60: acquiring a preset matching rule, and traversing a test paper content set according to the matching rule to obtain large-topic paragraph data, wherein the test paper content set is acquired by adopting the test paper content extraction method of any one of claims 1-2;
s70: acquiring corresponding question type description information from the big question paragraph data, and acquiring corresponding small question paragraph data according to the question type description information;
s80: composing the big-question paragraph data and the small-question paragraph data into a replacement file;
s90: and replacing the document content files in the test paper content set with the replacement files to obtain test question files.
4. The paper matching method of claim 1, wherein after step S60 and before step S70, the paper matching method further comprises:
s61: if the large-topic paragraph data is the first matched large-topic paragraph data and the first matched large-topic paragraph data is not the first paragraph object of the test paper content set, acquiring a title matching rule from the matching rule;
s62: acquiring the object serial number of the large-topic paragraph data obtained by the first matching, and acquiring the corresponding paragraph object smaller than the object serial number from the test paper content set;
s63: and matching the corresponding paragraph objects smaller than the object sequence number by using the title matching rule, and if the matching is successful, taking the matching result as the test paper title.
5. The test paper matching method of claim 1, wherein the step S70 includes:
s71: acquiring a sub-question matching rule corresponding to each major-question paragraph data from the matching rules according to the question type description information;
s72: traversing the test paper content set according to the sub-question matching rule to obtain a sub-question list;
s73: traversing all the subtopic objects in the subtopic list, and setting the subtopic objects as corresponding subtopic objects to obtain the subtopic paragraph data.
6. A test paper content extraction device characterized by comprising:
the test question acquisition module is used for acquiring a test question document from the document test questions if the document test questions are acquired;
the content acquisition module is used for acquiring a document content file from the test question document, wherein the document content file is a file in an xml format;
the paragraph traversing module is used for traversing the document content file and acquiring document paragraph data from the document content file;
the object acquisition module is used for acquiring text contents in each document paragraph data and forming the text contents in each document paragraph data into a corresponding paragraph object;
and the object adding module is used for adding the paragraph object into a set plist and taking the set plist as a test paper content set.
7. A test paper matching apparatus, characterized in that the test paper matching apparatus comprises:
the large-topic paragraph traversal module is used for acquiring a preset matching rule, and traversing a test paper content set according to the matching rule to acquire large-topic paragraph data, wherein the test paper content set is acquired by adopting the test paper content extraction method of any one of claims 1-2;
the small topic paragraph acquisition module is used for acquiring corresponding topic type description information from the large topic paragraph data and acquiring corresponding small topic paragraph data according to the topic type description information;
the replacing file obtaining module is used for forming a replacing file by the big question paragraph data and the small question paragraph data;
and the replacing module is used for replacing the document content file in the test paper content set with the replacing file to obtain the test question file.
8. The test paper matching apparatus of claim 7, wherein the test paper matching apparatus further comprises: the matching sub-module is used for acquiring a title matching rule from the matching rule if the large-topic paragraph data is the first matched large-topic paragraph data and the first matched large-topic paragraph data is not the first paragraph object of the test paper content set;
the object acquisition sub-module is used for acquiring the object serial number of the large-topic paragraph data obtained by the first matching and acquiring the corresponding paragraph object smaller than the object serial number from the test paper content set;
and the title matching sub-module is used for matching the corresponding paragraph objects smaller than the object serial number by using the title matching rule, and if the matching is successful, the matching result is used as the test paper title.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the test paper content extraction method according to any one of claims 1 to 2 when executing the computer program or implements the steps of the test paper matching method according to any one of claims 3 to 5 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the test paper content extraction method according to any one of claims 1 to 2; alternatively, the computer program realizes the steps of the test paper matching method according to any one of claims 3 to 5 when executed by a processor.
CN201910876239.8A 2019-09-17 2019-09-17 Test paper content extraction method, test paper matching method, device, equipment and medium Pending CN110674354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910876239.8A CN110674354A (en) 2019-09-17 2019-09-17 Test paper content extraction method, test paper matching method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910876239.8A CN110674354A (en) 2019-09-17 2019-09-17 Test paper content extraction method, test paper matching method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN110674354A true CN110674354A (en) 2020-01-10

Family

ID=69077999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910876239.8A Pending CN110674354A (en) 2019-09-17 2019-09-17 Test paper content extraction method, test paper matching method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110674354A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001183A (en) * 2020-07-26 2020-11-27 湖南省侍禾教育科技有限公司 Segmentation and extraction method and system for primary and secondary school test questions based on paragraph semantics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193906A (en) * 2010-03-11 2011-09-21 北京商纳科技有限公司 Method for automatically introducing examination paper in WORD format into database system
US20140178848A1 (en) * 2012-12-24 2014-06-26 Teracle, Inc. Method and apparatus for administering learning contents
CN104298652A (en) * 2013-07-19 2015-01-21 深圳习习网络科技有限公司 Electronic test paper format conversion method and device
CN106354740A (en) * 2016-05-04 2017-01-25 上海秦镜网络科技有限公司 Electronic examination paper inputting method
CN109614594A (en) * 2018-11-27 2019-04-12 浙江万朋教育科技股份有限公司 A method of topic document is resolved into exam pool data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193906A (en) * 2010-03-11 2011-09-21 北京商纳科技有限公司 Method for automatically introducing examination paper in WORD format into database system
US20140178848A1 (en) * 2012-12-24 2014-06-26 Teracle, Inc. Method and apparatus for administering learning contents
CN104298652A (en) * 2013-07-19 2015-01-21 深圳习习网络科技有限公司 Electronic test paper format conversion method and device
CN106354740A (en) * 2016-05-04 2017-01-25 上海秦镜网络科技有限公司 Electronic examination paper inputting method
CN109614594A (en) * 2018-11-27 2019-04-12 浙江万朋教育科技股份有限公司 A method of topic document is resolved into exam pool data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001183A (en) * 2020-07-26 2020-11-27 湖南省侍禾教育科技有限公司 Segmentation and extraction method and system for primary and secondary school test questions based on paragraph semantics

Similar Documents

Publication Publication Date Title
US10169337B2 (en) Converting data into natural language form
US11314807B2 (en) Methods and systems for comparison of structured documents
CN111680634B (en) Document file processing method, device, computer equipment and storage medium
CN111178088B (en) Configurable neural machine translation method for XML document
WO2020155749A1 (en) Method and apparatus for constructing personal knowledge graph, computer device, and storage medium
CN106960058A (en) A kind of structure of web page alteration detection method and system
CN110955608B (en) Test data processing method, device, computer equipment and storage medium
CN113220782A (en) Method, device, equipment and medium for generating multivariate test data source
CN112560423A (en) Document processing method, device, equipment and medium combining RPA and AI
CN114398873A (en) Sensitive word processing method and processing device
CN110674354A (en) Test paper content extraction method, test paper matching method, device, equipment and medium
CN113836947B (en) Method, device, equipment and storage medium for translating terms after machine translation
CN110275712B (en) Text replacement method, device and equipment
CN109284401A (en) The addition of courseware label, courseware recommended method, device and storage medium
CN114358032A (en) Machine translation error detection model training method, device, equipment and medium
CN112364632A (en) Book checking method and device
CN112364640A (en) Entity noun linking method, device, computer equipment and storage medium
CN104503992A (en) Question bank construction method
CN113033149B (en) User story document quality inspection method, device, equipment and storage medium
CN115470127B (en) Page compatibility processing method, device, computer equipment and storage medium
CN113408250B (en) Project file processing method and device
CN113505570B (en) Reference is made to empty checking method, device, equipment and storage medium
CN111061863B (en) Journal catalog display method, device and equipment
US20180293231A1 (en) Linguistic intelligence using language validator
CN116910329A (en) Automatic file checking, classifying and storing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200110

RJ01 Rejection of invention patent application after publication