CN110879987B

CN110879987B - Method for identifying answer content of test questions

Info

Publication number: CN110879987B
Application number: CN201911149719.0A
Authority: CN
Inventors: 王红接; 刘林; 刘恒鲁
Original assignee: Chengdu Dongfang Wendao Technology Development Co ltd
Current assignee: Chengdu Dongfang Wendao Technology Development Co ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2023-06-09
Anticipated expiration: 2039-11-21
Also published as: CN110879987A

Abstract

The invention relates to the technical field of intelligent recognition, and discloses a method for recognizing answer contents of test questions. The invention provides a new method for identifying the identification position of a response area and the response content of a test question without manual intervention, which comprises a test question pre-preparation stage and a test question response shooting stage, wherein a plurality of anchor points of a test question master plate and the marked response area positions are firstly determined in the test question pre-preparation stage, then the accurate positions of a plurality of anchor points in a shooting picture are determined in the test question response shooting stage in an image feature matching mode, then a perspective transformation matrix is obtained according to the positions of the anchor points in a scanning picture and the shooting picture, and finally the identification position of the response area and the response content of the test question are obtained according to the marked response area positions and the perspective transformation matrix, so that a user can directly obtain the response content of the test question after the test question is answered and shot without manual intervention. In addition, the method is applicable to the test questions of the full-grade general subject, and is convenient for practical application and popularization.

Description

Method for identifying answer content of test questions

Technical Field

The invention belongs to the technical field of intelligent recognition, and particularly relates to a method for recognizing answer contents of test questions.

Background

In recent years, with the development of continuous upgrading of information technology, the use of mobile terminal devices has become more popular, and more convenient, rapid and efficient working and learning modes have become popular. In the traditional education field, new generation of education informatization upgrading exploration has been gradually developed. In the existing basic education stage in China, the main investigation forms of the learning condition of students are still various types of examinations, including high-grade college entrance examination, middle-grade entrance examination, daily family work of basic level teachers, unit examination, various period end examination, consultation, joint examination, model examination and the like. In this case, the teacher carries a great amount of effort to correct the job and the test paper. Therefore, various auxiliary examination methods are gradually used in various examination scenes, such as collecting answer contents by adopting a photographing answer mode, so as to achieve the purposes of remote guidance, remote examination, automatic examination and the like.

The current photographing and answering modes are mainly divided into the following two types.

(1) Aiming at the condition that the photographed pages have corresponding electronic pages: the service provider makes electronic pages in advance and inputs information such as the area of each question, the area of the answer area, the standard answer content and the like on the pages; the user designates the page on the APP and then shoots the page, so that the APP knows the electronic page corresponding to the current photo; after photographing, the APP judges the position of the pages in the photo by detecting edges of the photo, and then changes the pages into a standard rectangle through perspective change; searching a question number in the transformed picture, and judging the positions of the questions and the answer areas by combining the previously recorded question information; the answer area is cut out for identification, so that the answer area is converted into electronic answer content. However, this approach has two drawbacks: firstly, only mathematical questions can be processed, and the subject is single; secondly, the edge recognition is not accurate enough, and a user is required to adjust the edge recognition to give out correct edge information.

(2) For the case that the photographed page does not have a corresponding electronic page: the service provider enters a text question; a user photographs on an APP, then, a transverse and vertical arithmetic expression in the photograph is identified through the APP, and simple arithmetic expression identification is carried out; and/or the APP adopts OCR technology to recognize the answer text, and searches out the corresponding title on the page in the background to judge. However, this approach has the disadvantage of processing only primary school mathematics, with a single discipline.

Disclosure of Invention

In order to solve the problems that the edge identification is inaccurate and manual intervention is needed in the current test question answering content acquisition process, the invention aims to provide a novel method for identifying test question answering content.

The technical scheme adopted by the invention is as follows:

a method for identifying answer content of test questions comprises a test question pre-preparation stage and a test question answer photographing stage;

the test question pre-preparation stage comprises the following steps S101 to S103:

s101, acquiring a scanning picture of a test question page, and equally dividing the scanning picture into n sub-scanning pictures, wherein n is a natural number not less than 3;

s102, scanning a region with the most image feature points by adopting a square sliding window aiming at each sub-scanning picture at the outermost periphery, taking the region as an anchor point of a corresponding sub-scanning picture, and finally obtaining a test question master plate containing all anchor points, wherein the side length of the square sliding window is smaller than the width of the corresponding sub-scanning picture;

s103, acquiring topic information manually marked on the scanned picture, wherein the topic information comprises a response area position;

the test question answering and photographing stage comprises the following steps S201 to S207:

s201, obtaining a photographed picture of a answered test question page, and obtaining a test question master set and question information corresponding to the answered test question page;

s202, selecting m anchor points closest to the edge line of a scanned picture in a test question master, estimating preliminary positions of the m anchor points in the photographed picture, and finally cutting out m sub-photographed pictures containing different anchor points according to each preliminary position, wherein m is a natural number which is not less than 4 and not more than 4 x (n-1);

s203, sending the obtained m sub-photographed pictures and the test question master set to a matching server, and sending the photographed pictures to a handwriting recognition server;

s204, receiving a matching result from a matching server, wherein the matching result is the accurate position of each selected anchor point in the corresponding sub-photographed picture;

s205, determining the accurate positions of the selected anchor points in the corresponding sub-photographed pictures according to the accurate positions of the selected anchor points in the photographed pictures, selecting 4 anchor points closest to the edge line of the photographed pictures from m selected anchor points, and finally calculating a perspective transformation matrix of the photographed pictures according to the accurate positions of the 4 selected anchor points in the photographed pictures and the positions of the selected anchor points in the scanned pictures;

s206, obtaining a response area identification position in the photographed picture according to the perspective transformation matrix and the response area position in the subject information;

s207, obtaining answer content corresponding to the answer area identification position according to the answer area identification position and the handwriting identification result from the handwriting identification server.

Preferably, when the topic information further includes a stem location, the method further includes the following steps before the step S207:

and identifying the lifting content in the handwriting identification result in a character comparison mode, determining the corresponding coordinate position of the lifting content in the photographed picture, and correcting the answer region identification position according to the mapping relation between the stem position and the coordinate position to obtain a more accurate answer region identification position.

Preferably, the question information further comprises standard answer content.

Preferably, in the step S102, the image feature values of the anchors are also calculated, and the test question master set contains the image feature values of the anchors.

The optimization further comprises a server matching processing stage: and carrying out image feature matching on each sub-photographing picture and the corresponding selected anchor point in the test question master plate to obtain the accurate position of each selected anchor point in the corresponding sub-photographing picture.

Specifically, an opencv-based SURF algorithm, an SI FT algorithm, an ORB algorithm or a FAST algorithm is adopted for image feature matching.

Preferably, the method further comprises a server identification processing stage: and recognizing each character on the photographed picture by adopting a handwriting recognition model trained through deep learning in advance, and obtaining a handwriting recognition result comprising the dry content and the answer content.

Specifically, a YOLO target detection network model is adopted for deep learning, and then the handwriting recognition model is obtained.

Specifically, when n is 3, m takes a value of 6.

Specifically, the side length of the square sliding window is 1/30-1/10 of the width of the scanned picture or the sub-scanned picture.

The beneficial effects of the invention are as follows:

(1) The invention provides a new method capable of identifying answer area identification positions and answer contents of test questions without manual intervention, which comprises a test question pre-preparation stage and a test question answer photographing stage, wherein a plurality of anchor points of a test question master plate and the positions of the marked answer areas are firstly determined in the test question pre-preparation stage, then the accurate positions of a plurality of anchor points in a photographing picture are determined in an image feature matching mode in the test question answer photographing stage, then perspective transformation matrixes are obtained according to the positions of the anchor points in a scanning picture and the photographing picture, and finally the answer area identification positions and the answer contents of the test questions are obtained according to the marked answer area positions and the perspective transformation matrixes, so that users can directly obtain the answer contents of the test questions after photographing without manual intervention, and user experience is greatly improved;

(2) The handwriting recognition model trained through deep learning in advance is adopted to recognize each character on the photographed picture, so that various characters, symbols or mathematical formulas and the like can be recognized, and the handwriting recognition model has the strong generalization characteristic, can be further suitable for the test questions of all-grade general subjects, and is convenient for practical application and popularization.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for identifying answer content of a test question.

Detailed Description

The invention will be further elucidated with reference to the drawings and to specific embodiments. The present invention is not limited to these examples, although they are described in order to assist understanding of the present invention. Specific structural and functional details disclosed herein are merely representative of example embodiments of the invention. This invention may, however, be embodied in many alternate forms and should not be construed as limited to the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments of the present invention.

It should be understood that for the term "and/or" that may appear herein, it is merely one association relationship that describes an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a alone, B alone, and both a and B; for the term "/and" that may appear herein, which is descriptive of another associative object relationship, it means that there may be two relationships, e.g., a/and B, it may be expressed that: a alone, a alone and B alone; in addition, for the character "/" that may appear herein, it is generally indicated that the context associated object is an "or" relationship.

It will be understood that when an element is referred to herein as being "connected," "connected," or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to herein as being "directly connected" or "directly coupled" to another element, it means that there are no intervening elements present. In addition, other words used to describe relationships between elements (e.g., "between … …" pair "directly between … …", "adjacent" pair "directly adjacent", etc.) should be interpreted in a similar manner.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," "including" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, and do not preclude the presence or addition of one or more other features, quantities, steps, operations, elements, components, and/or groups thereof.

It should be appreciated that in some alternative embodiments, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

It should be understood that specific details are provided in the following description to provide a thorough understanding of the example embodiments. However, it will be understood by those of ordinary skill in the art that the example embodiments may be practiced without these specific details. For example, a system may be shown in block diagrams in order to avoid obscuring the examples with unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the example embodiments.

Example 1

As shown in fig. 1, the method for identifying answer content of a test question provided in this embodiment includes a preparation stage of a test question in front and a photographing stage of answer of a test question.

The pre-preparation stage of the test question is performed on the test question library side, and may include, but is not limited to, the following steps S101 to S103.

S101, acquiring a scanning picture of a test question page, and equally dividing the scanning picture into n sub-scanning pictures, wherein n is a natural number not smaller than 3.

In the step S101, the scanned picture is imported by a service provider. In addition, the mode of equally dividing the picture is the conventional processing mode, for example, the scanned picture is equally divided into 9 sub-scanned pictures.

S102, scanning an area with the most image feature points by adopting a square sliding window aiming at each sub-scanning picture at the outermost periphery, taking the area as an anchor point of a corresponding sub-scanning picture, and finally obtaining a test question master plate containing all anchor points, wherein the side length of the square sliding window is smaller than the width of the corresponding sub-scanning picture.

In the step S102, if the scanned pictures are equally divided into 9 sub-scanned pictures, 8 sub-scanned pictures at the outermost periphery (i.e. the sub-scanned pictures at the center are removed), and the number of the feature points of the scanned image by adopting the square sliding window is the existing conventional scanning mode; the side length of the square sliding window is preferably 1/30 to 1/10 of the width of the scanned picture or the subsampled picture, for example, specifically 1/20. In addition, in order to reduce the data volume of the test question master set, space saving storage and flow saving transmission are facilitated, in the step S102, image feature values of the anchor points are also calculated, the test question master set contains the image feature values of the anchor points, and the calculation mode of the image feature values is an existing conventional mode.

S103, acquiring topic information manually marked on the scanned picture, wherein the topic information comprises a response area position.

In step S103, the topic information is obtained by manually labeling on a man-machine interaction interface by a service provider. Specifically, the question information may further include a stem location, standard answer content, and the like.

The test question answering and photographing stage is performed at the user, and may include, but is not limited to, the following steps S201 to S207.

S201, obtaining a photographed picture of the answered test question page, and obtaining a test question master set and question information corresponding to the answered test question page.

In the step S201, the photographed image is imported by the user after photographing the page of the answer questions, for example, using the mobile phone APP to photograph the answer questions. In addition, the test question master plate and the question information can be obtained by accessing a test question library mode and inquiring according to the same test question page number.

S202, selecting m anchor points closest to the edge line of the scanned picture in the test question master plate, estimating preliminary positions of the m anchor points in the photographed picture, and finally cutting out m sub-photographed pictures containing different anchor points according to each preliminary position, wherein m is a natural number which is not less than 4 and not more than 4 x (n-1).

In the step S202, the preliminary location estimation manner of the single selection anchor point may be, but is not limited to,: and taking the position of the selected anchor point in the scanned picture as the preliminary position in the photographed picture. And when the preliminary position is determined, the sub-photographing pictures corresponding to the selected anchor points one by one can be obtained by adopting a conventional cutting mode, and the sizes of the sub-photographing pictures can be the same or different. For example, when n is 3, the value of m is 6, that is, 6 anchor points closest to the edge line of the scanned picture are selected from the test question master.

S203, sending the obtained m sub-photographed pictures and the test question master set to a matching server, and sending the photographed pictures to a handwriting recognition server.

In the step S203, the matching server is configured to perform image feature matching processing on each sub-photographed picture with a corresponding selected anchor point (i.e., an anchor point area or a calculated image feature value), so as to obtain an accurate position of each selected anchor point in the corresponding sub-photographed picture. Namely, the method for identifying the answer content of the test questions further comprises a server matching processing stage: and carrying out image feature matching on each sub-photographing picture and the corresponding selected anchor point in the test question master plate to obtain the accurate position of each selected anchor point in the corresponding sub-photographing picture. Specifically, but not limited to, the SURF algorithm (Speeded Up Robust Feature, acceleration robust feature, a well-known scale invariant feature detection method), the SIFT algorithm (ScaleInvariant Feature Transform, scale invariant feature conversion, another well-known scale invariant feature detection method), the ORB algorithm (Brief, short for Brief, an improved version of Brief algorithm, 100 times faster than SIFT algorithm and 10 times faster than SURF algorithm) or the FAST algorithm (Features from Accelerated Segment Test, acceleration segmentation test is adopted to obtain features, the algorithm can be specially used for rapidly detecting interest points, and only a few pixels need to be compared to determine whether the interest points are key points) and other existing algorithms are adopted for image feature matching.

In the step S203, the handwriting recognition server is configured to perform handwriting content recognition on the photographed picture, and obtain information such as answer content and dry content therein. Namely, the method for identifying the answer content of the test questions further comprises a server identification processing stage: and recognizing each character on the photographed picture by adopting a handwriting recognition model trained through deep learning in advance, and obtaining a handwriting recognition result comprising the dry content and the answer content. Specifically, it is preferable to use a YOLO target detection network model to perform deep learning, and then obtain the handwriting recognition model. Since YOLO is an end-to-end target detection algorithm, the region pro-pos may be output over a network without pre-fetching the region pro-pos: the category, the confidence coefficient and the coordinate position have the characteristic of high detection speed, and can be favorable for quickly detecting and learning a large number of characters. In addition, in order to quickly complete the picture transmission, the photographed picture may be compressed and then transmitted to the handwriting recognition server.

S204, receiving a matching result from a matching server, wherein the matching result is the accurate position of each selection anchor point in the corresponding sub-photographing picture.

S205, determining the accurate positions of the selected anchor points in the corresponding sub-photographed pictures according to the accurate positions of the selected anchor points in the photographed pictures, selecting 4 anchor points closest to the edge line of the photographed pictures from m selected anchor points, and finally calculating a perspective transformation matrix of the photographed pictures according to the accurate positions of the 4 selected anchor points in the photographed pictures and the positions of the selected anchor points in the scanned pictures.

In the step S205, the calculation method of the perspective transformation matrix is an existing conventional method.

S206, obtaining the identification position of the answer area in the photographed picture according to the perspective transformation matrix and the answer area position in the subject information.

In step S206, specifically, the answer region position in the question information is substituted into the perspective transformation matrix, so as to obtain the answer region identification position in the photographed picture.

In the step S207, since the dry content and the answer content in the handwriting recognition result have corresponding coordinate positions in the photographed picture, the answer content corresponding to the answer region recognition position can be found through position matching. Preferably, the content of the note is considered to be directly identified by a character comparison mode, so that the corresponding coordinate position of the note can be determined firstly. Therefore, in order to further precisely locate the answer area identification position, when the question information further includes a stem position, the method further includes the following steps before the step S207: and identifying the lifting content in the handwriting identification result in a character comparison mode, determining the corresponding coordinate position of the lifting content in the photographed picture, and correcting the answer region identification position according to the mapping relation between the stem position and the coordinate position to obtain a more accurate answer region identification position.

The application process of the foregoing steps S101 to S207 is specifically but not limited to: the service provider scans the test question pages firstly, and then adopts steps S101-S103 to manufacture an electronic book containing a plurality of test question pages; when a teacher uses APP to arrange homework and students answer on paper test sheets, the APP can be used for photographing, then answer contents are directly obtained through steps S201-S207, and finally the teacher can continue to use the APP to check the answer contents, so that the purposes of remote paper reading and the like are achieved.

In summary, the method for identifying the answer content of the test questions provided by the embodiment has the following technical effects:

(1) The embodiment provides a novel method for identifying answer area identification positions and answer contents of test questions without manual intervention, which comprises a test question pre-preparation stage and a test question answer photographing stage, wherein a plurality of anchor points of a test question master plate and the positions of the marked answer areas are firstly determined in the test question pre-preparation stage, then the accurate positions of a plurality of anchor points in a photographing picture are determined in an image feature matching mode in the test question answer photographing stage, then perspective transformation matrixes are obtained according to the positions of the anchor points in a scanning picture and the photographing picture, and finally the answer area identification positions and the answer contents of the test questions are obtained according to the marked answer area positions and the perspective transformation matrixes, so that users can directly obtain the answer contents of the test questions after photographing without manual intervention, and user experience is greatly improved;

The various embodiments described above are merely illustrative and may or may not be physically separate if reference is made to the unit being described as separate components; if a component is referred to as being a unit, it may or may not be a physical unit, may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents. Such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Finally, it should be noted that the invention is not limited to the alternative embodiments described above, but can be used by anyone in various other forms of products in the light of the present invention. The above detailed description should not be construed as limiting the scope of the invention, which is defined in the claims and the description may be used to interpret the claims.

Claims

1. The method for identifying the answer content of the test question is characterized by comprising a test question pre-preparation stage and a test question answer photographing stage;

s103, acquiring topic information manually marked on the scanned picture, wherein the topic information comprises a response area position and a stem position;

s203, sending the obtained m sub-photographed pictures and the test question master set to a matching server, and sending the photographed pictures to a handwriting recognition server, wherein the matching server is used for carrying out image feature matching processing on each sub-photographed picture with corresponding selected anchor points to obtain accurate positions of each selected anchor point in the corresponding sub-photographed picture, and the handwriting recognition server is used for carrying out handwriting content recognition on the photographed pictures to obtain answer content and stem content;

s207, recognizing the stem content in the handwriting recognition result in a character comparison mode, determining the corresponding coordinate position of the stem content in the photographed picture, correcting the answer region recognition position according to the mapping relation between the stem position and the coordinate position to obtain a more accurate answer region recognition position, and finally finding answer content corresponding to the answer region recognition position through position matching according to the answer region recognition position and the handwriting recognition result from the handwriting recognition server.

2. The method of claim 1, wherein the question information further comprises standard answer content.

3. The method of claim 1, wherein in the step S102, image feature values of the respective anchors are also calculated, and the test question master contains the image feature values of the respective anchors.

4. The method for identifying answer content of test questions as claimed in claim 1, further comprising a server matching processing stage: and carrying out image feature matching on each sub-photographing picture and the corresponding selected anchor point in the test question master plate to obtain the accurate position of each selected anchor point in the corresponding sub-photographing picture.

5. The method for identifying answer content of test questions according to claim 4, wherein image feature matching is performed by adopting an opencv-based SURF algorithm, a SIFT algorithm, an ORB algorithm or a FAST algorithm.

6. The method for identifying answer content of test questions as claimed in claim 1, further comprising a server identification processing stage: and recognizing each character on the photographed picture by adopting a handwriting recognition model trained through deep learning in advance, and obtaining a handwriting recognition result containing the stem content and the answer content.

7. The method for recognizing answer contents of examination questions according to claim 6, wherein a YOLO object detection network model is adopted for deep learning, and then the handwriting recognition model is obtained.

8. The method of claim 1, wherein when n is 3, m is 6.

9. The method for identifying answer contents of test questions as claimed in claim 1, wherein the side length of the square sliding window is 1/30-1/10 of the width of the scanning picture or the sub-scanning picture.