CN111507251B - Method and device for positioning answering area in test question image, electronic equipment and computer storage medium - Google Patents

Method and device for positioning answering area in test question image, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN111507251B
CN111507251B CN202010300296.4A CN202010300296A CN111507251B CN 111507251 B CN111507251 B CN 111507251B CN 202010300296 A CN202010300296 A CN 202010300296A CN 111507251 B CN111507251 B CN 111507251B
Authority
CN
China
Prior art keywords
text
area
image
test question
handwritten
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010300296.4A
Other languages
Chinese (zh)
Other versions
CN111507251A (en
Inventor
袁枫
何小坤
单海蛟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202010300296.4A priority Critical patent/CN111507251B/en
Publication of CN111507251A publication Critical patent/CN111507251A/en
Application granted granted Critical
Publication of CN111507251B publication Critical patent/CN111507251B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

The embodiment of the application provides a method and a device for positioning a response area in a test question image, electronic equipment and a computer storage medium. The method for positioning the answer area in the test question image comprises the following steps: and determining a answering area mark in the test image according to the target text line detected from the test image, and determining a corresponding answering area in the test image according to the determined answering area mark and a handwritten text area included in the target text area detected from the test image. According to the method and the device, the answering area mark is searched in the test question image according to the target text line, so that the answering area corresponding to the test question image is obtained according to the answering area mark and the handwritten text area in the target text area.

Description

Method and device for positioning answer area in test question image, electronic equipment and computer storage medium
Technical Field
The embodiment of the application relates to the technical field of electronic information, in particular to a method and a device for positioning a answering area in a test question image, electronic equipment and a computer storage medium.
Background
With the rapid development of computer and internet technologies, teaching contents are gradually enriched, a large number of students work and many examinations exist at present, problems such as difficulties are encountered in the work, or when teachers correct test papers, users can convert paper-based test paper or work information into images through modes such as shooting test papers or work and the like, then the images are guided into an automatic paper reading system to conduct relevant processing, for example, objective question portions in the test papers comprise handwriting bodies and printing bodies, after shooting and uploading, the answering areas of the objective question portions in shot pictures are found out through positioning the handwriting bodies in the test paper images, and therefore the purpose that automatic equipment is used for conducting paper reading or correcting work and the like on the test paper images is achieved.
In the prior art, when the answering area of the objective questions in the test paper is distinguished in a complex scene, the answering area can be located from the test paper image by means of a fixed template and the image has requirements on format, definition and the like, for example, the answering area in the test paper image is located by comparing the fixed template which stores the test paper in advance in the system with the test paper image with a specific format.
However, not all of the test papers have matching fixed templates, and it is difficult to locate the answering area in the test paper image from the test paper image without the fixed templates.
Disclosure of Invention
In view of the above, one of the technical problems to be solved by the embodiments of the present application is to provide a method, an apparatus, an electronic device and a computer storage medium for positioning a response area in a test question image, so as to overcome the drawbacks of the prior art.
In a first aspect, an embodiment of the present application provides a method for positioning a response area in a test question image, where the method includes:
determining a answering area mark in the test question image according to a target text line detected from the test question image;
and determining a corresponding answering area in the test question image according to the determined answering area mark and a handwritten text area included in a target text area detected from the test question image.
In a second aspect, an embodiment of the present application provides an apparatus for locating a response area in a test question image, where the apparatus includes:
the answer area mark determining module is used for determining an answer area mark in the test question image according to the target text line detected from the test question image;
and the answer area determining module is used for determining a corresponding answer area in the test image according to the determined answer area mark and a handwritten text area included in a target text area detected from the test image.
In a third aspect, an embodiment of the present application provides an electronic device, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method for positioning the answer area in the test question image according to the first aspect.
In a fourth aspect, the present application provides a computer storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for locating a response area in a test question image as described in the first aspect or any one of the embodiments of the first aspect.
In the embodiment of the present application, the embodiment of the present application provides a method and an apparatus for positioning a response area in a test question image, an electronic device, and a computer storage medium. The method for positioning the answering area in the test question image comprises the following steps: and determining a answering area mark in the test image according to the target text line detected from the test image, and determining a corresponding answering area in the test image according to the determined answering area mark and a handwritten text area included in the target text area detected from the test image. According to the method and the device, the answering area mark is searched in the test question image according to the target text line, so that the answering area corresponding to the test question image is obtained according to the answering area mark and the handwritten text area in the target text area.
Drawings
Some specific embodiments of the present application will be described in detail below by way of example and not by way of limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
fig. 1 is a flowchart of a method for positioning a response area in a test question image according to an embodiment of the present application;
fig. 2 is an application scenario diagram provided in the embodiment of the present application;
fig. 3 is a structural diagram of a TextSnake model provided in an embodiment of the present application;
fig. 4 is a text line detection result provided in the embodiment of the present application;
fig. 5 is a structural diagram of a Mask RCNN model according to an embodiment of the present application;
fig. 6 is a flowchart of correcting a test question image according to an embodiment of the present application;
fig. 7 is a positioning device for answering area in test question image according to an embodiment of the present application;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.
It should be noted that, the target in the present application is only for representing a singular concept, and is not limited to any one, and is not particularly limited to any one, for example, the target text area refers to a text area obtained by detecting a test image through a model, and may be a text area obtained according to any one test image, and the target text block refers to any one text block; first and second in this application are for distinguishing names and do not represent sequential relationships and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated, such as a first coincidence length threshold and a second coincidence length threshold.
The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.
Example one
An embodiment of the present application provides a method for positioning an answer area in a test question image, and fig. 1 is a flowchart of the method for positioning an answer area in a test question image provided in the embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, according to the target text line detected from the test question image, a answering area mark in the test question image is determined.
In the embodiment of the application, the test questions may be test papers or examination questions or examination papers or homework, and the format of the test question images is not limited, for example, the test question images may be presented in the form of pictures, and the test question images may be test paper images to be reviewed or homework images for self-examination by students.
Here, an example of a scene to which the method for positioning the answer area in the test question image can be applied is described, and as shown in fig. 2, fig. 2 is an application scene diagram provided in the embodiment of the present application, and fig. 2 is only an implementable scene and does not constitute a limitation on the application scene of the present application, and the present application may also be applied to other scenes. In the application scenario shown in fig. 2, a test paper is photographed to obtain a test image, and the test image is uploaded to a system, where the system includes a computer program capable of locating an answering area in the test image, text area detection is performed on the test image to obtain a printed text area and a handwritten text area, text blocks in the test image are detected to obtain a text line, an answering area mark is obtained according to the text line and the printed text area, an expanded handwritten text area is constructed according to the position of the answering area mark, and an answering area is determined according to the handwritten text area and the constructed expanded handwritten text area. The answer area can be positioned from the image only by photographing and uploading the test paper in the application scene without special requirements on the format of the image, the matching template of the test paper is not needed, the answer area can be suitable for the image of the free layout, and the efficiency of positioning the answer area is improved.
Before step S101, a rotation process may be performed on the initial image to obtain a test question image, for example, preprocessing such as binarization and tilt correction may be performed on the acquired test paper image.
In a shooting scene, the initial image collected by the shooting device has uncertainty in the shooting scene and angle, so that the collected initial image can have the phenomena of inclination and the like, and the sizes of the images obtained by different shooting devices are different, so that the collected initial image is preprocessed, and test question images are obtained. It should be noted here that in some application scenarios, if the initial image has no or negligible tilt, the initial image may be directly used as the test question image.
Here, taking an example that an initial image is preprocessed to obtain a test question image, optionally, in an embodiment of the present application, the preprocessing of the image includes the following steps S1001 to S1004:
step S1001, detecting a text region in the initial image according to a preset initial text region detection model to obtain an initial text region.
The acquisition method of the initial image includes, but is not limited to, taking a picture by using a camera of a mobile device and uploading or scanning by using a scanner, in the embodiment of the present application, for some images with particularly high resolution, the image size is normalized, for example, a normalized width Nw and a normalized height Nh are normalized, scaling is performed only when the width w of the image is greater than Nw and the height h of the image is greater than Nh, a scaling ratio ZR = min (Nw/w, nh/h), min represents a minimum value, which is only an exemplary description, or other size specification modes may be used.
It should be noted that the initial text region detection model in the present application may be a textSnake model trained in advance, or may be a CTPN model, and the embodiment of the present application is not limited thereto. Here, taking a TextSnake model as an example for explanation, as shown in fig. 3, fig. 3 is a structure diagram of a TextSnake model provided in this embodiment of the present application, the TextSnake model is a flexible representation for detecting a text of an arbitrary shape, and because of its excellent performance in adapting to the diversity of a text structure, the shape is adapted to an external environment like a snake, so the TextSnake model is called, in fig. 3, an initial image is input into the TextSnake model as an input image, the TextSnake model includes feature extraction, feature merging, and an output layer, and a feature map is obtained after feature extraction is performed through five convolution stages (Conv stages), where f1 convolution kernels of a first convolution layer in the five convolution stages are 32, f2 of a second convolution layer are 64, f3 of a third convolution kernel is 128, f4 of a fourth convolution layer is 256, and f5 of the third convolution kernel is 512; merging features according to the feature diagram h1, the feature diagram h2, the feature diagram h3, the feature diagram h4 and the feature diagram h5, and sequentially overlapping the stages, wherein each stage consists of a feature diagram extracted from the previous stage and a corresponding backbone network layer,
h i =conv 3×3 (conv 1×1 [f i-1 ;UpSampling ×2 (h i-1 )]) The for i is more than or equal to 2, upSampling represents upsampling, the characteristic diagram h2 is obtained according to the characteristic diagram h5 and the fourth-layer convolutional layer f4, the characteristic diagram h3 is obtained according to the characteristic diagram h2 and the third-layer convolutional layer f3, and the characteristic diagram h4 is obtained according to the characteristic diagram h3 and the second-layer convolutional layer f 2; at the output level, after the fifth merging stage, a new feature map h5, h is obtained final =UpSampling ×2 (h 5 ) In the present application, example h final Is a feature map h5, with dimensions 1/2 of the original input image, then adding up-sampling layers and 2 convolutional layers to generate the final pixel-level predicted output layer P, where P = conv 1×1 (conv 3×3 (h final ) The output layer P in fig. 3 is denoted by predictor. Inputting the initial image into a TextSnake model, outputting an initial region detection energy map, and binarizing the initial region detection energy map by using the Otsu method (OTSU) to obtain an initial text region
Step S1002, masking the initial image according to the initial text area to obtain an initial text image.
Performing adaptive binarization on the initial image or the normalized initial image, and performing masking (mask) by using the initial text region obtained in step S1001 to obtain an initial text image S (x, y), where the initial text image S (x, y) is a binarized image. The area required in the embodiment of the application is a text area, and an area except for the text is not required, and the initial text area is masked to the initial image, so that an initial text image can be obtained, wherein the initial text image comprises a handwritten text and a printed text.
Step S1003, determining at least one rotation angle according to the connected components detected from the initial text image.
The binarized image S (x, y) obtained in step S1002 is morphologically expanded, and generally, the binarized image is morphologically expanded, and the height and width of the expansion kernel used are equal, and in the embodiment of the present application, a rectangle with a wider expansion kernel width is selected, for example, an expansion kernel with a width greater than the height is selected, because the present application is a morphological expansion performed on a test question image including an objective question portion, the objective question portion is composed of a line of text, the line of text is greater than the height, and the width of the expansion kernel is set to be greater than the height, so that the calculation time and the operation memory can be saved. The method comprises the steps of detecting connected domains of an expanded image, calculating the minimum circumscribed rectangle of each connected domain, and filtering out small rectangles with the width smaller than a set threshold value, for example, some topic description parts are relatively far away from an objective topic part and are not the objective topic part, and parts which do not belong to the objective topic content can be filtered out by filtering out the rectangles with the width smaller than the set threshold value. When the text and the image coordinates xoy incline, an included angle exists between the minimum circumscribed rectangle and the image coordinates xoy, in the image coordinates xoy, the upper left corner of the image is taken as an original point o, the right direction is taken as the positive direction of an x axis, and the downward direction is taken as the positive direction of a y axis, included angles between all the minimum circumscribed rectangles and the image coordinates are calculated, and at least one rotation angle is obtained.
Step S1004, angle correction is carried out on pixel points in the initial text image according to the average value of at least one rotation angle, and a test question image is obtained.
The embodiment of the application calculates the average value theta of all included angles, takes the average value theta as the image rotation correction angle, and the rotation transformation formula is shown as formula I.
Figure GDA0003840958550000061
In the formula I (x) 0 ,y 0 ) Is the coordinate of the pixel point on the binary image s (x, y), (x, y)Is the pixel point coordinate on the rotated image.
The initial image is masked through the initial text area detected by the model to obtain an interested initial text image, then angle correction is carried out on pixel points in the initial text image to obtain a test question image, and the phenomena of image inclination and the like caused by uncertain shooting angles can be corrected, for example, the test paper does not align in a two-dimensional plane, so that the test paper and the image coordinate xoy in the test question image obtained by actual shooting have an inclination angle.
It should be noted that, the embodiment of the present application is only exemplary, and does not mean that the test question image in the present application is a test question image obtained by performing rotation processing on an initial image, which is required to be subjected to operations such as rotation processing, so that unnecessary interference of test questions in a subsequent detection process can be reduced, and the detection efficiency can be improved.
In step S101, when determining the answer area mark in the test question image, the target text line is also detected from the test question image, and optionally, when determining the target text line in the test question image, the following specific implementation manner may be used, but is not limited thereto.
The text blocks in the test question image are detected, the text blocks meeting the preset conditions form a character set of text lines, and the text line rectangles are constructed according to the circumscribed rectangles among the characters, so that the target text lines are obtained.
For example, the target text line detected from the test question image in step S101 may be determined by the following steps a, B, C and D:
and step A, determining a plurality of text blocks according to the connected domain detected from the test question image.
The test question image rotated in the step S1003 is a binarized image, the test question image obtained in the step 1003 is morphologically expanded, connected domains are detected on the expanded image, the minimum circumscribed rectangle of each connected domain is calculated, each minimum circumscribed rectangle is a text block, and a plurality of text blocks can be obtained. It should be noted that the text block obtained by detecting the connected component in the test question image may be the test question image rotated in step S1003, or may also be the initial image after binarization, and this embodiment of the present application is not limited thereto, and is only described by taking the test question image after rotation as an example, and does not represent the limitation of the embodiment of the present application.
And B, detecting adjacent text blocks corresponding to any text block.
Optionally, in an embodiment of the application, when detecting an adjacent text block corresponding to any text block in step B, the adjacent text block corresponding to any text block may be detected according to at least one of a euclidean distance between center points of the multiple text blocks and a slope of a connection line between the center points. When the adjacent text blocks are obtained according to the Euclidean distances between the central points of the text blocks, the text block of which the Euclidean distance from the central point of the target text block is smaller than a set Euclidean distance threshold value is used as the adjacent text block of the target text block; and when the adjacent text blocks are obtained according to the slope of the connecting line between the central points of the plurality of text blocks, the text blocks of which the slope of the connecting line between the text blocks and the central point of the target text block is smaller than a set slope threshold value are used as the adjacent text blocks of the target text block.
Here, a specific example is given for explanation, that is, an arbitrary text block a is selected from a plurality of text blocks, the remaining text blocks are traversed, and a circumscribed rectangle satisfying at least one of the following conditions in the remaining circumscribed rectangles is taken as a neighboring text block of the text block a: the Euclidean distance between the text block A and the central point is smaller than the circumscribed rectangle of the Euclidean distance threshold value, and the slope of the line connecting the text block A and the central point is smaller than the circumscribed rectangle of the slope threshold value; the Euclidean distance threshold and the slope threshold can be set according to actual needs. In the embodiment of the application, the external rectangles, of which the Euclidean distance from the central point of the text block A and the slope of the central point connecting line are smaller than the threshold value, in the residual external rectangles are used as the adjacent text blocks of the text block A, when the adjacent text blocks of the text block A are selected, the Euclidean distance from the central point is not only considered, but also the slope of the central point connecting line is considered, because the detected text blocks are short or long, the distance between the two long text blocks A and B is possibly longer than the Euclidean distance from the central point of the text block C below the text block A and A, on the basis of the Euclidean distance from the central point, the slope of the central point connecting line is also considered, the upper and lower relations between the text blocks can be avoided, and the selected adjacent text blocks are more accurate.
And step C, respectively generating corresponding adjacent text block sets according to adjacent text blocks corresponding to the text blocks.
Here, a text block a is taken as an example for explanation, any text block a has at least one adjacent text block, the adjacent text blocks are put into a set, an adjacent text block set corresponding to the text block a is generated, and one text block corresponds to one adjacent text block set. And C, repeating the step B until all the text blocks are traversed.
And D, determining a target text line according to the adjacent text block set.
Optionally, in an embodiment of the present application, when determining a target text line according to a set of adjacent text blocks in step D, taking any one of the text blocks as a target text block, may include: determining a left text block which is positioned at the left side of the target text block and has a distance with the target text block smaller than a set left distance threshold value in the adjacent text block set; determining a right text block which is positioned at the right side of the target text block and has a distance with the target text block smaller than a set right distance threshold value in the adjacent text block set; and determining a target text line according to the left text block and/or the right text block and the target text block.
Here, a text block a is taken as an example for explanation, two text blocks closest to the left and right sides of the text block a, that is, a left text block and a right text block, are searched from a set of adjacent text blocks of the text block a, and if there is no text block that satisfies the condition, the text block is empty. When two text blocks are the nearest text blocks, the two text blocks are considered as the adjacent text blocks, an example is listed to explain the adjacent text blocks, when the adjacent text block set of the text block A is detected, the text block B is obtained as the right text block of the text block A, meanwhile, when the adjacent text block set of the text block B is detected, the text block A is obtained as the left text block of the text block B, the text block A and the text block B are the adjacent text blocks, namely, the right text block of the text block A is B, and the left text block of the text block B is the text block A. And repeating the steps until the nearest adjacent text block of all the text blocks is found, wherein the nearest adjacent text block comprises a left text block and a right text block.
Combining all adjacent text sets with overlapped parts, and taking an example to describe the adjacent text block set, where text block a and text block B are adjacent text blocks, and text block B and text block C are adjacent text blocks, then text block ABC constitutes an adjacent text set, and the text blocks in the adjacent text set are in the same text line, so that multiple adjacent text sets can be obtained, and then a text line rectangular box is obtained according to circumscribed rectangles between characters in the adjacent text sets, so as to obtain a target text line, as shown in fig. 4, fig. 4 is a text line detection result provided by the embodiment of the present application.
In some application scenarios, the answering area is an area reserved in advance by the answering area mark as a place where a user needs to answer, for example, a place where a student needs to write an answer in objective questions such as a blank filling question and a choice question, the answering area mark may be a bracket, a horizontal line, a square frame, a mark with other shapes, or a combination of the above marks, for example, a bracket and a horizontal line.
And S102, determining a corresponding answering area in the test image according to the determined answering area mark and a handwritten text area included in a target text area detected from the test image.
In this embodiment, the objective question portion in the test question includes a handwritten text and a printed text, so that the detection of the target text region in the test question image may be performed before step S102, and optionally, the target text region in the test question image may be detected specifically according to a preset target text region detection model, where the target text region includes a handwritten text region and a printed text region. The target text region detection model in the embodiment of the present application is a pre-trained Mask RCNN model, may also be a FasterR-CNN model, and may also be a yolo v3 model, which is not limited in the embodiment of the present application, and here, the Mask RCNN model is taken as an example for an exemplary description, as shown in fig. 5, fig. 5 is a structural diagram of a Mask RCNN model provided in the embodiment of the present application, the Mask RCNN model includes a backbone network conv, and the roipooling refers to cutting out a part of a feature map on a test image, then readjusting the part to a fixed size, and using a roiign method to sample different points of the feature map and apply a linear interpolation to output a class box of a handwritten text region and a printed text region. Inputting the test image into a Mask RCNN model, and outputting a boundary box (bounding box) of a handwritten text region and a printed text region after model detection.
When the answering area in the test question image is determined according to the handwritten text area and the answering area mark, the method may specifically include: and constructing an extended handwritten text area according to the determined position of the mark of the answering area, and determining a corresponding answering area in the test image according to the constructed extended handwritten text area and the handwritten text area included in the target text area detected from the test image.
The Mask RCNN model can be adapted to handwritten text detection in most scenes, a test question image is input into the Mask RCNN model to obtain a handwritten text area, and if the handwritten text area is input into the text recognition model to obtain the content of a answering area, but under the scenes that the handwritten handwriting is similar to the printed handwriting, the color of the handwritten text is close to that of the printed text, the distance between the handwritten text and the printed text is short and the like, the Mask RCNN model cannot completely detect all the handwritten text areas in the image. Therefore, in order to prevent the content in the answering area from being incomplete due to the fact that the Mask RCNN model misses the misjudgment caused by the detection of the handwritten text area, the method and the device take the extended handwritten text area constructed according to the mark position of the answering area into consideration, combine the handwritten text area and the extended handwritten text area to obtain the answering area of the test question image, and guarantee the integrity of the answering area. In the embodiment of the present application, the answer area in the objective question generally appears on the horizontal line and in the parentheses, but of course, the answer area may also be located in the figures such as the square frame, and the embodiment of the present application is not limited thereto.
Here, in some application scenarios, it may be omitted to construct an augmented handwritten text area according to the position of the specified answer area mark, and the corresponding answer area in the test question image may be determined directly according to the handwritten text area included in the target text area detected from the test question image.
Here, two examples are given to describe how to specifically construct the augmented handwritten text region, and in the first example, if the answering area mark is a horizontal line, a rectangle is constructed by using the o point in the image coordinate system xoy as the coordinate origin, the starting point of the horizontal line as the lower left point, the length of the horizontal line as the width, and the height of a single text line as the height, the rectangle is a handwritten answer frame augmented by the horizontal line, and the handwritten answer frame is used as the augmented handwritten text region. As a second example, if the answer area mark is a pair of parentheses, an extended handwritten text area is constructed by using the positions of the pair of parentheses, taking the o point in the image coordinate system xoy as the coordinate origin, adding a preset offset to the detected x value of the left bracket in the pair of parentheses as the x value of the vertex of the upper left corner of the handwritten text area, wherein y is 0, the distance between the pair of parentheses plus the preset offset is taken as the width, and the height of a single text line is taken as the height to construct a rectangle which is an extended handwritten answer frame based on the pair of parentheses, that is, the extended handwritten text area, and the offset can be set according to the actual situation; in the process of actually filling out the answer, sometimes the answer is written outside the parentheses, so that the distance between the pair of parentheses plus the offset is taken as the width of the rectangle, and the content in the pair of parentheses can be expanded to supplement the content written outside the parentheses. When the corresponding answering area in the test image is determined according to the handwritten text area and the expanded handwritten text area, the handwritten text written outside cannot be missed, and the integrity of the answering area is ensured.
When determining a corresponding answering area in the test question image according to the constructed extended handwritten text area and the constructed handwritten text area, optionally, in an embodiment of the present application, determining a coincidence area between the constructed extended handwritten text area and the constructed handwritten text area; according to the overlapping area, carrying out de-duplication processing on the constructed expanded handwritten text area and the constructed handwritten text area to obtain a handwritten area set; and determining a corresponding answering area in the test question image according to the handwriting area set.
And combining the constructed expanded handwritten text area and the constructed handwritten text area to be used as a response area set, and traversing the response area set. Detecting a handwritten text area according to a preset model, possibly omitting some handwritten texts, wherein the expanded handwritten text area is expanded based on a mark of a response area, repeated texts exist between the handwritten text area and the expanded handwritten text area, and performing deduplication on a combined response area set, namely performing deduplication according to a coincidence area between the handwritten text area and the expanded handwritten text area, namely removing texts in the coincidence area to obtain a handwritten area set. The handwriting area contains all the handwriting texts in a centralized manner, so that the obtained handwriting text area is more comprehensive, and the integrity of the answering area is ensured.
Optionally, when the handwriting area set is obtained, in an embodiment of the present application, merging the extended handwritten text area and the handwritten text area corresponding to the overlap area larger than the set overlap area threshold, so as to perform de-duplication on the constructed extended handwritten text area and the handwritten text area included in the target text area detected from the test image, and obtain the handwriting area set.
The method comprises the steps that a handwritten text area is expanded and combined with a handwritten text area to obtain a set of answering areas, the set of answering areas is traversed, if the coincidence area of two answering areas is larger than a set coincidence area threshold value, the two answering areas are combined, and after the text in the coincidence area is removed, the obtained handwritten text area is used as a corresponding answering area in a test question image; if the coincidence area of the two answering areas is smaller than the set coincidence area threshold, no processing is carried out, and the two answering areas are directly used as corresponding answering areas in the test question image, so that the obtained handwritten text area is more comprehensive, and the integrity of the answering areas is ensured. Furthermore, the offset is added when the rectangle is constructed according to the extended handwritten text constructed by the horizontal lines or the brackets, and the printed text is possibly extended, so that the printed text added during the extension is filtered after the handwritten text area detected according to the preset model is combined with the extended area.
Example II,
The step S101 determines the answering area label in the test question image according to the target text line detected from the test question image, and can be implemented by the following method.
Optionally, in an embodiment of the present application, a horizontal straight line score of each pixel in the test question image is determined, a horizontal line is determined according to a pixel point of which the horizontal straight line score is greater than a horizontal threshold, and when the length of the horizontal line is greater than a set horizontal line threshold and the horizontal line is located in a lower half portion of a corresponding target text line, the horizontal line is determined as an answer area mark in the test question image.
Traversing the test question image, and calculating the horizontal straight line score of each pixel in a manner shown as a formula II:
F [x,y] =max(P [x,y] ,F [x-1,y] +P [x,y] ,F [x-1,y-1] +α×P [x-1,y-1] ,F [x-1,y+1] +α×P [x-1,y+1] ) Equation two
In formula two, F [ x, y ] is the horizontal pixel score of pixel (x, y), P [ x, y ] is the pixel value of pixel (x, y), F [ x-1, y ] is the horizontal pixel score of pixel (x-1, y), F [ x-1, y-1] is the horizontal pixel score of pixel (x-1, y-1), P [ x-1, y-1] is the pixel value of pixel (x-1, y-1), F [ x-1, y +1] is the horizontal pixel score of pixel (x-1, y + 1), P [ x-1, y +1] is the pixel value of the pixel (x-1, y + 1), and alpha is a penalty coefficient, if F [ x, y ] is greater than P [ x, y ], the corresponding pixel point ((x-1, y), (x-1, y-1), (x-1, y + 1)) is the prepositive pixel point of the pixel (x, y)) according to the horizontal straight line score record, otherwise, the prepositive pixel of the pixel (x, y) is blank. The penalty coefficient can be set according to actual conditions, generally, if the transverse line is not horizontal, the inclination is more, the penalty coefficient alpha is smaller, upper and lower pixel points need to be considered, if the transverse line is horizontal, excessive upper and lower pixel points do not need to be considered, and the penalty coefficient is larger.
Optionally, in an embodiment of the present application, determining the horizontal straight line score of each pixel in the test question image may include: determining a horizontal straight line score for each pixel in the printed text region; therefore, a horizontal line is obtained according to the horizontal straight line score of the pixel, the length of the horizontal line is larger than a set threshold value, and the horizontal line positioned at the lower half part of the target text line is determined as the mark of the answering area.
It should be noted that, the student's handwritten answer often will intersect with the horizontal lines used for making answer area marks, if only find whether there is a horizontal line in the text area of printing, may miss some horizontal lines, therefore, this application selects the pixel point that the horizontal straight line score is greater than the threshold value of settlement through detecting the horizontal straight line score of every pixel in the examination question image, traverse the leading pixel point by pixel point, until leading pixel point is empty, so far, just obtain a horizontal line, if the length of horizontal line is greater than the threshold value of settlement, and the position of horizontal line is located the latter half of target text line, then this horizontal line is for making answer area marks. In one implementation, the answer area may be found by comparing the test question image with a fixed matching template, which may be a blank template that has not been filled with handwritten text after the original printing; in another implementation, some additional marks of auxiliary classes may be added to the position of the answering area in the test question image, where the additional marks are special marks other than printed text, for example, one may be manually added by a human; according to the method, the prepositive pixel points are traversed pixel point by pixel point through a method for calculating the horizontal straight line score of each pixel in the test question image, so that a transverse line is obtained and is used as the answering area mark, the position of the answering area can be found without a fixed matching template for the test question image, the test question image with any layout and any type can be aimed at, the position of the answering area is more convenient to extract, and the efficiency of positioning the answering area position in the test question image is improved.
Optionally, in an embodiment of the present application, the test question image is subjected to distortion correction processing according to a boundary of a printed text region included in a target text region detected from the test question image, so as to obtain a distortion corrected image, and a horizontal straight line score of each pixel in the distortion corrected image is calculated. Wherein the warping correction process includes at least one of a perspective transformation process and an affine transformation process.
In the shooting scene, the initial image obtained by the shooting device has distortion and deformation due to uncertainty of the shooting scene and angle or bending of the shot paper, and the distortion is caused by three-dimensional image distortion, such as concave or convex test paper. The present application is not limited to the implementation of the distortion correction processing on the test question image through perspective transformation, affine transformation, or the like, and it is within the scope of the present application as long as the distortion correction processing on the test question image through projective transformation, or the like. The affine transformation is the transformation of coordinates in a two-dimensional plane, is the transformation of an image based on 3 fixed vertexes, and keeps the straightness and the parallelism of a two-dimensional graph, wherein the straightness refers to the fact that the transformation is a straight line and is also a straight line after the transformation, and the parallelism refers to the fact that the relative position relation between the two-dimensional graphs keeps unchanged. The perspective transformation is three-dimensional space coordinate transformation, is transformation of an image based on 4 fixed vertices, and changes the parallel relation between straight lines as compared with affine transformation. The affine transformation and the perspective transformation may be both used to perform the warping correction process on the test question image, and the warping correction process on the test question image is exemplified by the perspective transformation, but it is to be understood that the test question image may be processed by other transformations such as warping correction, and the embodiments of the present application are not limited thereto. The perspective transformation maps the current image to another plane in a projection mode, the coordinate of a group of 4 points of the distorted image and the coordinate of a group of 4 points of the target image are required to be obtained for correcting the distorted image by the perspective transformation, a transformation matrix of the perspective transformation can be calculated by two groups of coordinate points, and then the transformation of the transformation matrix is executed on the whole original image, so that the image correction can be realized.
The target text area obtained in the step S101 includes a handwritten text area and a printed text area, in the present application, a boundary of the printed text area is obtained according to the printed text area, and since the printed area in the objective topic job image occupies most of the printed area, the boundary of the printed text area can be considered approximately as the text area of the whole topic image, four corners of the text area are used as correction points, a transformation matrix of perspective transformation is calculated, and image deformation caused by a shooting angle is corrected by using the perspective transformation. The perspective transformation formula is shown as formula three:
Figure GDA0003840958550000131
in the formula III, (x, y, z) respectively represent the coordinates of the pixel points after perspective transformation, and (u, v, l) represent the coordinates of the pixel points before perspective transformation. Since the test question image is a two-dimensional plane, l is a constant 1, aij is a parameter of the transformation matrix, wherein i and j both take values of 1,2,3. The coordinate of the pixel point on the image after perspective transformation needs to be converted into a two-dimensional coordinate, and a calculation formula is shown as a formula IV:
Figure GDA0003840958550000132
the three-dimensional coordinates (X, Y, Z) of the pixel points in formula four need to be divided by Z, Z 'is a constant 1, and since the test question image is a two-dimensional plane, the default value of Z' is 1.
According to the method and the device, the test question image is subjected to distortion correction processing according to the boundary of the printed text area, so that the phenomenon that the test question image has distortion due to the uncertainty of a shooting scene and an angle or the bending of a shot paper can be eliminated, the calculation result of the horizontal straight line score in the distortion correction image is not influenced by the distortion of the image, and the corresponding horizontal line in the test question image can be more accurately obtained.
Optionally, in an embodiment of the present application, a connected component in a target text row is detected, an area of the connected component is calculated, and a table in the test question image is filtered according to the area of the connected component.
It should be noted that the test question image or the distortion correction image often includes tables, etc., and if horizontal lines are obtained by calculating horizontal line scores in the image, the horizontal lines of these tables are also easily obtained, and the tables herein also include drawing lines of auxiliary types such as binding lines. The cross line used for marking the answering area in the application is located in the lower half part of the corresponding target text line, the scribing line of the table type is generally located in the middle position, cannot be located in the lower half part of the target text line and is different from the cross line in the application, and therefore the scribing line of the table type is filtered through the position of the scribing line and the area of the connected domain where the scribing line is located in the embodiment of the application. For example, in the embodiment of the present application, the target text line image obtained in step S102 is subjected to binarization processing to obtain B (x, y), and the binarized image B (x, y) is subjected to morphological dilation. And detecting the connected domain, and filtering out the connected domain with the area larger than a preset threshold value by combining the position of the connected domain, wherein the threshold value can be set according to the actual condition, so that the influence of the boundary of the table in the test question image or the distortion correction image on the detection of the transverse line can be reduced, and the efficiency of detecting the transverse line in the test question image or the distortion correction image is improved.
Optionally, in an embodiment of the present application, after filtering the table in the target text line, when calculating the horizontal straight line score of each pixel in the target text line, determining a horizontal line according to a pixel point where the horizontal straight line score is greater than the horizontal threshold, and when the length of the horizontal line is greater than the set horizontal line threshold and the horizontal line is located in the lower half portion of the corresponding target text line, determining the horizontal line as the answer area mark in the test question image, because the target text line obtained by detecting the text block of the test question image may include a table-like drawn line, a binding line, and the like, by filtering the table in the target text line, traversing the horizontal straight line score of each pixel in the target text line after filtering the table, the influence of the table boundary on detecting the horizontal line may be reduced, and the efficiency of detecting the horizontal line in the target text line is improved.
Example III,
Step S101 determines the answering area label in the test question image according to the target text line detected from the test question image, which can be implemented by the following method. Optionally, in an embodiment of the present application, the printed text line is determined according to the target text line detected from the test question image and the printed text area included in the target text area detected from the test question image, and the printed text line is subjected to character recognition according to a preset text detection model, so as to obtain a printed text character string.
The method includes the steps that a target text area detected from a test question image comprises a printed text area and a handwritten text area, the target text line detected from the test question image is a single text line image, the printed area of each line in the test question image can be obtained according to the printed text area and the target text line, the printed area of each line is input into a text detection model, the text detection model is not limited to a pre-trained printing recognition model in the embodiment of the application, and after the printed text is recognized through the printing recognition model, a printed text recognition result, namely a printed text character string, is output.
After the obtained printed text character string, whether the printed text character string includes a bracket pair or not can be detected to detect the position of the answering area in the test question image, and the test question can be divided according to the printed text character string.
Here, the detection of the position of the answer area in the test question image based on whether or not the parenthesis pair is included in the printed text character string will be described.
Optionally, in an embodiment of the present application, a pair of parentheses in the printed text character string is determined, and when a length of the character string in the pair of parentheses is smaller than a preset character string length threshold, the pair of parentheses is determined as the answer area mark in the question image.
And searching whether a pair of brackets exists in the character string of the printed text recognition result, and if the pair of brackets is included in the printed text character string and the length of the character in the brackets is smaller than a set threshold value, determining the pair of brackets as the answering area mark in the test question image.
Example four,
After the obtained printed text string, the question division can be performed according to the printed text string. It should be noted here that, after the topic classification, whether the printed text character string includes the parenthesis pair may be detected, or after whether the printed text character string includes the parenthesis pair is detected, the topic classification may be performed, which is not required in order, and the application is not limited to this.
Furthermore, the topic in the test question image may be divided before the positioning answer area or after the positioning answer area, and there is no precedence requirement.
Optionally, in an embodiment of the present application, when a ratio of a length of a longest common sub-string corresponding to the printed text string and the question stem in the printed text string is greater than a ratio threshold of the longest common sub-string, and/or a ratio of a longest common sub-sequence corresponding to the printed text string and the question stem in the printed text string is greater than a ratio threshold of the longest common sub-sequence, the printed text strings are divided into the same question.
In the embodiment of the application, the obtained printed text character strings are matched with question stem information stored in a question bank one by one, the matching measurement standard is divided into the longest common substring matching degree and the longest common subsequence matching degree, and the calculation formulas are respectively shown as formula five and formula six:
Figure GDA0003840958550000151
Figure GDA0003840958550000152
l in formula five substring L in formula six represents the longest common substring length subsequence Represents the longest common subsequence length, L all Representing the length of the printed text string. The proportion of the length of the longest common substring between the printed text string and the subject in the printed text string is taken as the matching degree M of the longest common substring substring The proportion of the longest common subsequence between the printed text character string and the title in the printed text character string is used as the matching degree Msub sequence of the longest common subsequence, and the proportion threshold of the longest common subsequence can be set according to actual conditions. The longest common substring represents the number of consistent continuous characters in the two character strings, and the longest common substring represents the number of consistent characters in the two character strings. In the embodiment of the present application, if the matching degree of the most common long sub-string and the matching degree of the longest common sub-sequence of the two character strings are both greater than the set threshold, the line text is considered to be matched with the question stem information stored in the question bank, and it can be understood that the topic classification can also be performed according to the matching degree of the most common long sub-string, or the topic classification can also be performed according to the matching degree of the longest common sub-sequence, which is not limited in this application.
In one embodiment of the application, the topic is divided by the following steps, step a, matching the obtained printing text character strings with topic stem information stored in a topic library one by one, wherein the printing text character strings are distributed in a text line; step B, if the single text row is not matched with the question stem information of a question in the question bank, selecting the next text row of the text row to match in the question bank, repeating the step A, and if no next text row exists, ending the question division; if the single text line is matched with the question stem information of a question in the question bank, combining the next text line of the text line with the current text line and then continuously matching with the matched question; if the matching degrees of the longest subsequence and the longest substring are both larger than a set threshold value, repeating the step B; and repeating the step A and the step B until all the text lines are matched. According to the embodiment of the application, the printed text character strings are matched with the question stem information in the question bank, all text lines in one question image are divided into different question ranges without the help of a fixed template of the question image, so that the whole question dividing process is simple, and other operations are not needed.
Furthermore, after the topic division is finished, the standard answers can be obtained according to the topics in the topic division database, so that the answering area can be corrected according to the standard answers later. According to the embodiment of the application, the standard answers are obtained according to the questions in the question bank after the questions are divided, the standard answers are not set in advance, and the operation flow of correcting the test questions is simplified.
Examples V,
The examination questions can be corrected after the answer area is obtained according to any one of the above embodiments. When the test questions are corrected, the candidate answers are obtained according to the answering area, and then the candidate answers are corrected according to the standard answers of the questions obtained in the fourth embodiment.
Optionally, in an embodiment of the present application, a handwritten text is recognized in the response area to obtain a handwritten text character string; candidate answers are determined from the handwritten text strings.
And inputting the answering area into a handwriting recognition model to recognize the handwritten character string, so as to obtain the handwritten text character string. The handwriting recognition model is not limited in the embodiment of the application, as long as the output recognition result is the handwritten text character string, the handwritten text character string may include some complex characters or character strings which are not candidate answers in the calculation process, and the like, so that the character strings which are not candidate answers are filtered out from the handwritten text character string to determine the candidate answers.
Optionally, in an embodiment of the present application, when determining a candidate answer according to a handwritten text character string, the method may further include, according to a set character string filtering rule, performing at least one of the following filtering processes on the handwritten text character string to obtain the candidate answer: filtering a handwritten text character string which comprises a plurality of complex characters and has a length smaller than a first preset length; filtering a handwritten text character string which comprises a plurality of calculation symbols and has a length smaller than a second preset length; filtering out handwritten text character strings of which the distance between the position where the handwritten text character string is located and the text printing area is larger than a set text area distance threshold value; filtering out handwritten text character strings of which the previous line and the next line are both handwritten text character strings and the coincidence length of the handwritten text character strings of the previous line and the handwritten text character strings of the next line on the transverse axis is greater than a first coincidence length threshold value; and filtering the corresponding calculation process of the handwritten text character string in the previous line, wherein the coincidence length of the calculation process corresponding to the handwritten text character string in the previous line and the handwritten text character string on the horizontal axis is greater than a second coincidence length threshold value.
The handwritten text character strings meeting the following rules are filtered out, so as to obtain candidate answers, so that character strings irrelevant to the calculation process are not included in the candidate answers, and the accuracy of the candidate answers is improved, wherein the rules can be, but are not limited to the following rules: the length of the handwritten text character string is short and contains a plurality of complex characters (accounting for more than 50% of the length of the text), such as @, _, &, #, $ and the like; the length of the handwritten text character string is within 20, and the handwritten text character string comprises a plurality of calculation symbols, such as:,/=, and the like; generally, the candidate answer is located at a position not far away from the question, and when the position of the handwritten text character string in the test question image is more than a set distance threshold from the question position (i.e. the position of the text printing area), the distance threshold can be set according to the actual situation, and the part of the handwritten text character string may not be a true answer; the upper line and the lower line of the handwritten text character string only contain handwritten texts and do not contain printed texts, the coincidence length of the handwritten text character string in the two lines on the horizontal axis (x axis) is larger than a first coincidence length threshold value, and the handwritten text character string in the part is possibly a calculation process; the previous line of the handwritten text character string is determined as a calculation process, and the coincidence length of the handwritten text character string on the horizontal axis (i.e. the x axis) in the two lines is greater than the second coincidence length threshold, which may be a calculation process, where the first coincidence length threshold and the second coincidence length threshold are names used for distinguishing two different thresholds and do not represent a precedence order, and the first coincidence length threshold and the second coincidence length threshold may be set according to an actual situation, which is not limited in the embodiment of the present application.
Optionally, in an embodiment of the present application, when determining a candidate answer according to a handwritten text character string, the method may further include, when a distance between two characters in the handwritten text character string is greater than a product of a preset distance and a length coefficient, separating the handwritten text character string according to a position center point of the two characters to obtain the candidate answer, where the length coefficient is set according to a length of the handwritten text character string.
In some application scenes, a plurality of objective question types connected together in a blank filling mode exist, when detection is carried out according to a preset model, handwritten text areas on a plurality of blank filling questions are usually detected as the same handwritten text area, candidate answers are not accurate, and in order to prevent the influence of the situation on correction, a self-adaptive answer separation algorithm is designed based on the positions of handwritten text character strings in a handwritten recognition result.
When the distance between two characters in the handwritten text character string is larger than the product of the preset distance and the length coefficient, namely the distance between the two characters is larger, the two characters belong to two candidate answers, and the two characters need to be separated, so that the two candidate answers are obtained. The situation that a plurality of handwritten text areas are used as a candidate answer when a plurality of blank filling questions are connected together can be prevented, and the accuracy of the candidate answer is further improved.
It can be understood that in other application scenarios, the student may compare and disperse several character writings belonging to the same candidate answer, and optionally, in an embodiment of the present application, after obtaining the position of the handwritten text region according to a preset model MaskRCNN, the distance between every two handwritten text regions is further calculated, and if the distance between two handwritten text regions is smaller than a preset distance threshold, the two handwritten text regions are merged to serve as one handwritten text region. According to the embodiment of the application, the positions of the handwritten text areas are obtained according to the preset model MaskRCNN, the two handwritten text areas with close distances are combined, characters which are written in a scattered manner can be connected together and used as the handwritten text area, then candidate answers are obtained according to the combined handwritten text area, the situation that a plurality of characters which belong to the same candidate answer are written in a scattered manner by students can be prevented, the handwritten text areas are multiple, and the accuracy of the candidate answers is further improved.
Optionally, in an embodiment of the present application, a distance between every two characters in the handwritten text character string is calculated, an average value of the distances between the characters is calculated according to the distance between every two characters in the handwritten text character string, and the average value of the distances is used as a preset distance.
The writing habits of each student are different, gaps among characters are different from person to person, so that the distances among character strings of the obtained recognition results are different, all distance values are averaged by calculating the distance between every two characters in all handwritten text character strings, the obtained distance average value is more in line with the writing habits of the students, and the accuracy of the separated candidate answers is improved.
Further, after the candidate answer is obtained, the candidate answer may also be modified by using a standard answer in the question during the topic segmentation, and optionally, in an embodiment of the present application, a candidate answer sequence is determined according to a position of the candidate answer in the test question image; and modifying the candidate answers according to the candidate answer sequence and the standard answers of the questions.
In the application, the candidate answers are ranked according to the number of lines according to the positions of the candidate answers in the test question image, and the answers in the same line are ranked according to the order of left and right to obtain a candidate answer sequence. And matching and correcting the candidate answer sequence according to the sequence of the standard answers, wherein the correcting rule in the embodiment of the application is as follows: searching the answer in the standard answer sequence from the candidate answer sequence, if the answer exists in the candidate answer sequence, considering the candidate answer as the correct answer, recording the answer and the position of the answer in the candidate answer sequence, searching whether the next candidate answer exists in the standard answer sequence from the position, if not, considering the candidate answer as the wrong answer, searching whether the next answer exists in the standard answer sequence from the position of the last correct candidate answer, and if not, searching the first candidate answer in the candidate answer sequence. For example, the correct answer is ABCE, but the recognition result of the handwritten text in the answering area is ABE, considering that the correct answer is lacking C, but E in the candidate answer sequence is the correct answer, and the correct answer is corrected in the candidate answer sequence manner, so that the method can be applied to multiple choice questions or objective questions with diversified answers, thereby avoiding correction errors and misjudgments, and improving the accuracy of correction. It can be understood that the present application may also correct the ranking questions, the answers of the ranking questions have a sequence requirement, and the candidate answer sequence is corrected according to the sequence of the candidate answer sequence and the standard answers and the corresponding correction rule. It should be noted that the modification rule is not limited to this, and any rule that modifies the candidate answers according to the candidate answer sequence and the standard answers to the topics is within the scope of the present application.
Further, in the embodiment of the present application, how to implement the correction of the test paper according to a test question image is exemplarily described, optionally, as shown in fig. 6, fig. 6 provides a flowchart for correcting the test question image for the embodiment of the present application.
Step S601, inputting an initial image in a TextSnake model, wherein the initial image can be presented in a picture mode;
step S602, detecting a text region in the initial image by using a TextSnake model to obtain an initial text region;
step S603, correcting the rotation angle of the initial image based on the image of the initial text area to obtain a test question image;
step S604, separating a handwritten text area from a printed text area of the test question image by using a Mask Rcnn model;
step S605, performing distortion correction processing on the test question image to obtain a distortion corrected image;
step S606, detecting text blocks in the test question image or the distortion correction image to obtain a text line image;
step S607, combining the text printing area obtained in step S604 to perform character recognition on the text line image to obtain a text printing character string;
step S608, searching questions in a question bank according to the printed text character strings, and dividing the questions;
step S609, positioning a answering area and obtaining a candidate answer, optionally traversing the test question image or the distortion correction image to obtain an answering transverse line in the image, or detecting a bracket pair in a printed text character string, expanding a handwritten text area by the positions of the transverse line and the bracket pair, combining the handwritten text area obtained in the step S604 to obtain the answering area, and taking the identification result meeting the preset rule in the answering area as the candidate answer;
and S610, modifying the candidate answers according to the standard answers of the questions in the question bank, optionally, sequencing the candidate answers according to the positions of the candidate answers and the line number, sequencing the answers in the same line in the order of from left to right, and then matching and modifying the candidate answer sequence according to the order of the standard answers.
According to the embodiment of the application, the rotation angle of the image is corrected, then the separation of a handwritten text area and a printed text area is realized on the rotation corrected test question image according to a preset model, a text block in the test question image is detected to obtain a text line image, the text line image is subjected to text recognition by combining the printed text area, the recognition result is a printed text character string, the segmentation is carried out according to the printed text character string, a transverse line of an answer in the question is detected, or a pair of brackets in the printed text character string is detected to obtain an answer area mark, then the handwritten text area and the handwritten text area expanded by the answer area mark are combined to obtain a candidate answer, and the test paper is corrected according to the candidate answer and a standard answer in the question. In the method, when the answer area is positioned, no specific format requirement is required on the test question image, and a matched fixed template is not needed, so that the answer area positioning method can be suitable for the test question image of a free layout, and the efficiency of positioning the answer area is improved; when the test question image is divided, an additional auxiliary mark is not needed, so that the problem dividing efficiency is improved; in addition, when the test question image is corrected, the candidate answer sequence is matched and corrected according to the sequence of the standard answers, and the method can be applied to multi-choice question sequencing questions or objective questions with diversified answers, so that correction errors and misjudgments are avoided, and the correction accuracy is improved.
EXAMPLE six
The embodiment of the present application provides a device for positioning a response area in a test question image, as shown in fig. 7, fig. 7 is a device for positioning a response area in a test question image provided by the embodiment of the present application, and the device 70 includes a response area mark determining module 701 and a response area determining module 702;
the answer region mark determining module 701, the answer region mark determining module 701 is configured to determine an answer region mark in the test question image according to the target text line detected from the test question image;
a response area determining module 702, where the response area determining module 702 is configured to determine a corresponding response area in the test image according to the determined response area mark and a handwritten text area included in a target text area detected from the test image.
Examples seven,
Based on the method for positioning the answer area in the test question image described in any one of the first to fifth embodiments, the embodiment of the present application provides an electronic device, and it should be noted that the method for positioning the answer area in the test question image of the present embodiment may be executed by any appropriate electronic device with data processing capability, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc. As shown in fig. 8, fig. 8 is a structural diagram of an electronic device according to an embodiment of the present disclosure. The specific embodiments of the present application do not limit the specific implementation of the electronic device. The electronic device may include: a processor (processor) 802, a communication Interface 804, a memory 806, and a communication bus 808.
Wherein: the processor 802, communication interface 804, and memory 806 communicate with each other via a communication bus 808.
A communication interface 804 for communicating with other electronic devices or servers.
The processor 802 is configured to execute the computer program 810, and may specifically execute the relevant steps in the embodiment of the method for locating the answering area in the question image.
In particular, the computer program 810 may comprise computer program code comprising computer operating instructions.
The processor 802 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present Application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
A memory 806 for storing a computer program 810. The memory 806 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The computer program 810 may be specifically adapted to cause the processor 802 to perform the following operations:
in an optional implementation manner, the computer program 810 is further configured to enable the specific implementation of each step in the computer program 810 by the processor 802 to refer to the corresponding description in the corresponding step and unit in the embodiment of the method for positioning an answer area in a test question image, which is not described herein again. It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
Example eight,
Based on the method for locating the answer area in the test question image described in the first to fifth embodiments, an embodiment of the present application provides a computer storage medium storing a computer program, and the computer program is executed by a processor to implement the method described in the first to fifth embodiments.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.
The above-described methods according to the embodiments of the present application may be implemented in hardware, firmware, or as software or computer code that may be stored in a recording medium such as a CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code downloaded through a network, originally stored in a remote recording medium or a non-transitory machine-readable medium, and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the answer area location method in the test question image described herein. Further, when a general-purpose computer accesses code for implementing the answering area positioning method in the test question image shown here, execution of the code converts the general-purpose computer into a special-purpose computer for executing the answering area positioning method in the test question image shown here.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims (23)

1. A method for positioning answering areas in test question images is characterized by comprising the following steps:
detecting a text region in the shot initial image according to a preset initial text region detection model to obtain an initial text region;
masking the initial image according to the initial text area to obtain an initial text image;
determining at least one rotation angle according to the connected domain detected from the initial text image;
carrying out angle correction on pixel points in the initial text image according to the average value of the at least one rotation angle to obtain a test question image;
determining a answering area mark in the test question image according to a target text line detected from the test question image;
and determining a corresponding answering area in the test question image according to the determined answering area mark and a handwritten text area included in a target text area detected from the test question image.
2. The method according to claim 1, wherein the determining the corresponding answering area in the test question image according to the determined answering area mark and a handwritten text area included in a target text area detected from the test question image comprises:
constructing an expanded handwritten text area according to the determined position of the mark of the answering area;
and determining a corresponding answering area in the test question image according to the constructed extended handwritten text area and the handwritten text area included in the target text area detected from the test question image.
3. The method according to claim 2, wherein the determining the corresponding answering area in the test question image according to the constructed augmented handwritten text area and the handwritten text area included in the target text area detected from the test question image comprises:
determining a coincidence area between the constructed expanded handwritten text area and the handwritten text area;
according to the overlapping area, carrying out duplication elimination processing on the constructed expanded handwritten text area and the constructed handwritten text area to obtain a handwritten area set;
and determining a corresponding answering area in the test question image according to the handwriting area set.
4. The method according to claim 3, wherein the performing de-duplication processing on the constructed augmented handwritten text region and a handwritten text region included in the target text region detected from the test question image according to the overlapping area to obtain a set of handwritten regions comprises:
and combining the extended handwritten text area and the handwritten text area corresponding to the superposition area larger than the set superposition area threshold value so as to perform de-duplication processing on the constructed extended handwritten text area and the handwritten text area included in the target text area detected from the test question image to obtain a handwritten area set.
5. The method according to claim 1, wherein determining the answering area label in the test question image according to the target text line detected from the test question image comprises:
determining a horizontal straight line score of each pixel in the test question image;
determining a horizontal line according to the pixel points with the horizontal straight line scores larger than the horizontal threshold;
and when the length of the transverse line is greater than a set transverse line threshold value and the transverse line is positioned at the lower half part of the corresponding target text line, determining the transverse line as a response area mark in the test question image.
6. The method of claim 5, wherein determining the horizontal straight line score for each pixel in the test question image comprises:
according to the boundary of a printed text area included in a target text area detected from the test question image, performing distortion correction processing on the test question image to obtain a distortion corrected image;
a horizontal line score is calculated for each pixel in the warp corrected image.
7. The method of claim 6, wherein the warp correction process comprises at least one of a perspective transformation process and an affine transformation process.
8. The method of claim 5 or 6, wherein prior to determining the horizontal straight line score for each pixel in the test question image, the method further comprises:
detecting connected domains in the target text line;
calculating the area of the connected domain;
and filtering the table in the test question image according to the area of the connected domain.
9. The method of claim 1, further comprising:
determining a printing text line according to a target text line detected from the test question image and a printing text area included in the target text area detected from the test question image;
and performing character recognition on the printed text line according to a preset text detection model to obtain a printed text character string.
10. The method of claim 9, wherein determining the corresponding response area in the test question image comprises:
determining a pair of parentheses in the printed text string;
and when the length of the character string in the bracket pair is smaller than a preset character string length threshold value, determining the bracket pair as the answering area mark in the test question image.
11. The method of claim 1, further comprising: determining a target text line in the test question image;
the determining of the target text line in the test question image comprises the following steps:
determining a plurality of text blocks according to the connected domain detected from the test question image;
detecting adjacent text blocks corresponding to any text block;
respectively generating corresponding adjacent text block sets according to adjacent text blocks corresponding to the text blocks;
and determining the target text line according to the adjacent text block set.
12. The method of claim 11, wherein detecting adjacent text blocks corresponding to any of the text blocks comprises:
and detecting adjacent text blocks corresponding to any text block according to at least one of Euclidean distances among the center points of the text blocks and the slope of a connecting line between the center points.
13. The method according to claim 11 or 12, wherein determining the target text line according to the set of adjacent text blocks by using any one of the text blocks as a target text block comprises:
determining a left text block which is positioned at the left side of the target text block and has a distance with the target text block smaller than a set left distance threshold value in the adjacent text block set;
determining a right text block which is positioned at the right side of the target text block and has a distance with the target text block smaller than a set right distance threshold value in the adjacent text block set;
and determining the target text line according to the left text block and/or the right text block and the target text block.
14. The method of claim 1, further comprising:
performing handwritten text recognition on the answering area to obtain a handwritten text character string;
and determining candidate answers according to the handwritten text character strings.
15. The method of claim 14, wherein determining candidate answers from the handwritten text string comprises:
according to a set character string filtering rule, at least one of the following filtering processing is carried out on the handwritten text character string to obtain the candidate answer:
filtering the handwritten text character string which comprises a plurality of complex characters and is shorter than a first preset length;
filtering the handwritten text character strings which comprise a plurality of calculation symbols and are smaller than a second preset length;
filtering the handwritten text character strings of which the distance between the position and a printed text area included in a target text area detected by the test question image is greater than a set text area distance threshold value;
filtering the handwritten text character strings of which the upper line and the lower line are both handwritten text character strings and the coincidence length of the upper line of handwritten text character strings and the lower line of handwritten text character strings on the horizontal axis is larger than a first coincidence length threshold value;
filtering out the corresponding calculation process of the handwritten text character string in the previous line, wherein the coincidence length of the calculation process corresponding to the handwritten text character string in the previous line and the handwritten text character string on the horizontal axis is larger than a second coincidence length threshold value.
16. The method of claim 14, wherein determining candidate answers from the handwritten text string comprises:
and when the distance between two characters in the handwritten text character string is greater than the product of a preset distance and a length coefficient, separating the handwritten text character string according to the position center points of the two characters to obtain the candidate answer, wherein the length coefficient is set according to the length of the handwritten text character string.
17. The method of claim 16, further comprising:
calculating the distance between every two characters in the handwritten text character string;
and calculating an average distance value between characters according to the distance between every two characters in the handwritten text character string, and taking the average distance value as the preset distance.
18. The method of claim 9, further comprising:
and when the length of the longest public substring corresponding to the printed text string and the question stem is greater than the ratio threshold of the longest public substring in the printed text string and/or the ratio of the printed text string to the longest public subsequence corresponding to the question stem is greater than the ratio threshold of the longest public subsequence in the printed text string, dividing the printed text string into the same question.
19. The method according to any one of claims 14-18, further comprising:
determining a candidate answer sequence according to the position of the candidate answer in the test question image;
and modifying the candidate answers according to the candidate answer sequence and the standard answers of the questions.
20. The method of claim 1, further comprising:
and detecting a target text area in the test question image according to a preset target text area detection model, wherein the target text area comprises the handwritten text area and a printed text area.
21. An apparatus for locating a response area in a test question image, the apparatus comprising:
the image acquisition module is used for detecting a text region in the shot initial image according to a preset initial text region detection model to obtain an initial text region; masking the initial image according to the initial text area to obtain an initial text image; determining at least one rotation angle according to the connected components detected from the initial text image; carrying out angle correction on pixel points in the initial text image according to the average value of the at least one rotation angle to obtain a test question image;
the answer area mark determining module is used for determining an answer area mark in the test question image according to the target text line detected from the test question image;
and the answer area determining module is used for determining a corresponding answer area in the test image according to the determined answer area mark and a handwritten text area included in a target text area detected from the test image.
22. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the positioning method of the answering area in the test question image according to any one of claims 1 to 20.
23. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements a method of locating a response area in a test question image as claimed in any one of claims 1 to 20.
CN202010300296.4A 2020-04-16 2020-04-16 Method and device for positioning answering area in test question image, electronic equipment and computer storage medium Active CN111507251B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010300296.4A CN111507251B (en) 2020-04-16 2020-04-16 Method and device for positioning answering area in test question image, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010300296.4A CN111507251B (en) 2020-04-16 2020-04-16 Method and device for positioning answering area in test question image, electronic equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN111507251A CN111507251A (en) 2020-08-07
CN111507251B true CN111507251B (en) 2022-10-21

Family

ID=71864805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010300296.4A Active CN111507251B (en) 2020-04-16 2020-04-16 Method and device for positioning answering area in test question image, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN111507251B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052851A (en) * 2020-09-04 2020-12-08 北京字节跳动网络技术有限公司 Answer determination method, answer determination device, computer readable medium and electronic equipment
CN112036343B (en) * 2020-09-04 2022-02-08 北京字节跳动网络技术有限公司 Answer extraction method and device, computer readable medium and electronic equipment
CN112181239B (en) * 2020-09-30 2022-03-11 深圳市快易典教育科技有限公司 Interaction method and system based on electronic test paper and computer equipment
CN111931018B (en) * 2020-10-14 2021-02-02 北京世纪好未来教育科技有限公司 Test question matching and splitting method and device and computer storage medium
CN112434641A (en) * 2020-12-10 2021-03-02 成都市精卫鸟科技有限责任公司 Test question image processing method, device, equipment and medium
CN112613402A (en) * 2020-12-22 2021-04-06 金蝶软件(中国)有限公司 Text region detection method, text region detection device, computer equipment and storage medium
CN112700413B (en) * 2020-12-30 2022-12-09 广东德诚大数据科技有限公司 Answer sheet abnormity detection method and device, electronic equipment and storage medium
CN112598000A (en) * 2021-03-03 2021-04-02 北京世纪好未来教育科技有限公司 Question identification method and device, electronic equipment and computer storage medium
CN112949621A (en) * 2021-03-16 2021-06-11 新东方教育科技集团有限公司 Method and device for marking test paper answering area, storage medium and electronic equipment
CN112712070A (en) * 2021-03-26 2021-04-27 北京世纪好未来教育科技有限公司 Question judging method and device for bead calculation questions, electronic equipment and storage medium
CN113505660A (en) * 2021-06-22 2021-10-15 上海工程技术大学 Paper engineering drawing operation reading device and method thereof
CN113255613B (en) * 2021-07-06 2021-09-24 北京世纪好未来教育科技有限公司 Question judging method and device and computer storage medium
CN113326815B (en) * 2021-07-12 2022-05-06 北京世纪好未来教育科技有限公司 Document processing method and device, electronic equipment and readable storage medium
CN113688809B (en) * 2021-10-26 2022-02-18 北京世纪好未来教育科技有限公司 Model training method, text removal method and related device
CN115439871A (en) * 2022-09-13 2022-12-06 北京航星永志科技有限公司 Automatic file acquisition method and device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798321A (en) * 2017-12-04 2018-03-13 海南云江科技有限公司 A kind of examination paper analysis method and computing device
CN110008933A (en) * 2019-04-18 2019-07-12 江苏曲速教育科技有限公司 A kind of universal intelligent marking system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6173154B1 (en) * 1997-07-31 2001-01-09 The Psychological Corporation System and method for imaging test answer sheets having open-ended questions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798321A (en) * 2017-12-04 2018-03-13 海南云江科技有限公司 A kind of examination paper analysis method and computing device
CN110008933A (en) * 2019-04-18 2019-07-12 江苏曲速教育科技有限公司 A kind of universal intelligent marking system and method

Also Published As

Publication number Publication date
CN111507251A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN111507251B (en) Method and device for positioning answering area in test question image, electronic equipment and computer storage medium
CN107798321B (en) Test paper analysis method and computing device
CN108304814B (en) Method for constructing character type detection model and computing equipment
CN109948510B (en) Document image instance segmentation method and device
US6778703B1 (en) Form recognition using reference areas
JP5522408B2 (en) Pattern recognition device
WO2018233055A1 (en) Method and apparatus for entering policy information, computer device and storage medium
CN108416345B (en) Answer sheet area identification method and computing device
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN112541922A (en) Test paper layout segmentation method based on digital image, electronic equipment and storage medium
US6947596B2 (en) Character recognition method, program and recording medium
CN112446262A (en) Text analysis method, text analysis device, text analysis terminal and computer-readable storage medium
JP2008234291A (en) Character recognition device and character recognition method
CN114419632A (en) OCR training sample generation method, device and system
CN112991410A (en) Text image registration method, electronic equipment and storage medium thereof
JP3558493B2 (en) Paper alignment device, paper alignment method, and computer-readable recording medium recording paper alignment program
US11699294B2 (en) Optical character recognition of documents having non-coplanar regions
JP2009223612A (en) Image recognition device and program
CN111027521B (en) Text processing method and system, data processing device and storage medium
CN114529922A (en) Method for identifying table structure of wireless frame table image
JP4936250B2 (en) Write extraction method, write extraction apparatus, and write extraction program
CN114241486A (en) Method for improving accuracy rate of identifying student information of test paper
CN109685074B (en) Bank card number row positioning method based on Scharr operator
JP4648084B2 (en) Symbol recognition method and apparatus
CN111582264B (en) Method, device and system for accurate frame questions, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant