CN111079742A

CN111079742A - Method for accurately positioning text block of text area image in scanning test paper

Info

Publication number: CN111079742A
Application number: CN201911218917.8A
Authority: CN
Inventors: 侯冲; 程建; 陈家海; 叶家鸣; 吴波
Original assignee: Anhui Seven Day Education Technology Co ltd
Current assignee: Anhui Seven Day Education Technology Co ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2020-04-28

Abstract

The invention discloses a method for accurately positioning text blocks in composition areas in test paper scanned images, and relates to the field of image positioning. Aiming at the problem of accurately positioning text blocks in a composition area of a test paper scanning image, a positioning method for scanning from a peripheral frame of the image to the middle of the image by a rectangular frame with a certain width and step length, predicting the number of small handwriting images in the rectangular scanning image and judging whether the scanning image is a text block boundary is provided. The method comprises four parts of cyclic scanning segmentation, rectangular re-segmentation, small graph prediction and boundary positioning, wherein the small graph prediction part needs to construct 9 layers of convolution networks and train a model to predict whether the small graph contains handwritten characters. The cyclic scanning part scans from four directions respectively, each cyclic scanning needs to segment a scanning rectangle once, and a prediction model is called to predict once until the rectangular scanning image in the direction is judged to be a boundary, so that the text block is accurately positioned.

Description

Method for accurately positioning text block of text area image in scanning test paper

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a method for accurately positioning text blocks of a text area image in a scanned test paper.

Background

In recent years, internet technology has already influenced the aspects of people's life, and the development of internet technology from online shopping, meal ordering, ticket ordering to map navigation and face recognition card punching is a result. In the education field, various network paper marking tools are also gradually sold to the market. The scoring tool method generally divides the answer sheet into question type modules, and then processes each module. The composition correcting module which is the most laborious and troublesome in correcting becomes the object of attention of people. The general solution of the composition module is to recognize and process the composition, but one outstanding problem is that the composition image cut by the marking method is often the whole composition answering area planned on the answer sheet, and the student answering part only occupies one part, so that the image is directly recognized, which causes more errors and consumption of calculated amount. If the student answering content area can be accurately intercepted in one step, the whole composition scoring module is greatly helped.

The existing image positioning method is mainly based on searching, images with different sizes and different positions are drawn in the images, and then whether the images are needed images is judged by utilizing an image classification method. The main problem of this method is that the size and position of the captured image are unknown, each image needs to be judged by drawing thousands of small images, and many repeated parts in the drawn image are repeatedly calculated, consuming many calculation resources and time cost.

Recently, the deep learning technology defeats the traditional processing method in the image field, and the convolution mode of the deep learning technology is more creative for extracting the local visual field information of the image for the subsequent network training simulation. However, the direct convolution training of a large image still requires a large computational resource, and therefore, it is more important to manually set a computational flow more specifically meeting the business requirements for the characteristics of positioning the composition image seed text blocks.

The invention combines the service demand with the research theory, greatly reduces the actual computing resource, and provides a solution for not only the precise positioning of the composition text block, but also a solution for the relevant application scene.

Disclosure of Invention

The technical problem to be solved is as follows:

the method solves the problem that the text blocks in the cut composition images cannot be accurately positioned, and provides the method for accurately positioning the text blocks in the composition areas in the test paper scanning images.

(II) technical scheme

To achieve the above object, the following conclusions can be drawn after investigation and experiments: 1, the text blocks in the composition image are all a whole, and the boundary position of the text blocks can be judged by scanning from the periphery to the inside; 2 whether an image contains handwriting or not, the image can be firstly cut into smaller images, each small image is subjected to convolution classification, and whether the handwriting is contained or not is judged according to the number ratio of the small images containing the handwriting. Based on the conclusion, the invention provides the following technical scheme: a method for accurately positioning text blocks in composition areas in test paper scanned images mainly comprises four parts of cyclic scanning segmentation, rectangular scanning image re-segmentation, small image prediction and boundary positioning.

Preferably, the cyclic scan segmentation is specifically described as: go on from four directions outline respectively toward the centre, the scanogram is respectively: the scanning size from the left and right sides to the middle is 32 × h, the scanning size from the upper and lower sides to the middle is w × 32, and the scanning moving step length is 16. The rectangular scanning image in a certain direction is predicted to contain handwritten characters, and the scanning in the direction is stopped.

Preferably, the re-cutting of the cut pattern is specifically described as: the rectangular scan is further sliced to (w//32 or h//32) 32 x 32 equilateral rectangles.

Preferably, the model prediction small graph category is specifically described as: scaling each equilateral rectangle (32 x 32) to 224 x 224 size, building 9-layer convolutional networks, training two-class models, and predicting whether the scaled 224 x 224 image contains handwriting.

Preferably, the positioning of the four boundaries is specifically described as: when the scanning of the four directions of the composition image is stopped, rectangular scanning graphs of the four directions are output, and the coordinates of the middle position of the directions correspond to the rectangular scanning graphs of the four directions.

A method for accurately positioning text blocks in composition areas in test paper scanning images comprises the following specific steps:

step one, collecting data: preparing 300 cut composition answer area images, wherein the images contain various answer paper types of application scenes as much as possible;

step two, data cutting: cutting out small pictures with the length and the width of 32 from the peripheral frames inwards by using 16 steps, and scaling the small pictures to 224 × 224 for storage;

step three, data annotation: performing two-class labeling on the zoomed small images in the second step, wherein the small images are classified into one class with the handwritten text and one class without the handwritten text (the partial images are mostly blank or contain print font and patterns), and selecting about 1000 small images respectively;

step four, model training: constructing a 9-layer convolutional network structure, and training a two-classification model by using the data labeled in the step three;

step five, scanning and judging: deploying the classification model into services, and constructing the following processing flow:

(1) composition images are transmitted;

(2) scanning inwards in a row or column unit along one direction to obtain a rectangular scanning image;

(3) then, the rectangular scanning graph is cut to obtain 32 × 32 small graphs;

(4) inputting the scaled small images into model prediction;

(5) when the number of the handwriting contained in the predicted small graph exceeds a threshold value, the scanning of the direction can be stopped, and if the number of the handwriting contained in the predicted small graph does not exceed the threshold value, the processes from (2) to (5) are repeated;

step six, accurate positioning: and coordinates in the middle of each direction of the rectangular scanning image with the four directions stopped last serve as boundary coordinates for accurately determining the four directions of the text block.

(III) advantageous effects

The invention provides a method for accurately positioning text blocks in composition areas in a test paper scanning image, which has the following beneficial effects: the method mainly solves the problem of accurate positioning of the text blocks in the text making area in the test paper scanning image, and creatively provides a method for scanning and searching the boundaries of the text blocks from four directions to the peripheral frame inwards according to the scene characteristics. The positioning problem solved by the complex deep network structure of the application is replaced by the 9-layer convolution model and the image scanning, on one hand, the calculation resources are greatly reduced, the running time is consumed, on the other hand, the text positioning boundary position is accurate to the error within 16 pixels, and the actual application requirements are met.

Drawings

FIG. 1 is an overall schematic view of the present invention;

FIG. 2 is a schematic view of a single direction cyclic scan of a composition image according to the present invention;

FIG. 3 is a schematic diagram of a rectangular scanogram re-segmentation and prediction in the present invention;

FIG. 4 is a 9-layer convolutional network structure for predicting whether a small graph contains handwritten characters in the present invention;

FIG. 5 is a diagram of the effect of the present invention on accurate positioning

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. The embodiments described herein are merely illustrative and are not intended to be limiting.

As shown in fig. 1, the present invention is a technical solution: a method for accurately positioning text blocks in composition areas in test paper scanned images mainly performs positioning in a mode that scanning from the periphery of the images to the middle of the images meets the text blocks and stops.

And (3) cyclic scanning segmentation of the image: the image is scanned from the outer frame to the middle from four directions of the image, the scanning size from the left direction to the right direction to the middle is 32 × h, the scanning size from the upper direction to the middle is w × 32, and the scanning moving step length is 16. When the rectangular scanning image in a certain direction judges that the handwritten characters are contained, the scanning in the direction is stopped. Fig. 2 is a schematic diagram of scanning from top to bottom, in this example, scanning is performed twice (two dashed boxes), a rectangular scan of the second scanning includes a handwritten font, that is, scanning in the direction is stopped, each direction is not affected, and a coordinate of a middle position of a last rectangular scan of the stopping direction is a boundary coordinate of the text block in the direction.

Rectangular scan re-segmentation and minimap prediction: the rectangular scan is further sliced into (w//32 or h//32) 32 x 32 equilateral rectangles, each of which is scaled to 224 x 224 size, and a 9-layer convolution network model is trained to predict whether the 224 x 224 rectangular graph contains handwriting, as shown in fig. 3. The 9-layer convolution network structure takes two layers of 3 × 3 convolution plus one layer of 1 × 1 convolution and one pooling layer as one block, three blocks are stacked to form 9 layers of convolution, the number of convolution kernels of each block is different and is respectively 64, 128 and 256, the specific network structure is shown in fig. 4, and other settings are as follows:

(1) and learning rate: 0.1, attenuation ratio of 0.9;

(2) and an optimizer: adagarad;

(3)、batch:64；

(4)、epoch:100；

positioning a boundary: if the ratio of the number of the handwriting contained in the small pictures cut out from the rectangular scanning picture in a certain direction to the total number of the small pictures cut out from the rectangular scanning picture exceeds a threshold value, the circular scanning in the direction is stopped, otherwise, the circular scanning is continued, as shown in fig. 3. When the scanning is stopped in all four directions, the last rectangular scanning image in the four directions corresponds to the middle position coordinates in the directions, i.e., the four boundary coordinates of the text block, and fig. 5 is a schematic outline diagram of the four boundary coordinates.

step one, collecting data: preparing 300 pieces of composition answer area images (containing various answer paper types of application scenes as much as possible) which are cut out;

step three, data annotation: classifying the zoomed small images in the second step into two classes, namely classifying the small images into one class with handwritten texts and classifying the small images into one class without the handwritten texts (the partial images are mostly blank or contain print type characters and patterns), and respectively selecting about 1000 small images;

(1) composition images are transmitted;

(4) inputting the scaled small images into model prediction;

The invention provides a simple and quick text block positioning method aiming at an application scene, converts the complicated deep learning positioning problem into the simple scanning technology and the shallow network classification problem, greatly reduces the technical difficulty and the implementation threshold, eliminates the interference for the text recognition work, and lays a foundation for improving the recognition accuracy.

The above description is only for the purpose of explanation and should not be construed as limiting the invention, but rather as the subject matter of any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention.

Claims

1. A method for accurately positioning text blocks in composition areas in a test paper scanning image is characterized in that a rectangular frame is used for scanning from the periphery of the image to the middle of the image, and the number of small handwriting images in a rectangular scanning image is predicted to judge whether the scanning image is a text block boundary. The method comprises four parts of composition image cyclic scanning segmentation, rectangular scanogram re-segmentation, small image prediction and boundary positioning.

2. The method for accurately positioning the text blocks of the composition areas in the scanned test paper image according to claim 1, wherein the cyclic scanning segmentation of the image is specifically described as follows: scanning from the periphery to the middle of the composition image along four directions respectively, wherein the scanning sizes are as follows: the scanning size from the left direction to the right direction to the middle is 32 × h, the scanning size from the upper direction to the middle direction is w × 32, and the scanning moving step length is 16. When the rectangular scanning image in a certain direction judges that the handwritten characters are contained, the scanning in the direction is stopped.

3. The method of claim 1, wherein the rectangular scan image recut and the thumbnail prediction are specifically described as follows: the rectangular scan is further sliced into (w//32 or h//32) 32 x 32 equilateral rectangles, each equilateral rectangle is scaled to 224 x 224 size, and a 9-layer convolution network model is trained to predict whether the 224 x 224 rectangular graph contains handwriting.

4. The method for accurately positioning the text block of the composition area in the test paper scanned image according to claim 1, wherein the positioning boundary is specifically described as: the number of handwriting contained in a small picture cut out from a rectangular scan exceeds a threshold value, and the scanning in the direction is stopped. And outputting the coordinate of the middle position corresponding to the direction when the scanning of the four directions is stopped.

5. A method for accurately positioning text blocks in composition areas in a test paper scanning image is characterized by comprising the following specific steps:

(1) composition images are transmitted;

(4) inputting the scaled small images into model prediction;