CN111079742A - Method for accurately positioning text block of text area image in scanning test paper - Google Patents

Method for accurately positioning text block of text area image in scanning test paper Download PDF

Info

Publication number
CN111079742A
CN111079742A CN201911218917.8A CN201911218917A CN111079742A CN 111079742 A CN111079742 A CN 111079742A CN 201911218917 A CN201911218917 A CN 201911218917A CN 111079742 A CN111079742 A CN 111079742A
Authority
CN
China
Prior art keywords
scanning
image
rectangular
small
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911218917.8A
Other languages
Chinese (zh)
Inventor
侯冲
程建
陈家海
叶家鸣
吴波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Seven Day Education Technology Co ltd
Original Assignee
Anhui Seven Day Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Seven Day Education Technology Co ltd filed Critical Anhui Seven Day Education Technology Co ltd
Priority to CN201911218917.8A priority Critical patent/CN111079742A/en
Publication of CN111079742A publication Critical patent/CN111079742A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a method for accurately positioning text blocks in composition areas in test paper scanned images, and relates to the field of image positioning. Aiming at the problem of accurately positioning text blocks in a composition area of a test paper scanning image, a positioning method for scanning from a peripheral frame of the image to the middle of the image by a rectangular frame with a certain width and step length, predicting the number of small handwriting images in the rectangular scanning image and judging whether the scanning image is a text block boundary is provided. The method comprises four parts of cyclic scanning segmentation, rectangular re-segmentation, small graph prediction and boundary positioning, wherein the small graph prediction part needs to construct 9 layers of convolution networks and train a model to predict whether the small graph contains handwritten characters. The cyclic scanning part scans from four directions respectively, each cyclic scanning needs to segment a scanning rectangle once, and a prediction model is called to predict once until the rectangular scanning image in the direction is judged to be a boundary, so that the text block is accurately positioned.

Description

Method for accurately positioning text block of text area image in scanning test paper
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method for accurately positioning text blocks of a text area image in a scanned test paper.
Background
In recent years, internet technology has already influenced the aspects of people's life, and the development of internet technology from online shopping, meal ordering, ticket ordering to map navigation and face recognition card punching is a result. In the education field, various network paper marking tools are also gradually sold to the market. The scoring tool method generally divides the answer sheet into question type modules, and then processes each module. The composition correcting module which is the most laborious and troublesome in correcting becomes the object of attention of people. The general solution of the composition module is to recognize and process the composition, but one outstanding problem is that the composition image cut by the marking method is often the whole composition answering area planned on the answer sheet, and the student answering part only occupies one part, so that the image is directly recognized, which causes more errors and consumption of calculated amount. If the student answering content area can be accurately intercepted in one step, the whole composition scoring module is greatly helped.
The existing image positioning method is mainly based on searching, images with different sizes and different positions are drawn in the images, and then whether the images are needed images is judged by utilizing an image classification method. The main problem of this method is that the size and position of the captured image are unknown, each image needs to be judged by drawing thousands of small images, and many repeated parts in the drawn image are repeatedly calculated, consuming many calculation resources and time cost.
Recently, the deep learning technology defeats the traditional processing method in the image field, and the convolution mode of the deep learning technology is more creative for extracting the local visual field information of the image for the subsequent network training simulation. However, the direct convolution training of a large image still requires a large computational resource, and therefore, it is more important to manually set a computational flow more specifically meeting the business requirements for the characteristics of positioning the composition image seed text blocks.
The invention combines the service demand with the research theory, greatly reduces the actual computing resource, and provides a solution for not only the precise positioning of the composition text block, but also a solution for the relevant application scene.
Disclosure of Invention
The technical problem to be solved is as follows:
the method solves the problem that the text blocks in the cut composition images cannot be accurately positioned, and provides the method for accurately positioning the text blocks in the composition areas in the test paper scanning images.
(II) technical scheme
To achieve the above object, the following conclusions can be drawn after investigation and experiments: 1, the text blocks in the composition image are all a whole, and the boundary position of the text blocks can be judged by scanning from the periphery to the inside; 2 whether an image contains handwriting or not, the image can be firstly cut into smaller images, each small image is subjected to convolution classification, and whether the handwriting is contained or not is judged according to the number ratio of the small images containing the handwriting. Based on the conclusion, the invention provides the following technical scheme: a method for accurately positioning text blocks in composition areas in test paper scanned images mainly comprises four parts of cyclic scanning segmentation, rectangular scanning image re-segmentation, small image prediction and boundary positioning.
Preferably, the cyclic scan segmentation is specifically described as: go on from four directions outline respectively toward the centre, the scanogram is respectively: the scanning size from the left and right sides to the middle is 32 × h, the scanning size from the upper and lower sides to the middle is w × 32, and the scanning moving step length is 16. The rectangular scanning image in a certain direction is predicted to contain handwritten characters, and the scanning in the direction is stopped.
Preferably, the re-cutting of the cut pattern is specifically described as: the rectangular scan is further sliced to (w//32 or h//32) 32 x 32 equilateral rectangles.
Preferably, the model prediction small graph category is specifically described as: scaling each equilateral rectangle (32 x 32) to 224 x 224 size, building 9-layer convolutional networks, training two-class models, and predicting whether the scaled 224 x 224 image contains handwriting.
Preferably, the positioning of the four boundaries is specifically described as: when the scanning of the four directions of the composition image is stopped, rectangular scanning graphs of the four directions are output, and the coordinates of the middle position of the directions correspond to the rectangular scanning graphs of the four directions.
A method for accurately positioning text blocks in composition areas in test paper scanning images comprises the following specific steps:
step one, collecting data: preparing 300 cut composition answer area images, wherein the images contain various answer paper types of application scenes as much as possible;
step two, data cutting: cutting out small pictures with the length and the width of 32 from the peripheral frames inwards by using 16 steps, and scaling the small pictures to 224 × 224 for storage;
step three, data annotation: performing two-class labeling on the zoomed small images in the second step, wherein the small images are classified into one class with the handwritten text and one class without the handwritten text (the partial images are mostly blank or contain print font and patterns), and selecting about 1000 small images respectively;
step four, model training: constructing a 9-layer convolutional network structure, and training a two-classification model by using the data labeled in the step three;
step five, scanning and judging: deploying the classification model into services, and constructing the following processing flow:
(1) composition images are transmitted;
(2) scanning inwards in a row or column unit along one direction to obtain a rectangular scanning image;
(3) then, the rectangular scanning graph is cut to obtain 32 × 32 small graphs;
(4) inputting the scaled small images into model prediction;
(5) when the number of the handwriting contained in the predicted small graph exceeds a threshold value, the scanning of the direction can be stopped, and if the number of the handwriting contained in the predicted small graph does not exceed the threshold value, the processes from (2) to (5) are repeated;
step six, accurate positioning: and coordinates in the middle of each direction of the rectangular scanning image with the four directions stopped last serve as boundary coordinates for accurately determining the four directions of the text block.
(III) advantageous effects
The invention provides a method for accurately positioning text blocks in composition areas in a test paper scanning image, which has the following beneficial effects: the method mainly solves the problem of accurate positioning of the text blocks in the text making area in the test paper scanning image, and creatively provides a method for scanning and searching the boundaries of the text blocks from four directions to the peripheral frame inwards according to the scene characteristics. The positioning problem solved by the complex deep network structure of the application is replaced by the 9-layer convolution model and the image scanning, on one hand, the calculation resources are greatly reduced, the running time is consumed, on the other hand, the text positioning boundary position is accurate to the error within 16 pixels, and the actual application requirements are met.
Drawings
FIG. 1 is an overall schematic view of the present invention;
FIG. 2 is a schematic view of a single direction cyclic scan of a composition image according to the present invention;
FIG. 3 is a schematic diagram of a rectangular scanogram re-segmentation and prediction in the present invention;
FIG. 4 is a 9-layer convolutional network structure for predicting whether a small graph contains handwritten characters in the present invention;
FIG. 5 is a diagram of the effect of the present invention on accurate positioning
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples. The embodiments described herein are merely illustrative and are not intended to be limiting.
As shown in fig. 1, the present invention is a technical solution: a method for accurately positioning text blocks in composition areas in test paper scanned images mainly performs positioning in a mode that scanning from the periphery of the images to the middle of the images meets the text blocks and stops.
And (3) cyclic scanning segmentation of the image: the image is scanned from the outer frame to the middle from four directions of the image, the scanning size from the left direction to the right direction to the middle is 32 × h, the scanning size from the upper direction to the middle is w × 32, and the scanning moving step length is 16. When the rectangular scanning image in a certain direction judges that the handwritten characters are contained, the scanning in the direction is stopped. Fig. 2 is a schematic diagram of scanning from top to bottom, in this example, scanning is performed twice (two dashed boxes), a rectangular scan of the second scanning includes a handwritten font, that is, scanning in the direction is stopped, each direction is not affected, and a coordinate of a middle position of a last rectangular scan of the stopping direction is a boundary coordinate of the text block in the direction.
Rectangular scan re-segmentation and minimap prediction: the rectangular scan is further sliced into (w//32 or h//32) 32 x 32 equilateral rectangles, each of which is scaled to 224 x 224 size, and a 9-layer convolution network model is trained to predict whether the 224 x 224 rectangular graph contains handwriting, as shown in fig. 3. The 9-layer convolution network structure takes two layers of 3 × 3 convolution plus one layer of 1 × 1 convolution and one pooling layer as one block, three blocks are stacked to form 9 layers of convolution, the number of convolution kernels of each block is different and is respectively 64, 128 and 256, the specific network structure is shown in fig. 4, and other settings are as follows:
(1) and learning rate: 0.1, attenuation ratio of 0.9;
(2) and an optimizer: adagarad;
(3)、batch:64;
(4)、epoch:100;
positioning a boundary: if the ratio of the number of the handwriting contained in the small pictures cut out from the rectangular scanning picture in a certain direction to the total number of the small pictures cut out from the rectangular scanning picture exceeds a threshold value, the circular scanning in the direction is stopped, otherwise, the circular scanning is continued, as shown in fig. 3. When the scanning is stopped in all four directions, the last rectangular scanning image in the four directions corresponds to the middle position coordinates in the directions, i.e., the four boundary coordinates of the text block, and fig. 5 is a schematic outline diagram of the four boundary coordinates.
A method for accurately positioning text blocks in composition areas in test paper scanning images comprises the following specific steps:
step one, collecting data: preparing 300 pieces of composition answer area images (containing various answer paper types of application scenes as much as possible) which are cut out;
step two, data cutting: cutting out small pictures with the length and the width of 32 from the peripheral frames inwards by using 16 steps, and scaling the small pictures to 224 × 224 for storage;
step three, data annotation: classifying the zoomed small images in the second step into two classes, namely classifying the small images into one class with handwritten texts and classifying the small images into one class without the handwritten texts (the partial images are mostly blank or contain print type characters and patterns), and respectively selecting about 1000 small images;
step four, model training: constructing a 9-layer convolutional network structure, and training a two-classification model by using the data labeled in the step three;
step five, scanning and judging: deploying the classification model into services, and constructing the following processing flow:
(1) composition images are transmitted;
(2) scanning inwards in a row or column unit along one direction to obtain a rectangular scanning image;
(3) then, the rectangular scanning graph is cut to obtain 32 × 32 small graphs;
(4) inputting the scaled small images into model prediction;
(5) when the number of the handwriting contained in the predicted small graph exceeds a threshold value, the scanning of the direction can be stopped, and if the number of the handwriting contained in the predicted small graph does not exceed the threshold value, the processes from (2) to (5) are repeated;
step six, accurate positioning: and coordinates in the middle of each direction of the rectangular scanning image with the four directions stopped last serve as boundary coordinates for accurately determining the four directions of the text block.
The invention provides a simple and quick text block positioning method aiming at an application scene, converts the complicated deep learning positioning problem into the simple scanning technology and the shallow network classification problem, greatly reduces the technical difficulty and the implementation threshold, eliminates the interference for the text recognition work, and lays a foundation for improving the recognition accuracy.
The above description is only for the purpose of explanation and should not be construed as limiting the invention, but rather as the subject matter of any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention.

Claims (5)

1. A method for accurately positioning text blocks in composition areas in a test paper scanning image is characterized in that a rectangular frame is used for scanning from the periphery of the image to the middle of the image, and the number of small handwriting images in a rectangular scanning image is predicted to judge whether the scanning image is a text block boundary. The method comprises four parts of composition image cyclic scanning segmentation, rectangular scanogram re-segmentation, small image prediction and boundary positioning.
2. The method for accurately positioning the text blocks of the composition areas in the scanned test paper image according to claim 1, wherein the cyclic scanning segmentation of the image is specifically described as follows: scanning from the periphery to the middle of the composition image along four directions respectively, wherein the scanning sizes are as follows: the scanning size from the left direction to the right direction to the middle is 32 × h, the scanning size from the upper direction to the middle direction is w × 32, and the scanning moving step length is 16. When the rectangular scanning image in a certain direction judges that the handwritten characters are contained, the scanning in the direction is stopped.
3. The method of claim 1, wherein the rectangular scan image recut and the thumbnail prediction are specifically described as follows: the rectangular scan is further sliced into (w//32 or h//32) 32 x 32 equilateral rectangles, each equilateral rectangle is scaled to 224 x 224 size, and a 9-layer convolution network model is trained to predict whether the 224 x 224 rectangular graph contains handwriting.
4. The method for accurately positioning the text block of the composition area in the test paper scanned image according to claim 1, wherein the positioning boundary is specifically described as: the number of handwriting contained in a small picture cut out from a rectangular scan exceeds a threshold value, and the scanning in the direction is stopped. And outputting the coordinate of the middle position corresponding to the direction when the scanning of the four directions is stopped.
5. A method for accurately positioning text blocks in composition areas in a test paper scanning image is characterized by comprising the following specific steps:
step one, collecting data: preparing 300 pieces of composition answer area images (containing various answer paper types of application scenes as much as possible) which are cut out;
step two, data cutting: cutting out small pictures with the length and the width of 32 from the peripheral frames inwards by using 16 steps, and scaling the small pictures to 224 × 224 for storage;
step three, data annotation: performing two-class labeling on the zoomed small images in the second step, wherein the small images are classified into one class with the handwritten text and one class without the handwritten text (the partial images are mostly blank or contain print font and patterns), and selecting about 1000 small images respectively;
step four, model training: constructing a 9-layer convolutional network structure, and training a two-classification model by using the data labeled in the step three;
step five, scanning and judging: deploying the classification model into services, and constructing the following processing flow:
(1) composition images are transmitted;
(2) scanning inwards in a row or column unit along one direction to obtain a rectangular scanning image;
(3) then, the rectangular scanning graph is cut to obtain 32 × 32 small graphs;
(4) inputting the scaled small images into model prediction;
(5) when the number of the handwriting contained in the predicted small graph exceeds a threshold value, the scanning of the direction can be stopped, and if the number of the handwriting contained in the predicted small graph does not exceed the threshold value, the processes from (2) to (5) are repeated;
step six, accurate positioning: and coordinates in the middle of each direction of the rectangular scanning image with the four directions stopped last serve as boundary coordinates for accurately determining the four directions of the text block.
CN201911218917.8A 2019-11-29 2019-11-29 Method for accurately positioning text block of text area image in scanning test paper Withdrawn CN111079742A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911218917.8A CN111079742A (en) 2019-11-29 2019-11-29 Method for accurately positioning text block of text area image in scanning test paper

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911218917.8A CN111079742A (en) 2019-11-29 2019-11-29 Method for accurately positioning text block of text area image in scanning test paper

Publications (1)

Publication Number Publication Date
CN111079742A true CN111079742A (en) 2020-04-28

Family

ID=70312627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911218917.8A Withdrawn CN111079742A (en) 2019-11-29 2019-11-29 Method for accurately positioning text block of text area image in scanning test paper

Country Status (1)

Country Link
CN (1) CN111079742A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814789A (en) * 2020-07-15 2020-10-23 中国建设银行股份有限公司 Card number detection method, device, equipment and storage medium
CN112464931A (en) * 2020-11-06 2021-03-09 马上消费金融股份有限公司 Text detection method, model training method and related equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814789A (en) * 2020-07-15 2020-10-23 中国建设银行股份有限公司 Card number detection method, device, equipment and storage medium
CN112464931A (en) * 2020-11-06 2021-03-09 马上消费金融股份有限公司 Text detection method, model training method and related equipment
CN112464931B (en) * 2020-11-06 2021-07-30 马上消费金融股份有限公司 Text detection method, model training method and related equipment

Similar Documents

Publication Publication Date Title
CN102567300B (en) Picture document processing method and device
CN105989347B (en) Objective item intelligently reading method and system
CN100565559C (en) Image text location method and device based on connected component and support vector machine
CN111401353B (en) Method, device and equipment for identifying mathematical formula
CN107871101A (en) A kind of method for detecting human face and device
CN105426856A (en) Image table character identification method
CN109241861B (en) Mathematical formula identification method, device, equipment and storage medium
CN110598698B (en) Natural scene text detection method and system based on adaptive regional suggestion network
CN111242024A (en) Method and system for recognizing legends and characters in drawings based on machine learning
CN113537227B (en) Structured text recognition method and system
CN104636742B (en) A kind of method by imaging automatic lock onto target topic and transmitting
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN110516554A (en) A kind of more scene multi-font Chinese text detection recognition methods
CN114005123A (en) System and method for digitally reconstructing layout of print form text
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN112541922A (en) Test paper layout segmentation method based on digital image, electronic equipment and storage medium
CN111079742A (en) Method for accurately positioning text block of text area image in scanning test paper
CN115761773A (en) Deep learning-based in-image table identification method and system
CN114119949A (en) Method and system for generating enhanced text synthetic image
CN114863408A (en) Document content classification method, system, device and computer readable storage medium
CN114170423B (en) Image document layout identification method, device and system
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
CN114998905A (en) Method, device and equipment for verifying complex structured document content
CN114821620A (en) Text content extraction and identification method based on longitudinal combination of line text boxes
CN110991440A (en) Pixel-driven mobile phone operation interface text detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200428

WW01 Invention patent application withdrawn after publication