CN110175616A

CN110175616A - A kind of paper image answer extraction method based on color

Info

Publication number: CN110175616A
Application number: CN201910406972.3A
Authority: CN
Inventors: 赵海峰; 欧阳广庆; 肖蓉
Original assignee: Nanjing Qingfeng And Intelligent Technology Co Ltd
Current assignee: Nanjing Qingfeng And Intelligent Technology Co Ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2019-08-27

Abstract

The present invention relates to a kind of paper image answer extraction method based on color, including step 1, according to paper topic classification, select different colours answers questions in writing topic；Step 2 obtains the digital picture of paper, and digital picture is converted to HSV space from RGB color, by the component threshold value of each color, obtains the color region containing each color, setting gray value obtains bianry image；Bianry image is found out all connected domains and corresponding boundary rectangle by chain code following mode by step 3；Step 4 finds out the maximum value of boundary rectangle, it is text filed accordingly to obtain each color to the boundary rectangle of same color.The present invention can complete different types of paper printed page analysis, effectively extracted according to color component to answer different types of in paper region, meet the technical requirements of paper image complexity and higher precision.

Description

A kind of paper image answer extraction method based on color

Technical field

The present invention relates to technical field of image processing more particularly to a kind of paper image answer based on color to automatically extract Method.

Background technique

Papery paper is automatically analyzed, needs first to be scanned paper, obtains paper image, then pass through image The method of processing analyzes paper.In examination paper analysis, it is necessary first to carry out printed page analysis to paper, that is, tell paper In each component part, the score in examination question region, answer region and paper including paper fills in region, examinee information Region etc..Effective extract to these regions is the subsequent basis for carrying out paper content analysis.

It include that Page Segmentation, text block identification and printed page understanding are several for printed page analysis in field of image processing Process.Page Segmentation is that document is divided into relatively independent different zones according to certain logical relation.In turn, subsequent Region recognition after segmentation can be picture, paragraph, table etc. by text block identification.

Under the conditions of current technology, classical Page Segmentation method can be divided into stratification and non-hierarchical method.It is non-hierarchical Method obtains pre-segmentation result by being split to original image.Then it again on the basis of pre-segmentation result, carries out feature and mentions It takes, to obtain more accurate effect.Such methods segmentation precision with higher, but algorithm complexity is higher, is not easy In real-time processing.Such as blank background Page Segmentation method and Page Segmentation method based on texture.

Hierarchical method handles document layout according to certain level as its name suggests.Bottom-up approach is by using office Portion's feature models file and picture, obtains the zonule of document, and then constantly merge to obtain the region of entire document.The party Method has good effect for document detail feature, is suitble to the more complicated space of a whole page.However its computation complexity is high, wants to equipment Ask higher.Top-down method then needs the priori knowledge by document, models to the overall distribution of document, thus To each logic region of document.This method speed is fast, however bad for complicated space of a whole page effect.In addition, it is desirable to more elder generation Knowledge is tested, this point is often difficult to obtain in practice.Such as the Page Segmentation method based on context analyzer, need known text The global shape of block.

Under the conditions of current technology, no matter which kind of method is used, both for specific file structure, without any one Kind method is capable of handling all document cases.Therefore, the accuracy rate of document analysis can not all accomplish that 100% is correct.In needle To in the printed page analysis of paper, since different paper typesettings differs widely, meanwhile, different times and different user take pictures and The complicated multiplicity of scanning, the printed page analysis difficulty of paper image increase, and classic algorithm can not be often satisfied with for a variety of papers Effect.Therefore, the present invention provides the printed page analysis method of another thinking, effectively classic algorithm can be overcome for difference The Problem of Failure of paper printed page analysis is suitable for a variety of different scenes.

Summary of the invention

The purpose of the present invention is to provide a kind of paper image answer extraction method based on color, is mentioned by color Method is taken, answer different types of in paper region is effectively extracted.

To achieve the above object, technical scheme is as follows:

A kind of paper image answer extraction method based on color, includes the following steps:

Step 1, according to paper topic classification, select different colours answers questions in writing topic；

Step 2 obtains the digital picture of paper, digital picture is converted to HSV space from RGB color, by every The component threshold value of kind color, obtains the color region containing each color, and setting gray value obtains bianry image；

Bianry image is found out all connected domains and corresponding boundary rectangle by chain code following mode by step 3；

Step 4 finds out the maximum value of boundary rectangle to all boundary rectangles, obtains the corresponding text area of each color Domain.

In the step 1, topic is answered questions in writing using red in paper objective item part, and paper subjective item part is answered questions in writing using green Topic, paper visuals use blue pen answer.

In the step 2, red area selects the tonal range value of H=[0,8] and [130,180], obtains red color area The bianry image in domain；Green area selects the tonal range value of H=[40,80], obtains the bianry image of green area；Blue The tonal range value of H=[100,124] is selected in region, obtains the bianry image of blue region.

Paper image answer extraction method based on color of the invention, can complete the different types of paper space of a whole page Analysis, effectively extracts answer different types of in paper region according to color component, meets paper image complexity and more High-precision technical requirements.Meanwhile paper topic types being classified based on color, be conducive to the construction of comparison process classifier.

Detailed description of the invention

Fig. 1 is the flow chart of the paper image answer extraction method in one embodiment of the invention based on color.

Specific embodiment

Technical solution of the present invention is described in further detail with reference to the accompanying drawings and examples.

A kind of paper image answer extraction method based on color of the invention, as shown in Figure 1, including following step It is rapid:

The bianry image of regions of different colours is found out all connected domain and phase by chain code following mode by step 3 The boundary rectangle answered；

Step 4 finds out the maximum value of boundary rectangle, it is literary accordingly to obtain each color to the boundary rectangle of same color One's respective area.

By taking specific answer paper answer automatically extracts as an example

During answer, topic, including multiple-choice question, gap-filling questions and judgement are answered questions in writing using red to objective item part in paper Topic；Subjective item part topic, including question-and-answer problem and theme are answered questions in writing using green；The drafting of figure uses blue pen.

Paper to be paved, is taken pictures using camera and obtains the digital picture of papery paper, specific color extraction method is, Digital picture is converted into HSV space from RGB color, by extracting the H component threshold value of red, green, blue, extracts phase The color region answered.

For the red component of HSV space, using the value in H=[0,8] and [130,180] range, obtained red The bianry image in region；Green area selects the tonal range value of H=[40,80], obtains the bianry image of green area；It is blue The tonal range value of H=[100,124] is selected in color region, obtains the bianry image of blue region.For the two of each color It is worth image and the boundary rectangle of all connected domain and its corresponding color is found out by chain code following mode.

Finally, the maximum value of boundary rectangle is found out to the boundary rectangle of same color, to obtain every piece of phase of same color Even text filed, finally obtains the text filed of corresponding color classification.

Paper image answer extraction method based on color of the invention, can complete the different types of paper space of a whole page Analysis, effectively extracts answer different types of in paper region according to color component, meets paper image complexity and more High-precision technical requirements.Meanwhile paper topic types being classified based on color, be conducive to the classification of comparison process classification design Device avoids a piece of paper volume one classifier of design, and process is complicated, at high cost, difficulty is big.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that the foregoing is merely a specific embodiment of the invention, the guarantor that is not intended to limit the present invention Range is protected, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all be contained in this hair Within bright protection scope.

Claims

1. a kind of paper image answer extraction method based on color, which comprises the steps of:

Step 2 obtains the digital picture of paper, and digital picture is converted to HSV space from RGB color, passes through every kind of face The component threshold value of color, obtains the color region containing each color, and setting gray value obtains bianry image；

The bianry image of each color by chain code following mode, is found out all connected domain and corresponding external by step 3 Rectangle；

Step 4 finds out the maximum value of boundary rectangle to the boundary rectangle of same color, obtains the corresponding text area of each color Domain.

2. the paper image answer extraction method according to claim 1 based on color, it is characterised in that: step 1 In, topic is answered questions in writing using red in paper objective item part, and topic is answered questions in writing using green in paper subjective item part, and paper visuals uses Blue pen answer.

3. the paper image answer extraction method according to claim 2 based on color, it is characterised in that: step 2 In, red area selects the tonal range value of H=[0,8] and [130,180], obtains the bianry image of red area；Green The tonal range value of H=[40,80] is selected in region, obtains the bianry image of green area；Blue region selection H=[100, 124] tonal range value, obtains the bianry image of blue region.