WO2004084539A1

WO2004084539A1 - Fill-in document creation device and creation method, fill-in content extraction device and extraction method, fill-in document

Info

Publication number: WO2004084539A1
Application number: PCT/JP2004/003359
Authority: WO
Inventors: Akitoshi Tsukamoto; Yoshinari Endo; Masahiko Suzaki
Original assignee: Oki Electric Industry Co., Ltd.
Priority date: 2003-03-17
Filing date: 2004-03-12
Publication date: 2004-09-30
Also published as: JPWO2004084539A1; JP4341620B2

Abstract

It is possible to realize a fill-in document creation device and a fill-in content extraction device which can perform high processing while suppressing labor cost without requiring a special paper sheet. A document image creation section (102) creates document image data of a document having a predetermined fill-in position. An embed information creation section (103) creates format information and fill-in position detection information indicating the fill-in position on the document and the fill-in content as embed information to be embedded in the document image data. A document data creation section (104) combines the document image data with the embed information to create document data and outputs it as a fill-in document (300) by a document output section (105). A fill-in content extraction device (200) extracts the embed information and judges the fill-in content according to the format information and the fill-in position detection information contained in this embed information.

Description

Description Document creation device and creation method for entry Entry content extraction device and extraction method, entry document

The present invention provides an entry document creation device and a creation method for automatically extracting and converting data written in a document such as a questionnaire and answer sheet, an entry content extraction device and an extraction method, and an entry device. Regarding documents. Background art

Conventionally, there have been the following methods for extracting the contents of a questionnaire or answer sheet.

(1) A device using a mark sheet (for example, see Non-Patent Document 1).

(2) An answer sheet is extracted from the answer sheet using an OCR device (for example, see Patent Documents 1 and 2 and Non-Patent Document 2).

[Patent Document 1]

Japanese Patent Application Laid-Open No. Hei 8—3 1 5 0 6

[Patent Document 2]

Japanese Patent Application Laid-Open No. H10-104993

[Non-Patent Document 1] Education Software Co., Ltd "OMIR-300" [online], 2002, [20

03 March 7, 2003], Internet: http: // Ran. Edsoft. Co.jp/2/3/7/ index, html>

[Non-patent document 2]

Media Drive Corporation "Score" [on Iine] _¾ 2001 [Searched on March 7, 2003], Internet UR: http: // www. Mediadrive. Co. Jp / products / so I ut i on / sa i ten / i ndex.html>

Conventionally, in order to convert the contents of the questionnaire and answer sheets into data, it is necessary to input the contents by keyboard or use the above methods (1) and (2). I got it.

However, operator input errors are unavoidable in order to perform keyboard input.Therefore, usually, two operators who perform input are prepared and the same contents are input, and the results are compared and input errors are made. So-called “verify method j” is used. As a result, there is a problem that labor costs increase and time is required for manual input.

On the other hand, in the conventional method (1), it is necessary to prepare a special paper (mark sheet paper) printed using a drop cartridge, and it is necessary to fill in the answer there. However, the special form of the mask sheet increases the cost.In addition, the problem is that the problem paper and the answer-mark sheet are distributed separately, so that the increase in the amount of material also increases the cost. , And also for respondents, There was a problem that it was easy.

Also, in the case of the method (2), the paper may be plain paper, and the question paper may be provided with an answer. Therefore, the method is lower cost and easier to handle than the method of the above (1). ing. Another advantage is that handwritten characters can be recognized and converted into data. However, in general, the performance of handwritten character recognition is low (it is very difficult to perform accurate recognition using only OCR), and there is a problem that a dictionary for recognition must be prepared. In addition, it is necessary for the system to retain format information that indicates the character position of the recognition target and the processing method (which question field is the answer field). For example, if the problem creator and the data creator are different (for example, The problem was how to communicate this format information in cases such as outsourcing the creation). In addition, such a method has a problem that the processing speed is slow because character recognition is performed. Disclosure of the invention

In order to solve the above-mentioned problems, the present invention creates format information on answer fields provided in a document in advance, and creates entry location detection information for detecting entry locations in a document. Is embedded in the document. That is, the present invention employs the following configuration.

According to the present invention, there is provided a document image creating section for creating harmful image data of a harmful image having a predetermined entry location, a document, format information indicating the entry location on the document, and whether or not the entry location has been entered. Is used as the embedded information in the document image data. And a document data creating section for creating document data by synthesizing the document image data and the embedded information.

¾>

In addition, the present invention provides a text / image creation unit that creates document image data of a harmful document having a predetermined entry location, format information indicating the entry location on the document and the entry content, and whether the entry location has been entered. An embedding information creating unit for creating entry point detection information for detecting whether or not entry information is embedded in document image data, a data storage unit for storing document image data and embedding information as integrated document data, and a data storage. A print processing device that combines the document image data and the embedding information stored in the section to create document data, and prints the document data and outputs a document for entry. It is an entry document creation device. <Configuration 3>

Further, the present invention provides the entry document creation device according to the configuration 1 or 2, further comprising a document data creation unit that creates the document data by embedding embedded information represented by a dot pattern in the document image data. This is a document creation device for entry.

Further, the present invention provides the entry document creation device according to any one of the constitutions 1 to 3, further comprising a sentence data creation unit for creating sentence data including document identification information. This is an input document creation device. Further, according to the present invention, there is provided a document image data creating step of creating document image data of a document having a predetermined entry portion using an entry document creation device, and format information indicating the entry portion and the entry content. A format information creation step to be created; an entry location detection information creation step to create entry location detection information for detecting whether or not an entry has been made; and document image data, format information, and entry location detection information And a document data creation step of creating document data as integrated document data.

Further, according to the present invention, in the writing document creation method according to the configuration 5, the document data creation step includes a step of creating the document data by embedding embedded information represented by a dot pattern in the document image data. This is a method for creating an entry document.

The present invention also provides format information for judging the position of the entry in the document and the content of the entry, and entry detection information for detecting whether or not the entry has been entered. A content extraction device for extracting the content of an entry from an entry document having an embedded information, an embedded information extraction unit for extracting format information and entry location detection information of the entry document, and information on an entry position in the format information. And an entry data detecting unit for detecting the entry using the entry data, and an answer data conversion unit for judging the entry using the detected entry and the information for judging the contents of the format information. An entry content extraction device characterized by comprising: <Configuration 8>

Further, according to the present invention, in the entry content extracting device according to the configuration 7, the entry location detection information is represented by a dot pattern, and the entry location detection unit determines the presence or absence of the entry based on a change in the detection state of the dot pattern. An entry content extraction device characterized by having such a configuration.

In addition, the present invention provides the entry content extraction device described in the configuration 7 or 8, wherein visual information for outputting an image for visually confirming the entry content in the free entry column is provided for the entry document having the free entry column. An entry content extraction device comprising an output unit.

In addition, the present invention uses an entry content extraction device to determine the location of entry points in a document and format information for determining the entry contents in entry points, and to detect whether or not entries have been made in entry points. An entry content extraction method for extracting entry content from an entry document that has entry location detection information in an integrated manner. The format information and entry location detection information of the entry document are extracted. An entry data detection step for detecting an entry using the location information and the entry data, and a response data change for judging the entry using the detected entry and the information for judging the entry of the format information. And a replacement step.

<Structure 1 1)

Also, in the present description, in the entry content extracting method described in Structure 10, the entry location detection information is represented by a dot pattern, and the entry location detection step is performed by detecting a change in the dot pattern detection state. A step of determining the presence or absence of an entry based on the entry content extraction method.

Further, according to the present invention, in the entry content extracting method described in the configuration 10 or 11, an image output for visually confirming the entry content in the free entry column is performed for the entry document having the free entry column. An entry content extraction method characterized by comprising a visual information output step.

The present invention also provides format information for judging the position of the entry in the document and the content of the entry, and entry detection information for detecting whether or not the entry has been entered. This is a document for entry characterized by integrally having BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a configuration diagram showing a specific example 1 of an entry content extraction device and an entry document creation device of the present invention.

FIG. 2 is a flowchart showing a process for creating a document for entry in Example 1.

Fig. 3 is an explanatory diagram of a survey form image.

Fig. 4 is an explanatory diagram of the answer entry area.

Fig. 5 is a country explaining format information. '

FIG. 6 is a diagram showing an example of a signal unit.

FIG. 7 is an explanatory diagram showing a change in pixel value. FIG. 8 is an explanatory diagram of a background image.

FIG. 9 is an explanatory diagram showing an example of a unit pattern and a symbol represented by the unit pattern.

FIG. 10 is a flowchart showing a watermark image forming process.

The eleventh country is the country that explains how to create the symbol unit placement availability matrix.

FIG. 12 is an explanatory diagram showing an example of a process of creating a unit pattern arrangement possibility matrix. FIG. 13 is an explanatory diagram showing an example of a unit pattern matrix.

FIG. 14 is an explanatory diagram showing an example of a unit matrix.

FIG. 15 is an explanatory diagram showing an example of creating a watermarked image.

FIG. 16 is a flowchart of an embedded signal number recording process.

FIG. 17 is an explanatory diagram of step S21.

FIG. 18 is an explanatory diagram of step S22 and step S23.

FIG. 19 is a flowchart of the entry content extraction processing in the specific example 1.

FIG. 20 is an operation flowchart of an embedded information retrieval process.

FIG. 21 is an explanatory diagram of a signal area detection method.

FIG. 22 is an explanatory diagram showing an example of a method of restoring the size of the unit matrix embedded in the attribute area.

FIG. 23 is an explanatory diagram of step S42 and step S43.

FIG. 24 is an explanatory diagram showing an example of a method for extracting a codeword from a unit pattern matrix. FIG. 25 is an explanatory diagram of the embedded signal number detection processing.

FIG. 26 is an explanatory diagram of a filter output value calculation process.

The 27th country is an explanatory diagram of the process of determining the optimum value.

FIG. 28 is an explanatory diagram of a detection signal counting process. ·

FIG. 29 is an explanatory diagram showing an example of a screen display.

FIG. 30 is a diagram showing the composition of Example 2.

FIG. 31 is an explanatory diagram of an entry content extraction process in the specific example 2.

FIG. 32 is a block diagram of a specific example 3.

FIG. 33 is a flowchart showing a process for creating a document for entry in Specific Example 3.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail using specific examples.

《Example 1》

FIG. 1 is a configuration diagram showing a specific example 1 of an entry document creation device and an entry content extraction device of the present invention.

In the figure, 100 is an entry document creation device, 200 is an entry content extraction device, and 300 is an entry document. The entry document creation device 100 is composed of a sentence β creation unit 101, a document image creation unit 102, an embedded information creation unit 103, a harmful data creation unit 104, and a document output unit 105. Become. The document creation unit 101 is a functional unit that creates a document (a document) such as a questionnaire, and is realized using, for example, general word processing software. The document image creation unit 102 is a functional unit that converts the data of the document created by the document creation unit 101 into image data. This can be achieved, for example, by using imaging software that records a print image of the document as an image. The embedded information creation unit 103 is a functional unit that creates embedded information including the following three pieces of information using the document data imaged by the document image creation unit 102.

1. "Entry location detection information j"

2. "Format information J" indicating the correspondence between the answer column position and the question number

3. "Identification information J" indicating the identification number of the document

The document data creation unit 104 is a functional unit that records the embedded information as a dot pattern (paper pattern) on paper, and creates image data of the document and the embedded information as integrated document data.

The document output unit 105 is, for example, a printer or the like, and is a functional unit that prints the document data created by the document data creation unit 104 and outputs it as an entry document 300.

The content extraction device 200 includes a document reading unit 201, an embedded information retrieving unit 202, an entry location detecting unit 203, a response data converting unit 204, and a visual information output unit 205. Have. The document reading unit 201 is equipped with a scanner, is a functional unit that reads the image of the entry tech- nique 300 that has been filled in with answers, and outputs this scanned image data. Embedded information extraction section

Reference numeral 202 denotes embedded information from the scanned image data output from the document reading unit 201. This is a function section that retrieves information. That is, the embedded information creation unit 1 of the entry document creation device 100

It has a function to retrieve the above three information created in 03 (entry location detection information and format information identification information). The entry location detection unit 203 is the embedded information extraction unit 200

2 is a functional unit that detects entry points from the scanned image data output from the document reading unit 201 based on the entry point detection information and the format information of the embedded information extracted by step 2.

The response data conversion unit 204 is a functional unit that converts the information of the detection location detected by the entry location detection unit 203 into response content using format information. The visual information output unit 205 outputs the response content obtained by the response data conversion unit 204 and the scanned image to a display or the like so that the operator can visually check whether the conversion result of the response content is correct. This is a functional part for displaying the visual information of the user.

The document creation unit 101 to the document data creation unit 104 in the entry document creation device 100, the document reading unit 201 to the visual information output unit 205 in the entry content extraction device 200, It is realized by software corresponding to each function and hardware such as a CPU and a memory for executing the software.

Next, the operation of the specific example 1 will be described separately for the operation of the entry document creation device "! 0 0" and the operation of the entry content extraction device 200. Note that the input sentence 300 in the specific example 1 is Assuming that it is a questionnaire, the following operation is explained.

[Operation of the entry document creation device 100] FIG. 2 is a flowchart showing the operation of the entry document creating apparatus 100 of the first embodiment. First, a document of the questionnaire is created by the document creating unit 101 (step s1) _{0, that} is, a document including a question and an answer column is created. Next, the document image creating unit 102 converts the harmful surface data created by the document creating unit 101 into image data, and creates a survey form image (step S2). FIG. 3 is an explanatory diagram of the created questionnaire image.

In such a questionnaire, the response column includes a check box 檲 (indicated by, for example, Α) and a free-form column (indicated by B in the figure). The check entry into the form and the entry into the free entry field are made by inputting characters.

Next, the embedded information creating unit 103 creates the format information (step S3), creates the entry location detection information (step S4), and creates the identification information (step S5), and includes these pieces of information. Create embedded information (step S6). Then, the document data creation unit 104 creates document data that combines the embedded information and the document image data created by the document image creation unit 102 (step S7).

Fig. 4 is an explanatory diagram of the answer entry area.

Fig. 4 shows the answer entry area in the survey form image of Fig. 3. In Fig sequentially checking entry column is referred as C, such as C _{1 2,} a free description column is referred to as F _2, F _3. Each free description column is assumed to be divided from passing Li block B shown in FIG until B _{2 8.} In addition, the coordinate positions in each column are defined as tight as shown in FIG. However, in the coordinate system, the origin (.0, 0) is at the upper left corner.

First, the creation of the format information in step S3 will be described. Format information Fomatsuto information Toki to answer entry area shown in der y _¾ Figure 4 shows the in whether what number answer to which question the position and its answer to each answer field and is as follows.

FIG. 5 is an explanatory diagram of the format information.

As shown in the figure, the format information includes upper left and lower right coordinates (A _x , A _y ) (B _x , B _y ) of the answer column, a question number, and information on answer determination.

The creation of the entry detection information is to create the image feature information in the check entry column and the free entry column, which will be described later. In addition, information for detecting the entry location is created separately for each column in the check entry column, and is created separately for each block in the free entry column.

Next, the process of creating identification information in step S5 will be described.

The identification information is the type number of the questionnaire, the ID and page number for each copy, and is used as a key for classifying the response data. This may be given manually or automatically by the system as a serial number.

In addition, in this specific example, the creation of the embedded information in step S6 is performed as an integrated process with the document data creation process in step S7 by the document data creation unit 104. Hereinafter, the process of creating embedded information and the process of creating document data will be described in detail.

First, the principle for representing embedded information with a watermark image composed of a dot pattern will be described.

The watermark signal composing the watermark and the image is represented by a signal unit that expresses a wave with an arbitrary wavelength and direction by an array of dots (black image hats). FIG. 6 is a diagram showing an example of a signal unit.

Hereinafter, a rectangle whose width and height are Sw and Sh is referred to as a signal unit with one signal unit. The width Sw and the height Sh may be different, but in this specific example, it is assumed that Sw = Sh for ease of explanation. The unit of the length is the number of pixels, and in the example of FIG. 6, Sw = Sh−12. The size of these signals when printed on paper depends on the resolution of the image information.For example, if the image information is an image of 600 dpi, the signal of FIG. The width and height of the unit will be 1 2600-0.02 (inch) on the printed document.

In Fig. 6 (1), the distance between the dots is dense in the direction of arctan (3) with respect to the horizontal axis, and the propagation direction of the wave is arctan (-1/3). Hereinafter, this signal unit is referred to as unit A. In Fig. 6 (2), the distance between the dots is dense in the direction of arctan (-3) with respect to the horizontal axis, and the propagation direction of the wave is arctan (1X3). Hereinafter, this signal unit is referred to as unit B.

7 is a sectional view of the change in pixel value from _{ar G ta η (1/3} ) direction.

In Fig. 7, the part where dots are arranged is the antinode of the minimum value of the wave (the point where the amplitude is maximum), and the part where no dots are arranged is the antinode of the maximum value of the wave. Also, since there are two regions where dots are densely arranged in each unit, the frequency per unit is 2 in this example. Since the propagation direction of the wave is perpendicular to the direction in which the dots are densely arranged, the wave of unit に対して is arctan (—l Z3) with respect to the horizontal direction, and the wave of unit. B is arc ΐ an (1 / 3) When the direction of arctan (a) is perpendicular to the direction of arctan (b), axb = —1. Note that the signal unit Signal units with other dot arrangements are also conceivable.

The embedding information can be embedded in the watermark image by assigning a code word symbol to the signal unit and embedding the signal unit in the watermark image. Hereinafter, the signal unit to which the codeword symbol is assigned is referred to as “symbol unit j”.

The required number of symbol units is determined by the number of dimensions used to convert the embedded information into codewords. When binary information is embedded (ί = 2), two types of symbol units (for example, unit Α and unit Β) are prepared. For example, symbol 0 is assigned to unit 、 and unit B is assigned to unit B. Symbol 1 can be assigned. When encoding the embedded information with a larger number of dimensions, a symbol unit corresponding to the number of dimensions is prepared.

Also, for example, a symbol irrelevant to the symbol of the codeword (for example, a symbol N when confidential words are N-ary encoded) is assigned to the unit C, and this is defined as a background unit, and these are arranged without gaps and watermarked. Can be the background of the image. Hereinafter, a signal unit to which a symbol irrelevant to the symbol of the codeword is assigned is referred to as a “background unit”. When arranging background units with no gaps and embedding the symbol unit there, replace the background unit at the position to be embedded with the symbol unit to be embedded.

FIG. 8 is an explanatory diagram of a background image.

Fig. 8 (1) shows a case where unit C is defined as a background unit, and these are arranged side by side without any gap to provide the background of the watermark image. Fig. 8 (2) shows an example in which the unit A as a symbol unit is embedded in the background image of Fig. 8 (1), and Fig. 8 (3) shows the example of Fig. 8 (3). An example of embedding unit B as a symbol unit in the background image of (1) is shown.

As shown in FIGS. 8 (1) to 8 (3), since the number of dots in each signal unit is all equal, a watermark image can be obtained by arranging these signal units without gaps. Has a uniform appearance. Therefore, on a printed page, a gray image having a single density appears to be embedded as a background. There are countless combinations of symbol assignments for signal units. In this way, it is possible to prevent the respondent or a third party (an unauthorized person) from easily decrypting the embedded information. It is also possible to embed the embedded information in the watermark image by simply arranging the corresponding symbol unit for each symbol of the codeword that coded the embedded information. In this specific example, in order to further enhance the effect of preventing unauthorized decoding by a third party, a signal unit arrangement pattern (hereinafter referred to as a unit pattern) is defined for each symbol of the codeword, and the unit pattern is defined. A method of embedding the embedded information in the watermark image by arranging the embedded information will be described.

Here, one unit pattern is defined as a matrix of signal units of width (column) X height (row) = 4 X 2. The background unit is unit C (symbol 2), and the symbol units embedded therein are unit A (symbol 0) and symbol B (1). In Fig. 9 (1), unit A (symbol 0) is arranged at a predetermined threshold (for example, 6) or more. Thus, symbol 0 is represented as the whole unit pattern. In FIG. 9 (2), it represents a thin Pol 1 as a whole _¾ unit pattern placed Yunitto B (symbol 1) predetermined闘値(eg 6) or more. In Fig. 9 (3), unit A and unit B are arranged in almost the same number (same number or one symbol unit is many), and symbol 2 is represented as the whole unit pattern.

Next, an actual watermark image creation process will be described.

It is assumed that the data shown in the questionnaire is binary black and white. For example, it is assumed that the frame and characters of the answer column are black (1) and the background is white (0).

FIG. 10 is a flowchart showing a watermark image forming process.

First, in step S11, the embedding information is converted into an N-ary code. N is arbitrary, but in the following, for simplicity of explanation, it is assumed that N = 2 (the embedded information is converted into a binary code). Therefore, it is assumed that the codeword generated in step S11 is represented by a bit string of 0 and 1. Also, the embedded information may be encoded as it is, or the encoded information may be encoded. Also, encoding may be performed using an error correction code.

Next, a unit pattern is assigned to each symbol of the codeword as shown in FIG. 9 (step S12).

Next, a symbol unit arrangement availability matrix is defined (step S13). The symbol unit arrangement availability matrix is an image obtained by dividing a document image into block images each having a block size of Sw (width) x Sh (height) pixels. This indicates whether the symbol unit can be embedded in the corresponding block of the image. This is a character area (actually If a symbol unit is inserted into a symbol unit, it is impossible to detect the symbol unit. Therefore, it is a matrix for designating a place where a symbol unit can be embedded in advance. If the value of the matrix is 1, then the symbol block can be embedded in the corresponding block of the document image. If the value is 0, the background unit is embedded. Here, Sw and Sh are the width and height of the signal unit, respectively. If the size of the input document image is WX H, the number of elements of the unit matrix Um is width (column) X height (row) = Mw xMh = W / SwX H / S h.

Each element of the symbol unit placement availability matrix is determined by whether or not a character area exists in the corresponding block of the document image. For example, any element (X, Y) (Y row, X column) of the symbol unit placement availability matrix is X -XXS w ~ (X + 1) XS w, y = YxS h ~ (Y + 1) of the input document image. Set to 1 if the character area (pixels with a luminance value of 0) included in xSh is less than or equal to T n pixels, and set to 0 if the character area is larger than T n pixels. T n is a threshold value, and is a small number equal to or less than Swx Sh X 0.5.

FIG. 11 is an explanatory diagram of creating a symbol unit arrangement possibility matrix.

Fig. 11 (1) shows blocks corresponding to each element of the symbol unit arrangement possibility matrix superimposed on the input document image. Fig. 11 (2) shows that when a character area is included in each block, the value of the corresponding block is set to 0. In Fig. 11 (3), the value of each key in the symbol unit placement availability matrix is determined from the character area determination result.

Next, a unit pattern arrangement availability matrix is created (step S14). This is a document The element value is 1 if the unity pattern can be inserted into the area corresponding to this matrix in the image, and 0 if the unity pattern cannot be inserted. If the unit pattern is defined as a matrix of signal units of width (columns) X height (rows) = 4 X 2, the determination of unit pattern input availability is performed as follows. First, the symbol unit placement availability matrix shown in FIG. 11 (3) is divided into 4 × 2 areas. Of the eight signal units constituting one area, if a predetermined threshold value Tu (Tu is about 6) or more can be embedded in the symbol unit (the value of the symbol unit arrangement availability matrix is 1). Unit patterns can be embedded, otherwise, unit patterns cannot be embedded.

FIG. 12 is an explanatory diagram showing an example of a process of creating a unit pattern arrangement possibility matrix. Fig. 12 (1) shows that one unit pattern is composed of eight signal units. Fig. 12 (2) shows that for each unit pattern, the number of elements of the corresponding symbol unit arrangement availability matrix is 1 and the unit pattern whose number is equal to or greater than Tu (= 6) is 1; Indicates that is given 0. Figure 12 (3) shows that the values of each element of the unit pattern placement availability matrix are set.

Next, a unit pattern matrix is created with reference to the unit pattern arrangement availability matrix (step S15). Codeword symbols are repeatedly set in the unit pattern matrix, but are not set in key units where unit patterns cannot be embedded.

FIG. 13 is an explanatory diagram showing an example of a unit pattern matrix.

For example, as shown in FIG. 13, a unit pattern matrix and a unit pattern Assuming that the size of the arrangement possibility matrix is PwxP h = 4 x 3, the symbol of the codeword is (001

Suppose that there were 4 pits in 1). In this figure, since the value of the element in the first row and second column of the unit pattern placement availability matrix is 0, symbol 2 is set without setting the second bit (symbol 0) of the symbol of the codeword, and one row is set. The second bit of the codeword symbol is set in the third column.

Next, a unit matrix U m is created based on the unit pattern pattern matrix and the symbol unit arrangement availability matrix (step S16). The unit matrix Um is the same size as the symbol unit arrangement availability matrix, and is a matrix that describes the arrangement pattern of signal units. The rules for arranging signal units are defined as follows.

FIG. 14 is an explanatory diagram showing an example of a unit matrix.

■ Rule 1: The background unit (Symbol 2) is set at the position where the element is 0 in the symbol unit arrangement availability matrix (Fig. 14 (1)).

■ Rule 2: If the element of the unit pattern matrix is a codeword symbol, set the symbol unit corresponding to that symbol in the corresponding area of the unit matrix Um (Fig. 14 (2)).

■ Rule 3: If the unit pattern matrix is not a symbol of a codeword (the value of the unit pattern placement availability matrix is 0), set the same number of symbol units indicating 0 and 1 symbol units as 1 (No. (1 4 Fig. (3)) ₀

Rail 4: Set the background unit in the area where no signal unit is set (Fig. 14 (4)). In summary, a background symbol is set in the character area, a codeword symbol is assigned if the background area of any unit pattern is more than T u (= 6), and two types are used in the background area in other cases. Allot the same number of symbol units. If the background area is odd, the other one will have a background symbol set. As a result, six or more identical unit patterns are set in the unit pattern to which the codeword symbol is assigned, so that at the time of detection, the total value of the filter output values for the embedded symbol unit is detected. However, in a unit pattern in which codeword symbols are not allocated, the difference between the sum of the output values of the two filters becomes smaller. Therefore, there is an effect that it is easy to determine whether a unit pattern is a unit pattern to which a code word is assigned or a unit pattern to which no code word is assigned.

Next, a watermarked image (image in which document image data and embedded information are superimposed) is created (step S17).

In this step S17, the signal unit is arranged on the background image according to the unit matrix U m created in step S16 (FIG. 15 (D) (FIG. 15 (2)). The document image is superimposed on the background image created by the above to create a watermarked document image (Fig. 15 (3)).

Next, the number of embedded signals is recorded.

FIG. 16 is a flowchart of the embedded signal number recording process.

First, the unit matrix is divided into blocks (step S21). FIG. 17 is an explanatory diagram of step S21.

In step S21, first, the Iw elements at the left end of the unit matrix Um (FIG. 17 (2)) are used as a unit for recording the number of embedded symbol units (called a recording unit band). (Fig. 17 (3)). Next, the part excluding the recording unit band of the unit matrix Um is divided into (horizontal X vertical-) Bwx B h blocks (this is divided into the unit number recording unit matrix Nu (x, y) x = 1 ~ Bw, y-1 ~ Bh). Magnitude unit matrix Um the number of elements as the unit of size of (width X height -) of each block bwx bh to (first 7 (4)) The recording Yunitto band on the left end of the _c Yunitto matrix Um of When arranging, the parameters that can be set for the unit number record unit matrix are the number of blocks in the horizontal direction and the size in the height direction of the blocks. The remaining number of blocks in the vertical direction and the size of the blocks in the width direction are automatically determined from the set parameters, the width of the recording unit band, and the parameters of the unit matrix Um.

In the following explanation, when the size (number of elements) of the unit matrix Um is MwXMh, the number of blocks in the horizontal direction is Bw = 4, the size in the block height direction is bh = 16, and the recording unit band is Let the width be I w = 4. Therefore, the number of blocks in the vertical direction is Bh = MhZbh = Mh16, and the size in the width direction of the block is bw = (Mh-Iw) / Bw = (Mh-4) 4.

Next, the number of signals in each block is measured (step S22), and the number of signals is coded and recorded (step S23).

FIG. 18 is an explanatory diagram of step S22 and step S23. In step S22, the number of symbol units contained in an area corresponding to each element of the unit number recording unit matrix in the unit matrix Um is measured. The example of FIG. 18 shows a method of measuring the number of symbol units in the unit number recording unit matrix Nu (X, Y), and is executed by the following steps.

'Step 1: Extract the area in the unit matrix Um corresponding to Nu (X, Y) (Fig. 18 ②, ②).

■ Step 2: Count the number of symbol units embedded in the area extracted in step 1 (Fig. 18, ③ and ④). Here, the embedding rule of the symbolic unit is, as described above, assumed that the symbolic unit is not embedded in the character area of the input document image. In the example of FIG. 18, it is assumed that the number of symbol units embedded in this area is 71.

In step S23, the number of symphony units measured in step S22 is recorded in the recording unit band. The steps are described below.

■ Step 3: N (X, Y) = File 1 is represented by a binary number (Fig. 18⑥).

'Step 4: Set the result of Step 3 in the corresponding area of the recording unit band (No.

18 Figure ⑦, ⑧).

In the example shown here, since the number of rows bh of the unit matrix Um corresponding to one row of the unit number recording unit matrix is 16 and the width I w of the recording unit band is 4, the unit number recording unit matrix (The number of units for recording is ί w X bh = 4 X 16 = 64. In addition, since the number of columns Bw of the unit number recording unit matrix is 4, the unit number recording unit matrix One of the key points The number of recording units (called unit recording units) assigned to the element is IwXbhZBw == 64/4 = 8. Therefore, the first and second rows of the recording unit band corresponding to each row of the unit recording unit matrix contain information on the first column of the unit recording unit matrix, and the third and fourth rows contain information on the second column and fifth and sixth rows. The information in the third column on the eyes and the information in the fourth column on the 7th to 8th lines are recorded in unit recording units (8 bits).

In this specific example, the number of units is recorded, but the "symbol unit" for the "maximum value of the number of signal units that can be embedded in the area of the unit matrix U m corresponding to each element" in the unit recording unit matrix is recorded. The ratio of the “number of knits” may be recorded. In the method of recording the ratio, the unit matrix U m corresponding to each element of the unit recording unit matrix has a large range, and the number of units contained in it is large. When the number of units exceeds the number of unit recording units '' or `` The number of unit recording units allocated to represent the information of one element of the unit recording unit matrix because the number of columns of the unit recording unit matrix has been increased. When the number of quotas is reduced. " Also, since the entry location is specified for each element of the unit record unit matrix, the number of rows and columns of the unit record unit matrix can be increased for the same input document image to specify the entry location for the print document. This has the advantage that it can be performed in detail, but requires a larger recording unit band or a smaller number of recording units.

The recording unit band is set in the margin of the damaged image so that it does not overlap the character area of the document image. Also, even if the recording unit band is set at the right end, upper end, or lower end of the unit matrix U m, the subsequent processing is performed under the assumption that the recording unit band is above and below the document image. A similar effect can be obtained by doing so.

Further, recording unit bands may be set on the left and right of the unit matrix Um, and the same information may be set for each. In this case, even if the paper becomes dirty and the information of one recording unit band cannot be read, the information is read from the other recording unit band, and the entry point detection processing is stabilized. It can be performed. This is the same in the vertical direction. Returning to FIG. 2, in step S8, the document output unit 105 prints out the document data output from the document data creation unit 104 and outputs it as the entry document 300. Note that the entry document 300 is, for example, a pattern as shown in FIG. 8 printed as a copy-forgery-inhibited pattern in the survey form image shown in FIG.

[Operation of entry content extraction device 200]

FIG. 19 is an operation flowchart of the entry content extraction device 200 in the specific example 1. First, the document reading unit 201 reads the survey form (document for entry 300) in which the answer has been entered (step S31). As a result, a scanned image of the survey form is obtained. Next, the embedded information fetching unit 202 fetches the embed information (step S32), and further separates the information of the entry location detection information, the format information, and the identification information (step S32). 3 3).

Next, the entry point detection unit 203 detects entry points in the survey form using the entry point detection information (step S34). The entry location detection information is recorded for each check entry column or for each block in the free entry column. As a result of entry location detection using this information, it is possible to identify a checked column or a free entry block. That is, the entry location detection This is performed by detecting the presence or absence of entry based on a change in the detection state of the dot pattern. Hereinafter, the operations of extracting the embedded information (step S32) to detecting the entry (step S34) will be described in detail. .

FIG. 20 is an operation flowchart showing a process of extracting embedded information.

First, an outline of a region where a signal unit is embedded (hereinafter, referred to as a signal region) is detected from a scanned image, and correction such as rotation of the image is performed.

FIG. 21 is an explanatory diagram of a signal area detection method.

FIG. 21 (1) is, for example, a scan image read in step S31. Here, an example in which the upper end of the signal area is detected is shown. Let the input image be I (x, y), x = 0 to Wi-1 and y = 0 to Hί-1. Also, the size of the signal unit embedded in the document by the entry document creation device “! 00” is width X height = S w XS h (pixels), and the print resolution of the document output unit 105 is Do ut ( dpi), and the reading resolution of the scanner in the document reading unit 201 is D in (dpi).

t Sw = S X D i n / D o u t

t S h = S h x D i n / Do u t

And That is, t Sw and t Sh are the theoretical signal units in I mg, and the signal detection filters such as filter A and filter B are designed based on this value.

From 1 mg of this image, set the sample area S (X), X = 1 to Sn for detecting the upper end of the signal area. S rWiW i ZN p (N p is an integer of about 10 to 20). The width of S (X) is Ws = t Swx Nt (Nt is an integer of 2 to 5), and the height is Hs = H ί ZN h (N h is about 8), and the horizontal position of S (x) in I mg is xx N p.

A method of detecting the upper end S Y 0 (n) of the signal area at an arbitrary S (n) will be described below. 'Step 1: Cut out the region corresponding to S (n) from I mg (Fig. 21 · ①).

'Step 2: Apply filter A and filter B to S (n) and record the maximum value in S (n) in the horizontal direction in F s (y) (Fig. 21 21).

'Step 3: Set a certain threshold Ty, set the average value of Fs (1) to Fs (1) to 0 (Ty), Fs (Ty) -Fs (Ty) to F The average value of s (H s) is defined as V 1 (T y).

Set T y at which V 1 (Ty) -VO (T y) becomes maximum to SYO (n) as the position of the upper end of the signal area in S (n) (Fig. 21, ③>).

FIG. 21 (4) is a diagram showing a change in the value of F s (y) with respect to y. As shown in the figure, the average value of the output value of the signal detection filter is small in an area without the signal unit of Img, while a symbol unit (unit A) Or, since the units B) are densely arranged, the output value of the signal detection filter is large (the margins of the document are the background, which are also densely embedded here). Therefore, the output value of the signal detection filter fluctuates greatly near the boundary between the signal region and the other region, and this is used for region detection.

The above steps 1 to 3 are performed for S (;) and X = 1 to Sn to obtain S YO (x) and X = 1 to Sn. The upper end of the signal region is obtained by linearly approximating the sample points S 0 (x XN p, SYO (x)), x = “! 〜 Sn obtained by using the least square method or the like. The contour line is also detected using the same method as described above. For example, an image obtained by rotating and moving the signal region so that the upper end of the signal region is horizontal is hereinafter referred to as an input image.

Here, an example in which the signal area of the input image is (IxO, ίy0) to (IX1, Iy1) and the information of the attribute recording area is restored.

-Step 1: Cut out the area near (1 x 0, I y 0) of the input image (Fig. 22 (1)).

■ Step 2: Set the attribute area for the cut area (Fig. 22 (1)). It is assumed that the attribute area is the same as that set in the entry document creation device 100. For example, when Mw is represented by 16 bits, the most significant bit is (IX0 + tSw, Iy0 ), The least significant bit is detected as being embedded in (I x O + t SwX l 7, I y 0).

■ Step 3: Filter A and filter B are applied to the Mw embedding area set in step 2, and the symbol unit corresponding to the larger output value of filter A and filter B at each pit position is the bit. It is determined that it is embedded in the position (Fig. 22, ③).

■ Step 4: Restore the value of Mw in the reverse order as set in the entry document creation device 100 (Figs. 22, ④ and ⑤).

Theoretical value of signal unit size in input image 1: Although S w and t S h include errors, the signal detection positions in the attribute recording area correspond to the boundaries detected in Fig. 21 respectively. For example, if Sw = S h = 12 and D ο ί = 600 _¾ D in = 400, then t Sw = t S h = 12 x 400 X 600 = 8, so the attribute recording area is 8 X 1 7 = 1 36 pixels only, and even if the error is about 1% (actually less), the error of about 1 pixel even at the position farthest from the reference point of the attribute area The signal detection position can be set almost accurately.

The true width S i w of the signal unit in the input image is calculated based on the width of the unit matrix extracted from the attribute recording area and the width of the signal area obtained from FIG. 21〗 X 1—I X 0

S i w = Mw / (ϊ 1-ϊ χ 0)

Can be calculated by Similarly, the true width S i h of the signal unit is

S i h = Mh / (I y 1-I y 0)

Can be calculated by

FIG. 23 is an explanatory diagram of step S42 and step S43 in FIG. In step S42, the sum of the filter output values is calculated for each unit pattern. In Fig. 23, for each signal unit that constitutes unit pattern U (X, y), the computation (convolution) with filter A is calculated, and the sum of the output values of the convolution for each signal unit is calculated. Is defined as the output value F u (A, X, y) of the filter A for the unit pattern. However, the composition for each signal unit is the maximum value calculated as the position of filter A is shifted from horizontal to vertical for each signal unit.

Similarly, for the filter B, the output value F u (B, X, y) for the unit pattern U (X, y) is calculated.

In step S43, F u (A, x, y) and F u (B, x, y) are compared and If the absolute value of the difference I Fu (A, x, y) -F u (B, x, y) | is smaller than a predetermined threshold value T p, it is assumed that the codeword symbol is not allocated. . In other cases, it is determined that the symbol with the larger of Fu (A, X, y) and Fu (B, x, y) is assigned. That is, if Fu (A, x, y)> Fu (B, x, y), U (x, y) is embedded with symbol 0, and Fu (A, X, y) is smaller than F u ( B,, y), it is assumed that the symbol 1 is embedded in U (x, y).

The processing shown in Fig. 23 is performed for all unit patterns obtained from the input image, and a unit pattern matrix U is created.

In step S44, the embedded information is decoded based on the determined symbol.

FIG. 24 is an explanatory diagram showing an example of a method for extracting a codeword from a unit pattern matrix.

In Fig. 24, it is assumed that the symbol 2 is set to the element to which no symbol is assigned, and the code word is restored by extracting the symbol ignoring the element where the symbol 2 is set.

Next, the operation of entry point detection will be described.

In the following description,

. The size of the signal unit embedded in the document by the entry document creation device 100 is SwXSh (pixel).

= The number of embedded signal units is width x height-nw x nh.

■ There are two types of embedded unit, Unit A and Unit B. 'The size of the signal unit in the input image is S iwx S ih.

The description is made on the assumption that

FIG. 25 is an explanatory diagram of the embedded signal number detection processing.

The detection of the number of embedded signals is performed in the following steps.

-Step 1: Divide the input image into S w X S h blocks and set the unit matrix U m (25th country).

■ Step 2: Take out the part corresponding to the recording unit band of the unit matrix U m (Fig. 25②).

■ Step 3: Restore the embedded bit string by applying a signal detection filter to the recording unit band (Fig. 25, ③ and ④). In Fig. 25 (3), the output values of two filters (Filter A and Filter B) are calculated for the area on the input image corresponding to each element of the unit matrix U m corresponding to the recording unit band, It is assumed that the symbolic unit corresponding to the filter with the larger output value is embedded. In this example, since the output value of the filter A is large, it is determined that the unit A (symbol 0) is embedded.

■ Step 4: Restore the unit number record unit matrix based on the restored bit string (Fig. 25⑤).

Next, a process of calculating a filter output value is performed.

FIG. 26 is an explanatory diagram of a filter output value calculation process.

Here, for each element of the unit matrix U m set in the process of detecting the number of embedded signals, the output value of the signal detection filter is recorded by the following steps. 'Step 1: Calculate the output values of the signal output filters (Filter A and Filter B) for the area of the input image corresponding to any element of the unit matrix U m (Fig. 26 2). The signal detection filter calculates the output value while shifting the target area up, down, left, and right, and finds the larger of the maximum value of the output value of filter A and the maximum value of the output value of filter B.

'Step 2: Perform step 1 for all elements of unit matrix U m, and output values corresponding to filter output value matrix F m (X, y), x = 1 to Sw, y = 1 to Sh Record in.

Next, the optimal threshold is determined.

FIG. 27 is an explanatory diagram of the determination process of the optimum threshold value.

The threshold value here is a threshold value (referred to as T s) for determining whether a unit symbol is embedded in an area of the input image corresponding to each area of the unit matrix U m, and is a filter output value. If the value of any element of the matrix exceeds the threshold value Ts, it is determined that the symbol unit is embedded at the position corresponding to the input image.

■ Step 1: Set the initial value of the threshold value t s from the average Fa and the standard deviation F s of the elements of the filter output value matrix (output value of the signal detection filter) (Fig. 27①). Here, for example, the initial value is t s = F a — F s * 3.

■ Step 2: The filter output value matrix is binarized by s s to form a unit extraction image (Fig. 27②).

-Step 3: Apply unit number record unit matrix to unit extracted image (No. 27 Figure 3).

+ 'Step 4: Count the number of symbol units in the area corresponding to each element of the unit number recording unit matrix of the unit extraction image and record it in the unit number recording unit matrix (Fig. 27 2).

-Step 5: The absolute value of the difference between the number of symbol units recorded in the recording unit band decoded in the embedded signal number detection process and the number of symbol units obtained from step 4 is calculated for each element of the unit number recording unit matrix. , And the total value of all elements is defined as S f (ts) (Fig. 27 2).

-Step 6: Record t s at which S f (t s) becomes minimum as T s (Fig. 27 2). ■ Step 7: Add At to t s and update t s (Fig. 27 2). A predetermined value may be calculated from the standard deviation Fs obtained in step 1 (for example, At = FsXO.1).

■ Step 8: When T s has reached the expected value, the process ends. If not, return to step 1 (Fig. 27 2).

Next, the detection signal is counted.

FIG. 28 is an explanatory diagram of a detection signal counting process.

In this part of the process, almost the same process as the optimal threshold determination process is performed using a unit extracted image obtained by binarizing the filter output value matrix with the optimal threshold Ts obtained in the optimal threshold determination process.

. Step 1: Binarize the filter output value matrix with ds to create a unit extraction image (Fig. 28①). ■ Step 2: Apply unit number record unit matrix to unit extracted image (No.

28 Figure ②).

-Step 3: Count the number of symbol units in the area corresponding to each element of the unit number recording unit matrix of the unit extraction image and record it in the unit number recording unit matrix (Fig. 28 (3)).

■ Step 4: The difference D (X, Y) between the number of symbol units recorded in the recording unit band decoded in the embedded signal number detection processing and the number of symbol units obtained from step 3 is calculated. It is calculated for each element of the unit number record unit matrix (Fig. 28 28). Unit number recording unit For D (X, Y) in an arbitrary element Nu (X, Y) of the matrix, R (X, 復元) is the number of unit symbols restored from the recording unit band, and measured in step 3. Let D (X, Y) = R (X, Y) C (X, Y) be the number of generated unit symbols C (X, Y).

Next, the entry is judged.

The entry judgment for any element N (X, Y) of the unit number recording unit is performed using D (X, Y) as follows.

'Entry with a check mark or character added: D (X, Y)> TA (TA is a positive integer) "When the number of unit symbols detected is smaller than the number of unit symbols recorded On the other hand, it was determined that detection was impossible due to the addition of a check mark or text on the originally embedded unitit symbol. "

It should be noted that deletion of characters and the like can also be detected by this determination method, but this is not used in this specific example, and is not used. Returning to FIG. 19, when the entry position is detected, the answer data conversion unit 204 converts the entry into answer data (step S35). Here, the format information obtained in step S33 is used. For example, suppose that in the detection of the entry in step S34,, in FIG. 4 is identified as checked. At this time, if a follower one mat information indicating the fifth-view, it is Ru divided question and number of answers R _{1 1} are each 1 _¾ 1. Therefore, it can be seen that the content of the answer to question 1 was answer 1 (ie, male). In addition, it is also possible to cut out an image of a book identified as being freely entered in the entry location detection in step S34 and obtain this as answer data.

Next, the visual information output unit 205 outputs the data conversion result of the answer, and this is displayed on a display or the like (not shown) (step S36).

FIG. 29 is an explanatory diagram showing an example of a screen display.

In the illustrated example, the scanned image, the entry detection result, the survey form identification information, and the response data conversion result are displayed as an example of the screen display. The scanned image is an image of the survey form read by the document reading unit 200 "I. The entry detection result is based on the detection obtained by the above-described processing by the embedded information extraction unit 202 to the response data conversion unit 204. shows the results. in this case, to correspond to question 1 of Γ man j, C _{2 1} corresponding to the "company employee" of question 2 is displayed as a result out of fill detection. In addition, in the free text box, the entry detection result for each block is displayed, so that the characters marked “purple” in a plurality of blocks are displayed. Further, the signature identification information is information such as the ID of a survey form extracted from the embedded identification information by the embedded information extraction unit 202. And the answer data conversion result is This is an image obtained by cutting out the block in the free entry column in which the answer number and the answer to the question converted by the data conversion unit 204 are written.

The operator of the entry extraction device 200 confirms and corrects the result of the data conversion of the answer to the displayed katsuri (step S37).

<Ekuri>

As described above, according to the specific example 1, since the format information, the information for detecting the entry location, and the identification information are all provided as the entry document 300, the entry content extraction device 200 side It is not necessary to keep the format information of the entry document 300. Therefore, there is an effect that the entry content extracting device 200 can correspond to any format of the entry document 300 in any format.

Also, it is not necessary to secure personnel for verification input, to prepare special paper such as mark sheet paper, to transmit format information necessary for OCR processing, and to enter identification numbers. In addition, in the process of extracting the contents of the entry, since character recognition is not performed, there is an effect that the data can be converted into data at a higher speed than in the process such as OCR.

Also, since the embedded information is represented by a dot pattern, there are almost no restrictions on the layout of the entry document 300, and even if the entry document 300 is dirty or bent, a read error may occur. Is less likely to occur, and therefore, the reliability of entry point detection can be improved.

Further, according to the specific example 1, the scan image, the entry detection result, and the result of the conversion of the response data are output as visual information. Corrections can be made. In particular, since the contents to be entered in the free text box are output, for example, the operator can concentrate on checking and correcting the contents of this part only.

《Example 2》

The specific example 2 is different from the configuration of the specific example 1 in that the response data conversion result is stored in an external storage device so that a visual check can be separately performed so that the visual check can be performed more efficiently. is there.

FIG. 30 is a block diagram of a specific example 2.

In the figure, an entry document creating apparatus 100 and an entry document 300 created thereby are the same as those in the first embodiment, and corresponding parts are denoted by the same reference numerals and description thereof will be omitted. The entry content extraction device 200a is composed of a document reading unit 201, an embedded information retrieving unit 202, an entry location detecting unit 203, and a response data converting unit 204. Since the configuration is the same as that of the entry content extraction device 200 in the specific example 1, the description is omitted here. The result storage unit 400 is, for example, a hard disk device, and is a storage unit that stores the answer data conversion result output from the answer data conversion unit 204. The visual information output unit 500 is a functional unit that displays and outputs the result of the conversion of the answer data using the data stored in the kuri chestnut storage unit 400, and is configured by, for example, a personal computer. ing.

<motion> .

The document creation processing by the entry document creation device 100 in the specific example 2 is the same as that of the specific example 1. Therefore, the description is omitted.

FIG. 31 is an explanatory diagram of the entry content extraction process.

The operations in steps S41 to S45 are the same as the operations in steps S31 to S35 in the first specific example. That is, it reads the questionnaire, retrieves the embedded information, separates the retrieved information, detects the entry location, and converts the response data.

Next, the result output of the answer data conversion unit 204 is stored in the result storage unit 400 (step S46). _{0 The} data conversion result is, as shown in FIG. This is image data obtained by cutting out the contents of the answer in the free entry column, and is stored together with the scan image. Next, using the data stored in the result storage unit 400, the visual data output unit 500 displays the data conversion result of the answer (step S47). The display contents are the same as those shown in FIG. 29.

The operator who checks the contents of the entry visually confirms and corrects the data conversion result of the answer to such a display result as in the specific example 1 (step S48). <Effect>

As described above, according to the specific example 2, the following effects are obtained in addition to the effects of the specific example 1. That is, the result of the conversion of the answer data is stored in the result storage unit 400, and the check is performed using the stored data at the time of the visual check. It can be changed at high speed, and the visual check of the result can be done at a different location, such as performing the operation at multiple remote locations at a remote location. As a result, the content extraction process can be performed more efficiently. It becomes possible.

《Example 3》

Example 3 is a portable external storage of the questionnaire image and embedded information so that a specialized contractor can create and print data for the form. It is stored in a device and can be used for work.

FIG. 32 is a block diagram of a specific example 3.

The entry document creation device 100a includes a document creation unit 101, a document image creation unit 102, and an embedded information creation unit 103. Since these configurations are the same as those in the first embodiment, corresponding portions are denoted by the same reference numerals and description thereof will be omitted. The data storage section 600 is a portable external storage device, and is desirably a large-capacity portable storage medium such as MO, CD-ROM, or DVD-RAM.

The print processing device 700 prints a questionnaire based on the questionnaire image and the embedded information stored in the data storage unit 600, and the document processing unit 7001 and the document output unit print out the questionnaire. Section 02 is provided. The print processing device 700 is, for example, a device installed in a different place from the entry document creating device 100a. The document data creation unit 701 in the print processing device 700 creates document image data and embedded information as one piece of document data, similarly to the document data creation unit 104 in the first and second examples. It has the function to do. The sentence output unit 702 prints the document data created by the sentence data creation unit 701 in the same manner as the document output unit 105 in Examples 1 and 2, With the ability to get document 300 I have.

The entry content extraction device 200 of the third embodiment is the same as that of the first embodiment, and the corresponding parts are denoted by the same reference numerals and the description thereof will be omitted.

The third country is a flowchart showing the process of creating the entry document 300.

First, the operations from step S51 to step S56 are the same as the operations from step S1 to step S6 in the first specific example. That is, creation of a survey form, creation of a survey form image, creation of format information, entry location detection information and identification information, and creation of embedded information by integrating these information are performed by the entry document creation device 100a. Done in Next, the survey form image and the embedded information output from the embedded information creating unit 103 are stored in the data storage unit 600 (step S57).

Next, in the print processing device 700, the document data creation unit 701, from the data of the questionnaire image and the embedding information stored in the data storage unit 600, expresses image data expressing the embedding information as a dot pattern. Is created, and this is superimposed on the questionnaire image to create a document database (step S58). Then, the document output section 720 prints the document data created by the document data creation section 70 "1" and outputs it as an entry document 300 (step S59).

As described above, according to the specific example 3, the data storage unit 600 for storing the document image data and the embedded information as integrated sentence data is provided, and the document data in the data storage unit 600 is stored. Since the input document 300 is printed by the print processing device 700 using data, the following effects are obtained. For example, by installing an entry document creation device 1 OOa at a research company, a print processing device 700 at a printing press, and a data storage unit 6000 for data transfer between them. It is easily possible to flexibly cope with a form in which the document data creation process and the printing process are performed at different places, such as when the research company requests the printing company to print the questionnaire.

《Usage form》

(2) In each of the above specific examples, the embedding information is represented by a dot pattern, and the entry location is detected based on a change in the detection state of the dot pattern. However, the present invention is not limited to these configurations. For example, the embedded information may be recorded in a part other than the character area in the document using a two-dimensional bar code or the like. In this case, the entry location detection information is, for example, the image feature information (original image feature information) of the answer column in which nothing is entered. The entry location detection process performed by the entry location detection unit 203 determines the image feature information in the answer column of the filled out questionnaire in the same way and compares it with the original image feature information. Can be detected.

In addition, as a method of expressing such image feature information, a target area of the survey form is divided into a plurality of blocks, and the image feature of each block is extracted. This can be done in the following ways.

(1) The value obtained by converting the frequency of the block image by 1 and sampling the frequency spectrum.

(2) Bu! The value obtained by performing filtering processing (filtering processing using a band-pass filter or a template of an arbitrary pattern) on the block image. (3) The ratio of the areas of white pixels (background area) and black pixels (character area) in the block image. Further, in addition to this, a method of detecting the presence or absence of entry using the edge length of the image feature amount of the character area may be used.

Furthermore, in each specific example, for example, the format information and the entry location detection information are represented by a dot pattern as described in this example, and the identification information is obtained by using another means such as a barcode or a two-dimensional barcode. For example, the embedded information may be expressed by a plurality of methods. Also, in each specific example, the information for detecting the entry location has been described as being individually created for each block in each check entry column and free entry column, but other configurations include the following. . That is, the entire survey form may be regarded as a free entry field, divided into blocks, and created individually for each block. In this case, it is only necessary to determine from the position of the block identified as the entry position, which answer entry position received the answer using the format information, and to convert it into answer data. In such a configuration, for example, if the answer method is to add a circle to the number instead of putting a check mark in the check answer column, add a large circle to the number or add a circle to the number This has the effect that the answer entry position can be accurately detected even when the position is slightly shifted.

Further, although the data storage section 600 is a portable storage medium in the above-mentioned specific example 3, the document data can be transferred from the writing document creation apparatus 100a to the print processing apparatus 700. Any type of communication may be used as long as it is possible, for example, any other communication method with the network.

As described above, according to the present invention, the position of the entry portion in the document and the entry content are determined. A document for entry is created which integrally includes the format information for entry and the entry location detection information for detecting whether or not the entry has been entered, and by using the format information and entry location detection information, Since the content of the written document is determined, it is not necessary to hold the format information before extracting the content. Therefore, there is an effect that any format can be dealt with. Also, it is not necessary to secure personnel for verification input, to prepare special paper such as mark sheet paper, to transmit the format information necessary for OCR processing, and to enter identification numbers. In addition, in the process of extracting the contents of the entry, character recognition is not performed, so that there is an effect that data can be converted at a higher speed as compared with processes such as OCR.

Claims

The scope of the claims

1. A document image creation unit for creating document image data of a document having a predetermined entry location, format information indicating the entry location and entry content on the document, and detecting whether or not the entry location has been entered. Embedded information creating section for creating entry point detection information for embedding information in the document image data,

An entry document creation device, comprising: a document data creation unit that creates document data by combining the document image data and embedded information.

2. A document image creation unit for creating document image data of a document having a predetermined entry location, format information indicating the entry location and entry content on the document, and detecting whether or not the entry location has been entered Embedded information creating section for creating entry point detection information for embedding information in the document image data,

A data storage unit that stores the document image data and the embedded information as integrated document data;

A print processing device that combines the document image data and the embedded information stored in the data storage unit to create document data, and prints the document data and outputs a document for entry. Characteristic document creation device.

3. In the entry document creation device described in paragraph 1 or 2 of the request,

An entry document creation device characterized by comprising a document data creation section for creating document data by embedding embedded information represented by a dot pattern in document image data.

4. The entry document creation device according to any one of claims 1 to 3, further comprising a document data creation unit that creates document data including document identification information. Document creation device.

5. Using an entry document creation device,

A document image data creation step for creating document image data of a document having a predetermined entry portion;

A format information creating step for creating format information indicating the entry location and the entry content;

An entry point detection information creating step for creating entry point detection information for detecting whether or not an entry has been made in the entry section;

A document data creation step of creating the document image data, the format information, and the entry location detection information as integrated document data.

6. In the method for preparing an entry document described in claim 5,

The document data creation step is a step of creating document data by embedding embedded information represented by a dot pattern in document image data to create document data.

7. The position of the entry in the document and the format information for determining the content of the entry, and the entry detection information for detecting whether or not the entry has been entered. An entry content extraction device for extracting entry content from an entry document integrally provided, An embedded information extracting unit for extracting format information and entry location detection information of the entry document;

An entry location detection unit that detects an entry location using the entry location information in the format information and the entry location detection information;

An entry content extraction device, comprising: a response data conversion unit that determines the entry content using the detected entry location and the entry content determination information of the format information.

8. In the entry content extraction device described in claim 7,

The entry location detection information is represented by a dot pattern,

An entry content extraction device, wherein the entry location detection unit is configured to determine the presence or absence of entry based on a change in the detection state of the dot pattern.

9. In the entry content extraction device according to claim 7 or 8,

An entry content extraction device comprising a visual information output unit for outputting an image for visually confirming the entry content in the entry field for an entry document having an entry field.

10. Using an entry extraction device, detect the location of the entry in the document, the format information for determining the content of the entry, and whether or not the entry has been entered. Content extraction method for extracting the content of an entry from an entry document that has entry location detection information for

An embedded information extraction step of extracting the format information and the entry location detection information of the writing damage, An entry location detection step for detecting an entry location using the entry location information in the format information and the entry location detection information;

A response data conversion step of determining the input content using the detected entry location and the input content determination information of the format information.

1 1. In the method for extracting contents described in claim 10,

The entry location detection information is represented by a dot pattern,

The entry content detection step is a step of judging the presence / absence of entry based on a change in the detection state of the preceding dot pattern.

1 2. In the method for extracting contents described in claim 10 or 11, in order to visually confirm the contents of the free entry column for a document for entry having a free entry column. A visual information output step of outputting an image of the entry.

1 3. The position of the entry in the document and the format information for determining the content of the entry and the entry detection information for detecting whether or not the entry has been entered. A document for entry characterized by having in the.