JP2012194879A - Information processing apparatus, information processing method and program - Google Patents

Information processing apparatus, information processing method and program Download PDF

Info

Publication number
JP2012194879A
JP2012194879A JP2011059362A JP2011059362A JP2012194879A JP 2012194879 A JP2012194879 A JP 2012194879A JP 2011059362 A JP2011059362 A JP 2011059362A JP 2011059362 A JP2011059362 A JP 2011059362A JP 2012194879 A JP2012194879 A JP 2012194879A
Authority
JP
Japan
Prior art keywords
area
region
position information
image data
information processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
JP2011059362A
Other languages
Japanese (ja)
Inventor
Masamitsu Ito
Takashi Sawada
Shigehiro Fujitsuka
Tatsuya Mogi
修光 伊藤
達也 毛木
敬 澤田
誠弘 藤塚
Original Assignee
Pfu Ltd
株式会社Pfu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pfu Ltd, 株式会社Pfu filed Critical Pfu Ltd
Priority to JP2011059362A priority Critical patent/JP2012194879A/en
Publication of JP2012194879A publication Critical patent/JP2012194879A/en
Application status is Withdrawn legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/2054Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification
    • G06K9/2063Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification based on a marking or identifier characterising the document or the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K2209/00Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K2209/01Character recognition

Abstract

PROBLEM TO BE SOLVED: To provide a technique for improving efficiency of definition information creation used for OCR software and the like.SOLUTION: An information processing apparatus according to the invention includes: an area recognition part for, in an area designated by a predetermined expression in image data, recognizing a first area designated by a first area designation expression and a second area designated by a second area designation expression that is different from the first area designation expression; a position information acquisition part for, in the image data, acquiring position information on the first area as position information to designate an area that is to be an object of character recognition; and an item name acquisition part for acquiring character information that is acquired by recognizing characters in the second area as an item name of an area that is to be an object of character recognition designated by the position information acquired by the position information acquisition part.

Description

  The present invention relates to an information processing apparatus, an information processing method, and a program technique.

  In recent years, paperlessness has been promoted in various businesses from the viewpoint of business improvement and cost reduction. On the other hand, there are still many scenes where paper is still used, for example, transaction documents. Conventionally, OCR (Optical Character Recognition) software has been used to improve the efficiency of operations in which such paper is used.

  In order to designate a reading area or the like in such OCR software, definition information of the reading area or the like is required. Patent Document 1 and Patent Document 2 listed below disclose techniques relating to the definition information.

  Japanese Patent Application Laid-Open No. H10-260260 discloses a technique for reading a character type corresponding to a color by scanning image data for each color. Patent Document 2 discloses a technique for recognizing attribute information entered in a region surrounded by a predetermined color frame and creating an attribute information definition body for a read item.

Japanese Utility Model Publication No. 05-008670 JP 05-081472 A

  However, in the conventional technique, when creating definition information of OCR software, the user manually sets an item name indicating the description content of the reading area for the position information of the reading area acquired from the image data. There was a need.

  The present invention has been made in consideration of such points, and an object of the present invention is to provide a technique capable of improving the efficiency of creating definition information used in OCR software or the like.

  The present invention employs the following configuration in order to solve the above-described problems.

That is, the information processing apparatus of the present invention
For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. An area recognition unit for recognizing two areas;
In the image data, a position information acquisition unit that acquires position information of the first region recognized by the region recognition unit as position information for designating a region that is a target of character recognition;
Character information obtained by recognizing characters existing in the second area recognized by the area recognition unit is the character recognition target specified by the position information acquired by the position information acquisition unit. An item name acquisition unit to acquire as an item name for the area
It is characterized by providing.

  Here, the area designation expression refers to an expression for designating an area, such as a frame, painting, or hatching.

  According to the above configuration, the first area and the second area in the image data are recognized. And the positional information for designating the area | region used as the object of character recognition is acquired from a 1st area | region. Moreover, the item name about the area | region used as the object of this character recognition is acquired from a 2nd area | region. This eliminates the need for the user to manually set item names for the areas for character recognition related to the acquired position information. Therefore, according to the above configuration, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

As another form of the present invention, the information processing apparatus of the present invention
An association unit that associates the first area with the second area;
The item name acquisition unit is configured to specify the character information obtained from the second area by the position information acquired from the first area associated with the second area by the association unit. You may acquire as an item name about the area | region used as recognition object.

  According to the above configuration, the position information for designating the area to be character-recognized is associated with the item name for the area to be character-recognized. This eliminates the need for the user to associate the acquired position information with the item name. Therefore, according to the above configuration, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

  As another form of the present invention, the association unit may associate the first region with the second region closest to the first region in image data.

  As another embodiment of the present invention, the association unit determines whether a positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and sets the predetermined condition. The first area determined to satisfy the second area may be associated with the second area.

  As another form of the present invention, the associating unit includes one first line arranged in the horizontal direction among a plurality of first areas arranged in the vertical direction and a plurality of second areas arranged in the vertical direction in the image data. It may be determined that the predetermined condition is satisfied for the region and one second region.

  As another form of the present invention, the associating unit includes one first line arranged in the vertical direction among a plurality of first areas arranged in the horizontal direction and a plurality of second areas arranged in the horizontal direction in the image data. It may be determined that the predetermined condition is satisfied for the region and one second region.

  As another embodiment of the present invention, the association unit recognizes and recognizes a predetermined correspondence instruction expression indicating the correspondence between the first area and the second area, which exists in the image data. Based on the correspondence, the first area and the second area may be associated with each other.

  As another form of the present invention, the information processing apparatus according to the present invention includes the position information acquired by the position information acquisition unit for designating an area to be subjected to character recognition, and the item name acquisition unit. And an item definition information creating unit that creates item definition information including the item name for the area to be recognized by the character specified by the position information.

  As another aspect of the present invention, an information processing method that implements each of the above configurations, a program, or a computer-readable storage medium that records such a program may be used. There may be. Further, as another aspect of the present invention, an information processing system in which a plurality of devices that realize each of the above configurations is configured to be communicable may be used.

  ADVANTAGE OF THE INVENTION According to this invention, the technique which can aim at the efficiency improvement of the definition information used for OCR software etc. can be provided.

FIG. 1 illustrates the processing of the information processing apparatus according to the embodiment. FIG. 2 illustrates the configuration of the information processing apparatus according to the embodiment. FIG. 3 is a flowchart illustrating an example of a processing procedure of the information processing apparatus according to the embodiment. FIG. 4 shows an example of image data processed by the information processing apparatus according to the embodiment. FIG. 5 shows an example of the scanning order of the first region and the second region. FIG. 6 shows an example of the association between the first area and the second area. FIG. 7 shows an example of the association between the first area and the second area. FIG. 8 shows an example of the association between the first area and the second area. FIG. 9 shows an example of the association between the first area and the second area. FIG. 10 shows an example of item definition information acquired from the image data shown in FIG.

  Hereinafter, embodiments of an information processing apparatus, an information processing method, a program, and the like according to one aspect of the present invention (hereinafter also referred to as “this embodiment”) will be described. However, the present embodiment is an exemplification, and the present invention is not limited to the configuration of the present embodiment.

  Although the data appearing in the present embodiment is described in a natural language (such as Japanese), more specifically, it is specified in a pseudo language, a command, a parameter, a machine language, or the like that can be recognized by a computer.

§1 Information processing apparatus An information processing apparatus according to the present embodiment will be described with reference to FIGS. 1 and 2.

<Overview>
FIG. 1 illustrates processing executed by the information processing apparatus according to the present embodiment. The information processing apparatus according to the present embodiment recognizes the first area 50 and the second area 60 that are areas designated by predetermined expressions in the image data.

  The first area 50 is designated by the first area designation expression. On the other hand, the second area 60 is designated by the second area designation expression. That is, the area designation expression is different between the first area 50 and the second area 60. The area designation expression is an expression for designating an area, and is, for example, a frame, painting, various types of hatching, and the like. In the example shown in FIG. 1, the first area designation expression is only a frame. That is, in the first area designation expression, the fill and various hatching are not performed within the frame. On the other hand, the second area designation expression is a fill in the example shown in FIG.

  The first area 50 is an area designated as a character recognition target in the image data. The second area 60 is an area in which item names for areas designated as character recognition targets exist.

For example, the user draws a frame, a fill, or various hatchings on a paper surface such as a form or a chart by a marker, a seal, or printing, and designates the first area 50 and the second area 60. Do. The information processing apparatus acquires image data in which the first area 50 and the second area 60 are specified by reading the paper in which the first area 50 and the second area 60 are specified in this way by a scanner or the like.

  The information processing apparatus according to the present embodiment recognizes the first area 50 and the second area 60 that are designated by different area designation expressions. Then, the information processing apparatus according to the present embodiment acquires position information for designating an area that is a character recognition target from the first area 50. In addition, the information processing apparatus according to the present embodiment acquires the item name for the area that is the target of character recognition from the second area 60.

  As described above, the information processing apparatus according to the present embodiment acquires the position information and the item name about the area that is the target of character recognition from the first area and the second area specified on the image data. As a result, the efficiency of the definition information creation by the user is improved.

  The user may specify the first area 50 and the second area 60 on the image data by editing the image data with drawing software or the like.

<Configuration example>
FIG. 2 shows a configuration example of the information processing apparatus 1 according to the present embodiment. As illustrated in FIG. 2, the information processing apparatus 1 includes a storage unit 11, a control unit 12, an input / output unit 14, and the like that are connected to the bus 13 as a hardware configuration.

  The storage unit 11 stores various data and programs used in processing executed by the control unit 12 (not shown). The storage unit 11 is realized by a hard disk, for example. The storage unit 11 may be realized by a recording medium such as a USB memory.

  The various data and programs stored in the storage unit 11 may be obtained from a recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). The storage unit 11 may be referred to as an auxiliary storage device.

The control unit 12 includes one or a plurality of processors such as a microprocessor or a CPU (Central Processing Unit), and peripheral circuits (ROM (Read Only Memory), RAM (Random Access Memory), interface circuits) used for processing of the processor. Etc.). The control unit 12 implements the processing of the information processing apparatus 1 in the present embodiment by executing various data and programs stored in the storage unit 11. ROM, RAM, and the like may be referred to as a main storage device in the sense that they are arranged in an address space handled by a processor in the control unit 12.

  The input / output unit 14 is one or a plurality of interfaces for transmitting / receiving data to / from a device existing outside the information processing device 1. The input / output unit 14 is, for example, an interface for connecting a LAN (Local Area Network) cable, an interface for connecting to a user interface such as an input device and an output device, or an interface such as USB (Universal Serial Bus). .

As shown in FIG. 2, the input / output unit 14 may be connected to the scanner 2, for example. The input / output unit 14 may be connected to a user interface (not shown) (input / output devices such as a touch panel, a numeric keypad, a keyboard, a mouse, and a display). Further, the input / output unit 14 may be connected to an input / output device such as a CD drive or a DVD drive or a removable recording medium, or a non-volatile portable recording medium such as a memory card. The input / output unit 14
You may have a function as an interface (communication part) which performs network connection.

  The information processing apparatus according to the present embodiment obtains position information and item names for an area that is a character recognition target, thereby improving the efficiency of definition information creation by the user. This process is realized as a process of the control unit 12.

  As illustrated in FIG. 2, the control unit 12 includes a region recognition unit 31, a position information acquisition unit 32, an item name acquisition unit 33, an association unit 34, and an item definition information creation unit 35 in order to realize the above processing. including. For example, the area recognition unit 31, the position information acquisition unit 32, the item name acquisition unit 33, the association unit 34, and the item definition information creation unit 35 may be configured such that a program stored in the storage unit 11 is a peripheral circuit of the control unit 12. This is realized by being expanded in a RAM or the like and executed by the processor of the control unit 12.

  The area recognition unit 31 includes a first area designated by the first area designation expression and a second area designation different from the first area designation expression for the area designated by the predetermined expression in the image data. Recognizing the second area designated by the expression. For example, the region recognition unit 31 distinguishes and recognizes the first region 50 and the second region 60 shown in FIG.

  The position information acquisition unit 32 acquires the position information of the first area recognized by the area recognition unit as position information for designating an area to be character-recognized in the image data. As illustrated in FIG. 1, the position information acquisition unit 32 acquires, for example, position information of the first area 50 in the image data as position information for designating an area that is a character recognition target.

  Note that the position information acquisition unit 32 may acquire the position information of the second region for the processing of the association unit 34 described later. For example, the position information acquisition unit 32 may acquire position information of the second region 60 in the image data illustrated in FIG.

  The item name acquisition unit 33 uses the character information specified by the position information acquired by the position information acquisition unit 32 as the character information obtained by recognizing the character existing in the second region recognized by the region recognition unit 31. Acquired as the item name for the area to be recognized. As shown in FIG. 1, for example, the item name acquisition unit 33 acquires character information obtained by recognizing characters existing in the second area as an item name for the first area 50.

  As will be described later, the first area and the second area are associated by the association unit 34. The item name acquisition unit 33 according to the present embodiment uses the character information obtained from the second area as the character specified by the position information acquired from the first area associated with the second area by the association unit 34. Acquired as the item name for the area to be recognized.

  The association unit associates the first area with the second area.

  For example, the associating unit 34 associates the first region with the second region closest to the first region in the image data.

  Further, for example, the associating unit 34 determines whether or not the positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and the first region determined to satisfy the predetermined condition The second area is associated. The predetermined condition conditions the positional relationship between the first region and the second region that are in a corresponding relationship. Details will be described later.

Further, for example, the association unit 34 recognizes a predetermined correspondence instruction expression indicating the association between the first area and the second area, which exists in the image data. Then, the associating unit 34 associates the first area with the second area based on the recognized correspondence relationship.

  The correspondence relationship instruction expression indicates the correspondence between the first area and the second area. For example, the correspondence relationship instruction expression includes an arrow provided between the first region and the second region, a line segment connecting the first region and the second region, the same symbol written in the first region and the second region, or It is a mark. The correspondence relationship instruction expression may be anything as long as it can indicate the correspondence relationship between the first region and the second region.

  The item definition information creation unit 35 is specified by the position information acquired by the position information acquisition unit 32 and the position information for specifying the area to be recognized by the character, and the position information acquired by the item name acquisition unit 33. Create item definition information that includes the item name for the area to be recognized. The created item definition information is information for designating the position and item name of an area that is a character recognition target. The item definition information is used by, for example, OCR software.

§2 Operation Example Next, an operation example of the information processing apparatus 1 according to the present embodiment will be described with reference to FIG. FIG. 3 shows an example of a processing procedure of the information processing apparatus 1 according to the present embodiment. In FIG. 3, the step is abbreviated as “S”.

<Start>
First, for example, a program stored in the storage unit 11 is expanded in the RAM or the like of the control unit 12 in accordance with a user operation. Then, the program developed in the RAM or the like of the control unit 12 is executed by the processor of the control unit 12. In this way, the information processing apparatus 1 starts processing.

<Step 101>
Next, the control part 12 acquires the image data used for the said process (step 101). The acquired image data may be, for example, data captured by the scanner 2 shown in FIG. Further, the acquired image data may be data stored in the storage unit 11. Such image data may be acquired via a network. The image data may be acquired from a non-volatile portable recording medium such as a memory card.

  FIG. 4 shows an example of image data acquired at this time. The image data is data obtained by digitizing paper media such as forms and medical records, for example. As shown in FIG. 4, the first area (50a, 50b) and the second area (60a, 60b) are designated on columns, characters, and the like described in the form and medical record. The first area (50a, 50b) and the second area (60a, 60b) are expressed so as to be distinguishable from columns, characters, and the like described in the form and medical record.

  For example, the first area (50a, 50b) and the second area (60a, 60b) are fields described in the form and medical record in order to clearly distinguish them from the fields and characters described in the form and medical record. Also, it may be expressed in a color different from the color of characters and the like. If expressed in this way, the first area (50a, 50b) and the second area (60a, 60b) out of those drawn in the image data by the OCR engine that detects and reads the different colors. It is possible to extract only the area designation expression related to. For example, assuming that the fields and characters described in the form and medical record are black, the OCR engine detects and reads a color other than the black color, so that the first region (50a, 50b) and the first Two regions (60a, 60b) are extracted.

  However, the first area (50a, 50b) and the second area (60a, 60b) do not necessarily have to be expressed in a color different from the color of the columns and characters described in the form and medical record. For example, if the first area (50a, 50b) and the second area (60a, 60b) are expressed by an area designation expression that can be distinguished from an area designation expression such as a column described in a form and a medical record, It may be expressed in the same color as the color of the fields and characters described in the form and medical record.

<Step 102>
Next, as shown in FIG. 3, the control unit 12 recognizes the first region in the image data acquired in step 101 (step 102).

  In the image data shown in FIG. 4, a frame is used as the first area designation expression. In other words, in the image data shown in FIG. 4, the first regions (50a, 50b) are represented by frames. The control unit 12 recognizes the first area (50a, 50b) represented by the frame.

  For example, the control unit 12 extracts region designation expressions related to the first region and the second region from those drawn in the image data. The extraction is feasible because the first area (50a, 50b) and the second area (60a, 60b) are expressed so as to be distinguishable from the columns and characters described in the form and medical record. . Subsequently, the control unit 12 specifies an area related to the first area designation expression from the extracted area designation expressions related to the first area and the second area. The identification is realized by pattern matching or the like, for example. Then, the control unit 12 recognizes the specified area as the first area. In this way, the control unit 12 recognizes the first areas (50a, 50b) represented by the frames in the image data shown in FIG.

<Step 103>
Next, the control unit 12 acquires position information in the image data of the first area recognized in step 102 (step 103).

  The position information may be any information as long as it is information indicating a position in the image data. In the present embodiment, the position information is expressed in an xy coordinate system in which the upper left corner of the image data is the origin, the horizontal axis is the x axis, and the vertical axis is the y axis. However, the representation of the position information is not limited to the xy coordinate system. For example, the representation of the position information may be a polar coordinate system having an origin at a certain point in the image data (for example, the center of the image data).

  Further, the position information of the first region according to the present embodiment includes the position (coordinates) of the upper left end of the first region, the horizontal length, and the vertical length. The position information is exemplified in FIG. 9 described later. The control unit 12 specifies the position coordinates of the upper left end of the first area recognized in step 102. Further, the control unit 12 specifies the horizontal length and the vertical length of the recognized first region. Thereby, the control part 12 acquires the positional information in the image data of the recognized 1st area | region.

<Step 104>
Next, the control unit 12 recognizes the second area in the image data acquired in Step 101 (Step 104).

In the image data shown in FIG. 4, the fill is used as the second area designation expression. In other words, in the image data shown in FIG. 4, the second area (60a, 60b) is expressed by painting. The control unit 12 recognizes the second area (60a, 60b) expressed by the filling. The second area is recognized by the same method as the first area recognition method in step 102.

<Step 105>
Next, the control unit 12 acquires position information in the image data of the second area recognized in Step 104 (Step 105). Note that step 105 may be omitted. In the present embodiment, since the position information of the second area is used in the association in step 107 described later, the position information of the second area is acquired. The position information of the second area is the same as the position information of the first area in step 103.

<Step 106>
Next, the control unit 12 recognizes characters existing in the second area recognized in step 104, thereby acquiring character information of the characters existing in the second area (step 106).

  Character recognition may be performed by any method. In step 106, the control unit 12 recognizes the characters described in the second area, thereby acquiring character information of the characters described in the second area.

  Note that the character information is acquired as an item name for the first area to be character-recognized. If there is only one each of the first area and the second area, only one combination of the first area and the second area can be considered, so it is necessary to specify the correspondence between the first area and the second area. Absent. That is, it is not necessary to specify which first area the character information acquired from the second area in this step 106 is the item name. When the character information is acquired in this step 106, the character information is specified as an item name for the first area according to steps 102 and 103.

  On the other hand, when there are a plurality of first areas and a plurality of second areas, it is necessary to specify which first area the character information acquired from the second area is the item name of. In the present embodiment, in step 107 to be described later, the first area and the second area are associated with each other, thereby specifying which first area the character information acquired from the second area is the item name of. The

  However, such association is not always necessary. For example, as illustrated in FIG. 5, it is assumed that the control unit 12 sequentially scans from the upper part of the image data and executes recognition of the first area in step 102 and recognition of the second area in step 104. And it is assumed that the control part 12 repeats the process of steps 102-106 whenever it finds one 1st area | region and one 2nd area | region. At this time, since the first area and the second area related to the process are always one by one, the above-described association process becomes unnecessary.

  For example, when the process is executed in this way, in the example shown in FIG. 5, the character information acquired from the second area 60a is specified as the item name for the first area 50a. Moreover, the character information acquired from the 2nd area | region 60b is specified as an item name about the 1st area | region 50b. The character information acquired from the second area 60c is specified as the item name for the first area 50c. In this process, steps 102 to 103 and steps 104 to 106 can be interchanged depending on the order in which the first area and the second area are found.

<Step 107>
Next, the control unit 12 associates the first region with the second region in order to specify the correspondence between the first region recognized at step 102 and the second region recognized at step 104. This step 107 may be omitted, for example, when there is only one first area and second area related to the association. As described above, this step 107 is a process for specifying which first area the character information acquired from the second area is the item name.

  An example of processing related to association by the control unit 12 will be described with reference to FIGS.

  For example, the control unit 12 associates the first area with the second area closest to the first area in the image data. FIG. 6 shows an example of the processing. In this embodiment, in Steps 103 and 105, position information of the first area and the second area is acquired. The position information includes the position coordinates of the upper left corner of each area. The control unit 12 calculates the distance between the first area and the second area using the position coordinates. That is, the control unit 12 calculates the distance between the position coordinates of the upper left end of the first area and the position coordinates of the upper left end of the second area. Then, the control unit 12 associates the first area with the shortest distance with the second area.

  In the example shown in FIG. 6, the control unit 12 associates the first area 50a with the second area 60a closest to the first area 50a in the image data. Further, the first area 50b is associated with the second area 60b closest to the first area 50b in the image data.

  Note that the first area and the second area in the processing may be interchanged. That is, the control unit 12 may associate the second area with the first area that is closest to the second area in the image data.

  Further, for example, the control unit 12 determines whether or not the positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and the first region and the first region determined to satisfy the predetermined condition Two areas may be associated with each other.

  The predetermined condition conditions the positional relationship between the first region and the second region that are in a correspondence relationship.

  For example, the predetermined condition relates to the distance between the first area and the second area that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first area and the second area that are within a threshold that can be set and changed by the user, among the first area and the second area in the image data. To do.

  For example, the predetermined condition relates to a relative positional relationship between the first region and the second region that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first area and the second area that are in a specific relative positional relationship among the first area and the second area in the image data. Here, in the present embodiment, the relative positional relationship is expressed as a difference vector between a vector indicating the upper left end of the first area and a vector indicating the upper left end of the second area with the upper left end of the image data as the origin. sell. Further, a specific relative positional relationship can be expressed as a condition vector that the difference vector should satisfy. For example, when the inner product of the difference vector and the condition vector is included in the range of values that can be set and changed by the user, the first area and the second area related to the difference vector are in a certain relative position. It is determined that there is a relationship.

Further, for example, the predetermined condition relates to a horizontal arrangement of the first area and the second area that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first region and the second region arranged in the horizontal direction among the first region arranged in the vertical direction and the second region arranged in the vertical direction in the image data. FIG. 7 illustrates a first region and a second region that satisfy the condition. In addition, x in the coordinate (x, y) in FIG. 7 shows the coordinate of a horizontal axis (x-axis). Moreover, y shows the coordinate of a vertical axis | shaft (y axis).

  Here, in the present embodiment, the first region arranged in the vertical direction means that the position coordinate (x coordinate) related to the horizontal axis (x axis) at the upper left end of the first region is within a threshold that can be set and changed by the user. It is the first region existing in the error range. For example, the x coordinate of the first region 50a shown in FIG. The x coordinate of the first region 50b is 68. The x coordinate of the first region 50c is 70. At this time, for example, if the threshold is 5, the first region 50a, the first region 50b, and the first region 50c are first regions arranged in the vertical direction.

  The same applies to the second region. In the present embodiment, the second region arranged in the vertical direction is an error range in which the position coordinate (x coordinate) related to the horizontal axis (x axis) at the upper left end of the second region is within a threshold that can be set and changed by the user. It is the second region that exists. For example, the x coordinate of the second region 60a shown in FIG. The x coordinate of the second region 60b is 21. The x coordinate of the second region 60c is 19. At this time, for example, if the threshold is 5, the second region 60a, the second region 60b, and the second region 60c are second regions arranged in the vertical direction.

  In this way, the control unit 12 acquires the first region arranged in the vertical direction and the second region arranged in the vertical direction. And the control part 12 determines with satisfy | filling the said predetermined conditions with respect to the 1st area | region and 2nd area | region which are located in a horizontal direction among the 1st area | regions and 2nd area | regions arranged in a vertical direction.

  Here, in the present embodiment, the first area and the second area are arranged in the horizontal direction. The position coordinate (y coordinate) on the vertical axis (y axis) at the upper left corner of the first area and the upper left corner of the second area. This refers to a state in which the difference in position coordinates regarding the vertical axis at the end is within a threshold that can be set and changed by the user.

  For example, the y coordinate of the first region 50a shown in FIG. The y coordinate of the first region 50b is 98. The y coordinate of the first region 50c is 140. On the other hand, the y coordinate of the second region 60a shown in FIG. The y coordinate of the second region 60b is 100. The y coordinate of the second region 60c is 141.

  At this time, for example, if the threshold value is 5, the control unit 12 determines that the first region 50a and the second region 60a are arranged in the horizontal direction and satisfy a predetermined condition. Further, the control unit 12 determines that the first region 50b and the second region 60b are arranged in the horizontal direction and satisfy a predetermined condition. Further, the control unit 12 determines that the first region 50c and the second region 60c are arranged in the horizontal direction and satisfy a predetermined condition. That is, the control unit 12 associates the first area 50a with the second area 60a. In addition, the control unit 12 associates the first area 50b with the second area 60b. Furthermore, the control unit 12 associates the first area 50c with the second area 60c.

  Further, for example, the predetermined condition relates to a vertical arrangement of the first area and the second area that are in a correspondence relationship. The control unit 12 determines that a predetermined condition is satisfied for the first region and the second region arranged in the vertical direction among the first region arranged in the horizontal direction and the second region arranged in the horizontal direction in the image data. FIG. 8 illustrates a first region and a second region that satisfy the condition. The coordinates (x, y) in FIG. 8 are the same as the coordinates in FIG.

  Here, whether or not the first region is aligned in the horizontal direction and whether or not the second region is aligned in the horizontal direction are determined based on whether the first region and the second region are aligned in the horizontal direction. This is the same as the determination. Whether the first area and the second area are arranged in the vertical direction is determined by determining whether the first area is arranged in the vertical direction, and whether the second area is arranged in the vertical direction. This is the same as the determination.

For example, if the threshold value is 5, the control unit 12 determines that the first region 50a and the second region 60a in FIG. Further, the control unit 12 determines that the first region 50b and the second region 60b are arranged in the vertical direction and satisfy a predetermined condition. Further, the control unit 12 determines that the first region 50c and the second region 60c are arranged in the vertical direction and satisfy a predetermined condition. That is, the control unit 12 associates the first area 50a with the second area 60a. In addition, the control unit 12 associates the first area 50b with the second area 60b. Furthermore, the control unit 12 associates the first area 50c with the second area 60c.

  For example, the control unit 12 recognizes a predetermined correspondence instruction expression indicating the correspondence between the first area and the second area, which exists in the image data. Then, the control unit 12 associates the first region with the second region based on the correspondence relationship indicated by the recognized correspondence relationship instruction expression.

  The correspondence relationship instruction expression indicates the correspondence between the first area and the second area. FIG. 9 illustrates the correspondence relationship instruction expression.

  For example, the correspondence relationship instruction expression is an arrow 70 shown by FIG. For example, the control unit 12 recognizes an arrow 70 existing in the image data. Then, the control unit 12 acquires vector information about the direction indicated by the arrow 70 from the recognized arrow 70. Further, the control unit 12 specifies the first region 50a and the second region 60a indicated by the arrow 70 using the acquired vector information. As a result, the control unit 12 associates the identified first region 50a and second region 60a with each other.

  Further, for example, the correspondence relationship instruction expression is a line segment 71 shown in FIG. For example, the control unit 12 recognizes a line segment 71 existing in the image data. Then, the control unit 12 specifies the first region 50b and the second region 60b connected by the line segment 71. As a result, the control unit 12 associates the identified first region 50b and second region 60b with each other.

  Further, for example, the correspondence relationship instruction expression is a symbol 72a and a symbol 72b shown in FIG. For example, the control unit 12 recognizes the symbols 72a and 72b that are the same symbols existing in the image data. And the control part 12 specifies the 1st area | region 50c and the 2nd area | region 60c to which the symbol 72a and the symbol 72b which are the same symbols are attached | subjected. As a result, the control unit 12 associates the identified first region 50c and second region 60c with each other.

  The control unit 12 associates the first area recognized in step 102 with the second area recognized in step 104 by the association method exemplified so far. The control unit 12 may associate the first region with the second region by combining a plurality of association methods exemplified so far.

<Step 108>
Next, the control unit 12 creates item definition information including the position information acquired in step 103 and the item name acquired in step 106. FIG. 10 exemplifies the item definition information generated in step 108 as a result of the processing from step 102 to step 107 being executed on the image data shown in FIG.

  As shown in FIG. 10, the first area 50a and the second area 60a are associated with each other. Further, the first area 50b and the second area 60b are associated with each other.

The x coordinate (Left), the y coordinate (Top), the length of the horizontal axis (Width), and the length of the vertical axis (Height) of the first region 50a are 120, 80, 320, and 30. The x-coordinate, y-coordinate, the length of the horizontal axis, and the length of the vertical axis of the first region 50b are 120,
120, 320, and 30. Further, the x coordinate, the y coordinate, the length of the horizontal axis, and the length of the vertical axis of the second region 60a are 20, 80, 90, and 30, respectively. The x coordinate, the y coordinate, the length of the horizontal axis, and the length of the vertical axis of the second region 60b are 20, 120, 90, and 30, respectively.

FIG. 10 exemplifies item definition information acquired from the first area 50a and the second area 60a, and the first area 50b and the second area 60b. The “item name” field in the item definition information illustrated in FIG. 10 stores character information acquired from the second area. The “Left” field stores the x coordinate of the upper left corner of the first area. The “Top” field stores the y coordinate of the upper left corner of the first area. The “Width” field stores the length of the horizontal axis of the first area. The “Height” field stores the length of the vertical axis of the first area.

  Here, the row data (record) of the item definition information indicates information related to the first area and the second area that are in a correspondence relationship. That is, the record of the item definition information includes the position information of the area that is the object of character recognition and the item name for the area.

  Note that the OCR software or the like may acquire the position information of the area to be character-recognized and the item name for the area from the record of the item definition information. That is, the item definition information may be used in OCR software or the like to specify information related to a region that is a character recognition target.

  In addition, the control unit 12 displays the position information and item names related to the character recognition target area obtained from the item definition information record, together with the image data obtained from the information, connected to the information processing apparatus 1. It may be displayed on the device.

<End>
Finally, the control unit 12 stores the item definition information generated in step 108 in the storage unit 11, for example. Then, the information processing apparatus 1 ends the process according to this operation example.

<Others>
Note that the processing related to the recognition of the first region and the second region in steps 102 and 104 by the control unit 12 corresponds to the processing of the region recognition unit 31.

  The process related to the position information acquisition in step 103 performed by the control unit 12 corresponds to the process of the position information acquisition unit 32.

  The process related to the item name acquisition in step 106 by the control unit 12 corresponds to the process of the item name acquisition unit 33.

  The process related to the association in step 107 by the control unit 12 relates to the process of the association unit 34.

  The processing related to the creation of the item definition information in step 108 by the control unit 12 relates to the processing of the item definition information creation unit 35.

§3 Actions and effects according to the embodiment As described above, the information processing apparatus 1 according to the present embodiment recognizes the first area and the second area in the image data (steps 102 and 104). Then, from the first area, position information for designating an area for character recognition is acquired (step 103). Further, the item name for the area that is the target of character recognition is acquired from the second area (step 106).

  Therefore, according to the information processing apparatus 1 according to the present embodiment, the user does not need to manually set an item name for an area that is a target of character recognition related to the acquired position information. Therefore, according to the information processing apparatus 1 according to the present embodiment, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

  Further, in the information processing apparatus 1 according to the present embodiment, the position information for designating the area that is the target of character recognition is associated with the item name for the area that is the target of character recognition (step) 107). This eliminates the need for the user to associate the acquired position information with the item name. Therefore, according to the information processing apparatus 1 according to the present embodiment, it is possible to improve the efficiency of creating definition information used for OCR software or the like.

§4 Supplement Although the embodiment of the present invention has been described in detail above, the above description is merely an example of the present invention in all respects and is not intended to limit the scope thereof. It goes without saying that various improvements and modifications can be made without departing from the scope of the present invention.

  A person skilled in the art can implement an equivalent range from the description of the present embodiment based on the description of the claims and the common general technical knowledge. Moreover, the term used in this specification is used by the meaning normally used in the said field unless there is particular mention. Thus, unless defined otherwise, all technical and technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the event of a conflict, terms used herein will be understood in the meaning set forth herein (including definitions).

DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Scanner 11 Storage part 12 Control part 13 Bus 14 Input / output part 31 Area recognition part 32 Position information acquisition part 33 Item name acquisition part 34 Correlation part 35 Item definition information creation part 50, 50a, 50b, 50c 1 area 60, 60a, 60b, 60c 2nd area 70 Correspondence relation instruction expression (arrow)
71 Corresponding relationship instruction expression (line segment)
72a, 72b Corresponding relationship instruction expression (symbol)

Claims (10)

  1. For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. An area recognition unit for recognizing two areas;
    In the image data, a position information acquisition unit that acquires position information of the first region recognized by the region recognition unit as position information for designating a region that is a target of character recognition;
    Character information obtained by recognizing characters existing in the second area recognized by the area recognition unit is the character recognition target specified by the position information acquired by the position information acquisition unit. An item name acquisition unit to acquire as an item name for the area
    An information processing apparatus comprising:
  2. An association unit that associates the first area with the second area;
    The item name acquisition unit is configured to specify the character information obtained from the second area by the position information acquired from the first area associated with the second area by the association unit. The information processing apparatus according to claim 1, wherein the information processing apparatus acquires an item name for an area to be recognized.
  3.   The information processing apparatus according to claim 2, wherein the association unit associates the first region with the second region that is closest to the first region in image data.
  4.   The associating unit determines whether or not a positional relationship between the position of the first region and the position of the second region satisfies a predetermined condition, and the first region that has been determined to satisfy the predetermined condition and the The information processing apparatus according to claim 2, wherein the second area is associated with the information processing apparatus.
  5.   The associating unit includes a plurality of first regions arranged in the vertical direction and a plurality of second regions arranged in the vertical direction in the image data, with respect to one first region and one second region arranged in the horizontal direction. The information processing apparatus according to claim 4, wherein the information processing apparatus determines that the predetermined condition is satisfied.
  6.   The associating unit is configured to detect one first region and one second region arranged in the vertical direction among a plurality of first regions arranged in the horizontal direction and a plurality of second regions arranged in the horizontal direction in the image data. The information processing apparatus according to claim 4, wherein the information processing apparatus determines that the predetermined condition is satisfied.
  7.   The association unit recognizes a predetermined correspondence instruction expression indicating the correspondence between the first region and the second region, which exists in the image data, and based on the recognized correspondence, the first region The information processing apparatus according to claim 2, wherein the second area is associated with the second area.
  8.   The position information for designating the area for character recognition acquired by the position information acquisition unit, and the character recognition target specified by the position information acquired by the item name acquisition unit; The information processing apparatus according to claim 1, further comprising: an item definition information creating unit that creates item definition information including the item name for a region to be formed.
  9. Computer
    For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. Recognizing two regions;
    Obtaining position information of the recognized first region as position information for designating a region for character recognition in the image data;
    Obtaining character information obtained by recognizing characters existing in the recognized second area as an item name for the area to be subjected to character recognition specified by the obtained position information; ,
    The information processing method characterized by performing.
  10. On the computer,
    For the area specified by the predetermined expression in the image data, the first area specified by the first area specifying expression and the second area specifying expression different from the first area specifying expression are designated. Recognizing two regions;
    Obtaining position information of the recognized first region as position information for designating a region for character recognition in the image data;
    Obtaining character information obtained by recognizing characters existing in the recognized second area as an item name for the area to be subjected to character recognition specified by the obtained position information; ,
    A program for running
JP2011059362A 2011-03-17 2011-03-17 Information processing apparatus, information processing method and program Withdrawn JP2012194879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011059362A JP2012194879A (en) 2011-03-17 2011-03-17 Information processing apparatus, information processing method and program

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2011059362A JP2012194879A (en) 2011-03-17 2011-03-17 Information processing apparatus, information processing method and program
CN2012100592429A CN102708365A (en) 2011-03-17 2012-03-02 Information processing apparatus to acquire character information
US13/410,930 US20120237131A1 (en) 2011-03-17 2012-03-02 Information processing apparatus to acquire character information

Publications (1)

Publication Number Publication Date
JP2012194879A true JP2012194879A (en) 2012-10-11

Family

ID=46828502

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2011059362A Withdrawn JP2012194879A (en) 2011-03-17 2011-03-17 Information processing apparatus, information processing method and program

Country Status (3)

Country Link
US (1) US20120237131A1 (en)
JP (1) JP2012194879A (en)
CN (1) CN102708365A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015138396A (en) * 2014-01-22 2015-07-30 富士ゼロックス株式会社 Image processor and image processing program

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017058732A (en) * 2015-09-14 2017-03-23 富士ゼロックス株式会社 Information processing device and program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007279828A (en) * 2006-04-03 2007-10-25 Toshiba Corp Business form processor, business form format preparation device, business form, program for processing business form and program for preparing business form format
GB0622863D0 (en) * 2006-11-16 2006-12-27 Ibm Automated generation of form definitions from hard-copy forms

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015138396A (en) * 2014-01-22 2015-07-30 富士ゼロックス株式会社 Image processor and image processing program

Also Published As

Publication number Publication date
US20120237131A1 (en) 2012-09-20
CN102708365A (en) 2012-10-03

Similar Documents

Publication Publication Date Title
JP3095709B2 (en) A method of generating a user interface form
JP2005004774A (en) Annotation process and system of digital ink for recognizing, anchoring, and reflowing annotation of digital ink
JP5465015B2 (en) Apparatus and method for digitizing documents
JP3996579B2 (en) Form processing system for identifying active areas of machine-readable forms
CN1842122B (en) Apparatus and method for processing annotation data
US6600834B1 (en) Handwriting information processing system with character segmentation user interface
US8107727B2 (en) Document processing apparatus, document processing method, and computer program product
US8780117B2 (en) Display control apparatus and display control method capable of rearranging changed objects
JPH0772861B2 (en) Program creating device
US6356655B1 (en) Apparatus and method of bitmap image processing, storage medium storing an image processing program
US20120124509A1 (en) Information processor, processing method and program
JPH10162150A (en) Page analysis system
JP2001005599A (en) Information processor and information processing method an d recording medium recording information processing program
US8532388B2 (en) Image processing apparatus, image processing method, and computer program
JP2002279433A (en) Method and device for retrieving character in video
JP2005173730A (en) Business form ocr program, method, and device
JP2967309B2 (en) Image processing apparatus
JP6007497B2 (en) Image projection apparatus, image projection control apparatus, and program
US6201894B1 (en) Method and apparatus for extracting ruled lines or region surrounding ruled lines
US20100171999A1 (en) Image processing apparatus, image processing method, and computer program thereof
JP3113827B2 (en) Recognition method and recognition apparatus of the rectangular object
US7926732B2 (en) OCR sheet-inputting device, OCR sheet, program for inputting an OCR sheet and program for drawing an OCR sheet form
JP2761467B2 (en) Image cut-out device and the character recognition device
US8619278B2 (en) Printed matter examination apparatus, printed matter examination method, and printed matter examination system
JP4461769B2 (en) Document retrieval / browsing technique and document retrieval / browsing device

Legal Events

Date Code Title Description
A300 Withdrawal of application because of no request for examination

Free format text: JAPANESE INTERMEDIATE CODE: A300

Effective date: 20140603