JP2005322082A

JP2005322082A - Document attribute input device and method

Info

Publication number: JP2005322082A
Application number: JP2004140380A
Authority: JP
Inventors: Shoichi Tateno; 昌一舘野
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-05-10
Filing date: 2004-05-10
Publication date: 2005-11-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document attribute input technology for easily extracting attributes from a document and inputting them. <P>SOLUTION: When there are proper attributes in highlighted words and phrases on an attribute input picture, the highlighted part is clicked by using a pointing device 16. This is determined by an event processing part 17, and the text of the highlighted part (the text of the link destination when it is linked) is extracted, and written in a buffer memory 18. Afterwards, when the corresponding field of an attribute input table is clicked, this is determined by an event processing part 17, and the text of the buffer memory 18 is written in the field. Afterwards, contents of the buffer memory 18 are cleared. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、文書を作業者に提示してその文書の属性を入力させる文書属性入力技術に関し、とくに、文書属性を容易に判断して入力できるようにしたものである。 The present invention relates to a document attribute input technique for presenting a document to an operator and inputting the attribute of the document. In particular, the present invention can easily determine and input a document attribute.

オフィス空間を有効に利用するために紙文書を遠隔地の比較的コストが低廉な倉庫に保管し、あるいは、破棄し、その内容を電子化して利用することが提案されている。紙文書の電子化は、オフィスコストを抑制するとともに情報の共有化の上でも好ましい。ところで、電子化した文書を管理するために、文書ごとに、重要事項を抽出して割り当てることが好ましい。しかしながら、このような重要事項（文書の属性）を抽出するために、作業者が文書を読み、重要事項を抽出し、さらに、それを入力するするには、多大な時間とコストが必要である。 In order to use the office space effectively, it has been proposed to store paper documents in a relatively low-cost warehouse at a relatively low cost, or to discard them and digitize the contents for use. The digitization of paper documents is preferable for reducing office costs and sharing information. By the way, in order to manage the digitized document, it is preferable to extract and assign important matters for each document. However, in order to extract such important matters (document attributes), it takes a lot of time and cost for the operator to read the document, extract the important matters, and input them. .

なお、この発明と関連する特許文献としては、著作権に関連する課金を管理するために著作物画像をその著作権属性とともに表示するものがある（特許文献１）。しかしながら、この文献では、簡易に文書の属性を入力することについては何ら記載がない。
特開平１０−７９０１６号公報 As a patent document related to the present invention, there is one that displays a copyrighted work image together with a copyright attribute in order to manage a charge related to the copyright (Patent Document 1). However, in this document, there is no description about inputting a document attribute simply.
JP-A-10-79016

この発明は、以上の事情を考慮してなされたものであり、文書から簡易に属性を抽出して入力する文書属性入力技術を提供することを目的としている。 The present invention has been made in consideration of the above circumstances, and an object of the present invention is to provide a document attribute input technique for easily extracting and inputting an attribute from a document.

この発明によれば、上述の目的を達成するために、特許請求の範囲に記載のとおりの構成を採用している。ここでは、発明を詳細に説明するのに先だって、特許請求の範囲の記載について補充的に説明を行なっておく。 According to this invention, in order to achieve the above-mentioned object, the configuration as described in the claims is adopted. Here, prior to describing the invention in detail, supplementary explanations of the claims will be given.

すなわち、この発明の一側面によれば、上述の目的を達成するために、文書属性入力装置に：文書表示領域に文書を文書属性値候補の語句を強調表示して表示する文書表示手段と；属性入力フォーム表示領域に属性入力フォームを表示する属性入力フォーム表示手段と；上記文書表示領域において強調表示されている語句に対するクリック操作と上記属性入力フォーム表示領域における入力フィールドに対するクリック操作とに応答して上記クリック操作の対象の語句を上記クリック操作の対象の入力フィールドに転記する属性値転記手段とを設けるようにしている。 That is, according to one aspect of the present invention, in order to achieve the above-mentioned object, on the document attribute input device: a document display means for displaying the document in the document display area with the document attribute value candidate word highlighted; Attribute input form display means for displaying an attribute input form in the attribute input form display area; responding to a click operation on a word highlighted in the document display area and a click operation on an input field in the attribute input form display area In addition, attribute value transfer means is provided for transferring the word of the click operation target to the input field of the click operation target.

この構成においては、強調表示（ハイライトともいう）されている文書属性値候補をクリックし、併せて属性入力フォームの入力フィールドをクリックするだけで簡易に文書の属性を入力することができる。 In this configuration, a document attribute value can be easily input by simply clicking on a highlighted document attribute value candidate (also referred to as highlight) and clicking an input field on the attribute input form.

文書属性入力装置は、単一の筐体に収容された機器として構成されても良いし、それぞれ個別の筐体に収容された独立した複数の機器を組み合わせて構成しても良い。例えば、文書属性入力装置を、サーバ装置とクライアント装置等の組み合わせで構成しても良い。 The document attribute input device may be configured as a device housed in a single housing, or may be configured by combining a plurality of independent devices housed in individual housings. For example, the document attribute input device may be configured by a combination of a server device and a client device.

文書属性は、文書作成日、文書所有者、文書管理者、文書保存期限、文書種類等、その文書が持つ特性を広く含む。文書の内容に関するものでなく、文書サイズ等の外形に関するものでも良い。 The document attributes widely include characteristics of the document, such as the document creation date, the document owner, the document manager, the document retention period, and the document type. It does not relate to the content of the document, but may relate to the external shape such as the document size.

属性入力フォームは、属性を入力するフォームであり、例えば、テーブル状にしたり、ＨＴＭＬのフォームタグによる入力フィールド状にしたり、種々の形態を採用できる。 The attribute input form is a form for inputting attributes. For example, various forms such as a table or an input field using an HTML form tag can be adopted.

語句の強調表示には、文字の色属性、該当部分の背景の色属性、下線、文字書体等、任意の周囲と異なる文字属性等を採用できる。 For highlighting a word, a character attribute different from any surroundings, such as a character color attribute, a background color attribute of a corresponding portion, an underline, a character font, or the like can be adopted.

クリック操作は典型的にはマウス装置等のポインティング装置によりクリック操作であるが、要するに、ポインティング位置を指定するイベント発生操作を広く含み、他のポインティング装置の同等の操作も含む。例えば、カーソルキーと決定キー（エンターキー）とによる操作も含む。 The click operation is typically a click operation by a pointing device such as a mouse device, but in short, includes a wide range of event generation operations for specifying a pointing position, and includes equivalent operations of other pointing devices. For example, an operation using a cursor key and an enter key (enter key) is also included.

また、この発明の他の側面によれば、上述の目的を達成するために、文書属性入力装置に：文書表示領域に文書を属性値候補の語句を強調表示して表示する文書表示手段と；属性入力フォーム表示領域に属性入力フォームを表示する属性入力フォーム表示手段と；上記文書表示領域において強調表示されている語句に対するクリック操作に応答して上記クリック操作の対象の語句を上記属性入力フォームの入力フィールドに転記する属性値転記手段とを設けるようにしている。 According to another aspect of the present invention, in order to achieve the above-mentioned object, the document attribute input device: a document display means for displaying the document with the attribute value candidate words highlighted in the document display area; Attribute input form display means for displaying the attribute input form in the attribute input form display area; in response to a click operation on the word highlighted in the document display area, the word of the click operation is displayed in the attribute input form Attribute value transfer means for transferring to the input field is provided.

この構成においては、強調表示されている文書属性値候補をクリックするだけで簡易に文書の属性を入力することができる。 In this configuration, it is possible to easily input a document attribute simply by clicking a highlighted document attribute value candidate.

属性入力フォームの入力フィールドが複数ある場合には、操作の順番、属性の内容等に基づいて転記先の入力フィールドを一意に決定する。 When there are a plurality of input fields of the attribute input form, the input field of the transfer destination is uniquely determined based on the operation order, the attribute contents, and the like.

この構成において、定型属性値表示領域に定型的な属性値を表示する定型属性値表示手段をさらに設け、上記定型属性値表示領域において上記定型的な属性値に対してクリック操作が行なわれたときに、上記典型的な属性値を上記入力フィールドに転記するようにしても良い。定型的な属性値を補充的に入力の補助に用いることができる。 In this configuration, there is further provided a fixed attribute value display means for displaying a fixed attribute value in the fixed attribute value display area, and when a click operation is performed on the fixed attribute value in the fixed attribute value display area In addition, the typical attribute value may be transferred to the input field. Regular attribute values can be supplementarily used to assist input.

また、上記文書の重要語句を抽出する重要語句抽出手段をさらに設け、上記重要語句抽出手段により抽出した文書を属性値候補の語句として表示するようにしてもよい。上記重要語句抽出手段は固有表現を重要語句として抽出するものでもよい。固有表現は、人名、地名、組織名、通貨、日付等の固有な表現を言う。上記文書表示手段は、上記固有表現の種類に応じた表示属性で語句の強調表示を行なってもよい。 Further, an important phrase extracting unit for extracting the important phrase of the document may be further provided, and the document extracted by the important phrase extracting unit may be displayed as an attribute value candidate phrase. The important phrase extracting unit may extract a specific expression as an important phrase. The specific expression is a specific expression such as a person name, a place name, an organization name, a currency, and a date. The document display means may highlight a word with a display attribute corresponding to the type of the unique expression.

また、上記文書表示手段により、属性値候補の語句を強調表示した表示される文書はＨＴＭＬ言語で記述され、上記ＨＴＭＬ言語で記述された文書はオリジナルの画像データから文字認識して得られたものであり、上記文書表示手段はＨＴＭＬ言語で記述された上記文書と上記オリジナルの画像データとを切り替えて表示するものであってよい。この場合、紙文書を電子化した際に簡易に属性を取得して入力でき、さらに、画像データも表示できるようにしているので確実な検証をオリジナルの文書の電子画像から行なうことができる。 In addition, the displayed document with the attribute value candidate words highlighted by the document display means is described in the HTML language, and the document described in the HTML language is obtained by character recognition from the original image data. The document display means may switch and display the document described in the HTML language and the original image data. In this case, when a paper document is digitized, attributes can be easily acquired and input, and image data can also be displayed, so that reliable verification can be performed from the electronic image of the original document.

また、転記する属性値が属性値として許容できるかどうかを判別する手段を設けてもよい。共用できない場合には転記を禁止しても良いし、作業時にプロンプトを行なっても良いし、作業後に属性入力フォームの該当フィールドを強調表示して注意を促しても良い。 Further, a means for determining whether or not an attribute value to be transferred is acceptable as an attribute value may be provided. If sharing is not possible, posting may be prohibited, a prompt may be given during work, and attention may be given by highlighting the corresponding field on the attribute input form after work.

なお、この発明は、紙文書を電子化した際の文書の属性付与に最適なものであるが、それに限定されず、元々、電子データとして準備されている文書に属性を付与する場合にも適用があることはもちろんである。 The present invention is most suitable for assigning a document attribute when a paper document is digitized. However, the present invention is not limited to this, and is also applicable to a case where an attribute is originally given to a document prepared as electronic data. Of course there is.

また、この発明は装置またはシステムとして実現できるのみでなく、方法としても実現可能である。また、そのような発明の一部をソフトウェアとして構成することができることはもちろんである。またそのようなソフトウェアをコンピュータに実行させるために用いるソフトウェア製品もこの発明の技術的な範囲に含まれることも当然である。 Further, the present invention can be realized not only as an apparatus or a system but also as a method. Of course, a part of the invention can be configured as software. Of course, software products used to cause a computer to execute such software are also included in the technical scope of the present invention.

この発明の上述の側面および他の側面は特許請求の範囲に記載され以下実施例を用いて詳述される。 These and other aspects of the invention are set forth in the appended claims and will be described in detail below with reference to examples.

この発明によれば、簡易な操作で文書の属性を入力することができる。 According to the present invention, it is possible to input document attributes with a simple operation.

以下、この発明の実施例について説明する。 Examples of the present invention will be described below.

まず、この発明の実施例１について説明する。この実施例１は、例えば、１つのコンピュータまたは複数のコンピュータから実現されるものであり、その構成が機能ブロックとして示される。各機能ブロックはコンピュータのハードウェアリソースまたはソフトウェアリソースにより実現される。もちろん、回路部を個別に組み合わせて同様な構成を実現しても良い。 First, Embodiment 1 of the present invention will be described. The first embodiment is realized by, for example, one computer or a plurality of computers, and the configuration is shown as a functional block. Each functional block is realized by computer hardware resources or software resources. Of course, a similar configuration may be realized by combining the circuit units individually.

図１は実施例１の文書属性入力装置を示しており、この図において、文書属性入力装置は、ＨＴＭＬ文書表示部１１、画像文書表示部１２、属性入力テーブル表示部１３、定型語句テーブル表示部１４、表示装置１５、ポインティング装置１６、イベント処理部１７、バッファメモリ１８、ＨＴＭＬ文書記憶部１９、画像文書記憶部２０、属性データ記憶部２１、定型語句記憶部２２等を含んで構成されている。 FIG. 1 shows a document attribute input apparatus according to the first embodiment. In this figure, the document attribute input apparatus includes an HTML document display unit 11, an image document display unit 12, an attribute input table display unit 13, and a fixed phrase table display unit. 14, a display device 15, a pointing device 16, an event processing unit 17, a buffer memory 18, an HTML document storage unit 19, an image document storage unit 20, an attribute data storage unit 21, a fixed phrase storage unit 22, and the like. .

表示装置１５は、ＣＲＴ、液晶表示装置等の通常の表示機器および表示制御部からなり、文字コードや画像を表示するものである。ポインティング装置１６は、例えばマウス装置であり、表示装置１５の表示画面上の座標を指示するものである。ＨＴＭＬ文書表示部１１は、ＨＴＭＬ文書記憶部１９に記憶されているＨＴＭＬ文書を表示するものである。このＨＴＭＬ文書は属性が入力されるべき文書である。このＨＴＭＬ文書は、この例では、紙文書をスキャンインして取得したオリジナルの画像文書（画像データ）を文字認識処理し、そののち固有表現を抽出し当該固有表現をタグにより強調表示したものである。 The display device 15 includes a normal display device such as a CRT or a liquid crystal display device and a display control unit, and displays character codes and images. The pointing device 16 is a mouse device, for example, and instructs the coordinates on the display screen of the display device 15. The HTML document display unit 11 displays an HTML document stored in the HTML document storage unit 19. This HTML document is a document into which attributes are to be input. In this example, the HTML document is obtained by performing character recognition processing on an original image document (image data) obtained by scanning a paper document, extracting a specific expression, and highlighting the specific expression with a tag. is there.

例えば、図２に示す表示例では、左側のフレームＦ１にＨＴＭＬ文書１５１を表示し、「あいうえ男」（人名）、「Ｈ１６／０５／１０」（日付）、「東京都」（地名）、「日本株式会社」（組織名）等の固有表現を強調表示して示している。 For example, in the display example shown in FIG. 2, the HTML document 151 is displayed in the left frame F1, and “Ai Ueo” (person name), “H16 / 05/10” (date), “Tokyo” (place name), Specific expressions such as “Japan Corporation” (organization name) are highlighted.

画像文書表示部１２は、画像文書記憶部２０に記憶されているオリジナルの画像文書を表示するものである。ＨＴＭＬ文書と画像文書とは切り替えて表示される。例えば図２の例では、上部にコマンドバーＣに種々のコマンドが設けられ、「画像文書」、「ＨＴＭＬ文書」をクリック操作等することにより文書の切替を行なえる。 The image document display unit 12 displays the original image document stored in the image document storage unit 20. The HTML document and the image document are switched and displayed. For example, in the example of FIG. 2, various commands are provided in the command bar C at the top, and the document can be switched by clicking the “image document” or “HTML document”.

属性入力テーブル表示部１３および定型語句テーブル表示部１４はそれぞれ図２の例でフレームＦ２、Ｆ３に属性入力テーブル１５２およびお定型語句テーブル１５３を表示するものである。属性データ記憶部２１は入力された属性データを記憶し、定型語句記憶部２２は定型語句を記憶している。 The attribute input table display unit 13 and the fixed phrase table display unit 14 display the attribute input table 152 and the fixed phrase table 153 in the frames F2 and F3, respectively, in the example of FIG. The attribute data storage unit 21 stores input attribute data, and the fixed phrase storage unit 22 stores fixed phrases.

なお、図１においては、ＨＴＭＬ文書表示部１１、画像文書表示部１２、属性入力テーブル表示部１３、定型語句テーブル表示部１４を別々に表しているが、ＨＴＭＬのフレーム手法等により１つのＨＴＭＬ文書としてウェブブラウザを用いて表示する場合には、１つの表示部として構成できる。画像文書の表示にはプラグインソフトウェアを用いても良い。 In FIG. 1, the HTML document display unit 11, the image document display unit 12, the attribute input table display unit 13, and the fixed phrase table display unit 14 are separately shown. However, one HTML document is displayed by an HTML frame method or the like. Can be configured as a single display unit. Plug-in software may be used to display the image document.

イベント処理部１７は、各種イベントを処理するものであり、この例と関連して、ＨＴＭＬ文書のハイライト表示部へのクリック操作を検出してそのハイライト部のテキストをバッファメモリ１８に書き込む。バッファメモリ１８は典型的にはクリップボードである。また、イベント処理部１７は、属性入力テーブルのフィールドがクリックされたときに、バッファメモリ１８に記憶されているテキストを読みだし、そのテキストを当該フィールドに入力する処理を行なう。 The event processing unit 17 processes various events. In association with this example, the event processing unit 17 detects a click operation on the highlight display unit of the HTML document and writes the text of the highlight unit in the buffer memory 18. The buffer memory 18 is typically a clipboard. In addition, when the field of the attribute input table is clicked, the event processing unit 17 reads the text stored in the buffer memory 18 and performs processing for inputting the text into the field.

また、イベント処理部１７は、定型語句の特定のフィールドがクリックされたときにそのテキストをバッファメモリ１８に書き込む。この後、属性入力テーブルのフィールドがクリックされたときに、バッファメモリ１８に記憶されているテキストを読みだし、そのテキストを当該フィールドに入力する処理を行なう。 Further, the event processing unit 17 writes the text in the buffer memory 18 when a specific field of the fixed phrase is clicked. Thereafter, when a field in the attribute input table is clicked, the text stored in the buffer memory 18 is read and the text is input to the field.

このようにして、ＨＴＭＬ文書のハイライト部または定型語句のフィールドをクリックした後、属性入力テーブルのフィールドをクリックすることにより簡易に属性値を入力できる。 In this way, an attribute value can be easily input by clicking a highlighted part or a fixed phrase field in an HTML document and then clicking a field in the attribute input table.

図３は、この実施例において、ＨＴＭＬ文書中でハイライトされている語句を属性入力テーブルの該当するフィールドに転記するときの動作例の概略を示しており、この図において、属性入力対象の文書を指定してＨＴＭＬ文書（あるいは画像文書）、属性入力テーブル、定型語句テーブルを表示する（Ｓ１１）。ＨＴＭＬ文書に対するクリックにより属性入力を行なう場合にはＨＴＭＬ文書を表示する。画像文書が表示されている場合には、コマンドバーの「ＨＴＭＬ」ボタンをクリック等してＨＴＭＬ文書が表示されるようにする。つぎにハイライトされている語句の中に属性として適切なものがあれば、そのハイライト部をポインティング装置１６を用いてクリック操作する。イベント処理部１７がこれを判別して当該ハイライト部のテキスト（リンクされている場合はそのリンク先のテキスト）を取りだしてバッファメモリ１８に書き込む（Ｓ１２）。こののち、属性入力テーブルの対応するフィールドがクリックされると、これをイベント処理部１７が判別してバッファメモリ１８のテキストを当該フィールドに書き込み、その後、バッファメモリ１８の内容をクリアする（Ｓ１３）。 FIG. 3 shows an outline of an operation example when the highlighted phrase in the HTML document is transferred to the corresponding field of the attribute input table in this embodiment. HTML document (or image document), attribute input table, and fixed phrase table are displayed (S11). When inputting an attribute by clicking on the HTML document, the HTML document is displayed. When the image document is displayed, the HTML document is displayed by clicking the “HTML” button on the command bar. Next, if there is an appropriate attribute among the highlighted words / phrases, the highlighted portion is clicked using the pointing device 16. The event processing unit 17 discriminates this and takes out the text of the highlighted part (if linked, the linked text) and writes it in the buffer memory 18 (S12). Thereafter, when a corresponding field in the attribute input table is clicked, the event processing unit 17 determines this, writes the text in the buffer memory 18 into the field, and then clears the contents of the buffer memory 18 (S13). .

図４は、この実施例において、定型語句テーブルの定型語句を属性入力テーブルの該当するフィールドに転記するときの動作例の概略を示しており、この図において、図３と対応する箇所には対応する符号を付した。図４において、属性入力対象の文書を指定してＨＴＭＬ文書（あるいは画像文書）、属性入力テーブル、定型語句テーブルを表示する（Ｓ１１）。定型語句テーブル中に属性として適切なものがあれば、その定型語句をポインティング装置１６を用いてクリック操作する。イベント処理部１７がこれを判別して当該定型語句のテキスト（リンクされている場合はそのリンク先のテキスト）を取りだしてバッファメモリ１８に書き込む（Ｓ１４）。こののち、属性入力テーブルの対応するフィールドがクリックされると、これをイベント処理部１７が判別してバッファメモリ１８のテキストを当該フィールドに書き込み、その後、バッファメモリ１８の内容をクリアする（Ｓ１３）。 FIG. 4 shows an outline of an operation example when the fixed phrase of the fixed phrase table is transferred to the corresponding field of the attribute input table in this embodiment. In this figure, the part corresponding to FIG. The reference numerals are attached. In FIG. 4, an HTML document (or image document), an attribute input table, and a fixed phrase table are displayed by designating an attribute input target document (S11). If there is an appropriate attribute in the fixed phrase table, the fixed phrase is clicked using the pointing device 16. The event processing unit 17 determines this and takes out the text of the fixed phrase (if linked, the linked text) and writes it in the buffer memory 18 (S14). Thereafter, when a corresponding field in the attribute input table is clicked, the event processing unit 17 determines this, writes the text in the buffer memory 18 into the field, and then clears the contents of the buffer memory 18 (S13). .

属性入力が終了したら「確定」ボタンを押して確定操作をする。 When the attribute input is complete, press the “Confirm” button to confirm.

この実施例によれば、ＨＴＭＬ文書中にハイライトして表示される属性候補をクリックし転記先をさらにクリックするだけで簡単に文書の属性を入力することができる。また、同様に定型語句をクリックして転記先をクリックするだけで同様に簡単に文書の属性を入力できる。 According to this embodiment, the attribute of the document can be easily input simply by clicking the attribute candidate displayed highlighted in the HTML document and further clicking the transfer destination. Similarly, by simply clicking a fixed phrase and clicking a transfer destination, the attributes of a document can be easily input.

なお、この実施例では、ハイライトされている属性候補等の転記元を先にクリックして属性入力テーブルの転記先を後でクリックするようにしたがクリックの順番を逆にしても良い。また、転記先のフィールドが１つの場合や、所定の規則により転記先のフィールドを一意に決定できる場合には、転記元をクリックするだけで転記が行なわれるようにしても良い。 In this embodiment, the transfer source such as the highlighted attribute candidate is clicked first and the transfer destination of the attribute input table is clicked later, but the click order may be reversed. Further, when there is only one transfer destination field or when the transfer destination field can be uniquely determined according to a predetermined rule, the transfer may be performed by simply clicking the transfer source.

また、図５に示すように、文字認識部２３、固有表現抽出部２４、ＨＴＭＬ文書生成部２５を設けてオリジナルの画像文書からＨＴＭＬ文書を生成してＨＴＭＬ文書記憶部１９に記憶するようにしても良い。文字認識部２３は画像を解析して文字コードを出力する、通常の文字認識手法を実行するものである。固有表現抽出部２４は、氏名、名称、地名、組織名、日付、通貨等、重要な表現を抽出するものであり、通常、字句解析部、抽出ルール実行部を有して成るが、これに限定されない。固有表現抽出部２４は固有表現の種類も特定でき、これにより、ハイライト属性を個別に選択できる。抽出された固有表現およびその種類に基づいてタグでマークアップすることにより所望のＨＴＭＬ文書を生成できる。もちろん、文字認識部２３、固有表現抽出部２４、ＨＴＭＬ文書生成部２５を別途設けて予めＨＴＭＬ文書を作成して作成されたＨＴＭＬ文書を利用するようにしても良い。 Further, as shown in FIG. 5, a character recognition unit 23, a specific expression extraction unit 24, and an HTML document generation unit 25 are provided so that an HTML document is generated from an original image document and stored in the HTML document storage unit 19. Also good. The character recognition unit 23 executes a normal character recognition method that analyzes an image and outputs a character code. The specific expression extraction unit 24 extracts important expressions such as name, name, place name, organization name, date, currency, and the like, and normally includes a lexical analysis unit and an extraction rule execution unit. It is not limited. The specific expression extracting unit 24 can also specify the type of specific expression, and thereby select the highlight attribute individually. A desired HTML document can be generated by marking up with a tag based on the extracted unique expression and its type. Of course, a character recognition unit 23, a specific expression extraction unit 24, and an HTML document generation unit 25 may be separately provided to use an HTML document created by creating an HTML document in advance.

さらに、属性値としてあり得ない値を記述するルールテーブル２６を設けてイベント処理部１７がこれを参照し、転記を禁止するようにしても良い。例えば、創立時期が比較的新しく、例えば１９７０年代の会社の文書を電子化する際に、「５６／０１／０１」で年表記が「５６」であった場合には、西暦の「１９５６」ではなく、「昭和５６年」がただしいものであるのであるので、「５６」としての入力や、さらにコンプリート機能を利用した「１９５６」の入力を禁止する。また、このような、あり得ない属性値の判別ルールを設けることにより、文字認識誤りにも対処することができる。 Furthermore, a rule table 26 describing values that cannot be used as attribute values may be provided, and the event processing unit 17 may refer to them and prohibit transcription. For example, when the founding time is relatively new, for example, when a company document in the 1970s is digitized, and “56/01/01” and the year is “56”, the year “1956” However, since “Showa 56” is a failure, the input as “56” and the input of “1956” using the complete function are prohibited. In addition, by providing such an attribute value determination rule that is impossible, a character recognition error can be dealt with.

つぎにこの発明の実施例２について説明する。この実施例２はサーバ装置とクライアント装置とを用いて文書属性入力システムを構築するものである。なお、同一のコンピュータにサーバアプリケーションとクライアントアプリケーションとを実行させてもよいことはもちろんである。 Next, a second embodiment of the present invention will be described. In the second embodiment, a document attribute input system is constructed using a server device and a client device. Of course, the server application and the client application may be executed on the same computer.

図２は実施例２の文書属性入力システムを全体として示しており、この図において文書属性入力システムはサーバ装置１００およびクライアント装置２００から構成されている。サーバ装置１００は、典型的にはウェブサーバであり、適宜、アプリケーションサーバやデータベースシステムを具備する。サーバ装置１００はクライアント装置２００からＨＴＴＰ要求（ＰＯＳＴ、ＧＥＴ要求等）を受け取って所定の文書をクライアント装置２００に転送する。あるいは、ＨＴＴＰに含まれる引数を所定のプログラムに引き渡しその返り値をクライアント装置２００に転送する。この例では、プログラム実行部１１０が、ＨＴＭＬ文書出力プログラム１１１（図１のＨＴＭＬ文書表示部１１に対応）、画像文書出力プログラム１１２（図１の画像文書表示部１２）、属性入力テーブル出力プログラム（図１の属性入力テーブル表示部１３に対応）、定型語句テーブル出力プログラム１１４（定型語句テーブル表示部１４に対応）を実行する。 FIG. 2 shows the entire document attribute input system according to the second embodiment. In this figure, the document attribute input system includes a server device 100 and a client device 200. The server device 100 is typically a web server, and appropriately includes an application server and a database system. The server apparatus 100 receives an HTTP request (POST, GET request, etc.) from the client apparatus 200 and transfers a predetermined document to the client apparatus 200. Alternatively, an argument included in HTTP is passed to a predetermined program, and the return value is transferred to the client device 200. In this example, the program execution unit 110 includes an HTML document output program 111 (corresponding to the HTML document display unit 11 in FIG. 1), an image document output program 112 (image document display unit 12 in FIG. 1), an attribute input table output program ( 1), the fixed phrase table output program 114 (corresponding to the fixed phrase table display unit 14) is executed.

記憶装置１２０にはＨＴＭＬ文書や画像文書等が記憶されている。 The storage device 120 stores HTML documents, image documents, and the like.

クライアント装置２００は閲覧ソフトウェア２０１例えばウェブブラウザやプラグインソフトウェアを実行する。この例では、閲覧ソフトウェア２０１が属性入力用のＨＴＭＬ文書（図２の属性入力テーブルや定型語句テーブルを含む全表示）を受け取って表示を行なう。ＨＴＭＬ文書には所定のプログラム言語例えばＪａｖａＳｃｒｉｐｔ（商標）で記述されたスクリプトが含まれている。このスクリプトを閲覧ソフトウェア２０１上で実行する。このスクリプトによりなるイベント処理プログラム２０３がスクリプト実行部２０２により実行される。このイベント処理プログラム２０３は図１のイベント処理部１７に対応する。 The client device 200 executes browsing software 201 such as a web browser or plug-in software. In this example, the browsing software 201 receives and displays an HTML document for attribute input (all displays including the attribute input table and fixed phrase table in FIG. 2). The HTML document includes a script written in a predetermined programming language such as JavaScript (trademark). This script is executed on the browsing software 201. An event processing program 203 composed of this script is executed by the script execution unit 202. This event processing program 203 corresponds to the event processing unit 17 in FIG.

図７はこの実施例の動作の概要を説明するものである。図７において、クライアント装置２００の閲覧ソフトウェア２０１が例えば処理対象文書のリストを表示してこのリストの中から処理対象を選択指定する（Ｓ２１）。サーバ装置１００は、処理対象のＨＴＭＬ文書をとりだして図２のフレームＦ１をターゲットして送信する（Ｓ２２）。なお、属性入力テーブルおよび定型語句テーブルを適宜転送するようにしても良い。クライアント装置２００においてＨＴＭＬ文書のハイライト部をクリックして重要語句のテキストをクリップボードに書き込む（Ｓ２３）。このとき、ハイライトは例えば図８の（Ａ）に示すように示すようにタグ付けされ、このときに用いられるスクリプトは図９に示すようなものである。 FIG. 7 explains the outline of the operation of this embodiment. In FIG. 7, the browsing software 201 of the client device 200 displays a list of documents to be processed, for example, and selects and designates a processing target from the list (S21). The server apparatus 100 takes out the HTML document to be processed and sends it by targeting the frame F1 in FIG. 2 (S22). Note that the attribute input table and the fixed phrase table may be appropriately transferred. In the client apparatus 200, the highlighted part of the HTML document is clicked and the text of the important phrase is written on the clipboard (S23). At this time, the highlight is tagged as shown in FIG. 8A, for example, and the script used at this time is as shown in FIG.

つぎに属性入力テーブル中のフィールドをクリックすると、フィールド値をクリップボードのテキストで更新するように、当該テキストを引数としてサーバ装置１００の属性テーブル出力プログラム１１３に渡す（Ｓ２４）。このときのクリック操作により実行するスクリプトは図１０に示すようなものである。この例でもクリップボードの内容がクリアされる。サーバ装置１００の属性テーブル出力プログラム１１３は当該フィールドに、当該テキストを書き込んだ属性入力テーブルを生成してクライアント装置２００に転送する。 Next, when a field in the attribute input table is clicked, the field value is passed to the attribute table output program 113 of the server apparatus 100 as an argument so as to update the field value with the clipboard text (S24). The script executed by the click operation at this time is as shown in FIG. This example also clears the clipboard contents. The attribute table output program 113 of the server device 100 generates an attribute input table in which the text is written in the field, and transfers the attribute input table to the client device 200.

この実施例でも、簡易に文書の属性を入力できる。 Also in this embodiment, it is possible to easily input document attributes.

なお、この発明は上述の実施例に限定されるものではなくその趣旨を逸脱しない範囲で種々変更が可能である。例えば、上述例では、画像文書とＨＴＭＬ文書を対として択一的に表示するようにしたが、両者を同時に表示しても良い。また、画像文書を扱わないようにしても良い。また、ＨＴＭＬ言語でなく、ＸＭＬやＸＨＴＭＬ等の他のマークアップ言語を用いても良い。 The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the invention. For example, in the above example, the image document and the HTML document are alternatively displayed as a pair, but both may be displayed simultaneously. Further, the image document may not be handled. Also, other markup languages such as XML and XHTML may be used instead of the HTML language.

この発明の実施例１の構成を説明するブロック図である。It is a block diagram explaining the structure of Example 1 of this invention. 実施例１の属性表示画面の例を説明する図である。It is a figure explaining the example of the attribute display screen of Example 1. FIG. 実施例１の動作例を説明するフローチャートである。3 is a flowchart for explaining an operation example of the first embodiment. 実施例１の他の動作例を説明するフローチャートである。6 is a flowchart for explaining another operation example of the first embodiment. 実施例１の変形例の構成を説明するブロック図である。It is a block diagram explaining the structure of the modification of Example 1. FIG. この発明の実施例２の構成を説明する図である。It is a figure explaining the structure of Example 2 of this invention. 実施例２の動作を説明する図である。FIG. 10 is a diagram for explaining the operation of the second embodiment. 実施例２のＨＴＭＬ文書の例を説明する図である。10 is a diagram illustrating an example of an HTML document according to Embodiment 2. FIG. 実施例２における転記元からのコピーを説明する図である。FIG. 10 is a diagram for explaining copying from a transcription source in Example 2. 実施例２における転記先へのコピーを説明する図である。FIG. 10 is a diagram illustrating copying to a transfer destination in the second embodiment.

Explanation of symbols

１１ＨＴＭＬ文書表示部
１２画像文書表示部
１３属性入力テーブル表示部
１３属性入力テーブル表示部
１４定型語句テーブル表示部
１５表示装置
１６ポインティング装置
１７イベント処理部
１８バッファメモリ
１９ＨＴＭＬ文書記憶部
２０画像文書記憶部
２１属性データ記憶部
２２定型語句記憶部
２３文字認識部
２４固有表現抽出部
２５ＨＴＭＬ文書生成部
１００クライアント装置
１００サーバ装置
１１０プログラム実行部
１１１ＨＴＭＬ文書出力プログラム
１１２画像文書出力プログラム
１１３属性テーブル出力プログラム
１１４定型語句テーブル出力プログラム
１２０記憶装置
１５１ＨＴＭＬ文書
１５２属性入力テーブル
１５３定型語句テーブル
２００クライアント装置
２０１閲覧ソフトウェア
２０２スクリプト実行部
２０３イベント処理プログラム 11 HTML document display unit 12 Image document display unit 13 Attribute input table display unit 13 Attribute input table display unit 14 Fixed phrase table display unit 15 Display device 16 Pointing device 17 Event processing unit 18 Buffer memory 19 HTML document storage unit 20 Image document storage Unit 21 attribute data storage unit 22 fixed phrase storage unit 23 character recognition unit 24 proper expression extraction unit 25 HTML document generation unit 100 client device 100 server device 110 program execution unit 111 HTML document output program 112 image document output program 113 attribute table output program 114 fixed phrase table output program 120 storage device 151 HTML document 152 attribute input table 153 fixed phrase table 200 client device 201 viewing software 202 script Part 203 event processing program

Claims

Document display means for highlighting and displaying the word of the attribute value candidate in the document display area;
Attribute input form display means for displaying the attribute input form in the attribute input form display area;
In response to a click operation on a word highlighted in the document display area and a click operation on an input field in the attribute input form display area, the word to be clicked is transferred to the input field to be clicked. And a document attribute input device.

Document display means for highlighting and displaying the word of the attribute value candidate in the document display area;
Attribute input form display means for displaying the attribute input form in the attribute input form display area;
A document attribute characterized by comprising attribute value transcription means for transcribing the word to be clicked to the input field of the attribute input form in response to a click operation on the word highlighted in the document display area. Input device.

There is further provided a fixed attribute value display means for displaying a fixed attribute value in the fixed attribute value display area, and when the click operation is performed on the fixed attribute value in the fixed attribute value display area, 3. The document attribute input apparatus according to claim 1, wherein a typical attribute value is transferred to the input field.

4. The document attribute input apparatus according to claim 1, further comprising an important phrase extracting unit for extracting an important phrase of the document, and displaying the document extracted by the important phrase extracting unit as an attribute value candidate phrase.

5. The document attribute input device according to claim 4, wherein the important word / phrase extracting means extracts a specific expression as an important word / phrase.

6. The document attribute input device according to claim 5, wherein the document display means highlights a phrase with a display attribute corresponding to the type of the unique expression.

A document displayed by highlighting the attribute value candidate words by the document display means is described in the HTML language, and the document described in the HTML language is obtained by character recognition from the original image data. 7. The document attribute input device according to claim 1, wherein the document display means switches between the document described in the HTML language and the original image data for display.

8. The document attribute input device according to claim 1, further comprising means for determining whether or not an attribute value to be transferred is acceptable as an attribute value.

Displaying the document in the document display area by highlighting the word of the attribute value candidate by the document display means;
Displaying the attribute input form in the attribute input form display area by the attribute input form display means;
In response to a click operation on a word highlighted in the document display area and a click operation on an input field in the attribute input form display area by the attribute value transcription means, the click operation is performed on the word to be clicked. A document attribute input method comprising the steps of:

Highlighting and displaying the attribute value candidate words in the document display area by the document display means;
Displaying the attribute input form in the attribute input form display area by the attribute input form display means;
In response to a click operation on a word highlighted in the document display area by the attribute value transfer means, the step of transferring the word to be clicked on to the input field of the attribute input form. Document attribute input method.

Displaying the document in the document display area by highlighting the word of the attribute value candidate by the document display means;
Displaying the attribute input form in the attribute input form display area by the attribute input form display means;
In response to a click operation on a word highlighted in the document display area and a click operation on an input field in the attribute input form display area by the attribute value transcription means, the click operation is performed on the word to be clicked. A computer program for inputting document attributes, characterized in that it is used for causing a computer to execute a step of transcribing to an input field to be processed.

Highlighting and displaying the attribute value candidate words in the document display area by the document display means;
Displaying the attribute input form in the attribute input form display area by the attribute input form display means;
A step of transcribing the word to be clicked into the input field of the attribute input form in response to a click operation on the word highlighted in the document display area by the attribute value transfer means. A computer program for inputting document attributes, characterized in that the computer program is used to execute the program.