JP7157245B2

JP7157245B2 - File management device, file management method, and program

Info

Publication number: JP7157245B2
Application number: JP2021521729A
Authority: JP
Inventors: 玲子源野; 島▲崎▼克仁
Original assignee: PFU Ltd
Current assignee: PFU Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2022-10-19
Anticipated expiration: 2039-05-31
Also published as: JPWO2020240820A1; US20220222209A1; WO2020240820A1; US11971852B2

Description

本発明は、ファイル管理装置、ファイル管理方法、及びプログラムに関する。 The present invention relates to a file management device, file management method, and program.

例えば、特許文献１には、第１の規則と該第１の規則に適用する第２の規則との組合を、文書を分類する分類規則として登録する分類規則登録手段と、前記分類規則登録手段により登録された異なる複数の分類規則について、該異なる複数の分類規則間で重複する内容を排除して統合する分類規則統合手段とを有する文書処理装置が開示されている。 For example, Patent Document 1 discloses classification rule registration means for registering a combination of a first rule and a second rule applied to the first rule as a classification rule for classifying documents, and the classification rule registration means A document processing apparatus having a classification rule integration means for integrating a plurality of different classification rules registered by the different classification rules by eliminating duplicate contents among the plurality of different classification rules.

また、特許文献２には、新規のデータ項目及び当該新規のデータ項目のカテゴリを記憶装置に格納するステップと、データ項目と当該データ項目のカテゴリとを格納する正解データ格納部に格納されたデータから、前記記憶装置に格納された前記新規のデータ項目の特徴素を含む条件と対応するカテゴリとを含む特徴パターンを抽出し、特徴パターン格納部に格納するステップと、前記特徴パターン格納部に格納された特徴パターンを、前記記憶装置に格納された前記新規のデータ項目のカテゴリに合致する第１の集合と合致しない第２の集合とにグループ化し、グループ化した結果をグループデータ格納部に格納するグループ化ステップと、を含み、コンピュータに実行される分類ルール作成支援方法が開示されている。 Further, Patent Document 2 describes a step of storing a new data item and a category of the new data item in a storage device, and data stored in a correct data storage unit storing the data item and the category of the data item. a step of extracting a feature pattern including a condition including a feature element of the new data item stored in the storage device and a corresponding category from the storage device and storing the extracted feature pattern in a feature pattern storage unit; The feature patterns obtained are grouped into a first set that matches the category of the new data item stored in the storage device and a second set that does not match the category of the new data item, and the result of grouping is stored in a group data storage unit. A computer-implemented classification rule creation support method is disclosed, comprising:

また、特許文献３には、記憶手段と、文書を表す文書画像データが入力される入力手段と、前記入力手段へ入力された文書画像データにレイアウト解析を施しその文書画像データの表す文書のレイアウトを特定する特定手段と、前記入力手段へ入力された文書画像データに文字解析を施しその文書画像データの表す文書の各記載項目の属性を判別する判別手段と、前記特定手段により特定されたレイアウトおよび前記判別手段により判別された各記載項目の属性に基づいて記載項目間の階層構造を特定しその階層構造を表すルールデータを生成する生成手段と、前記生成手段により生成されたルールデータを前記記憶手段へ書き込む書き込み手段とを有することを特徴とする文書処理装置が開示されている。 Further, in Patent Document 3, a storage means, an input means for inputting document image data representing a document, and a layout analysis of the document image data input to the input means to perform a layout analysis of the document represented by the document image data. determining means for performing character analysis on the document image data input to the input means and determining the attribute of each description item of the document represented by the document image data; and the layout specified by the specifying means. and generation means for specifying a hierarchical structure between entry items based on the attribute of each entry item determined by the determination means and generating rule data representing the hierarchical structure; A document processing device is disclosed comprising writing means for writing to a storage means.

特開２０１３－２５１６１０JP 2013-251610 特開２００７－０５２７４４JP 2007-052744 特開２００７－０５２６１５JP 2007-052615

データファイルを適切に分類することができるファイル管理装置を提供することを目的とする。 An object of the present invention is to provide a file management device capable of appropriately classifying data files.

本発明に係るファイル管理装置は、同一のタグが付与された複数のデータファイルから、これらのデータファイルに共通する特徴を抽出する共通特徴抽出部と、前記共通特徴抽出部により抽出された特徴と、これらのデータファイルに付与されていたタグとを互いに関連付けて付与ルールとして格納するルール格納部と、前記ルール格納部に格納されている付与ルールに基づいて、新たに入力されたデータファイルにタグを付与するタグ付与部とを有する。 A file management apparatus according to the present invention includes a common feature extraction unit for extracting features common to a plurality of data files to which the same tag is attached, and a feature extracted by the common feature extraction unit. , a rule storage unit that associates tags assigned to these data files with each other and stores them as assignment rules; and a tag assigning unit that assigns

好適には、前記タグ付与部は、新たに入力されたデータファイルから、前記ルール格納部に付与ルールとして登録されている特徴を探索し、いずれかの特徴が発見された場合に、この特徴に関連付けられたタグを、新たに入力されたデータファイルに付与する。 Preferably, the tagging unit searches the newly input data file for features registered as the tagging rules in the rule storage unit, and if any of the features is found, Applies associated tags to newly input data files.

好適には、前記タグ付与部は、付与ルールとして登録されている特徴の一部が、新たに入力されたデータファイルから発見された場合に、この特徴に関連付けられたタグをユーザに提案し、ユーザの操作に応じて、タグを付与する。 Preferably, the tagging unit, when part of the feature registered as the tagging rule is found from a newly input data file, proposes a tag associated with the feature to the user, A tag is added according to the user's operation.

好適には、提案したタグがユーザに採用された場合に、新たに入力されたデータファイルの特徴が、付与ルールの特徴と一致するように、付与ルールを更新するルール更新部をさらに有する。 Preferably, when the proposed tag is adopted by the user, it further has a rule updating unit that updates the grant rule so that the characteristics of the newly input data file match the characteristics of the grant rule.

好適には、提案したタグがユーザに採用されなかった場合に、新たに入力されたデータファイルの特徴が、付与ルールの特徴と一致しないように、付与ルールを更新するルール更新部をさらに有する。 Preferably, it further comprises a rule updating unit that updates the granting rule so that the features of the newly input data file do not match the features of the granting rule when the proposed tag is not adopted by the user.

好適には、前記共通特徴抽出部は、前記特徴として、文字列、日付、画像サイズ、及び、画像に使用される色数の少なくとも一つを抽出する。 Preferably, the common feature extraction unit extracts at least one of a character string, date, image size, and number of colors used in the image as the feature.

好適には、前記ルール格納部に格納される付与ルールは、複数の判定要素を含んでおり、前記ルール更新部は、複数のデータファイルで共通する特徴の中から、出現頻度、直近性及び出現位置の少なくとも一つと、特有性とに基づいて、付与ルールの判定要素として登録される特徴を選択する。 Preferably, the given rule stored in the rule storage section includes a plurality of determination elements, and the rule update section selects appearance frequency, recency, and appearance from features common to a plurality of data files. Based on at least one of the locations and the peculiarity, a feature is selected to be registered as a determining factor of the application rule.

本発明に係るファイル管理方法は、同一のタグが付与された複数のデータファイルから、これらのデータファイルに共通する特徴を抽出する共通特徴抽出ステップと、前記共通特徴抽出ステップにより抽出された特徴と、これらのデータファイルに付与されていたタグとを互いに関連付けて付与ルールとして格納するルール格納ステップと、前記ルール格納ステップに格納されている付与ルールに基づいて、新たに入力されたデータファイルにタグを付与するタグ付与ステップとを有する。 A file management method according to the present invention comprises a common feature extraction step of extracting features common to a plurality of data files to which the same tag is assigned, and features extracted by the common feature extraction step. , a rule storage step of correlating the tags assigned to these data files with each other and storing the tags as assignment rules; and a tagging step of giving

本発明に係るプログラムは、同一のタグが付与された複数のデータファイルから、これらのデータファイルに共通する特徴を抽出する共通特徴抽出ステップと、前記共通特徴抽出ステップにより抽出された特徴と、これらのデータファイルに付与されていたタグとを互いに関連付けて付与ルールとして格納するルール格納ステップと、前記ルール格納ステップに格納されている付与ルールに基づいて、新たに入力されたデータファイルにタグを付与するタグ付与ステップとをコンピュータに実行させる。 A program according to the present invention comprises a common feature extraction step of extracting features common to a plurality of data files to which the same tag is assigned, features extracted by the common feature extraction step, and A rule storing step for storing the tag assigned to the data file of the above as an assigning rule in association with each other, and assigning a tag to the newly input data file based on the assigning rule stored in the rule storing step and a tagging step.

データファイルを適切に分類することができる。 Data files can be properly categorized.

ファイル管理システム１の全体構成を例示する図である。1 is a diagram illustrating the overall configuration of a file management system 1; FIG. ファイル管理装置５のハードウェア構成を例示する図である。3 is a diagram illustrating a hardware configuration of a file management device 5; FIG. ファイル管理装置５の機能構成を例示する図である。3 is a diagram illustrating a functional configuration of a file management device 5; FIG. （ａ）は、タグ付けルールを説明する表であり、（ｂ）は、タグ「見積書」のタグ付けルールを説明する表であり、（ｃ）は、共通特徴抽出部５０２により抽出された特徴を例示する図である。(a) is a table for explaining the tagging rules, (b) is a table for explaining the tagging rules for the tag "estimate", and (c) is a table for explaining the tagging rules extracted by the common feature extraction unit 502. FIG. 4 is a diagram illustrating features; （ａ）は、キーワードとキーワードの位置情報を例示する表であり、（ｂ）は、日付と日付の位置情報を例示する図であり、（ｃ）は、画像サイズと使用色数を例示する表である。(a) is a table showing examples of keywords and position information of the keywords, (b) is a diagram showing examples of dates and position information of the dates, and (c) is an example of image sizes and the number of colors used. It is a table. ファイル管理装置５によるタグ付けルールの登録と更新処理（Ｓ１０）を説明するフローチャートである。4 is a flowchart for explaining tagging rule registration and update processing (S10) by the file management device 5. FIG. ユーザによるタグ付け操作の画面例である。It is an example of a screen of a tagging operation by a user. （ａ）は、タグ付けルールのカスタマイズ画面の呼び出し例１であり、（ｂ）は、ユーザによるタグ付けルールのカスタマイズ画面例である。(a) is an invocation example 1 of a tagging rule customization screen, and (b) is an example of a tagging rule customization screen by a user. （ａ）は、タグ付けルールのカスタマイズ画面の呼び出し例２であり、（ｂ）は、タグ付けルールのカスタマイズ画面の呼び出し例３である。(a) is a calling example 2 of the tagging rule customization screen, and (b) is a calling example 3 of the tagging rule customization screen. ファイル管理装置５によるタグ付け及びタグ提案処理（Ｓ２０）を説明するフローチャートである。4 is a flowchart for explaining tagging and tag proposal processing (S20) by the file management device 5; タグ付け提案に対するユーザの応答に応じたタグ付けルールの更新処理（Ｓ３０）を説明するフローチャートである。FIG. 10 is a flow chart illustrating a tagging rule updating process (S30) according to a user's response to a tagging proposal; FIG. タグ付け対案に対するユーザの応答操作画面を例示する図である。FIG. 11 is a diagram illustrating a user response operation screen for tagging alternatives; （ａ）は、「請求書（２０１８）」のタグ付けルールと文書Ｃの条件候補を例示する図であり、（ｂ）は、条件候補のキーワードのスコアを示す表であり、（Ｃ）は、タグ付けルールの更新例である。(a) is a diagram exemplifying the tagging rule of "invoice (2018)" and condition candidates for document C, (b) is a table showing keyword scores of the condition candidates, and (C) is , which is an example of an updated tagging rule. （ａ）は、「請求書」のタグ付けルールと文書Ｄとの特徴を表す図であり、（ｂ）は、条件候補のキーワードのスコアを示す表であり、（ｃ）は、タグ付けルールの更新例である。(a) is a diagram showing the features of the tagging rule for "bill" and the document D, (b) is a table showing the scores of keywords for condition candidates, and (c) is a tagging rule is an example of updating. （ａ）は、「ＡＡ社_請求書」のタグ付けルールと、ＢＢ社の請求書である新規タグ付け文書Ｇの特徴とを表す図であり、（ｂ）は、「○○社_請求書」のタグ付けルールを例示する図であり、（ｃ）は、条件候補のキーワードのスコアを表す表であり、（ｄ）は、タグ付けルールデータの更新例である。(a) is a diagram showing the tagging rules of "Company AA_bill" and the characteristics of the newly tagged document G, which is the invoice of Company BB; FIG. 10 is a diagram illustrating tagging rules for "Kaki", (c) is a table showing scores of keywords of condition candidates, and (d) is an example of updating tagging rule data.

以下、本発明の実施形態を、図面を参照して説明する。
図１は、ファイル管理システム１の全体構成を例示する図である。
図１に例示するように、ファイル管理システム１は、複数のスキャナ３a、スキャナ３ｂ、スキャナ３ｃ、ファイル管理装置５、及びユーザ端末７を含み、ネットワーク９を介して互いに接続している。スキャナ３ａ、スキャナ３ｂ、スキャナ３ｃをスキャナ３と総称し、ユーザ端末７ａとユーザ端末７ｂとをユーザ端末７と総称する。
スキャナ３は、光学式の読取装置であり、取得した画像データをファイル管理装置５へ送信する。
ファイル管理装置５は、コンピュータ端末であり、スキャナ３から受信した画像データを分類するタグを、画像データに付与する。具体的には、ファイル管理装置５は、タグ付けの規則であるタグ付けルールを保持し、タグ付けルールと、画像データをＯＣＲ処理したデータファイルの特徴とに基づいて、適したタグをデータファイルに付与する。さらに、ファイル管理装置５は、ユーザの操作に応じて、タグ付けルールの生成、及び更新を行う。なお、タグ付けルールは、本発明に係る付与ルールの一例である。
ユーザ端末７は、ユーザが操作するコンピュータ端末であり、ファイル管理装置５により提供されるユーザインタフェースを表示する。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram illustrating the overall configuration of a file management system 1. As shown in FIG.
As illustrated in FIG. 1, the file management system 1 includes a plurality of scanners 3a, 3b, 3c, a file management device 5, and a user terminal 7, which are interconnected via a network 9. FIG. The scanner 3a, the scanner 3b, and the scanner 3c are collectively called the scanner 3, and the user terminal 7a and the user terminal 7b are collectively called the user terminal 7. FIG.
The scanner 3 is an optical reading device, and transmits acquired image data to the file management device 5 .
The file management device 5 is a computer terminal, and attaches tags for classifying the image data received from the scanner 3 to the image data. Specifically, the file management device 5 holds tagging rules, which are rules for tagging, and assigns suitable tags to data files based on the tagging rules and the characteristics of data files obtained by OCR processing image data. Grant to. Further, the file management device 5 generates and updates tagging rules according to user's operations. Note that the tagging rule is an example of an assignment rule according to the present invention.
A user terminal 7 is a computer terminal operated by a user, and displays a user interface provided by the file management device 5 .

図２は、ファイル管理装置５のハードウェア構成を例示する図である。
図２に例示するように、ファイル管理装置５は、ＣＰＵ２００、メモリ２０２、ＨＤＤ２０４、ネットワークインタフェース２０６（ネットワークＩＦ２０６）、表示装置２０８、及び、入力装置２１０を有し、これらの構成はバス２１２を介して互いに接続している。
ＣＰＵ２００は、例えば、中央演算装置である。
メモリ２０２は、例えば、揮発性メモリであり、主記憶装置として機能する。
ＨＤＤ２０４は、例えば、ハードディスクドライブ装置であり、不揮発性の記録装置としてコンピュータプログラム（例えば、図３のファイル管理プログラム５０）やその他のデータファイル（例えば、図３のタグ付けルールデータベース６００）を格納する。
ネットワークＩＦ２０６は、有線又は無線で通信するためのインタフェースであり、例えば、内部ネットワーク９における通信を実現する。
表示装置２０８は、例えば、液晶ディスプレイである。
入力装置２１０は、例えば、キーボード及びマウスである。FIG. 2 is a diagram illustrating the hardware configuration of the file management device 5. As shown in FIG.
As illustrated in FIG. 2, the file management device 5 has a CPU 200, a memory 202, an HDD 204, a network interface 206 (network IF 206), a display device 208, and an input device 210. These components are connected via a bus 212. connected to each other.
CPU 200 is, for example, a central processing unit.
The memory 202 is, for example, a volatile memory and functions as a main memory.
The HDD 204 is, for example, a hard disk drive device, and stores computer programs (eg, the file management program 50 in FIG. 3) and other data files (eg, the tagging rule database 600 in FIG. 3) as a non-volatile recording device. .
The network IF 206 is an interface for wired or wireless communication, and implements communication in the internal network 9, for example.
The display device 208 is, for example, a liquid crystal display.
Input device 210 is, for example, a keyboard and mouse.

図３は、ファイル管理装置５の機能構成を例示する図である。
図３に例示するように、本例のファイル管理装置５には、ファイル管理プログラム５０がインストールされると共に、タグ付けルールデータベース６００（タグ付けルールＤＢ６００）が構成される。
ファイル管理プログラム５０は、取得部５００、共通特徴抽出部５０２、照合部５０４、スコア算出部５０６、タグ付与部５０８、及びルール更新部５１０を有する。
なお、ファイル管理プログラム５０の一部又は全部は、ＡＳＩＣなどのハードウェアにより実現されてもよく、また、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）の機能を一部借用して実現されてもよい。
ファイル管理プログラム５０において、取得部５００は、スキャナ３により読み取られた画像データを取得する。
共通特徴抽出部５０２は、同一のタグが付与された複数のデータファイルから、これらのデータファイルに共通する特徴を抽出する。例えば、データファイルとは、画像データに対してＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）処理を実施したものをいう。具体的には、共通特徴抽出部５０２は、取得部５００より取得した画像データに対してＯＣＲ処理を実施し、ＯＣＲ処理結果に基づいて、データファイルの特徴を抽出する。より具体的には、共通特徴抽出部５０２は、データファイルに記載される文字列、日付、データファイルの画像サイズ、及び、データファイルの画像に使用される色数の少なくとも一つを特徴として抽出する。FIG. 3 is a diagram illustrating the functional configuration of the file management device 5. As shown in FIG.
As illustrated in FIG. 3, the file management device 5 of this example is installed with a file management program 50 and configured with a tagging rule database 600 (tagging rule DB 600).
The file management program 50 has an acquisition unit 500 , a common feature extraction unit 502 , a matching unit 504 , a score calculation unit 506 , a tagging unit 508 and a rule updating unit 510 .
Part or all of the file management program 50 may be realized by hardware such as ASIC, or may be realized by partially borrowing functions of an OS (Operating System).
In the file management program 50 , an acquisition unit 500 acquires image data read by the scanner 3 .
A common feature extraction unit 502 extracts features common to a plurality of data files to which the same tag is assigned. For example, a data file refers to image data subjected to OCR (Optical Character Recognition) processing. Specifically, the common feature extraction unit 502 performs OCR processing on the image data acquired from the acquisition unit 500, and extracts features of the data file based on the OCR processing result. More specifically, the common feature extraction unit 502 extracts at least one of a character string described in the data file, the date, the image size of the data file, and the number of colors used in the image of the data file as a feature. do.

ここで、タグ付けルールＤＢ６００について説明する。
タグ付けルールＤＢ６００は、共通特徴抽出部５０２により抽出された特徴と、これらのデータファイルに付与されていたタグとを互いに関連付けてタグ付けルールとして格納する。具体的には、タグ付けルールＤＢ６００は、タグ名と、タグ名に関連付けられる「条件候補」と「条件」とを格納する。「条件」とは、データファイルに関連付けられる「タグ名」を付与するための制約であり、同じタグが付与された複数のデータファイルから抽出された共通する特徴である。同様に、「条件候補」は、複数のデータファイルから抽出された共通する特徴であり、「条件」は、「条件候補」の中から選出される。タグ付けルールＤＢ６００は、本発明に係るルール格納部の一例である。Here, the tagging rule DB 600 will be explained.
The tagging rule DB 600 associates the features extracted by the common feature extraction unit 502 with the tags given to these data files and stores them as tagging rules. Specifically, the tagging rule DB 600 stores tag names, and “condition candidates” and “conditions” associated with the tag names. A "condition" is a constraint for assigning a "tag name" associated with a data file, and is a common feature extracted from multiple data files assigned the same tag. Similarly, "candidate conditions" are common features extracted from a plurality of data files, and "conditions" are selected from among the "candidate conditions." The tagging rule DB 600 is an example of a rule storage section according to the present invention.

照合部５０４は、新たに入力されたデータファイルと、タグ付けルールとして登録されている特徴とを照合する。新たに入力されたデータファイルとは、取得部５００により取得された画像データに対して、ＯＣＲ処理されたデータファイルをいう。具体的には、照合部５０４は、共通特徴抽出部５０２によりＯＣＲ処理されたデータファイルと、タグ付けルールＤＢ６００に格納される、タグ付けルールに登録されている特徴との一致度を判定する。
スコア算出部５０６は、各「条件候補」の判定要素となるスコアを算出し、閾値以上のスコアを有する「条件候補」の中から「条件」を選出する。具体的には、スコア算出部５０６は、各「条件候補」の出現頻度、直近性、出現位置、及び特有性のスコアを算出し、各スコアに重みを付けて合算し、「条件候補」の優位性を算出する。The collation unit 504 collates the newly input data file with features registered as tagging rules. A newly input data file is a data file obtained by OCR processing the image data acquired by the acquisition unit 500 . Specifically, the matching unit 504 determines the degree of matching between the data file OCR-processed by the common feature extraction unit 502 and the features registered in the tagging rules stored in the tagging rule DB 600 .
The score calculation unit 506 calculates a score that is a determination factor for each “condition candidate”, and selects a “condition” from among the “condition candidates” having a score equal to or greater than a threshold. Specifically, the score calculation unit 506 calculates the scores of appearance frequency, recency, appearance position, and uniqueness of each “candidate condition”, weights the scores, and sums them up. Calculate superiority.

タグ付与部５０８は、タグ付けルールＤＢ６００に格納されるタグ付けルールに基づいて、新たに入力されたデータファイルにタグを付与する。
具体的には、タグ付与部５０８は、新たに入力されたデータファイルから、タグ付けルールとして登録されている特徴を探索し、いずれかの特徴が発見された場合に、この特徴に関連付けられたタグを、新たに入力されたデータファイルに付与する。
より具体的には、タグ付与部５０８は、タグ付けルールとして登録されている特徴の一部が、新たに入力されたデータファイルから発見された場合に、この特徴に関連付けられたタグをユーザに提案し、ユーザの操作に応じて、タグを付与する。特徴の一部が新たに入力されたデータファイルから発見された場合とは、共通特徴抽出部５０２により抽出された特徴と、タグ付けルールの特徴との一致率が５０％～９９％である場合をいう。The tagging unit 508 tags the newly input data file based on the tagging rules stored in the tagging rule DB 600 .
Specifically, the tagging unit 508 searches the newly input data file for features registered as tagging rules, and if any feature is found, the tagging unit 508 associates with this feature. Applies tags to newly entered data files.
More specifically, when some of the features registered as tagging rules are found in a newly input data file, the tagging unit 508 provides the user with tags associated with the features. Suggestions are made and tags are given according to the user's operation. A case where a part of the feature is found from a newly input data file is a case where the matching rate between the feature extracted by the common feature extraction unit 502 and the feature of the tagging rule is 50% to 99%. Say.

ルール更新部５１０は、タグ付けルールの生成及び更新を行う。具体的には、ルール更新部５１０は、複数のデータファイルで共通する特徴の中から、出現頻度、直近性及び出現位置の少なくとも一つと、特有性とに基づいて、タグ付けルールの判定要素として登録される特徴を選択する。より具体的には、ルール更新部５１０は、スコア算出部５０６により算出された合算スコアが閾値より高い「条件候補」から「条件」を選出し、タグ付けルールを更新する。
また、具体的には、ルール更新部５１０は、照合部５０４により、新たに入力されたデータファイルと、タグ付けルールとして登録されている特徴の一部が一致したと判定された場合であって、ユーザが提案されたタグを採用した場合に、新たに入力されたデータファイルと、タグ付けルールとして登録されている特徴とが一致するように、タグ付けルールを更新する。
さらに、ルール更新部５１０は、照合部５０４により、新たに入力されたデータファイルと、タグ付けルールとして登録されている特徴の一部が一致したと判定された場合であって、ユーザが提案されたタグを拒否した場合に、新たに入力されたデータファイルと、タグ付けルールとして登録されている特徴とが一致しないように、タグ付けルールを更新する。The rule updating unit 510 generates and updates tagging rules. Specifically, the rule updating unit 510 uses at least one of appearance frequency, recency, and appearance position among the features common to a plurality of data files, and uniqueness as a judgment element of the tagging rule. Select the features to be registered. More specifically, the rule updating unit 510 selects “conditions” from “candidate conditions” for which the total score calculated by the score calculating unit 506 is higher than the threshold, and updates the tagging rule.
More specifically, the rule updating unit 510 updates the rule when the matching unit 504 determines that the newly input data file matches part of the characteristics registered as the tagging rule. , if the user adopts the suggested tag, update the tagging rule so that the newly input data file matches the features registered as the tagging rule.
Further, the rule updating unit 510 updates the proposed user when the collating unit 504 determines that the newly input data file matches part of the characteristics registered as the tagging rule. update the tagging rules so that the newly input data file does not match the features registered as the tagging rules when rejecting new tags.

次に、タグ付けルールＤＢ６００に格納されるタグ付けルールについて説明する。
図４（ａ）は、タグ付けルールを説明する表であり、（ｂ）は、タグ「見積書」のタグ付けルールを説明する表であり、（ｃ）は、共通特徴抽出部５０２により抽出された特徴を例示する図である。
図４（ａ）に例示するように、タグ付けルールは、「タグ名」と、「タグ名」に関連付けられた「条件」とを有する。「タグ名」は、データファイルを分類する名称である。「条件」は、「タグ名」を付与するための制約であり、同じタグが付与されたデータファイルから抽出された共通する特徴である。次回取り込まれたデータファイルがその「条件」に該当すれば、ファイル管理装置５は、「条件」に関連付けられた「タグ名」を、取り込まれたデータファイルに自動で付与する。Next, tagging rules stored in the tagging rule DB 600 will be described.
FIG. 4(a) is a table for explaining the tagging rules, (b) is a table for explaining the tagging rules for the tag "estimate", and (c) is a table for explaining the tagging rules extracted by the common feature extraction unit 502. FIG. 10 illustrates the features that are used;
As illustrated in FIG. 4(a), a tagging rule has a "tag name" and a "condition" associated with the "tag name". "Tag name" is a name for classifying data files. A "condition" is a constraint for assigning a "tag name" and is a common feature extracted from data files assigned the same tag. If the data file fetched next time satisfies the 'condition', the file management device 5 automatically assigns the 'tag name' associated with the 'condition' to the fetched data file.

さらに、「条件」は、「条件候補」の中から選出される。「条件候補」とは、共通特徴抽出部５０２がデータファイルから抽出した特徴である。具体的には、図４（ｂ）に例示されるように、タグ「見積書」には、「条件候補」と「条件」とが関連付けられ、「条件候補」は、タグ付けの「条件」には含めなかったが、タグ付けルールの更新時用に「条件」の候補として保存されるデータファイルから抽出された特徴である。
より具体的には、共通特徴抽出部５０２は、図４（ｃ）に例示されるデータファイルから図４（ｂ）に例示するように、項目毎の「条件候補」を抽出する。スコア算出部５０６は、各条件候補の優位性を表すスコアを算出し、ルール更新部５１０は、各「条件候補」の合算スコアに基づいて、閾値以上のスコアを有する「条件候補」の中から「条件」を選出する。Further, the "condition" is selected from among the "candidate conditions". A “condition candidate” is a feature extracted from a data file by the common feature extraction unit 502 . Specifically, as exemplified in FIG. 4B, the tag "estimate" is associated with "candidate conditions" and "conditions", and the "candidate conditions" is associated with the "conditions" of tagging. are features extracted from the data file that were not included in the , but are saved as candidate "conditions" for when updating the tagging rules.
More specifically, the common feature extraction unit 502 extracts "condition candidates" for each item from the data file illustrated in FIG. 4(c), as illustrated in FIG. 4(b). The score calculation unit 506 calculates a score representing the superiority of each condition candidate, and the rule update unit 510 selects “condition candidates” having a score equal to or higher than the threshold based on the total score of each “condition candidate”. Select "Conditions".

図５は、データファイルから抽出した条件候補を例示する図である。
図５に例示するように、「条件候補」は、共通特徴抽出部５０２によりデータファイルから抽出された特徴であり、文字列、及び画像の縦、及び横の長さである。具体的には、「条件候補」とは、データファイルの文中の単語（キーワード）、文書日付の値、及び画像の縦横の長さの値等である。
キーワードについて、図５（ａ）に例示するように、共通特徴抽出部５０２は、データファイル中のキーワードの文字列と、文字列が記載された位置情報とを特徴として記録する。具体的には、共通特徴抽出部５０２は、ＯＣＲ処理の結果に基づいて、データファイルの形態素解析を行い、分割された単語をキーワード条件候補とする。共通特徴抽出部５０２は、キーワード条件候補のうち、スコアが特定の条件を満たす最大５件をキーワード条件とする。FIG. 5 is a diagram illustrating condition candidates extracted from a data file.
As exemplified in FIG. 5, “candidate conditions” are features extracted from the data file by the common feature extraction unit 502, and are the vertical and horizontal lengths of character strings and images. Specifically, the “candidate conditions” are words (keywords) in the text of the data file, document date values, image length and width values, and the like.
As for the keyword, as illustrated in FIG. 5A, the common feature extraction unit 502 records the character string of the keyword in the data file and the position information describing the character string as the feature. Specifically, the common feature extraction unit 502 performs morphological analysis of the data file based on the result of the OCR processing, and uses the divided words as keyword condition candidates. Among the keyword condition candidates, the common feature extraction unit 502 sets up to five cases whose scores satisfy a specific condition as keyword conditions.

文書日付について、図５（ｂ）に例示するように、共通特徴抽出部５０２は、データファイル中の日付を年／月／日の要素に分解し、日付の記載された位置情報を特徴として記録する。共通特徴抽出部５０２は、文書内に日付が書かれている場合、年、月、日、及び曜日それぞれの要素を文書日付条件候補とする。共通特徴抽出部５０２は、文書日付条件候補のうち、スコアが特定の条件を満たす各要素で最大１件を文書日付条件とする。
画像サイズについて、図５（ｃ）に例示するように、共通特徴抽出部５０２は、データファイルの縦・横の長さを画像サイズ条件候補とし、画像サイズ条件候補のうち、スコアが特定の条件を満たす各要素で最大１件を画像サイズ条件とする。
その他に、キーワード、文書日付の値、及び画像の縦横の長さ以外にも、ルール更新部５１０は、「フォーマット」、「名刺またはレシートの属性値(会社名または住所)」、及び「画像の色」を条件候補とし、これらの一致または類似を条件としてタグ付けルールを作成してもよい。As for the document date, as illustrated in FIG. 5B, the common feature extraction unit 502 decomposes the date in the data file into elements of year/month/day, and records the position information in which the date is described as a feature. do. When the date is written in the document, the common feature extraction unit 502 selects the elements of year, month, day, and day of the week as document date condition candidates. Among the document date condition candidates, the common feature extraction unit 502 selects, as the document date condition, at most one item from each element whose score satisfies a specific condition.
Regarding the image size, as illustrated in FIG. 5C, the common feature extraction unit 502 sets the length and width of the data file as image size condition candidates, and among the image size condition candidates, the score is specified as a condition A maximum of one image size condition is set for each element that satisfies
In addition to the keywords, document date values, and image lengths and widths, the rule updating unit 510 also supports “format,” “business card or receipt attribute values (company name or address),” and “image A tagging rule may be created using "color" as a candidate condition and matching or similar to these as a condition.

タグ付与部５０８は、「条件」を満たすデータファイルにタグを付与する。具体的には、データファイルの特徴と、タグ付けルールのキーワードとの一致、文書日付との類似、及び画像の縦横の長さとの類似の場合に、タグ付与部５０８は、タグを付与する。
例えば、キーワードの条件は、特定の文字列がデータファイル内に記載されていることにより満たされる。文書日付の類似の条件は、データファイル内に記載された年月日がある特徴を有することにより満たされる。画像の縦横の長さの類似の条件は、画像の縦横サイズがある特徴を有することにより満たされる。The tagging unit 508 tags the data files that satisfy the "conditions". Specifically, the tagging unit 508 adds tags when the characteristics of the data file match the keywords of the tagging rule, when the date of the document is similar, and when the vertical and horizontal lengths of the image are similar.
For example, the keyword condition is satisfied by having a specific character string written in the data file. A similar condition for document dates is met by having certain features of the date listed in the data file. The similarity condition of image length and width is satisfied by having a certain feature of image length and width.

次に、条件候補のスコアの算出方法について説明する。
各条件候補は、出現頻度、直近性、出現位置、及び特有性に対するスコアを有する。各スコアは、０～１０の間で増減する。
出現頻度のスコアは、すべてのタグ付きデータファイル中、ある特徴が何件のデータファイルで出現するかに基づいて算出される。出現頻度のスコアは、出現回数が多いほど高くなる。また、タグ付けしたすべてのデータファイルに共通した特徴であればスコアは、１０である。
直近性スコアは、ある特徴が最近入力されたデータファイルに該当するか否かに基づいて算出される。直近性のスコアの初期値は、最大値（１０）である。また、追加されたデータファイルに特徴が当てはまらない場合に、直近性のスコアは減少する。
出現位置のスコアは、データファイル上の近い位置に記載されているか否かに基づいて算出される。出現位置のスコアは、同一箇所であれば最大値（１０）であり、位置が離れるにつれて、出現位置のスコアは減少する。Next, a method for calculating the score of the condition candidate will be described.
Each candidate condition has a score for frequency of occurrence, recency, location of occurrence, and uniqueness. Each score increases or decreases between 0-10.
A frequency of occurrence score is calculated based on how many data files a feature appears in all tagged data files. The score of appearance frequency becomes higher as the number of appearances increases. Also, the score is 10 if the feature is common to all tagged data files.
A recency score is calculated based on whether a feature applies to a recently entered data file. The initial value for recency score is the maximum value (10). Also, the recency score is reduced when the feature does not fit the added data file.
The score of the appearance position is calculated based on whether or not it is described in a close position on the data file. The score of the appearance position is the maximum value (10) at the same location, and the score of the appearance position decreases as the position is farther away.

特有性のスコアは、タグ付けルールに特有の特徴であるか否かに基づいて算出される。スコア算出部５０６は、タグ付けルールに該当するデータファイルに対し、タグ付け提案を行った場合、且つ、ユーザがその提案を拒否した場合に、タグ付けルールには存在し、該当のデータファイルには存在しない「条件」、及び「条件候補」の特有性のスコアを加算する。
スコア算出部５０６は、出現頻度、直近性、出現位置、及び特有性のスコアを算出し、各スコアに重みを付けて合算し、「条件候補」の優位性を計算する。スコア算出部５０６は、「合算スコア＝α×出現頻度のスコア＋β×直近性スコア＋γ×出現位置のスコア＋δ×特有性のスコア」の式を用いて合算スコアを計算する。ルール更新部５１０は、スコア算出部５０６により算出された合算スコアが閾値より高い条件候補から「条件」を選出する。A uniqueness score is calculated based on whether or not the feature is unique to the tagging rule. When a tagging proposal is made to a data file corresponding to the tagging rule and the user rejects the proposal, the score calculation unit 506 determines that the tagging rule exists and the corresponding data file has adds the uniqueness scores of non-existing "conditions" and "candidate conditions".
The score calculation unit 506 calculates the scores of appearance frequency, recency, appearance position, and peculiarity, weights and adds up the scores, and calculates the superiority of the “condition candidate”. The score calculation unit 506 calculates the total score using the formula “total score=α×appearance frequency score+β×recency score+γ×appearance position score+δ×uniqueness score”. The rule update unit 510 selects “conditions” from condition candidates whose total score calculated by the score calculation unit 506 is higher than the threshold.

図６は、ファイル管理装置５によるタグ付けルールの登録及び更新処理（Ｓ１０）を説明するフローチャートである。図６では、ユーザがタグ「Ａ」をデータファイルに付与した場合のタグ付けルールの登録及び更新について説明する。
図６に例示するように、ステップ１００（Ｓ１００）において、ルール更新部５１０は、ユーザのタグ付け操作により、データファイルにタグが付与されたことを検知する。具体的には、ユーザは、図７に例示するように、タグ付け操作画面において、タグの選択、または新規タグ名を入力し、データファイルに対してタグ付け操作（タグ「Ａ」の付与）を行い、ルール更新部５１０は、ユーザによるタグ付け操作を検知する。
ステップ１０５（Ｓ１０５）において、照合部５０４は、タグ「Ａ」のタグ付けルールが存在するか否かをタグ付けルールＤＢ６００から検索する。タグ付けルールが存在する場合に、照合部５０４は、Ｓ１３５へ移行し、タグ付けルールが存在しない場合に、照合部５０４は、Ｓ１１０へ移行する。FIG. 6 is a flowchart for explaining tagging rule registration and update processing (S10) by the file management device 5. As shown in FIG. FIG. 6 describes the registration and update of the tagging rule when the user attaches the tag "A" to the data file.
As illustrated in FIG. 6, at step 100 (S100), the rule updating unit 510 detects that a tag has been added to the data file by the user's tagging operation. Specifically, as exemplified in FIG. 7, the user selects a tag or inputs a new tag name on the tagging operation screen, and performs a tagging operation (adding tag "A") to the data file. , and the rule updating unit 510 detects the tagging operation by the user.
At step 105 (S105), the matching unit 504 searches the tagging rule DB 600 to see if there is a tagging rule for the tag "A". If the tagging rule exists, the matching unit 504 proceeds to S135, and if the tagging rule does not exist, the matching unit 504 proceeds to S110.

ステップ１１０（Ｓ１１０）において、照合部５０４により検索された、ユーザによりタグ「Ａ」が付与されたデータファイルが２つ以上存在する場合に、照合部５０４は、Ｓ１１５へ移行する。ユーザにより、タグ「Ａ」が付与されたデータファイルが１つしかいない場合に、ルール更新部５１０は、タグ付けルールの登録及び更新処理（Ｓ１０）を終了する。タグ「Ａ」の付与されたデータファイルが１つだけでは、同じタグ「Ａ」が付与されたデータファイルにおける共通の特徴を抽出できないため、タグ付けルールは生成されない。
ステップ１１５（Ｓ１１５）において、共通特徴抽出部５０２は、タグ「Ａ」が付与されているデータファイルの特徴を抽出する。具体的には、共通特徴抽出部５０２は、データファイルの文字列、日付、画像サイズ、及び画像に使用される色数の少なくとも一つを抽出する。At step 110 (S110), if there are two or more data files to which the tag "A" has been assigned by the user, which are retrieved by the collation unit 504, the collation unit 504 proceeds to S115. If there is only one data file to which the tag "A" is assigned by the user, the rule updating unit 510 terminates the tagging rule registration and update process (S10). If there is only one data file with the tag "A", no tagging rule is generated because common features in data files with the same tag "A" cannot be extracted.
At step 115 (S115), the common feature extraction unit 502 extracts the features of the data files to which the tag "A" is attached. Specifically, the common feature extraction unit 502 extracts at least one of the character string, date, image size, and number of colors used in the image from the data file.

ステップ１２０（Ｓ１２０）において、共通特徴抽出部５０２が、タグ「Ａ」の付与されているデータファイルすべての特徴を抽出した場合に、タグ付けルールの登録及び更新処理（Ｓ１０）は、Ｓ１２５へ移行し、すべてのデータファイルの特徴を抽出していない場合に、タグ付けルールの登録及び更新処理（Ｓ１０）は、Ｓ１１５へ移行する。
ステップ１２５（Ｓ１２５）において、共通特徴抽出部５０２は、タグ「Ａ」が付与されたデータファイルすべてに共通する特徴を「条件候補」として抽出する。
ステップ１３０（Ｓ１３０）において、スコア算出部５０６は、各「条件候補」の出現頻度、直近性、出現位置、及び特有性のスコアと、合算スコアとを算出する。ルール更新部５１０は、合算スコアが高く、且つ、各スコアが閾値以上の「条件候補」を「条件」として選出し、タグ「Ａ」のタグ付けルールを生成し、タグ付けルールＤＢ６００に登録する。In step 120 (S120), when the common feature extraction unit 502 has extracted the features of all the data files to which the tag "A" is attached, the tagging rule registration and update process (S10) proceeds to S125. However, if the features of all data files have not been extracted, the tagging rule registration and update process (S10) proceeds to S115.
At step 125 (S125), the common feature extraction unit 502 extracts features common to all data files to which the tag "A" is assigned as "candidate conditions".
At step 130 (S130), the score calculation unit 506 calculates the score of appearance frequency, recency, appearance position, and specificity of each “condition candidate”, and a total score. The rule updating unit 510 selects “condition candidates” whose total score is high and whose score is equal to or greater than the threshold as “conditions”, generates a tagging rule for the tag “A”, and registers it in the tagging rule DB 600. .

ステップ１３５（Ｓ１３５）において、タグ「Ａ」のタグ付けルールが存在する場合に、照合部５０４は、タグ「Ａ」のタグ付けルールを取得する。
ステップ１４０（Ｓ１４０）において、共通特徴抽出部５０２は、ユーザによりタグ「Ａ」が付与されたデータファイルの特徴を抽出する。具体的には、共通特徴抽出部５０２は、データファイルの文字列、日付、画像サイズ、及び画像に使用される色数の少なくとも一つを取得する。
ステップ１４５（Ｓ１４５）において、ルール更新部５１０は、Ｓ１３５において、照合部５０４が取得したタグ付けルールの「条件」のうち、Ｓ１４０において、共通特徴抽出部５０２が抽出した特徴に該当しない条件を削除する。さらに、スコア算出部５０６は、Ｓ１３５において、照合部５０４が取得したタグ付けルールの「条件候補」を加えて、Ｓ１４０において取得した各「条件候補」のスコアを再計算する。ユーザによるカスタマイズにより固定条件を設定されている場合には、固定条件を設定された条件候補をスコアの値によらず「条件」として選出する。さらに、ルール更新部５１０は、その他の条件候補の中から合算スコアが高く、且つ各スコアが閾値以上の条件候補を「条件」として追加選出する。
ステップ１５０（Ｓ１５０）において、ルール更新部５１０は、タグ「Ａ」のタグ付けルールの「条件」を、選出された新たな「条件」に入れ替えてタグ付けルールを更新する。新たな「条件」に入れ替えることにより、より一致率の高いタグ付けルールを生成することができる。In step 135 (S135), if there is a tagging rule for tag "A", the matching unit 504 acquires the tagging rule for tag "A".
At step 140 (S140), the common feature extraction unit 502 extracts the features of the data files to which the tag "A" has been assigned by the user. Specifically, the common feature extraction unit 502 acquires at least one of the character string, date, image size, and number of colors used in the image of the data file.
At step 145 (S145), the rule updating unit 510 deletes, from among the "conditions" of the tagging rules acquired by the matching unit 504 at S135, conditions that do not correspond to the features extracted by the common feature extraction unit 502 at S140. do. Furthermore, in S135, the score calculation unit 506 adds the “candidate conditions” of the tagging rule acquired by the matching unit 504, and recalculates the score of each “candidate condition” acquired in S140. When a fixed condition is set by customization by the user, the condition candidate for which the fixed condition is set is selected as the "condition" regardless of the score value. Furthermore, the rule update unit 510 additionally selects, as a “condition”, a condition candidate whose total score is high and whose score is equal to or greater than the threshold value from among the other condition candidates.
At step 150 (S150), the rule updating section 510 updates the tagging rule by replacing the "condition" of the tagging rule for the tag "A" with the selected new "condition". By replacing it with a new "condition", a tagging rule with a higher matching rate can be generated.

次に、ユーザによるタグ付けルールのカスタマイズについて説明する。
ユーザは、任意のタイミングでタグ付けルールのカスタマイズ画面を呼び出し、タグ付けルールの確認、及びカスタマイズを行うことができる。具体的には、図８（ａ）に例示するように、タグ「納品書」から「タグ付けルールの確認」メニューが表示され、ユーザの選択により、タグ付けルールのカスタマイズ画面が表示される。さらに、ユーザは、図８（ｂ）に例示するように、タグに関連付けられるキーワードの追加、削除、及び日付の変更等のカスタマイズをすることができる。さらに、ユーザは、タグに関連付けられる「条件」を固定する設定（固定条件）ができる。固定条件に設定された「条件」は、ユーザのタグ付け操作によるルール更新時に、除去されることなく、必ず「条件」として保持される。
また、図９（ａ）及び（ｂ）に例示するように、ユーザへのタグの提案時、及びタグの検索結果であるタグリストからもタグ付けルールのカスタマイズ画面を呼び出すことができる。このように、ユーザがタグ付けルールを確認し、必要に応じて修正できるため、ファイル管理装置５によるタグ付けルールの更新では対応できないようなタグ付けルールの生成も可能である。Next, customization of tagging rules by the user will be described.
The user can call the tagging rule customization screen at any time to check and customize the tagging rule. Specifically, as exemplified in FIG. 8A, a menu of "confirmation of tagging rules" is displayed from the tag "statement of delivery", and a customizing screen of tagging rules is displayed according to the user's selection. Furthermore, the user can customize, such as adding and deleting keywords associated with the tag, and changing the date, as exemplified in FIG. 8(b). Furthermore, the user can set (fixed condition) to fix the “condition” associated with the tag. A "condition" set as a fixed condition is always retained as a "condition" without being removed when a rule is updated by a user's tagging operation.
Further, as exemplified in FIGS. 9A and 9B, the tagging rule customization screen can be called up when a tag is proposed to the user and also from the tag list that is the search result of the tag. In this way, since the user can check the tagging rules and modify them as necessary, it is possible to generate tagging rules that cannot be handled by updating the tagging rules by the file management device 5 .

図１０は、ファイル管理装置５によるタグ付け及びタグ提案処理（Ｓ２０）を説明するフローチャートである。
図１０に例示するように、ステップ２００（Ｓ２００）において、取得部５００は、スキャナ３によりスキャンされた書類の画像データを取得する。共通特徴抽出部５０２は、取得部５００により取得された画像データに対してＯＣＲ処理を実施したデータファイルを取得する。共通特徴抽出部５０２は、データファイルの文字列、日付、画像サイズ、及び画像に使用される色数の少なくとも一つを特徴として抽出する。
ステップ２０５（Ｓ２０５）において、照合部５０４は、データファイルの特徴とタグ付けルールＤＢ６００に格納されるタグ付けルールとを照合する。
ステップ２１０（Ｓ２１０）において、照合部５０４は、データファイルの特徴と、すべてのタグ付けルールとを照合した場合に、Ｓ２１５へ移行し、すべてのタグ付けルールと照合していない場合に、Ｓ２０５へ移行する。
ステップ２１５（Ｓ２１５）において、照合部５０４は、照合した結果、データファイルの特徴と一致率が最も高いタグ付けルールを選出する。
ステップ２２０（Ｓ２２０）において、Ｓ２１５において選出したタグ付けルールの一致率が１００％である場合に、Ｓ２２５へ移行し、一致率が１００％でない場合に、Ｓ２３５へ移行する。FIG. 10 is a flow chart for explaining tagging and tag proposal processing (S20) by the file management device 5. As shown in FIG.
As illustrated in FIG. 10, in step 200 (S200), the acquisition unit 500 acquires image data of the document scanned by the scanner 3. FIG. A common feature extraction unit 502 acquires a data file obtained by performing OCR processing on the image data acquired by the acquisition unit 500 . A common feature extraction unit 502 extracts at least one of character strings, dates, image sizes, and the number of colors used in images as features.
At step 205 ( S<b>205 ), the collation unit 504 collates the feature of the data file with the tagging rules stored in the tagging rule DB 600 .
At step 210 (S210), the matching unit 504 proceeds to S215 if the feature of the data file is matched with all tagging rules, and proceeds to S205 if not matched with all tagging rules. Transition.
At step 215 (S215), the matching unit 504 selects the tagging rule with the highest matching rate with the feature of the data file as a result of the matching.
At step 220 (S220), if the matching rate of the tagging rule selected at S215 is 100%, the process proceeds to S225, and if the matching rate is not 100%, the process proceeds to S235.

ステップ２２５（Ｓ２２５）において、タグ付与部５０８は、一致率１００％であるタグ付けルールのタグをデータファイルに付与する。
ステップ２３０（Ｓ２３０）において、ルール更新部５１０は、タグ付けルールを更新し、登録する。具体的には、Ｓ２１５において選出したタグ付けルールのうち、共通特徴抽出部５０２が抽出した特徴に該当しない条件を削除する。さらに、各条件候補のスコア、固定条件、及びその他の条件候補に基づいて「条件」を選出し、タグ付けルールの「条件」を、選出された新たな「条件」に入れ替え、タグ付けルールＤＢ６００に登録する。
ステップ２３５（Ｓ２３５）において、タグ付与部５０８は、Ｓ２１５において選出されたタグ付けルールとデータファイルとの特徴が一致率５０％以上９９％未満（類似）である場合に、Ｓ２４０へ移行し、一致率が４９％以下である場合に、タグ付けをせず、処理を終了する。
ステップ２４０（Ｓ２４０）において、タグ付与部５０８は、類似であると判断されたタグ付けルールのタグをデータファイルに付与することを提案し、付与するか否かの判断をユーザに求める。At step 225 (S225), the tagging unit 508 gives the data file a tag of a tagging rule with a matching rate of 100%.
At step 230 (S230), the rule updating unit 510 updates and registers the tagging rule. Specifically, among the tagging rules selected in S215, conditions that do not correspond to the features extracted by the common feature extraction unit 502 are deleted. Furthermore, a "condition" is selected based on the score of each condition candidate, a fixed condition, and other condition candidates, the "condition" of the tagging rule is replaced with the selected new "condition", and the tagging rule DB 600 to register.
In step 235 (S235), if the tagging rule selected in S215 and the data file have a matching rate of 50% or more and less than 99% (similar), the tagging unit 508 proceeds to S240 and If the rate is less than or equal to 49%, no tagging is done and the process ends.
At step 240 (S240), the tagging unit 508 proposes that the tag of the tagging rule determined to be similar be given to the data file, and asks the user to decide whether or not to give the tag.

図１１は、タグ付け提案に対するユーザの応答に応じたタグ付けルールの更新処理（Ｓ３０）を説明するフローチャートである。図１１では、ファイル管理装置５がタグ「ＡＡＡ」の付与をユーザに提案した場合について説明する。
図１１に例示するように、ステップ３００（Ｓ３００）において、タグ付与部５０８は、図１２に例示するように、ユーザにタグ「ＡＡＡ」の付与を提案する。具体的には、タグ付与部５０８は、タグ「ＡＡＡ」の付与、タグを付与しない、及び、別タグの付与をユーザの応答操作として提示する。
ステップ３０５（Ｓ３０５）において、タグ付与部５０８によるタグの提案に対してユーザがタグ「ＡＡＡ」を妥当であると判断した場合に、Ｓ３１０へ移行し、タグ「ＡＡＡ」を妥当であると判断しない場合に、Ｓ３２０へ移行する。
ステップ３１０（Ｓ３１０）において、タグ付与部５０８は、データファイルにタグ「ＡＡＡ」を付与する。
ステップ３１５（Ｓ３１５）において、ルール更新部５１０は、タグ「ＡＡＡ」のタグ付けルールを更新登録する。具体的には、ルール更新部５１０は、データファイルの特徴と、タグ「ＡＡＡ」のタグ付けルールとの一致率が１００％となるよう、「条件」を選定し、既存の「ＡＡＡ」のタグ付けルールの「条件」とを入れ替える。ルール更新部５１０は、「条件」の入れ替えではなく、「条件」の一部削除（例えば、文字列が３文字一致から２文字一致により条件を満たすような、条件の緩和）により一致率が上がるようにしてもよい。FIG. 11 is a flowchart for explaining the tagging rule updating process (S30) according to the user's response to the tagging proposal. FIG. 11 illustrates a case where the file management device 5 proposes to the user to add the tag "AAA".
As illustrated in FIG. 11, at step 300 (S300), the tagging unit 508 proposes to the user to give the tag "AAA" as illustrated in FIG. Specifically, the tagging unit 508 presents, as the user's response operation, the addition of the tag "AAA", no tagging, and the addition of another tag.
In step 305 (S305), if the user determines that the tag "AAA" is appropriate for the tag proposal by the tagging unit 508, the process proceeds to S310 and does not determine that the tag "AAA" is appropriate. If so, the process proceeds to S320.
At step 310 (S310), the tag assigning unit 508 assigns the tag "AAA" to the data file.
At step 315 (S315), the rule updating unit 510 updates and registers the tagging rule for the tag "AAA". Specifically, the rule update unit 510 selects a “condition” such that the matching rate between the characteristics of the data file and the tagging rule for the tag “AAA” is 100%, and updates the existing “AAA” tag. Replace the "condition" of the attachment rule. The rule update unit 510 does not replace the "condition", but partially deletes the "condition" (for example, relaxes the condition such that the character string matches the three characters to match the two characters) to increase the matching rate. You may do so.

ステップ３２０（Ｓ３２０）において、ユーザが「ＡＡＡ」とは別のタグを付与することを選択した場合、ここでは、ユーザがタグ「ＢＢＢ」を付与することを選択した場合にＳ３２５へ移行し、それ以外の場合に、Ｓ３４０へ移行する。
ステップ３２５（Ｓ３２５）において、タグ付与部５０８は、データファイルにタグ「ＢＢＢ」を付与する。
ステップ３３０（Ｓ３３０）において、ユーザが「タグ「ＢＢＢ」を付与する」を選択した場合に、ルール更新部５１０は、データファイルの特徴と、タグ「ＡＡＡ」のタグ付けルールとの一致率が４９％以下となるよう、タグ「ＡＡＡ」のタグ付けルールを更新する。具体的には、ルール更新部５１０は、データファイルの特徴と、タグ「ＡＡＡ」のタグ付けルールとが一致率が４９％以下となるよう「条件」を選定する。さらに、ルール更新部５１０は、選定した「条件」と、既存の「ＡＡＡ」のタグ付けルールの「条件」とを入れ替える。これにより、データファイルの特徴とタグ「ＡＡＡ」のタグ付けルールとが類似と判定されないようになる。また、ルール更新部５１０は、「条件」の入れ替えではなく、「条件」の追加（条件の強化）により一致率が下がるようにしてもよい。In step 320 (S320), if the user selects to add a tag other than "AAA", here, if the user selects to add the tag "BBB", the process proceeds to S325, and Otherwise, the process proceeds to S340.
At step 325 (S325), the tag assigning unit 508 attaches the tag "BBB" to the data file.
In step 330 (S330), when the user selects "apply tag 'BBB'", the rule updating unit 510 determines that the match rate between the data file characteristics and the tagging rule for the tag 'AAA' is 49. % or less, update the tagging rule for the tag "AAA". Specifically, the rule updating unit 510 selects the “condition” so that the matching rate between the feature of the data file and the tagging rule of the tag “AAA” is 49% or less. Further, the rule updating unit 510 replaces the selected “condition” with the “condition” of the existing “AAA” tagging rule. As a result, the feature of the data file and the tagging rule of the tag "AAA" are not determined to be similar. Also, the rule update unit 510 may reduce the match rate by adding a "condition" (strengthening the condition) instead of replacing the "condition".

ステップ３３５（Ｓ３３５）において、ルール更新部５１０は、データファイルの特徴と、タグ「ＢＢＢ」のタグ付けルールとの一致率が１００％となるようタグ「ＢＢＢ」のタグ付けルールを更新する。具体的には、ルール更新部５１０は、データファイルの特徴と、タグ「ＢＢＢ」のタグ付けルールとの一致率が１００％となるよう、「条件」を選定する。さらに、ルール更新部５１０は、選定した「条件」と、既存の「ＢＢＢ」のタグ付けルールの「条件」とを入れ替える。これにより、データファイルの特徴とタグ「ＢＢＢ」のタグ付けルールとが一致と判定されるようになる。また、ルール更新部５１０は、「条件」の入れ替えではなく、「条件」の一部削除（条件の緩和）により一致率が上がるようにしてもよい。
ステップ３４０（Ｓ３４０）において、ユーザが「タグ「ＡＡＡ」を付与しない」を選択した場合に、ルール更新部５１０は、データファイルの特徴と、タグ「ＡＡＡ」のタグ付けルールとの一致率が４９％以下となるようタグ「ＡＡＡ」のタグ付けルールを更新する。より具体的には、ルール更新部５１０は、データファイルの特徴と、タグ「ＡＡＡ」のタグ付けルールとの一致率が４９％以下となるよう、「条件」を選定する。そして、ルール更新部５１０は、選定した「条件」と、既存の「ＡＡＡ」のタグ付けルールの「条件」とを入れ替える。これにより、データファイルの特徴とタグ「ＡＡＡ」のタグ付けルールとが類似と判定されないようになる。また、ルール更新部５１０は、「条件」の入れ替えではなく、「条件」の追加（条件の強化）により一致率が下がるようにしてもよい。
ステップ３４５（Ｓ３４５）において、ルール更新部５１０は、入れ替えた「条件」をタグ付けルールの条件としてタグ付けルールＤＢ６００に登録する。At step 335 (S335), the rule updating section 510 updates the tagging rule for the tag "BBB" so that the matching rate between the feature of the data file and the tagging rule for the tag "BBB" is 100%. Specifically, the rule updating unit 510 selects the “condition” so that the matching rate between the feature of the data file and the tagging rule for the tag “BBB” is 100%. Furthermore, the rule update unit 510 replaces the selected “condition” with the “condition” of the existing “BBB” tagging rule. As a result, it is determined that the characteristics of the data file and the tagging rule for the tag "BBB" match. Further, the rule updating unit 510 may increase the match rate by partially deleting the “condition” (relaxing the condition) instead of replacing the “condition”.
In step 340 (S340), when the user selects "do not add tag 'AAA'", the rule updating unit 510 determines that the match rate between the data file characteristics and the tagging rule for the tag 'AAA' is 49. % or less to update the tagging rule for the tag "AAA". More specifically, the rule updating unit 510 selects the “condition” so that the matching rate between the characteristics of the data file and the tagging rule for the tag “AAA” is 49% or less. Then, the rule updating unit 510 replaces the selected “condition” with the “condition” of the existing “AAA” tagging rule. As a result, the feature of the data file and the tagging rule of the tag "AAA" are not determined to be similar. Also, the rule update unit 510 may reduce the match rate by adding a "condition" (strengthening the condition) instead of replacing the "condition".
At step 345 (S345), the rule update unit 510 registers the replaced "condition" in the tagging rule DB 600 as a condition of the tagging rule.

次に、新規タグ付け文書Ｃに対して、文書Ａ及び文書Ｂに付与された既存タグ「請求書（２０１８）」を付与した場合のタグ付けルールの更新例を説明する。
図１３（ａ）は、「請求書（２０１８）」のタグ付けルールと文書Ｃの特徴とを表す図であり、文書Ａ、文書Ｂ、及び文書Ｃのキーワード、文書日付、及びサイズを表す。図１３（ａ）に例示するように、文書Ｃには、文書Ａ及び文書Ｂのように、「下記」及び「東京」の文字列が含まれない。
図１３（ｂ）は、条件候補のキーワードのスコアを表す表である。
図１３（ｃ）は、タグ付けルールデータの更新例である。
ルール更新部５１０は、文書Ｃに含まれる「条件候補」のスコアに基づいて、「条件」として採用されるキーワードを選出する。具体的には、図１３（ｂ）に例示するように、スコア算出部５０６により算出された、「下記」及び「東京」の出現頻度は「１０」から「７」に、直近度は「１０」から「９」に下がり、「請求書」及び「税額」の出現位置は加算される。その結果、「下記」及び「東京」の合算スコアが下がり、「請求書」及び「税額」の合算スコアは上がる。したがって、図１３（ｃ）に例示するように、ルール更新部５１０は、文書Ａ及び文書Ｂに基づいて生成されたタグ付けルールを、キーワードとして「請求書」、「金額」、「振込」、「税額」、及び「納期」を有し、文書日付として「２０１８」を有し、サイズが「Ａ４」であるデータファイルにタグ「請求書（２０１８）」を付与するタグ付けルールに更新する。Next, an example of updating the tagging rule in the case where the existing tag "invoice (2018)" given to the document A and the document B is given to the newly tagged document C will be described.
FIG. 13(a) is a diagram representing the tagging rules of "invoice (2018)" and the characteristics of document C, representing keywords, document dates and sizes of documents A, B and C. FIG. As illustrated in FIG. 13A, document C, like documents A and B, does not include the character strings "below" and "Tokyo".
FIG. 13(b) is a table showing scores of keywords of condition candidates.
FIG. 13(c) is an example of updating the tagging rule data.
The rule updating unit 510 selects a keyword to be adopted as the "condition" based on the score of the "condition candidate" included in the document C. FIG. Specifically, as exemplified in FIG. ” to “9”, and the appearance positions of “invoice” and “tax amount” are added. As a result, the total score of "below" and "Tokyo" decreases, and the total score of "invoice" and "tax amount" increases. Therefore, as exemplified in FIG. 13C, the rule updating unit 510 sets the tagging rules generated based on the document A and the document B as keywords "invoice", "amount", "transfer", The tagging rule is updated to give a tag "invoice (2018)" to a data file having "tax amount" and "delivery date", having "2018" as the document date, and having a size of "A4".

次に、１つの条件だけでは、一致率が１００％となるタグ付けルールを生成できない場合のタグ付けルールの更新例を説明する。具体的には、ユーザが、新規タグ付け文書Ｄに対して、既存タグ「請求書」を付与する場合について説明する。
図１４（ａ）は、「請求書」のタグ付けルールと文書Ｄとの特徴を表す図であり、文書Ａ、文書Ｂ、文書Ｃ、及び文書Ｄのキーワード、及び文書日付を表す。図１４（ａ）に例示するように、文書Ｄには、文書Ａ～文書Ｃのように、「請求書」の文字列が含まれず、文書Ａ～文書Ｃと新規タグ付け文書Ｄとでは、共通するデータファイルの特徴がない。つまり、図１４（ｃ）のＮｏ．１の既存のタグ「請求書」のタグ付けルールだけでは、文書Ｄに対してタグ「請求書」を付与することができない。
図１４（ｂ）は、条件候補のキーワードのスコアを表す表である。
図１４（ｃ）は、タグ付けルールデータの更新例である。
図１４（ｂ）、及び図１４（ｃ）に例示するように、スコア算出部５０６は、「条件候補」のキーワードのスコアを再計算し、ルール更新部５１０は、更新前のタグ付けルール（Ｎｏ．１）に加え、文書Ｄがタグ「請求書」に該当するように、合算スコアの高い「振込先」及び「支払期限」を「条件」として選出し、タグ「請求書」のタグ付けルールとして追加する。したがって、ルール更新部５１０は、タグ「請求書」のタグ付けルールとして条件「Ｎｏ．１」と「Ｎｏ．２」とを登録する。これにより、タグ付けルールの条件が「Ｎｏ．１」または「Ｎｏ．２」に該当すればタグ「請求書」が付与されるようになる。Next, an example of tagging rule update when a tagging rule with a matching rate of 100% cannot be generated with only one condition will be described. Specifically, a case will be described in which the user attaches the existing tag "bill" to the newly tagged document D. FIG.
FIG. 14(a) is a diagram showing the tagging rules for "bill" and the features of document D, showing the keywords of document A, document B, document C, and document D, and the document date. As exemplified in FIG. 14(a), document D does not include the character string "invoice" like documents A to C, and documents A to C and newly tagged document D: No common data file characteristics. That is, No. in FIG. 14(c). Document D cannot be given the tag "bill" with only one existing tagging rule of "bill".
FIG. 14(b) is a table showing scores of keywords of condition candidates.
FIG. 14(c) is an example of updating the tagging rule data.
As illustrated in FIGS. 14(b) and 14(c), the score calculation unit 506 recalculates the score of the “condition candidate” keyword, and the rule update unit 510 updates the tagging rule before update ( In addition to No. 1), in order for document D to fall under the tag "invoice", "payment destination" and "payment deadline" with a high total score are selected as "conditions" and tagged with the tag "invoice" Add as a rule. Therefore, the rule updating unit 510 registers the conditions "No. 1" and "No. 2" as the tagging rules for the tag "bill". As a result, if the condition of the tagging rule is "No. 1" or "No. 2", the tag "invoice" is added.

次に、タグ付けを提案後、ユーザがその提案を拒否した場合のタグ付けルールの更新例を説明する。具体的には、ＢＢ社の請求書である新規タグ付け文書Ｇに対して、ユーザが既存タグ「ＡＡ社_請求書」の提案を拒否した場合について説明する。
図１５（ａ）は、「ＡＡ社_請求書」のタグ付けルールと、ＢＢ社の請求書である新規タグ付け文書Ｇの特徴とを表す図であり、（ｂ）は、「○○社_請求書」のタグ付けルールを例示する図であり、（ｃ）は、条件候補のキーワードのスコアを表す表であり、（ｄ）は、タグ付けルールデータの更新例である。
図１５（ｂ）に例示する「○○社_請求書」のタグ付けルールでは、ＢＢ社の請求書にも「ＡＡ社_請求書」タグの付与を提案してしまう。
そこで、図１５（ａ）に例示するように、スコア算出部５０６は、タグ付けを拒否した文書Ｇの特徴と、タグ「ＡＡ社_請求書」のタグ付けルールの「条件候補」とを比較する。スコア算出部５０６は、タグ「ＡＡ社_請求書」のタグ付けルールの「条件候補」の中に、文書Ｇの特徴に含まれない「条件候補」があれば、その「条件候補」の特有性のスコアを加算する。具体的には、図１５（ｃ）に例示するように、スコア算出部５０６は、文書Ｅ、文書Ｆに含まれ、文書Ｇに含まれないキーワードの条件候補である“ＡＡ社”に特有性スコアを付与する。スコア算出部５０６によるスコアの再計算の結果、図１５（ｄ）に例示するように、ルール更新部５１０は、タグ付けルールを、タグ名「ＡＡ社_請求書」、キーワード「“ＡＡ会社”、“請求書”、“振込先”、“支払期限”、及び“請求金額”」を有するタグ付けルールに更新する。これにより、タグ付与部５０８は、ＢＢ社の請求書である文書Ｇに対して、タグ「ＡＡ社_請求書」を提案することがなくなる。Next, an example of updating the tagging rules when the user rejects the suggestion after tagging is suggested will be described. Specifically, a case will be described in which the user rejects the proposal of the existing tag "AA company_bill" for the newly tagged document G, which is the bill of company BB.
FIG. 15(a) is a diagram showing the tagging rule of "Company AA_invoice" and the characteristics of the newly tagged document G, which is the invoice of Company BB. FIG. 10 is a diagram illustrating tagging rules for "_bill", (c) is a table showing scores of keywords of condition candidates, and (d) is an example of updating tagging rule data.
According to the tagging rule of "Company XX_invoice" illustrated in FIG. 15B, the tag "Company AA_invoice" is proposed to the invoice of Company BB as well.
Therefore, as exemplified in FIG. 15A, the score calculation unit 506 compares the features of the document G for which tagging is rejected with the "candidate conditions" of the tagging rule with the tag "Company AA_bill". do. If there is a “candidate condition” that is not included in the features of document G among the “candidate conditions” of the tagging rule of the tag “company AA_invoice”, the score calculation unit 506 Add gender score. Specifically, as exemplified in FIG. 15C, the score calculation unit 506 calculates the peculiarity of “AA Company”, which is a keyword condition candidate that is included in documents E and F but not included in document G. give a score. As a result of the recalculation of the score by the score calculating unit 506, the rule updating unit 510 sets the tagging rule to the tag name "AA company_bill" and the keyword ""AA company", as illustrated in FIG. 15(d). , “Invoice”, “Payee”, “Payment Due Date”, and “Billed Amount”. As a result, the tagging unit 508 no longer proposes the tag "AA company_bill" to document G, which is the bill of company BB.

以上説明したように、ファイル管理装置５によれば、スキャナ３から取得したデータファイルに、データファイルの特徴とタグ付けルールと基づいて、ユーザの介在なく自動でタグを付与することができる。また、ユーザは、ファイル管理装置５により管理されるタグ付けルールの見直しが可能であり、必要に応じてタグ付けルールを修正することができる。そして、タグ付けルールは、タグ付け対象の文書とタグ付けルールとの一致率に基づいて、タグ付けルールを更新するため、使用により、より精度の高いタグ付けルールが確立される。 As described above, according to the file management device 5, tags can be automatically added to data files acquired from the scanner 3 based on the features of the data files and the tagging rules without user intervention. In addition, the user can review the tagging rules managed by the file management device 5 and correct the tagging rules as necessary. The tagging rules are then used to establish more accurate tagging rules because the tagging rules are updated based on the matching rate between the tagged document and the tagging rules.

また、上記実施形態では、ファイル管理装置５がスキャナ３により読み取られた画像データに対してタグを付与しているが、これに限定されず、スキャナ３が、ファイル管理装置５の機能を有し、画像データを読み取り、データファイルに対して、タグを付与してもよい。さらに、ユーザ端末７が、ファイル管理装置５の機能を有し、ユーザ端末７が、データファイルに対してタグを付与してもよい。 In the above embodiment, the file management device 5 attaches tags to the image data read by the scanner 3, but the present invention is not limited to this, and the scanner 3 has the function of the file management device 5. , the image data may be read and tags may be added to the data files. Furthermore, the user terminal 7 may have the function of the file management device 5, and the user terminal 7 may add tags to data files.

１…ファイル管理システム
３…スキャナ
５…ファイル管理装置
５０…ファイル管理プログラム
５００…取得部
５０２…共通特徴抽出部
５０４…照合部
５０６…スコア算出部
５０８…タグ付与部
５１０…ルール更新部
６００…タグ付けルールデータベース1 file management system 3 scanner 5 file management device 50 file management program 500 acquisition unit 502 common feature extraction unit 504 collation unit 506 score calculation unit 508 tag assignment unit 510 rule update unit 600 tag tagging rule database

Claims

a common feature extraction unit for extracting features common to a plurality of data files to which the same tag is attached;
a rule storage unit that associates at least one feature extracted by the common feature extraction unit and one tag that has been assigned to these data files with each other and stores them as an assignment rule;
A file management device comprising: a tag assigning unit that assigns a tag to a newly input data file based on the assigning rule stored in the rule storage unit.

The tagging unit searches the newly input data file for the features registered as the granting rules in the rule storage unit, and if any feature is found, the tag associated with this feature to a newly input data file.

When part of a character string registered as a feature of a tagging rule is found in a newly input data file, the tagging unit proposes a tag associated with this feature to the user, 3. The file management device according to claim 2, wherein a tag is added according to the operation of .

4. The method according to claim 3, further comprising: a rule update unit that updates the grant rule so that, when the proposed tag is adopted by the user, the features of the newly input data file match the features of the grant rule. File management device.

4. The method according to claim 3, further comprising: a rule updating unit that updates the granting rule so that the features of the newly input data file do not match the features of the granting rule if the proposed tag is not adopted by the user. file management device.

2. The file management device according to claim 1, wherein the common feature extractor extracts at least one of a character string, a date, an image size, and the number of colors used in an image as the feature.

The addition rule stored in the rule storage unit includes a plurality of features as determination elements for determining whether or not to add a tag ,
The rule update unit selects features to be registered as determination elements of the given rule based on at least one of appearance frequency, recency, and appearance position, and uniqueness from features common to a plurality of data files. The file management device according to claim 4 or 5.

a common feature extraction step in which a computer extracts features common to a plurality of data files to which the same tag is attached;
a rule storage step in which the computer associates at least one feature extracted by the common feature extraction step with one tag assigned to these data files and stores them as an assignment rule;
A file management method comprising: a tagging step in which a computer tags newly input data files based on the granting rules stored in the rule storing step.

a common feature extraction step of extracting features common to a plurality of data files to which the same tag is attached;
a rule storage step of correlating at least one feature extracted by the common feature extraction step and one tag assigned to these data files with each other and storing them as an assignment rule;
A program for causing a computer to execute: a tagging step of tagging a newly input data file based on the tagging rule stored in the rule storing step.