JP6969818B1

JP6969818B1 - Information processing equipment, information processing methods and information processing programs

Info

Publication number: JP6969818B1
Application number: JP2020114655A
Authority: JP
Inventors: 正三中島
Original assignee: 株式会社ダブルスタンダード
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2021-11-24
Anticipated expiration: 2040-07-02
Also published as: JP2022013946A; JP2022012657A

Abstract

【課題】利便性の高い情報処理装置、情報処理方法及び情報処理プログラムを提供すること。【解決手段】本発明に係る情報処理装置は、異なる２以上の情報源から対象者に関する情報を取得する取得部と、前記取得部で取得された前記対象者に関する情報の表記を所定の表記に変更する表記変更部と、前記取得部で取得された前記対象者に関する情報を統合する統合部と、前記統合部で統合された前記対象者に関する情報に基づいて、前記対象者が監視対象となるリスクを算出する算出部と、を備える。【選択図】図１PROBLEM TO BE SOLVED: To provide a highly convenient information processing apparatus, information processing method and information processing program. An information processing apparatus according to the present invention has an acquisition unit that acquires information about a target person from two or more different information sources, and a notation of information about the target person acquired by the acquisition unit as a predetermined notation. The target person is monitored based on the notation change unit to be changed, the integrated unit that integrates the information about the target person acquired by the acquisition unit, and the information about the target person integrated by the integrated unit. It is equipped with a calculation unit that calculates the risk. [Selection diagram] Fig. 1

Description

本発明は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method and an information processing program.

従来、金融機関等では、本人確認書類に基づき、顧客が犯罪リスクを持った人物であるか否かをチェックすることがある。犯罪リスクのチェックには例えば、マネーロンダリングを行う可能性が高い人物であるか否かのチェックや反社会勢力に属する人物であるか否かのチェックなどが含まれる。 Conventionally, a financial institution or the like may check whether or not a customer has a criminal risk based on an identity verification document. The criminal risk check includes, for example, a check of whether or not the person is likely to perform money laundering and a check of whether or not the person belongs to an antisocial force.

例えば、特許文献１には、検出された疑わしい取引がマネーロンダリングに該当するか否かを複数の情報源から総合的に判定することを支援するマネーロンダリング判定支援システムが提案されている。 For example, Patent Document 1 proposes a money laundering determination support system that supports comprehensive determination of whether or not a detected suspicious transaction corresponds to money laundering from a plurality of information sources.

特開２０１０−２２５０４０号公報Japanese Unexamined Patent Publication No. 2010-22040

上記のように、犯罪リスクを判定するためには、複数の情報源から取得した情報を総合的に判定する必要がある。しかしながら、情報源によって文書フォーマット等が異なるため、複数の情報源から取得した情報を機械的に処理することは難しく、多数の人手が必要となっている。 As mentioned above, in order to judge the crime risk, it is necessary to comprehensively judge the information acquired from multiple information sources. However, since the document format and the like differ depending on the information source, it is difficult to mechanically process the information acquired from a plurality of information sources, and a large number of manpower is required.

本発明は、上記課題に鑑みてなされたものであり、利便性の高い情報処理装置、情報処理方法及び情報処理プログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a highly convenient information processing apparatus, information processing method, and information processing program.

上記課題を解決するため、本発明の情報処理装置は、２以上の情報源から対象者に関する情報を取得する取得部と、前記取得部で取得された前記対象者に関する情報の表記を所定の表記に変更する表記変更部と、前記取得部で取得された前記対象者に関する情報を統合する統合部と、前記統合部で統合された前記対象者に関する情報に基づいて、前記対象者が監視対象となるリスクを算出する算出部と、を備える。 In order to solve the above problems, the information processing apparatus of the present invention has a predetermined notation of an acquisition unit that acquires information about the target person from two or more information sources and a notation of information about the target person acquired by the acquisition unit. Based on the notation change unit to be changed to, the integrated unit that integrates the information about the target person acquired by the acquisition unit, and the information about the target person integrated by the integrated unit, the target person is monitored. It is provided with a calculation unit for calculating the risk of becoming.

本発明によれば、利便性の高い情報処理装置、情報処理方法及び情報処理プログラムを提供することができる。 According to the present invention, it is possible to provide a highly convenient information processing apparatus, information processing method and information processing program.

実施形態に係る情報処理システムの概略構成の一例を示す図である。It is a figure which shows an example of the schematic structure of the information processing system which concerns on embodiment. 実施形態に係るサーバのハード構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the server which concerns on embodiment. 実施形態に係るサーバの記憶装置に記憶されているデータベースの一例を示す図である。It is a figure which shows an example of the database stored in the storage device of the server which concerns on embodiment. データベースに記憶されている情報の一例を示す図である。It is a figure which shows an example of the information stored in a database. データベースに記憶されている情報の一例を示す図である。It is a figure which shows an example of the information stored in a database. 実施形態に係るサーバの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the server which concerns on embodiment. 実施形態に係るユーザ端末のハード構成及び機能構成の一例を示す図である。It is a figure which shows an example of the hardware composition and the function composition of the user terminal which concerns on embodiment. 実施形態に係るサーバのリスク算出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the risk calculation process of the server which concerns on embodiment. （ａ）は、実施形態に係るサーバの表記変更部による表記変更の一例を示す図である。（ｂ）及び（ｃ）は、実施形態に係るサーバの統合部による情報統合の一例を示す図である。(A) is a figure which shows an example of the notation change by the notation change part of the server which concerns on embodiment. (B) and (c) are diagrams showing an example of information integration by the integration unit of the server according to the embodiment. 実施形態に係るサーバの文字認識処理の一例を示すフローチャートである。It is a flowchart which shows an example of the character recognition processing of the server which concerns on embodiment. 実施形態に係るサーバの認識部による文字認識の一例を示す図である。It is a figure which shows an example of character recognition by the recognition part of the server which concerns on embodiment. 実施形態に係るサーバの認識部による位置情報付与の一例を示す図である。It is a figure which shows an example of the position information addition by the recognition part of the server which concerns on embodiment. 実施形態に係るサーバの探索部による探索の一例を示す図である。It is a figure which shows an example of the search by the search part of the server which concerns on embodiment. 実施形態に係るサーバの結合部による横方向の結合の一例を示す図である。It is a figure which shows an example of the horizontal connection by the connection part of the server which concerns on embodiment. 実施形態に係るサーバの結合部による縦方向の結合の一例を示す図である。It is a figure which shows an example of the vertical connection by the connection part of the server which concerns on embodiment.

以下、本発明の実施形態を図面に基づいて説明する。なお、以下の説明において「対象者」とは、リスク算出の対象者である。
また、「リスク」とは、対象者が監視対象（例えば、外国ＰＥＰｓ（外国の政府等において重要な地位を占める者（外国の国家元首等）とその地位にあった者、それらの家族および実質的支配者がこれらの者である法人）やマネーロンダリング等の犯罪）に該当するリスクである。本実施形態では、リスクが高いと犯罪となる可能性が高く、リスクが低いと犯罪となる可能性が低い。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following explanation, the “target person” is the target person for risk calculation.
In addition, "risk" means that the subject is monitored (for example, foreign PEPs (persons who occupy an important position in foreign governments (foreign heads of state, etc.) and those who were in that position, their families, and substance). It is a risk that falls under the category of corporations whose rulers are these persons) and crimes such as money laundering. In this embodiment, a high risk is likely to be a crime, and a low risk is unlikely to be a crime.

［実施形態］
図１は、実施形態に係る情報処理システム１の概要構成の一例を示す図である。情報処理システム１は、サーバ２及びユーザ端末３がネットワーク４を介して接続された構成を有する。なお、情報処理システム１が具備するサーバ２、ユーザ端末３の数は任意である。 [Embodiment]
FIG. 1 is a diagram showing an example of an outline configuration of an information processing system 1 according to an embodiment. The information processing system 1 has a configuration in which a server 2 and a user terminal 3 are connected via a network 4. The number of servers 2 and user terminals 3 included in the information processing system 1 is arbitrary.

ネットワーク４には、例えば、他のシステムのサーバ（情報源）が接続されており、サーバ２は、ネットワーク４を介して上記サーバにアクセスし、記憶されている情報（後述する対象者に関する情報）を取得することができるように構成されている。上記サーバの情報の取得には、クローラーやスクレイパーなどのソフトウェアを利用することができる。なお、ネットワーク４をどのような通信網で構成するかは任意である。 For example, a server (information source) of another system is connected to the network 4, and the server 2 accesses the server via the network 4 and stores information (information about a target person described later). Is configured to be able to be obtained. Software such as crawlers and scrapers can be used to acquire the information on the server. It should be noted that what kind of communication network the network 4 is configured with is arbitrary.

図２は、本実施形態に係るサーバ２（情報処理装置）のハード構成の一例を示す図である。図２に示すように、サーバ２は、通信ＩＦ２００Ａ、記憶装置２００Ｂ及びＣＰＵ２００Ｃがバスを介して接続された構成を備える。 FIG. 2 is a diagram showing an example of a hardware configuration of the server 2 (information processing apparatus) according to the present embodiment. As shown in FIG. 2, the server 2 includes a configuration in which a communication IF 200A, a storage device 200B, and a CPU 200C are connected via a bus.

通信ＩＦ２００Ａは、外部端末と通信するためのインターフェースである。 The communication IF200A is an interface for communicating with an external terminal.

記憶装置２００Ｂは、例えば、ＨＤＤや半導体記憶装置である。記憶装置２００Ｂには、サーバ２で利用する情報処理プログラムや各種データベースが記憶されている。なお、本実施形態では、情報処理プログラムや各種データベースは、サーバ２の記憶装置２００Ｂに記憶されているが、ＵＳＢメモリなどの外部記憶装置やネットワークを介して接続された外部サーバに記憶し、必要に応じて参照やダウンロード可能に構成されていてもよい。 The storage device 200B is, for example, an HDD or a semiconductor storage device. The information processing program and various databases used by the server 2 are stored in the storage device 200B. In the present embodiment, the information processing program and various databases are stored in the storage device 200B of the server 2, but are stored in an external storage device such as a USB memory or an external server connected via a network, and are required. It may be configured to be referenceable or downloadable depending on the situation.

図３は、サーバ２の記憶装置２００Ｂに記憶されているデータベースの一例を示す図である。図３に示すように、記憶装置２００Ｂには、補正パタンデータベース１（以下、補正パタンＤＢ１ともいう）、分類用データベース２（以下、分類用ＤＢ２ともいう）、項目マスタデータベース３（以下、項目マスタＤＢ３ともいう）、表記変更用データベース４（以下、表記変更用ＤＢ４ともいう）、対象者データベース５（以下、対象者ＤＢ５ともいう）、リスク算出用データベース６（以下、リスク算出用ＤＢ６ともいう）が記憶されている。 FIG. 3 is a diagram showing an example of a database stored in the storage device 200B of the server 2. As shown in FIG. 3, the storage device 200B includes a correction pattern database 1 (hereinafter, also referred to as a correction pattern DB1), a classification database 2 (hereinafter, also referred to as a classification DB2), and an item master database 3 (hereinafter, an item master). DB3), notation change database 4 (hereinafter, also referred to as notation change DB4), target person database 5 (hereinafter, also referred to as target person DB5), risk calculation database 6 (hereinafter, also referred to as risk calculation DB6). Is remembered.

（補正パタンＤＢ１）
補正パタンＤＢ１は、書類の画像データを補正するための補正パタンが複数記憶されている。図４（ａ）は、補正パタンＤＢ１に記憶されている情報の一例を示す図である。図４（ａ）に示すように、複数の補正パタンは、それぞれ１以上の補正を組み合わせて構成されている。例えば、補正パタン１は、補正１及び３を組み合わせて構成される。また、補正パタン２は、補正１、２及び４を組み合わせて構成される。また、補正パタン３は、補正１、２及び３を組み合わせて構成される。また、補正パタン４は、補正１、３及び５を組み合わせて構成される。また、補正パタン５は、補正１及び４を組み合わせて構成される。なお、補正パタンの数は５に限られず３以上であればよい。 (Correction pattern DB1)
The correction pattern DB 1 stores a plurality of correction patterns for correcting the image data of the document. FIG. 4A is a diagram showing an example of information stored in the correction pattern DB1. As shown in FIG. 4A, each of the plurality of correction patterns is configured by combining one or more corrections. For example, the correction pattern 1 is configured by combining corrections 1 and 3. Further, the correction pattern 2 is configured by combining corrections 1, 2 and 4. Further, the correction pattern 3 is configured by combining corrections 1, 2 and 3. Further, the correction pattern 4 is configured by combining corrections 1, 3 and 5. Further, the correction pattern 5 is configured by combining corrections 1 and 4. The number of correction patterns is not limited to 5, and may be 3 or more.

また、補正１〜補正５は、例えば、それぞれ遠近法ワープ（台形補正）、明るさ補正、コントラスト補正、ガウス補正、ぼかし補正などである。なお、図４（ａ）に示す各補正パタンの補正の組み合わせはあくまで一例であり、各補正パタンをどのような補正で構成するかは任意である。また、補正は、補正１〜補正５の５つに限られない。 Further, the corrections 1 to 5 are, for example, perspective warp (trapezoidal correction), brightness correction, contrast correction, gauss correction, blur correction, and the like, respectively. The combination of corrections of each correction pattern shown in FIG. 4A is only an example, and the type of correction for each correction pattern is arbitrary. Further, the correction is not limited to the five corrections 1 to 5.

（分類用ＤＢ２）
分類用ＤＢ２には、書類を分類するための情報が記憶されている。図４（ｂ）は、分類用ＤＢ２に記憶されている情報の一例を示す図である。図４（ｂ）に示すように、分類用ＤＢ２には、書類の種別ごとに特有のパタンマッチ用データ（画像データや特徴点データ（例えば、印章の画像データや特徴点データなど））やキーワード（ＫＷ）が関連付けて記憶されている。なお、図４（ｂ）に示すパタンマッチ用データやキーワードはあくまで一例であり、書類を分類するための情報として、どのようなパタンマッチ用データやキーワードとするかは任意である。後述の分類部２０６は、分類用ＤＢ２を参照し、文書の画像データにパタンマッチ用データやキーワードが含まれているが否かに基づいて、書類の画像データを分類する。 (DB2 for classification)
Information for classifying documents is stored in the classification DB 2. FIG. 4B is a diagram showing an example of information stored in the classification DB 2. As shown in FIG. 4B, the classification DB 2 contains pattern matching data (image data, feature point data (for example, image data of a seal, feature point data, etc.)) and keywords peculiar to each type of document. (KW) is associated and stored. The pattern matching data and keywords shown in FIG. 4B are merely examples, and what kind of pattern matching data and keywords should be used as information for classifying documents is arbitrary. The classification unit 206, which will be described later, refers to the classification DB 2 and classifies the image data of the document based on whether or not the image data of the document contains pattern matching data or keywords.

（項目マスタＤＢ３）
項目マスタＤＢ３には、取得する項目の情報が書類の種別ごとに記憶されている。図４（ｃ）は、項目マスタＤＢ３に記憶されている情報の一例を示す図である。図４（ｃ）に示すように、項目マスタＤＢ３には、書類の種別ごとに取得する情報の項目が関連付けて記憶されている。なお、書類からどのような項目の情報を取得するかは任意である。 (Item master DB3)
The item master DB 3 stores information on the items to be acquired for each type of document. FIG. 4C is a diagram showing an example of information stored in the item master DB 3. As shown in FIG. 4C, the item master DB 3 stores information items to be acquired for each type of document in association with each other. It is arbitrary what kind of item information is obtained from the document.

（表記変更用ＤＢ４）
表記変更用ＤＢ４には、サーバ２が取得した対象者に関する情報の表記を所定の表記に変更するための情報が記憶されている。具体的には、表記変更用ＤＢ４には、ロジック処理用の情報及び第１，第２マスタ処理用の情報が記憶されている。 (DB4 for notation change)
The notation changing DB 4 stores information for changing the notation of the information about the target person acquired by the server 2 to a predetermined notation. Specifically, the notation change DB 4 stores information for logic processing and information for first and second master processing.

ロジック処理用の情報は、表記を変更する対象（以下、処理摘要対象ともいう）である対象者に関する情報と、該情報の表記を変更する処理ルールとを関連付けた情報である。図５（ａ）は、表記変更用ＤＢ４に記憶されているロジック処理用の情報の一例を示す図であり、処理摘要対象に処理ルールが対応づけられている。図５（ａ）に示す例では、処理摘要対象が「「数字（カンマ含む）ｘ桁」＋「千円」」となっている場合、「「数字（カンマ含む）ｘ桁」＋「,000」」に置換することが規定されている。例えば、図５（ｂ）に示すように処理摘要対象が「1,000千円」である場合に、図５（ａ）に例示する処理ルールを適用すると、「1,000,000」に表記が変更される。
なお、図５（ａ）及び図５（ｂ）に示す例は、あくまで一例であり、ロジック処理用の情報には、処理摘要対象と、該処理摘要対象に対する処理ルールの対応づけのパタンが種々含まれている。例えば、ロジック処理用の情報には、処理摘要対象が「「数字（カンマ含む）ｘ桁」＋「百万円」」となっている場合、「「数字（カンマ含む）ｘ桁」＋「,000,000」」に置換することが規定されていてもよい。また、逆に、表記が「1,000,000」である場合に「1,000千円」、「1,000,000,000」である場合に「1,000百万円」となるように処理ルールが規定されていてもよい。 The information for logic processing is information in which information about a target person whose notation is changed (hereinafter, also referred to as a processing description target) is associated with a processing rule for changing the notation of the information. FIG. 5A is a diagram showing an example of information for logic processing stored in the notation change DB 4, and a processing rule is associated with the processing description target. In the example shown in FIG. 5A, when the processing description target is "number (including comma) x digit" + "thousand yen", "number (including comma) x digit" + ",000". It is stipulated to replace it with "". For example, when the processing description target is "1,000 thousand yen" as shown in FIG. 5 (b) and the processing rule illustrated in FIG. 5 (a) is applied, the notation is changed to "1,000,000".
The examples shown in FIGS. 5 (a) and 5 (b) are merely examples, and the information for logic processing has various patterns of associating the processing description target with the processing rule for the processing description target. include. For example, in the information for logic processing, if the processing description target is "number (including comma) x digit" + "million yen", "number (including comma) x digit" + ", It may be specified to replace with "000,000". On the contrary, the processing rule may be stipulated so that when the notation is "1,000,000", it is "1,000 thousand yen", and when it is "1,000,000,000", it is "1,000 million yen".

また、表記を変更する対象（以下、処理摘要対象ともいう）である対象者に関する情報として住所と、該住所の表記を変更する処理ルールとを関連付けた例について説明する。
例えば、住所の番地が「東京都港区赤坂５−５−５」とハイフンで表記されている場合、「東京都港区赤坂５丁目５−５」というように住所の最初のハイフンを「丁目」に変更する処理ルールを設けてもよい。なお、表記を統一するのが目的であるため、住所の表記が「東京都港区赤坂５丁目５−５」である場合に、「東京都港区赤坂５−５−５」とする処理ルールを設けてもよい。また、「東京都港区赤坂５−５−５」を「東京都港区赤坂５丁目５番５号」というように最初のハイフンを「丁目」、次のハイフンを「番」、次のハイフンを「号」に変更する処理ルールを設けてもよい。また、逆に「東京都港区赤坂５丁目５番５号」を「東京都港区赤坂５−５−５」とする処理ルールでもよい。
このように、ロジック処理用の情報は、処理摘要対象と、該処理摘要対象に対する処理ルールとを対応付けられた情報であり、該ロジック処理用の情報を参照することで、表記を所定の統一された表記に変更することができる。 Further, an example in which an address is associated with a processing rule for changing the notation of the address as information on a target person who is a target for which the notation is changed (hereinafter, also referred to as a processing summary target) will be described.
For example, if the address of the address is written as "5-5-5 Akasaka, Minato-ku, Tokyo" with a hyphen, the first hyphen of the address is "chome" such as "5-5-5 Akasaka, Minato-ku, Tokyo". You may provide a processing rule to change to. Since the purpose is to unify the notation, if the notation of the address is "5-5-5 Akasaka, Minato-ku, Tokyo", the processing rule is "5-5-5 Akasaka, Minato-ku, Tokyo". May be provided. In addition, the first hyphen is "chome", the next hyphen is "ban", and the next hyphen is "5-5-5 Akasaka, Minato-ku, Tokyo" as "5-5-5, Akasaka, Minato-ku, Tokyo". May be provided with a processing rule to change to "No.". On the contrary, the processing rule may be such that "5-5-5 Akasaka, Minato-ku, Tokyo" is set to "5-5-5 Akasaka, Minato-ku, Tokyo".
As described above, the information for logic processing is information in which the processing description target and the processing rule for the processing description target are associated with each other, and the notation is unified by referring to the information for logic processing. It can be changed to the written notation.

第１マスタ処理用の情報は、処理摘要対象である対象者に関する情報と、該情報の辞書データによる上書き処理内容とを関連付けた情報である。図５（ｃ）は、表記変更用ＤＢ４に記憶されている第１マスタ処理用の情報の一例を示す図であり、処理摘要対象に辞書データによる上書き処理内容が対応づけられている。図５（ｃ）に示す例では、処理摘要対象が「「金額」項目内の文字が「戦円」」となっている場合、「「千円」」に上書きすることが規定されている。例えば、図５（ｄ）に示すように処理摘要対象が「1,000戦円」である場合に、図５（ｄ）に例示する処理内容を適用すると、「1,000千円」に誤記が変更される。なお、図５（ｃ）及び図５（ｄ）に示す例は、あくまで一例であり、第１マスタ処理用の情報には、処理摘要対象に辞書データによる上書き処理内容のパタンが種々含まれている。 The information for the first master processing is information in which the information about the target person who is the processing summary target and the content of the overwriting processing by the dictionary data of the information are associated with each other. FIG. 5C is a diagram showing an example of information for the first master processing stored in the notation change DB 4, and the processing description target is associated with the overwriting processing content by the dictionary data. In the example shown in FIG. 5 (c), when the processing description target is "war yen" in the "amount" item, it is stipulated that "thousand yen" is overwritten. For example, if the processing description target is "1,000 battle yen" as shown in FIG. 5 (d) and the processing content illustrated in FIG. 5 (d) is applied, the error is changed to "1,000 thousand yen". .. The examples shown in FIGS. 5 (c) and 5 (d) are merely examples, and the information for the first master processing includes various patterns of overwriting processing contents by dictionary data in the processing description target. There is.

また、表記変更用ＤＢ４に、第１マスタ処理用の情報として、外字（ガイジ）を変更するための辞書データを格納してもよい。外字とは、ＩＭＥなどの文字入力ソフトに登録されていない文字であり、テキスト入力の際に変換しても表示できない文字である。外字を扱うためには外字エディタ等を利用する必要があるため、対象者に関する情報に外字が含まれている場合、外字をＩＭＥなどの文字入力ソフトで扱える文字に変更することが好ましい。変更例を以下に示す。
変換前：「高」崎太郎（「高」は梯子高）
変換後：高崎太郎
上記の変更例では、「「高」（梯子高）」がＩＭＥなどの文字入力ソフトに登録されている「高」に変更されている。 Further, the dictionary data for changing the external character (gaiji) may be stored in the notation changing DB 4 as the information for the first master processing. External characters are characters that are not registered in character input software such as IME, and cannot be displayed even if they are converted during text input. Since it is necessary to use an external character editor or the like in order to handle external characters, it is preferable to change the external characters to characters that can be handled by character input software such as IME when the information about the target person includes external characters. An example of the change is shown below.
Before conversion: "Taka" Taro Saki ("Taka" is the height of the ladder)
After conversion: Taro Takasaki In the above change example, "high" (ladder height) is changed to "high" registered in character input software such as IME.

このように、表記変更用ＤＢ４に、外字（ガイジ）を変更するための辞書を格納し、対象者に関する情報に外字が含まれている場合、外字をＩＭＥなどの文字入力ソフトで扱える文字に変更する構成としてもよい。 In this way, the dictionary for changing the external character (gaiji) is stored in the notation change DB4, and if the information about the target person contains the external character, the external character is changed to a character that can be handled by character input software such as IME. It may be configured to be used.

また、表記変更用ＤＢ４に、第１マスタ処理用の情報として、住所の誤記を修正するための辞書を格納してもよい。該辞書には、正規の住所の表記が格納される。ここで、正規の住所は、日本の行政区画に基づいて決められた都道府県名、都道府県名に含まれる市、区、村、郡などの名称を関連付けた情報であり、この情報を利用することで、住所の都道府県名に含まれる市、区、村、郡などの誤記を修正することができる。
例えば、対象者に関する情報として下記の修正前住所が取得された場合、赤坂は渋谷区ではなく港区であるため、辞書に格納された正規の住所を利用して下記のように住所が修正される。
修正前住所：東京都渋谷区赤坂３丁目３−３
修正後住所：東京都港区赤坂３丁目３−３ Further, the notation change DB 4 may store a dictionary for correcting an erroneous address as information for the first master processing. The dictionary stores the notation of a legitimate address. Here, the official address is information that associates the names of prefectures determined based on the administrative divisions of Japan, and the names of cities, wards, villages, counties, etc. included in the names of prefectures, and this information is used. By doing so, it is possible to correct erroneous descriptions of cities, wards, villages, counties, etc. included in the prefecture name of the address.
For example, if the following uncorrected address is obtained as information about the target person, Akasaka is not Shibuya Ward but Minato Ward, so the address will be corrected as follows using the regular address stored in the dictionary. NS.
Address before correction: 3-3-3 Akasaka, Shibuya-ku, Tokyo
Corrected address: 3-3-3 Akasaka, Minato-ku, Tokyo

また、他の例を示すと、対象者に関する情報として下記の修正前住所が取得された場合、行政区画では大字は使用されないため、辞書に格納された正規の住所を利用して下記のように住所が修正される。
修正前住所：愛知県知多郡東浦町大字藤江字柳牛３４−２
修正後住所：愛知県知多郡東浦町藤江柳牛３４−２
このように、第１マスタ処理用の情報は、誤記となる表記と、該誤記に対する正しい表記とが種々対応付けられた情報であり、該第１マスタ処理用の情報を参照することで、誤記を正しい表記に修正することができる。 In addition, to show another example, if the following uncorrected address is obtained as information about the target person, large letters are not used in the administrative division, so the regular address stored in the dictionary is used as shown below. The address is corrected.
Address before correction: 34-2 Yagyu, Oaza Fujie, Higashiura-cho, Chita-gun, Aichi
Corrected address: 34-2 Fujie Yagyu, Higashiura-cho, Chita-gun, Aichi Prefecture
As described above, the information for the first master process is information in which the notation that is erroneous and the correct notation for the erroneous description are variously associated with each other, and by referring to the information for the first master process, the erroneous description is made. Can be corrected to the correct notation.

また、第２マスタ処理用の情報は、対象者に関する情報に含まれる不要な文字データを削除するための情報である。具体的には、第２マスタ処理用の情報は、対象者に関する情報と、該情報に含まれる文字の情報である。
このように、第２マスタ処理用の情報は、対象者に関する情報と、該情報に含まれる文字の情報とが対応付けられた情報であり、該第２マスタ処理用の情報を参照することで、該対象者に関する情報に含まれる文字以外の不要な文字データを認識して、該不要な文字データを削除することができる。 Further, the information for the second master process is information for deleting unnecessary character data included in the information about the target person. Specifically, the information for the second master processing is information about the target person and character information included in the information.
As described above, the information for the second master processing is the information in which the information about the target person and the information of the characters included in the information are associated with each other, and by referring to the information for the second master processing. , Unnecessary character data other than the characters included in the information about the target person can be recognized and the unnecessary character data can be deleted.

（対象者ＤＢ５）
対象者ＤＢ５には、対象者に関する情報が対象者ＩＤに関連付けて記憶されている。具体的には、対象者ＤＢ５には、対象者の銀行口座（以下、単に口座ともいう）での取引情報（以下、単に口座取引情報ともいう）、や警察庁データ（例えば、犯罪履歴情報や反社会勢力情報）、割賦販売法・貸金業法信用情報機関データ（例えば、クレジットカード会社、収納代行会社（口座振替）による未納情報）などのいわゆるブラックリスト情報が対象者ＩＤに関連付けて記憶されている。
ここで、対象者が個人の場合、氏名、住所、連絡先、個人の口座取引情報及びブラックリスト情報が対象者ＩＤに関連付けて記憶されている。
また、対象者が法人の場合、法人の所在地、法人の連絡先、代表者の氏名、常任代理人の氏名、実質的支配者の氏名、代理人の氏名、法人の口座取引情報及び上記代表者、常任代理人、実質的支配者、代理人のブラックリスト情報が対象者ＩＤに関連付けて記憶されている。
対象者に関する情報に、年齢、性別、住所、生年月日、本籍、ＳＮＳ参考情報などを含めるようにしてもよい。「ＳＮＳ参考情報」は、審査対象者による所定のＳＮＳの投稿内容から本人情報の信頼性をチェックしたものである。
なお、対象者ＤＢ５に記憶される対象者に関する情報は、後述の取得部２１０により取得される (Target person DB5)
Information about the target person is stored in the target person DB 5 in association with the target person ID. Specifically, the target person DB5 contains transaction information (hereinafter, also simply referred to as account transaction information) in the target person's bank account (hereinafter, also simply referred to as an account), police agency data (for example, criminal history information, etc.). So-called blacklist information such as (anti-social force information), installment sales method / money lending business method credit information agency data (for example, unpaid information by credit card company, storage agency (account transfer)) is stored in association with the target person ID. There is.
Here, when the target person is an individual, the name, address, contact information, personal account transaction information, and blacklist information are stored in association with the target person ID.
If the target person is a corporation, the location of the corporation, the contact information of the corporation, the name of the representative, the name of the standing proxy, the name of the beneficial owner, the name of the agent, the account transaction information of the corporation, and the above representative. , Standing proxy, beneficial owner, and proxy blacklist information are stored in association with the subject ID.
Information about the subject may include age, gender, address, date of birth, registered domicile, SNS reference information, and the like. The "SNS reference information" is a check of the reliability of the personal information from the content posted by the predetermined SNS by the examinee.
The information about the target person stored in the target person DB 5 is acquired by the acquisition unit 210 described later.

（リスク算出用ＤＢ６）
リスク算出用ＤＢ６には、対象者に関する情報から対象者が監視対象となるリスク（以下、単にリスクとも記載する）を算出するための情報が記憶されている。具体的には、リスク算出用ＤＢ６には、対象者ＤＢ５に記憶された対象者に関する情報の各項目の組み合わせ（条件）に対して設定されたリスク算出用スコア（以下、リスクスコアともいう）が複数記憶されている。 (DB6 for risk calculation)
The risk calculation DB 6 stores information for calculating the risk (hereinafter, also simply referred to as risk) for the target person to be monitored from the information about the target person. Specifically, the risk calculation DB 6 has a risk calculation score (hereinafter, also referred to as a risk score) set for each combination (condition) of information about the target person stored in the target person DB 5. Multiple are stored.

例えば、マネーロンダリング等の犯罪リスクを算出する場合に利用される項目には、以下の情報がある。
（１−１）取引期間
（１−２）取引金額
（１−３）取引頻度
（１−４）犯罪歴
（１−５）所在地（個人の場合は住所）
（１−６）口座へのアクセス履歴
図５（ｅ）に条件及びリスクスコアの組み合わせの一例を示す。図５（ｅ）に示す例では、「取引金額が１か月の期間に500万円以上かつ犯罪履歴有り」の場合に「リスクスコア」が「30」となることが示されている。 For example, the items used when calculating crime risk such as money laundering include the following information.
(1-1) Transaction period (1-2) Transaction amount (1-3) Transaction frequency (1-4) Criminal record (1-5) Location (address in the case of an individual)
(1-6) Access history to account Figure 5 (e) shows an example of a combination of conditions and risk scores. In the example shown in FIG. 5 (e), it is shown that the "risk score" is "30" when "the transaction amount is 5 million yen or more in a period of one month and there is a criminal history".

また、例えば、外国ＰＥＰｓ（外国の政府等において重要な地位を占める者（外国の国家元首等）とその地位にあった者、それらの家族および実質的支配者がこれらの者である法人）であるリスクを算出する場合に利用される項目には、以下の情報がある。
（２−１）外国人要人リスト
（２−２）顧客データ
（２−３）クローリングによるＷＥＢ情報
（２−４）その他対象者に関する種々の情報
上記（２−１）〜（２−４）の情報をもとに、対象者の名前、所在地、出身国、年齢を取引履歴、入出金履歴等を取得し、この取得した情報をもとに対象者が外国ＰＥＰｓに該当するリスクを算出する。例えば、対象者が外国人要人リストと一致する場合には、外国ＰＥＰｓに該当する非常に高いリスク（スコアが略１００％となる）が算出される。 Also, for example, in foreign PEPs (persons who occupy important positions in foreign governments (heads of state of foreign countries, etc.) and those who were in that position, their families, and corporations whose beneficial owners are these persons). The items used to calculate a certain risk include the following information.
(2-1) List of foreign VIPs (2-2) Customer data (2-3) WEB information by crawling (2-4) Other various information about the target person (2-1) to (2-4) above Based on the information of, the target person's name, location, country of origin, age, transaction history, deposit / withdrawal history, etc. are acquired, and the risk that the target person corresponds to foreign PEPs is calculated based on this acquired information. .. For example, if the subject matches the list of foreign dignitaries, a very high risk (score is approximately 100%) corresponding to foreign PEPs is calculated.

ＣＰＵ２００Ｃは、サーバ２を制御し、図示しないＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）を備えている。 The CPU 200C controls the server 2 and includes a ROM (Read Only Memory) and a RAM (Random Access Memory) (not shown).

図６に示すように、サーバ２は、受信部２０１、送信部２０２、記憶装置制御部２０３、補正部２０４、認識部２０５、分類部２０６、判定部２０７、探索部２０８、結合部２０９、取得部２１０、表記変更部２１１、統合部２１２、算出部２１３などの機能を有する。なお、図６に示す機能は、サーバ２のＲＯＭ（不図示）に記憶された情報処理プログラムをＣＰＵ２００Ｃが実行することにより実現される。 As shown in FIG. 6, the server 2 has a receiving unit 201, a transmitting unit 202, a storage device control unit 203, a correction unit 204, a recognition unit 205, a classification unit 206, a determination unit 207, a search unit 208, a coupling unit 209, and an acquisition unit. It has functions such as a unit 210, a notation changing unit 211, an integrated unit 212, and a calculation unit 213. The function shown in FIG. 6 is realized by the CPU 200C executing an information processing program stored in the ROM (not shown) of the server 2.

受信部２０１は、外部から送信される情報、例えば、対象者に関する情報を受信する。 The receiving unit 201 receives information transmitted from the outside, for example, information about the target person.

送信部２０２は、情報を外部へ送信する。 The transmission unit 202 transmits information to the outside.

記憶装置制御部２０３は、記憶装置２００Ｂを制御する。具体的には、記憶装置制御部２０３は、記憶装置２００Ｂを制御して情報の書き込みや読み出しを行う。 The storage device control unit 203 controls the storage device 200B. Specifically, the storage device control unit 203 controls the storage device 200B to write and read information.

補正部２０４は、書類の画像データを、補正パタンＤＢ１に記憶されている複数の補正パタン（各補正パタンには、各々１以上の異なる補正が含まれている）により補正し、各補正パタンに対応する複数の補正後の画像を生成する。具体的には、補正部２０４は、画像データを補正パタン１で補正した補正後画像データ１を生成する。また、補正部２０４は、画像データを補正パタン２で補正した補正後画像データ２を生成する。また、補正部２０４は、画像データを補正パタン３で補正した補正後画像データ３を生成する。また、補正部２０４は、画像データを補正パタン４で補正した補正後画像データ４を生成する。また、補正部２０４は、画像データを補正パタン５で補正した補正後画像データ５を生成する。 The correction unit 204 corrects the image data of the document by a plurality of correction patterns stored in the correction pattern DB1 (each correction pattern includes one or more different corrections), and the correction unit 204 corrects each correction pattern. Generate a plurality of corresponding corrected images. Specifically, the correction unit 204 generates the corrected image data 1 in which the image data is corrected by the correction pattern 1. Further, the correction unit 204 generates the corrected image data 2 in which the image data is corrected by the correction pattern 2. Further, the correction unit 204 generates the corrected image data 3 in which the image data is corrected by the correction pattern 3. Further, the correction unit 204 generates the corrected image data 4 in which the image data is corrected by the correction pattern 4. Further, the correction unit 204 generates the corrected image data 5 in which the image data is corrected by the correction pattern 5.

認識部２０５は、書類の画像データから文字を認識する。ここで、認識部２０５は、補正部２０４で生成された各補正パタンに対応する複数の補正後の画像から文字を認識する。次いで、認識部２０５は、複数の補正後の画像データから認識した文字のうち最も多いものを選択し、認識した文字とする。また、認識部２０５は、認識した文字に位置情報を付与する。なお、位置情報は、書類の左上をゼロ点としたＸＹ座標により表され、書類に向かって横方向がＸ軸（右方向が正）、縦方向がＹ軸（下方向が正）となっている。なお、座標を表す数値に画素数を利用してもよい。また、書類のどの位置をゼロ点とするかは任意である。また、座標を表す数値に画素数以外の数値を利用してもよい。 The recognition unit 205 recognizes characters from the image data of the document. Here, the recognition unit 205 recognizes characters from a plurality of corrected images corresponding to each correction pattern generated by the correction unit 204. Next, the recognition unit 205 selects the most recognized character from the plurality of corrected image data and uses it as the recognized character. Further, the recognition unit 205 adds position information to the recognized characters. The position information is represented by XY coordinates with the upper left of the document as the zero point, and the horizontal direction is the X axis (positive in the right direction) and the vertical direction is the Y axis (positive in the downward direction) toward the document. There is. The number of pixels may be used as a numerical value representing the coordinates. In addition, which position of the document is set as the zero point is arbitrary. Further, a numerical value other than the number of pixels may be used as the numerical value representing the coordinates.

なお、認識部２０５は、書類の画像データから文字を認識する際、認識した文字の位置情報（座標）が横方向（Ｘ軸）又は縦方向（Ｙ軸）において所定距離内である場合（例えば、文字の位置を示す座標が重なっている場合）、一続きの言葉を構成する文字であると認識し、認識した文字の位置情報（座標）が横方向（Ｘ軸）又は縦方向（Ｙ軸）において所定距離より離れている場合（例えば、文字の位置を示す座標が重なっている場合）、一続きの言葉を構成する文字でなく別の文字又は言葉を構成する文字であると認識する。なお、文字の位置を示す座標が重なっているとは、例えば、「言葉」の文字が書類上に横方向（Ｘ軸）に記載されている場合、「言」の文字の右端の位置座標が、「葉」の文字の左端の位置座標よりも横方向（Ｘ軸）において右側に存在する場合、換言すると、「葉」の文字の左端の位置座標が、「言」の文字の右端の位置座標よりも横方向（Ｘ軸）において左側に存在する場合をいう。また、例えば、「言葉」の文字が書類上に縦方向（Ｙ軸）に記載されている場合、「言」の文字の下端の位置座標が、「葉」の文字の上端の位置座標よりも縦方向（Ｙ軸）において下側に存在する場合、換言すると、「葉」の文字の上端の位置座標が、「言」の文字の下端の位置座標よりも縦方向（Ｘ軸）において上側に存在する場合をいう。 When the recognition unit 205 recognizes a character from the image data of the document, the position information (coordinates) of the recognized character is within a predetermined distance in the horizontal direction (X axis) or the vertical direction (Y axis) (for example). , When the coordinates indicating the position of the character overlap), it is recognized as a character that constitutes a series of words, and the position information (coordinates) of the recognized character is in the horizontal direction (X axis) or the vertical direction (Y axis). ), When the distance is longer than a predetermined distance (for example, when the coordinates indicating the positions of the characters overlap), it is recognized that the character is not a character constituting a series of words but another character or a character constituting the word. Note that the coordinates indicating the position of the characters overlap, for example, when the character of "word" is written in the horizontal direction (X-axis) on the document, the position coordinate of the right end of the character of "word" is , If it is on the right side in the lateral direction (X-axis) from the position coordinates of the left end of the character of "leaf", in other words, the position coordinate of the left end of the character of "leaf" is the position of the right end of the character of "word". It means that it exists on the left side in the lateral direction (X axis) from the coordinates. Further, for example, when the character of "word" is written in the vertical direction (Y axis) on the document, the position coordinate of the lower end of the character of "word" is higher than the position coordinate of the upper end of the character of "leaf". If it exists below in the vertical direction (Y-axis), in other words, the position coordinates of the upper end of the character "leaf" are above the position coordinates of the lower end of the character "word" in the vertical direction (X-axis). Refers to the case where it exists.

分類部２０６は、分類用ＤＢ２を参照し、書類の画像データを分類する。具体的には、分類部２０６は、分類用ＤＢ２を参照し、書類の種別ごとに用意されたパタンマッチ用データやキーワードが存在するか否かに応じて、画像データのもととなった書類を分類する。分類部２０６は、分類した情報（書類の種別情報）を書類の画像データに付与する。 The classification unit 206 refers to the classification DB 2 and classifies the image data of the document. Specifically, the classification unit 206 refers to the classification DB 2 and documents that are the source of the image data according to whether or not pattern matching data and keywords prepared for each type of document exist. To classify. The classification unit 206 adds the classified information (document type information) to the image data of the document.

判定部２０７は、取得する項目の情報が書類の種別ごとに記憶された項目マスタＤＢ３を参照し、認識部２０５が認識した文字に項目が存在するか否かを、項目ごとに判定する。 The determination unit 207 refers to the item master DB 3 in which the information of the items to be acquired is stored for each type of document, and determines whether or not the item exists in the character recognized by the recognition unit 205 for each item.

探索部２０８は、判定部２０７が存在すると判定しない項目がある場合、該項目を構成する各文字を認識部２０５が認識した文字から探索する。ここで、探索部２０８は、各文字のうちの１文字を起点とした所定範囲内に項目を構成する他の文字が存在するか探索する。 When there is an item that the determination unit 207 does not determine to exist, the search unit 208 searches for each character constituting the item from the characters recognized by the recognition unit 205. Here, the search unit 208 searches for the existence of other characters constituting the item within a predetermined range starting from one of the characters.

結合部２０９は、探索部２０８により探索された書類の画像データ上の文字を項目として認識可能なようにデータ的に結合する。 The joining unit 209 joins the characters on the image data of the document searched by the search unit 208 in terms of data so that they can be recognized as items.

取得部２１０は、ネットワーク４を介して接続された他のシステムのサーバやユーザ端末３から対象者に関する情報を取得する。なお、取得部２１０は、種々の情報源から種々の手法を利用して対象者に関する情報を取得する。
「種々の情報源」には、例えば、警察庁データ、割賦販売法・貸金業法信用情報機関データなどが含まれる。
「種々の手法」には、例えば、クローリング（ＷＥＢサイト上の情報を取得する技術）、ＯＣＲ（Optical Character Recognition）、他のシステムとのＡＰＩ（Application Programming Interface）連携などの他、ユーザ端末３を利用して入力された情報などが含まれる。 The acquisition unit 210 acquires information about the target person from a server or a user terminal 3 of another system connected via the network 4. The acquisition unit 210 acquires information about the target person from various information sources by using various methods.
"Various information sources" include, for example, National Police Agency data, installment sales method / money lending business law credit information agency data, and the like.
"Various methods" include, for example, crawling (technology for acquiring information on a WEB site), OCR (Optical Character Recognition), API (Application Programming Interface) linkage with other systems, and user terminal 3. Information entered using it is included.

なお、取得部２１０は、対象者に関する情報が画像データである場合、探索部２０８での探索結果に応じて、各文字に対応する文字を対象者に関する情報として取得する。具体的には、取得部２１０は、項目に対応する文字を、項目ごとに取得する。より具体的には、取得部２１０は、項目の第１側（本実施形態では右側）に存在する次の項目（次項目）までの文字又は改行までの文字を項目に対応する文字として取得する。また、取得部２１０は、項目の第１側（本実施形態では右側）の所定範囲内に文字（項目を構成する文字を除く）が存在しない場合、項目の第１側とは異なる第２側（本実施形態では下側）に存在する次の項目（次項目）までの文字又は改行までの文字を、項目に対応する文字を対象者に関する情報として取得する。 When the information about the target person is image data, the acquisition unit 210 acquires the characters corresponding to each character as the information about the target person according to the search result in the search unit 208. Specifically, the acquisition unit 210 acquires the character corresponding to the item for each item. More specifically, the acquisition unit 210 acquires the characters up to the next item (next item) existing on the first side (right side in this embodiment) of the item or the characters up to the line feed as the characters corresponding to the item. .. Further, when the characters (excluding the characters constituting the item) do not exist within the predetermined range of the first side (right side in this embodiment) of the item, the acquisition unit 210 has a second side different from the first side of the item. The characters up to the next item (next item) or the characters up to the line feed existing in (lower side in this embodiment) are acquired, and the characters corresponding to the items are acquired as information about the target person.

表記変更部２１１は、取得部２１０で取得された対象者に関する情報の表記を所定の表記に変更する。具体的には、表記変更部２１１は、表記変更用ＤＢ４を参照し、表記変更用ＤＢ４に記憶されているロジック処理用の情報及び第１，第２マスタ処理用の情報に基づいて、取得部２１０で取得された対象者に関する情報の表記を所定の表記に変更する。 The notation changing unit 211 changes the notation of the information about the target person acquired by the acquisition unit 210 to a predetermined notation. Specifically, the notation change unit 211 refers to the notation change DB 4, and is an acquisition unit based on the information for logic processing and the information for the first and second master processes stored in the notation change DB 4. The notation of the information about the target person acquired in 210 is changed to the predetermined notation.

統合部２１２は、表記変更部２１１で所定の表示に変更された対象者に関する情報を統合する。具体的には、統合部２１２は、表記変更部２１１で所定の表示に変更された対象者に関する情報を項目ごとに照合し、重複しない項目を抽出して対象者に関する情報とする。また、統合部２１２は、重複する項目については、最も多い記載を重複する項目に対応する情報として選択する。 The integration unit 212 integrates information about the target person whose display has been changed to a predetermined display by the notation change unit 211. Specifically, the integration unit 212 collates the information about the target person whose display has been changed to the predetermined display by the notation change unit 211 for each item, and extracts non-overlapping items to obtain the information about the target person. Further, the integration unit 212 selects the most duplicated items as the information corresponding to the duplicated items.

算出部２１３は、統合部２１２で統合された対象者に関する情報に基づいて、対象者のリスクを算出する。具体的には、算出部２１３は、対象者ＤＢ５に記憶された対象者に関する情報の各項目に対応する情報を組み合わせて、リスク算出用ＤＢ６に記憶された条件を満たすか否かを判定し、満たす場合に該条件に対応付けられたリスクスコアを該対象者のリスクとする。 The calculation unit 213 calculates the risk of the target person based on the information about the target person integrated by the integration unit 212. Specifically, the calculation unit 213 combines the information corresponding to each item of the information about the target person stored in the target person DB 5 to determine whether or not the condition stored in the risk calculation DB 6 is satisfied. If the conditions are met, the risk score associated with the condition is taken as the risk of the subject.

（ユーザ端末３）
図７は、実施形態に係るユーザ端末３のハード構成及び機能構成の一例を示す図である。図７（ａ）は、ユーザ端末３のハード構成の一例を示す図、図７（ｂ）は、ユーザ端末３の機能構成の一例を示す図である。ユーザ端末３は、ＰＣ（Personal Computer）や携帯端末（例えば、タブレット端末）などである。図７（ａ）に示すように、ユーザ端末３は、通信ＩＦ３００Ａ、記憶装置３００Ｂ、入力装置３００Ｃ、表示装置３００Ｄ、ＣＰＵ３００Ｅなどを備える。 (User terminal 3)
FIG. 7 is a diagram showing an example of a hardware configuration and a functional configuration of the user terminal 3 according to the embodiment. FIG. 7A is a diagram showing an example of the hardware configuration of the user terminal 3, and FIG. 7B is a diagram showing an example of the functional configuration of the user terminal 3. The user terminal 3 is a PC (Personal Computer), a mobile terminal (for example, a tablet terminal), or the like. As shown in FIG. 7A, the user terminal 3 includes a communication IF 300A, a storage device 300B, an input device 300C, a display device 300D, a CPU 300E, and the like.

通信ＩＦ３００Ａは、他の装置（実施形態では、サーバ２）と通信するためのインターフェースである。 The communication IF 300A is an interface for communicating with another device (server 2 in the embodiment).

記憶装置３００Ｂは、例えば、ＨＤＤ（Hard Disk Drive）や半導体記憶装置（ＳＳＤ(Solid State Drive)）である。記憶装置３００Ｂには、ユーザ端末３の識別子（ＩＤ）及び情報処理プログラムなどが記憶されている。なお、識別子は、サーバ２がユーザ端末３に対して新たに付与してもよいし、ＩＰ（Internet Protocol）アドレス、ＭＡＣ（Media Access Control）アドレスなどを利用してもよい。 The storage device 300B is, for example, an HDD (Hard Disk Drive) or a semiconductor storage device (SSD (Solid State Drive)). The storage device 300B stores an identifier (ID) of the user terminal 3, an information processing program, and the like. The identifier may be newly assigned by the server 2 to the user terminal 3, or may use an IP (Internet Protocol) address, a MAC (Media Access Control) address, or the like.

入力装置３００Ｃは、例えば、キーボード、タッチパネルなどであり、入力装置３００Ｃを操作して、情報処理システム１の利用に必要な情報（例えば、対象者に関する情報（画像データを含む））を入力することができる。 The input device 300C is, for example, a keyboard, a touch panel, or the like, and operates the input device 300C to input information necessary for using the information processing system 1 (for example, information about a target person (including image data)). Can be done.

表示装置３００Ｄは、例えば、液晶モニタや有機ＥＬモニタなどである。表示装置３００Ｄは、情報処理システム１の利用に必要な画面（例えば、対象者に関する情報を入力するための画面（画像データを含む）、サーバ２により算出された対象者のリスクを提示する画面など）を表示する。 The display device 300D is, for example, a liquid crystal monitor, an organic EL monitor, or the like. The display device 300D includes a screen necessary for using the information processing system 1 (for example, a screen for inputting information about the target person (including image data), a screen for presenting the risk of the target person calculated by the server 2, and the like. ) Is displayed.

ＣＰＵ３００Ｅは、ユーザ端末３を制御するものであり、図示しないＲＯＭ及びＲＡＭを備えている。 The CPU 300E controls the user terminal 3 and includes a ROM and a RAM (not shown).

図７（ｂ）に示すように、ユーザ端末３は、受信部３０１、送信部３０２、記憶装置制御部３０３、操作受付部３０４、表示装置制御部３０５などの機能を有する。なお、図７（ｂ）に示す機能は、ＣＰＵ３００Ｅが、記憶装置３００Ｂに記憶されている情報処理プログラムを実行することで実現される。 As shown in FIG. 7B, the user terminal 3 has functions such as a receiving unit 301, a transmitting unit 302, a storage device control unit 303, an operation receiving unit 304, and a display device control unit 305. The function shown in FIG. 7B is realized by the CPU 300E executing an information processing program stored in the storage device 300B.

受信部３０１は、サーバ２から送信される情報を受信する。 The receiving unit 301 receives the information transmitted from the server 2.

送信部３０２は、入力装置３００Ｃを利用して入力された情報に識別子を付与してサーバ２へ送信する。ユーザ端末３から送信される情報に識別子を付与することでサーバ２は、受信した情報がどのユーザ端末３から送信されたものであるかを認識できる。 The transmission unit 302 assigns an identifier to the information input using the input device 300C and transmits the information to the server 2. By assigning an identifier to the information transmitted from the user terminal 3, the server 2 can recognize from which user terminal 3 the received information is transmitted.

記憶装置制御部３０３は、記憶装置３００Ｂを制御する。具体的には、記憶装置制御部３０３は、記憶装置３００Ｂを制御して情報の書き込みや読み出しを行う。 The storage device control unit 303 controls the storage device 300B. Specifically, the storage device control unit 303 controls the storage device 300B to write and read information.

操作受付部３０４は、入力装置３００Ｃでの入力操作を受け付ける。例えば、対象者の関する情報の入力操作（画像データの入力操作を含む）を受け付ける。 The operation receiving unit 304 receives an input operation on the input device 300C. For example, it accepts an input operation (including an image data input operation) of information related to the target person.

表示装置制御部３０５は、表示装置３００Ｄを制御する。具体的には、表示装置制御部３０５は、表示装置３００Ｄを制御して実施形態に係る情報処理システム１の利用に必要な画面（例えば、対象者に関する情報を入力するための画面（画像データを含む）、サーバ２により算出された対象者のリスクを提示する画面など）を表示させる。 The display device control unit 305 controls the display device 300D. Specifically, the display device control unit 305 controls the display device 300D and screens necessary for using the information processing system 1 according to the embodiment (for example, a screen for inputting information about the target person (image data). Includes), a screen that presents the target person's risk calculated by the server 2, etc.) is displayed.

（情報処理方法）
図８は、実施形態に係るサーバのリスク算出処理の一例を示すフローチャートである。 (Information processing method)
FIG. 8 is a flowchart showing an example of the risk calculation process of the server according to the embodiment.

（ステップＳ１０１）
サーバ２の取得部２１０は、２以上の情報源から対象者に関する情報を取得する。 (Step S101)
The acquisition unit 210 of the server 2 acquires information about the target person from two or more information sources.

（ステップＳ１０２）
サーバ２の取得部２１０は、取得した対象者に関する情報がテキストデータであるか否か、換言すると画像データであるか否かを判定する。画像データである場合（ＹＥＳ）、サーバ２は、ステップ１０３の処理を実行する。画像データでない場合（ＮＯ）、換言するとテキストデータである場合、サーバ２は、ステップ１０４の処理を実行する。 (Step S102)
The acquisition unit 210 of the server 2 determines whether or not the acquired information about the target person is text data, in other words, whether or not it is image data. If it is image data (YES), the server 2 executes the process of step 103. If it is not image data (NO), in other words, if it is text data, the server 2 executes the process of step 104.

（ステップＳ１０３）
サーバ２は、文字認識処理を実行する。なお、文字認識処理の詳細は後述する。 (Step S103)
The server 2 executes the character recognition process. The details of the character recognition process will be described later.

（ステップＳ１０４）
サーバ２の表記変更部２１１は、表記変更用ＤＢ４を参照し、表記変更用ＤＢ４に記憶されているロジック処理用の情報及び第１，第２マスタ処理用の情報に基づいて、取得部２１０で取得された対象者に関する情報の表記を所定の表記に変更する。 (Step S104)
The notation change unit 211 of the server 2 refers to the notation change DB 4, and is an acquisition unit 210 based on the information for logic processing and the information for the first and second master processes stored in the notation change DB 4. Change the notation of the acquired information about the target person to the specified notation.

（ステップＳ１０５）
サーバ２の統合部２１２は、表記変更部２１１で所定の表示に変更された対象者に関する情報を統合する。なお、統合部２１２の動作の詳細は、後述の図９を参照して説明する。 (Step S105)
The integration unit 212 of the server 2 integrates information about the target person whose display has been changed to a predetermined display by the notation change unit 211. The details of the operation of the integrated unit 212 will be described with reference to FIG. 9 described later.

（ステップＳ１０６）
算出部２１３は、算出部２１３は、統合部２１２で統合された対象者に関する情報に基づいて、対象者のリスクを算出する。なお、算出部２１３の動作の詳細は説明したので重複する説明は省略する。 (Step S106)
The calculation unit 213 calculates the risk of the target person based on the information about the target person integrated by the integration unit 212. Since the details of the operation of the calculation unit 213 have been described, duplicate explanations will be omitted.

（ステップＳ１０７）
送信部２０２は、算出部２１３で算出されたリスクを該リスクの算出対象となった対象者の情報（例えば、氏名、法人名など）とともに出力する。出力されたリスク及び対象者の情報は、ユーザ端末３の受信部３０１で受信され、ユーザ端末３の表示装置制御部３０３により表示装置３００Ｄに表示される。なお、算出部２１３で算出されたリスク及び対象者の情報を出力する際に、リスクが所定値以上である場合、監視対象となる旨とともにリスク及び対象者の情報を出力し、リスクが所定値未満である場合、監視対象とならない旨とともにリスク及び対象者の情報を出力してもよい。また、リスクが所定値以上となった場合にのみ、リスク及び対象者の情報を出力してもよいし、監視対象となる旨とともにリスク及び対象者の情報を出力してもよい。 (Step S107)
The transmission unit 202 outputs the risk calculated by the calculation unit 213 together with the information of the target person (for example, name, corporate name, etc.) for which the risk is calculated. The output risk and target person information are received by the receiving unit 301 of the user terminal 3 and displayed on the display device 300D by the display device control unit 303 of the user terminal 3. When the risk calculated by the calculation unit 213 and the information of the target person are output, if the risk is equal to or higher than the predetermined value, the information of the risk and the target person is output together with the fact that the risk is to be monitored, and the risk is the predetermined value. If it is less than, the risk and target person information may be output as well as not being monitored. Further, the information on the risk and the target person may be output only when the risk exceeds a predetermined value, or the information on the risk and the target person may be output together with the fact that the information is to be monitored.

図９（ａ）は、実施形態に係るサーバの表記変更部２１１による表記変更の一例を示す図である。図９（ａ）では、表記変更部２１１が金額の表記を所定の表記に変更する例を示している。図９（ａ）に示すように、表記変更部２１１は、表記変更用ＤＢ４に記憶された第１マスタ処理用の情報に基づいて誤記を訂正する（図９（ａ）の例では「戦」の文字を「千」にしている）。また、表記変更部２１１は、表記変更用ＤＢ４に記憶された第２マスタ処理用の情報に基づいて不要な文字を削除する（図９（ａ）の例では「＊税別」の文字が削除されている）。次いで、表記変更部２１１は、表記変更部２１１は、表記変更用ＤＢ４のロジック処理用の情報に基づいて単位を統一する。 FIG. 9A is a diagram showing an example of notation change by the notation change unit 211 of the server according to the embodiment. FIG. 9A shows an example in which the notation changing unit 211 changes the notation of the amount of money to a predetermined notation. As shown in FIG. 9A, the notation changing unit 211 corrects the erroneous description based on the information for the first master processing stored in the notation changing DB 4 (in the example of FIG. 9A, “war”. The character of is "thousand"). Further, the notation changing unit 211 deletes unnecessary characters based on the information for the second master processing stored in the notation changing DB 4 (in the example of FIG. 9A, the characters "* excluding tax" are deleted. ing). Next, the notation changing unit 211 unifies the units of the notation changing unit 211 based on the information for logic processing of the notation changing DB 4.

図９（ｂ）及び図９（ｃ）は、実施形態に係るサーバ２の統合部２１２による情報統合の一例を示す図である。図９（ｂ）は、統合部２１２による統合前の対象者に関する情報、図９（ｃ）は、統合部２１２による統合後の対象者に関する情報である。図９（ｂ）及び図９（ｃ）に示すように、統合部２１２は、表記変更部２１１で所定の表示に変更された対象者に関する情報を項目ごとに統合する。具体的には、統合部２１２は、各項目の情報を表記変更部２１１で所定の表示に変更された対象者に関する情報から取得し、重複する項目については、最も多い記載を重複する項目に対応する情報として選択して、種々の情報源から取得された対象者に関する情報を統合する。 9 (b) and 9 (c) are diagrams showing an example of information integration by the integration unit 212 of the server 2 according to the embodiment. FIG. 9B is information about the target person before the integration by the integration unit 212, and FIG. 9C is information about the target person after the integration by the integration unit 212. As shown in FIGS. 9 (b) and 9 (c), the integration unit 212 integrates information about the target person whose display has been changed to a predetermined display by the notation change unit 211 for each item. Specifically, the integration unit 212 acquires the information of each item from the information about the target person whose display has been changed to the predetermined display by the notation change unit 211, and for the duplicated items, corresponds to the most duplicated items. Select as information to integrate information about the subject obtained from various sources.

（情報処理方法）
図１０は、実施形態に係るサーバの文字認識処理の一例を示すフローチャートである。 (Information processing method)
FIG. 10 is a flowchart showing an example of the character recognition process of the server according to the embodiment.

（ステップＳ２０１）
サーバ２の補正部２０４は、補正パタンＤＢ１を参照し、文書の画像データを補正する。具体的には、補正部２０４は、書類の画像データを、補正パタンＤＢ１に記憶されている複数の補正パタン（各補正パタンには、各々１以上の異なる補正が含まれている）により補正し、各補正パタンに対応する複数の補正後の画像を生成する。 (Step S201)
The correction unit 204 of the server 2 refers to the correction pattern DB1 and corrects the image data of the document. Specifically, the correction unit 204 corrects the image data of the document by a plurality of correction patterns stored in the correction pattern DB1 (each correction pattern includes one or more different corrections). , Generate a plurality of corrected images corresponding to each correction pattern.

（ステップＳ２０２）
サーバ２の認識部２０５は、書類の画像データから文字を認識する。具体的には、認識部２０５は、補正部２０４で生成された各補正パタンに対応する複数の補正後の画像から文字を認識する。次いで、認識部２０５は、複数の補正後の画像データから認識した文字のうち最も多いものを選択し、認識した文字とする。 (Step S202)
The recognition unit 205 of the server 2 recognizes characters from the image data of the document. Specifically, the recognition unit 205 recognizes characters from a plurality of corrected images corresponding to each correction pattern generated by the correction unit 204. Next, the recognition unit 205 selects the most recognized character from the plurality of corrected image data and uses it as the recognized character.

図１１は、認識部２０５による文字認識の一例を示す図である。図１１に示すように、認識部２０５は、補正部２０４で生成された各補正パタンに対応する複数の補正後の画像から文字を認識する。図１１に示す例では、補正パタン１、３及び５では、認識結果が「山田太郎」となっている。また、補正パタン２では、認識結果が「山田大郎」となっている。また、補正パタン３では、認識結果が「認識不可」、すなわち文字を認識することができなかったとなっている。認識部２０５は、複数の補正後の画像データから認識した文字のうち最も多いもの、図１１に示す例では「山田太郎」を選択し、認識した文字として決定する。なお、認識した文字のうち最も多いものがない場合（例えば、補正パタン１〜５の判定結果がそれぞれ２、２、２、２、１の場合）は、再度、ステップＳ２０２の処理を行ってもよいし、補正パタンを変更してステップＳ２０２の処理を行ってもよい、また、読み取れなったとして報知（エラーを出力）するようにしてもよい。 FIG. 11 is a diagram showing an example of character recognition by the recognition unit 205. As shown in FIG. 11, the recognition unit 205 recognizes characters from a plurality of corrected images corresponding to each correction pattern generated by the correction unit 204. In the example shown in FIG. 11, in the correction patterns 1, 3 and 5, the recognition result is “Taro Yamada”. Further, in the correction pattern 2, the recognition result is "Taro Yamada". Further, in the correction pattern 3, the recognition result is "unrecognizable", that is, the character cannot be recognized. The recognition unit 205 selects the most recognized character from the plurality of corrected image data, "Taro Yamada" in the example shown in FIG. 11, and determines it as the recognized character. If there is no recognized character with the largest number (for example, when the determination results of the correction patterns 1 to 5 are 2, 2, 2, 2, and 1, respectively), the process of step S202 may be performed again. Alternatively, the correction pattern may be changed to perform the process of step S202, or a notification (an error may be output) may be given as unreadable.

（ステップＳ２０３）
サーバ２の認識部２０５は、認識した文字に位置情報を付与する。図１２は、認識部２０５による位置情報付与の一例を示す図である（図中の破線、矢印、Ｔｏｐ、Ｌｅｆｔ、Ｂｏｔｔｏｍ、Ｒｉｇｈｔの文字は、説明のために図示したものであり、実際の画像データとして存在するものではない）。図１２（ａ）は、認識対象である書類の画像データの一例、図１２（ｂ）は、図１２（ａ）を認識した文字に付与された位置情報の一例である。図１２に示すように、認識部２０５は、書類の左上をゼロ点としたＸＹ座標により表される位置情報を認識した文字に付与する。図１２（ｂ）に示す例では、Ｔｏｐは文字の上端、Ｌｅｆｔは文字の左端、Ｂｏｔｔｏｍは、文字の下端、Ｒｉｇｈｔは文字の右端、Ｗｏｒｄは認識した文字である。なお、上述したように、本実施形態では、位置情報は書類の左上をゼロ点としたＸＹ座標により表され、座標の数値には画素数が利用されている。 (Step S203)
The recognition unit 205 of the server 2 adds position information to the recognized characters. FIG. 12 is a diagram showing an example of giving position information by the recognition unit 205 (the broken lines, arrows, Top, Left, Bottom, and Right characters in the figure are illustrated for explanation and are actual images. It does not exist as data). 12 (a) is an example of image data of a document to be recognized, and FIG. 12 (b) is an example of position information given to a character recognizing FIG. 12 (a). As shown in FIG. 12, the recognition unit 205 assigns the position information represented by the XY coordinates with the upper left of the document as the zero point to the recognized character. In the example shown in FIG. 12B, Top is the upper end of the character, Left is the left end of the character, Bottom is the lower end of the character, Right is the right end of the character, and Word is the recognized character. As described above, in the present embodiment, the position information is represented by the XY coordinates with the upper left of the document as the zero point, and the number of pixels is used as the numerical value of the coordinates.

上記のようにして、認識部２０５は、画像データに含まれる全ての文字を認識し、認識した文字に、書類の左上をゼロ点としたＸＹ座標により表される位置情報を付与する。なお、図１２に示す例では、文字の上端（Ｔｏｐ）、左端（Ｌｅｆｔ）、下端（Ｂｏｔｔｏｍ）、右端（Ｒｉｇｈｔ）は、実際の文字から離れた位置となっているが、これは認識した文字のフォントサイズに応じて文字の上端（Ｔｏｐ）、左端（Ｌｅｆｔ）、下端（Ｂｏｔｔｏｍ）、右端（Ｒｉｇｈｔ）が決定されるためである。また、本実施形態では、文字の位置情報を上端（Ｔｏｐ）、左端（Ｌｅｆｔ）、下端（Ｂｏｔｔｏｍ）、右端（Ｒｉｇｈｔ）で示しているが、文字の左上及び右下のそれぞれのＸ軸及びＹ軸の位置座標、又は文字の右上及び左下のそれぞれのＸ軸及びＹ軸の位置座標で文字の位置を示すようにしてもよい。 As described above, the recognition unit 205 recognizes all the characters included in the image data, and gives the recognized characters the position information represented by the XY coordinates with the upper left of the document as the zero point. In the example shown in FIG. 12, the upper end (Top), left end (Left), lower end (Font), and right end (Right) of the character are located away from the actual character, but this is a recognized character. This is because the upper end (Top), the left end (Left), the lower end (Bottom), and the right end (Right) of the character are determined according to the font size of the character. Further, in the present embodiment, the position information of the character is shown by the upper end (Top), the left end (Left), the lower end (Bottom), and the right end (Right), but the X-axis and the Y of the upper left and the lower right of the character are shown. The position of the character may be indicated by the position coordinates of the axis or the position coordinates of the X-axis and the Y-axis of the upper right and the lower left of the character, respectively.

（ステップＳ２０４）
分類部２０６は、分類用ＤＢ２を参照し、書類の画像データを分類する。具体的には、分類部２０６は、分類用ＤＢ２を参照し、認識部２０５で認識された文字に、用意されたパタンマッチ用データやキーワードが存在するか否か書類の種別ごとに判定する。認識部２０５で認識された文字に、用意されたパタンマッチ用データ又はキーワードのいずれか一つが含まれている場合、分類部２０６は、書類の画像データを、該パタンマッチ用データ又はキーワードに対応する種別に分類する。また、分類部２０６は、分類した情報（書類の種別情報）を書類の画像データに付与する。 (Step S204)
The classification unit 206 refers to the classification DB 2 and classifies the image data of the document. Specifically, the classification unit 206 refers to the classification DB 2 and determines whether or not the prepared pattern matching data or keywords exist in the characters recognized by the recognition unit 205 for each type of document. When the characters recognized by the recognition unit 205 include any one of the prepared pattern matching data or keywords, the classification unit 206 corresponds the image data of the document to the pattern matching data or keywords. Classify by type. Further, the classification unit 206 adds the classified information (document type information) to the image data of the document.

（ステップＳ２０５）
判定部２０７は、項目マスタＤＢ３を参照し、分類部２０６で分類された書類の種別に対応する項目が存在するか否かを項目ごとに判定する。判定部２０７が存在すると判定しない項目がある場合（ＹＥＳ）、サーバ２は、ステップＳ２０６の処理へ移行する。また、判定部２０７が存在すると判定しない項目がない場合（ＮＯ）、サーバ２は、ステップＳ２０８の処理へ移行する。 (Step S205)
The determination unit 207 refers to the item master DB 3 and determines for each item whether or not there is an item corresponding to the type of the document classified by the classification unit 206. If there is an item that is not determined to be present by the determination unit 207 (YES), the server 2 proceeds to the process of step S206. If there is no item that is not determined to be present by the determination unit 207 (NO), the server 2 proceeds to the process of step S208.

（ステップＳ２０６）
探索部２０８は、判定部２０７により存在しないとされた項目を構成する各文字を認識部２０５が認識した文字から探索する。ここで、探索部２０８は、各文字のうちの１文字を起点とした所定範囲内に項目を構成する他の文字が存在するか探索する。 (Step S206)
The search unit 208 searches for each character constituting an item that is determined not to exist by the determination unit 207 from the characters recognized by the recognition unit 205. Here, the search unit 208 searches for the existence of other characters constituting the item within a predetermined range starting from one of the characters.

図１３は、探索部２０８による探索の一例を示す図である（図中の破線、矢印、Ｔｏｐ、Ｌｅｆｔ、Ｂｏｔｔｏｍ、Ｒｉｇｈｔの文字は、説明のために図示したものであり、実際の画像データとして存在するものではない）。図１３（ａ）は、探索部２０８による横方向（Ｘ座標）探索の一例を示す図である。図１３（ａ）に示すように「氏名」の項目が、横方向（Ｘ軸方向）に所定間隔以上離れて配置されている場合、「氏」の文字と「名」の文字とがそれぞれ単独で読み取られるため、「氏名」の項目が書類上に存在するにも関わらず「氏名」の項目として認識することができない。そこで、探索部２０８は、「氏名」の項目を構成する各文字のうちの１文字である「氏」を起点とした所定範囲内に「氏名」の項目を構成する他の文字である「名」が存在するか探索する。より具体的には、探索部２０８は、「氏」の文字のＹ座標内のＸ軸線上に連続して「名」の文字が存在するかを探索する。 FIG. 13 is a diagram showing an example of a search by the search unit 208 (the characters of the broken line, arrow, Top, Left, Bottom, and Right in the figure are shown for explanation and are used as actual image data. Does not exist). FIG. 13A is a diagram showing an example of a lateral (X coordinate) search by the search unit 208. As shown in FIG. 13A, when the items of "name" are arranged in the horizontal direction (X-axis direction) at a distance of a predetermined interval or more, the characters of "name" and the characters of "name" are independent of each other. Because it is read in, it cannot be recognized as a "name" item even though the "name" item exists on the document. Therefore, the search unit 208 is a "name" which is another character constituting the item of "name" within a predetermined range starting from "name" which is one character of each character constituting the item of "name". Search for the existence. More specifically, the search unit 208 searches for the continuous existence of the character "name" on the X-axis line in the Y coordinate of the character "Mr.".

なお、文字が横方向（Ｘ軸方向）に並んで配置されているか否かの判断は、認識部２０５が認識した文字の上端（図１３（ａ）の「Ｔｏｐ」の位置）又は下端（図１３（ａ）の「Ｂｏｔｔｏｍ」の位置）を基準としてもよい。具体的には、横方向（Ｘ軸方向）に所定間隔離れた各文字（図１３（ａ）に示す例では「氏」及び「名」）の上端Ｔｏｐ又は下端ＢｏｔｔｏｍのＹ座標の値（ゼロ点からの画素数）の差が所定範囲内（例えば、±２０画素）であれば文字が横方向（Ｘ軸方向）に並んで配置されていると判定するようにしてもよい。項目を構成する文字同士であれば、通常、同じフォント及びサイズであると考えられることから横方向（Ｘ軸方向）に所定間隔離れた各文字（図１３（ａ）に示す例では「氏」及び「名」）の上端Ｔｏｐ又は下端ＢｏｔｔｏｍのＹ座標の値（ゼロ点からの画素数）の差が所定範囲内であれば文字が横方向（Ｘ軸方向）に並んで配置されていると判定することができる。 Whether or not the characters are arranged side by side in the horizontal direction (X-axis direction) is determined by the upper end (position of "Top" in FIG. 13A) or the lower end (FIG. 13) of the characters recognized by the recognition unit 205. The position of "Bottom" in 13 (a)) may be used as a reference. Specifically, the Y coordinate value (zero) of the upper end Top or the lower end Bottom of each character (“Mr.” and “First name” in the example shown in FIG. 13 (a)) separated by a predetermined interval in the horizontal direction (X-axis direction). If the difference (the number of pixels from the point) is within a predetermined range (for example, ± 20 pixels), it may be determined that the characters are arranged side by side in the horizontal direction (X-axis direction). Characters that make up an item are usually considered to have the same font and size, so each character is separated by a predetermined interval in the horizontal direction (X-axis direction) (in the example shown in FIG. 13A, "Mr." And if the difference in the Y coordinate value (number of pixels from the zero point) of the upper end Top or the lower end Bottom of "name") is within a predetermined range, the characters are arranged side by side in the horizontal direction (X-axis direction). It can be determined.

図１３（ｂ）は、探索部２０８による縦方向（Ｙ座標）探索の一例を示す図である。図１３（ｂ）に示すように「記号」の項目が、縦方向（Ｙ軸方向）に配置されている場合、「記」の文字と「号」の文字とがそれぞれ単独で読み取られるため、「記号」の項目が書類上に存在するにも関わらず「記号」の項目として認識することができない。そこで、探索部２０８は、「記号」の項目を構成する各文字のうちの１文字である「記」を起点とした所定範囲内に「記号」の項目を構成する他の文字である「号」が存在するか探索する。より具体的には、探索部２０８は、「記」の文字のＸ座標内のＹ軸線上に連続して「号」の文字が存在するかを探索する。 FIG. 13B is a diagram showing an example of a vertical direction (Y coordinate) search by the search unit 208. As shown in FIG. 13 (b), when the item of "symbol" is arranged in the vertical direction (Y-axis direction), the character of "ki" and the character of "go" are read independently. Even though the "symbol" item exists on the document, it cannot be recognized as a "symbol" item. Therefore, the search unit 208 is a "number" which is another character constituting the item of the "symbol" within a predetermined range starting from "note" which is one character of each character constituting the item of the "symbol". Search for the existence. More specifically, the search unit 208 searches for the continuous existence of the character "No." on the Y-axis line in the X coordinate of the character "Note".

なお、文字が縦方向（Ｙ軸方向）に並んで配置されているか否かの判断は、認識部２０５が認識した文字の左端（図１３（ｂ）の「Ｌｅｆｔ」の位置）又は右端（図１３（ｂ）の「Ｒｉｇｈｔ」の位置）を基準としてもよい。具体的には、縦方向（Ｙ軸方向）に所定間隔離れた各文字（図１３（ｂ）に示す例では「記」及び「号」）の左端Ｌ又は右端ＲのＺ座標の値（ゼロ点からの画素数）の差が所定範囲内（例えば、±２０画素）であれば文字が縦方向（Ｙ軸方向）に並んで配置されていると判定するようにしてもよい。項目を構成する文字同士であれば、通常、同じフォント及びサイズであると考えられることから縦方向（Ｙ軸方向）に所定間隔離れた各文字（図１３（ｂ）に示す例では「記」及び「号」）の左端Ｌ又は右端ＲのＸ座標の値（ゼロ点からの画素数）の差が所定範囲内であれば文字が縦方向（Ｙ軸方向）に並んで配置されていると判定することができる。 Whether or not the characters are arranged side by side in the vertical direction (Y-axis direction) is determined by the left end (position of "Left" in FIG. 13B) or the right end (FIG. 13) of the characters recognized by the recognition unit 205. 13 (b) “Right” position) may be used as a reference. Specifically, the Z coordinate value (zero) of the left end L or the right end R of each character (“Note” and “No.” in the example shown in FIG. 13 (b)) separated by a predetermined interval in the vertical direction (Y-axis direction). If the difference (the number of pixels from the point) is within a predetermined range (for example, ± 20 pixels), it may be determined that the characters are arranged side by side in the vertical direction (Y-axis direction). Characters that make up an item are usually considered to have the same font and size, so each character is separated by a predetermined interval in the vertical direction (Y-axis direction) (“Note” in the example shown in FIG. 13 (b)). And "No.") If the difference between the X coordinate values (number of pixels from the zero point) of the left end L or the right end R is within a predetermined range, the characters are arranged side by side in the vertical direction (Y-axis direction). It can be determined.

以上のように、探索部２０８は、各文字のうちの１文字を起点として横方向（Ｚ軸方向）及び縦方向（Ｙ軸方向）に項目を構成する他の文字が存在するか探索する。具体的には、項目を構成する各文字のうちの最初の１文字のＹ座標内のＸ軸線上に連続して、項目を構成する他の文字が存在するかを探索する。探索部２０８は、項目を構成する各文字のうちの最初の１文字のＹ座標内のＸ軸線上に連続して、項目を構成する他の文字が存在しない場合、項目を構成する各文字のうちの最初の１文字のＸ座標内のＹ軸線上に連続して、項目を構成する他の文字が存在するかを探索する。 As described above, the search unit 208 searches for other characters constituting the item in the horizontal direction (Z-axis direction) and the vertical direction (Y-axis direction) starting from one character of each character. Specifically, it searches for the existence of other characters constituting the item continuously on the X-axis line in the Y coordinate of the first character of each character constituting the item. When the search unit 208 is continuous on the X-axis line in the Y coordinate of the first character of each character constituting the item and there is no other character constituting the item, the search unit 208 of each character constituting the item Search for the existence of other characters that make up the item in succession on the Y-axis line within the X coordinate of the first character.

（ステップＳ２０７）
結合部２０９は、探索部２０８により探索された書類の画像データ上の文字を、項目として認識可能なようにデータ的に結合する。より具体的には、結合部２０９は、探索部２０８により探索された文字を結合して、項目として認識できるようにデータ的に結合する処理を行う。図１４は、結合部２０９による横方向の文字の結合の一例を示す図である（図中の破線は説明のために図示したものであり、実際の画像データとして存在するものではない）。図１４（ａ）は、結合前の文字の画像データの一例を示す図である。図１４（ｂ）は、認識部２０５で認識された「氏」及び「名」の文字に各々付与された位置情報の一例である。図１４（ｃ）は、結合部２０９による結合後の文字の画像データの一例を示す図である。図１４（ｄ）は、結合後の「氏名」の文字に付与された位置情報の一例である。図１４（ｃ）及び図１４（ｄ）に示すように結合部２０９は、「氏」の左端（Ｌｅｆｔ）の位置情報を「氏名」の左端（Ｌｅｆｔ）の位置情報とし、「名」の右端（Ｒｉｇｈｔ）の位置情報を「氏名」の右端（Ｒｉｇｈｔ）の位置情報とすることで、「氏」「名」の文字を一つの項目「氏名」として認識可能なようにデータ的に結合する。 (Step S207)
The joining unit 209 joins the characters on the image data of the document searched by the search unit 208 in terms of data so that they can be recognized as items. More specifically, the joining unit 209 performs a process of combining the characters searched by the search unit 208 and combining them in terms of data so that they can be recognized as items. FIG. 14 is a diagram showing an example of the combination of characters in the horizontal direction by the connection portion 209 (the broken line in the figure is shown for illustration purposes and does not exist as actual image data). FIG. 14A is a diagram showing an example of image data of characters before joining. FIG. 14B is an example of the position information given to the characters “Mr.” and “First name” recognized by the recognition unit 205. FIG. 14C is a diagram showing an example of image data of characters after being joined by the joining portion 209. FIG. 14D is an example of the position information given to the characters of the “name” after the combination. As shown in FIGS. 14 (c) and 14 (d), the joint portion 209 uses the position information of the left end (Left) of the “name” as the position information of the left end (Left) of the “name” and the right end of the “name”. By using the position information of (Right) as the position information of the right end (Right) of the "name", the characters of "name" and "name" are combined in terms of data so that they can be recognized as one item "name".

図１５は、結合部２０９による縦方向の文字の結合の一例を示す図である（図中の破線は説明のために図示したものであり、実際の画像データとして存在するものではない）。図１５（ａ）は、結合前の文字の画像データの一例を示す図である。図１５（ｂ）は、認識部２０５で認識された「記」及び「号」の文字に各々付与された位置情報の一例である。図１５（ｃ）は、結合部２０９による結合後の文字の画像データの一例を示す図である。図１５（ｄ）は、結合後の「記号」の文字に付与された位置情報の一例である。図１５（ｃ）及び図１５（ｄ）に示すように結合部２０９は、「記」の上端（Ｔｏｐ）の位置情報を「記号」の上端（Ｔｏｐ）の位置情報とし、「号」の下端（Ｂｏｔｔｏｍ）の位置情報を「記号」の下端（Ｂｏｔｔｏｍ）の位置情報とすることで、「記」「号」の文字を一つの項目「記号」として認識可能なようにデータ的に結合する。
このように、結合部２０９は、探索部２０８により探索された文字を結合し、一つの情報として取り扱うことができるように結合処理を行う。 FIG. 15 is a diagram showing an example of joining characters in the vertical direction by the joining portion 209 (the broken line in the figure is shown for illustration purposes and does not exist as actual image data). FIG. 15A is a diagram showing an example of image data of characters before joining. FIG. 15B is an example of the position information given to the characters “Note” and “No.” recognized by the recognition unit 205. FIG. 15C is a diagram showing an example of image data of characters after being joined by the joining portion 209. FIG. 15D is an example of the position information given to the characters of the “symbol” after the combination. As shown in FIGS. 15 (c) and 15 (d), the joint portion 209 uses the position information of the upper end (Top) of the “note” as the position information of the upper end (Top) of the “symbol” and the lower end of the “number”. By using the position information of (Bottom) as the position information of the lower end (Bottom) of the "symbol", the characters "ki" and "go" are combined in terms of data so that they can be recognized as one item "symbol".
In this way, the joining unit 209 joins the characters searched by the search unit 208 and performs the joining process so that they can be handled as one piece of information.

（ステップＳ２０８）
取得部２１０は、各項目に対応する文字を取得する。具体的には、取得部２１０は、項目の第１側（本実施形態では右側）に存在する次の項目（次項目）又は改行までの文字を項目に対応する文字として取得する（図１４に示す例では「山田太郎」の文字、図１５に示す例では「２０１３７５」の文字）。また、取得部２１０は、項目の第１側（本実施形態では右側：横書きに対応）の所定範囲内に文字（項目を構成する文字を除く）が存在しない場合、項目の第１側とは異なる第２側（本実施形態では下側：縦書きに対応）に存在する次の項目（次項目）又は改行までの文字を、項目に対応する文字として取得する。 (Step S208)
The acquisition unit 210 acquires characters corresponding to each item. Specifically, the acquisition unit 210 acquires the next item (next item) existing on the first side of the item (on the right side in this embodiment) or the character up to the line feed as the character corresponding to the item (FIG. 14). In the example shown, the character "Taro Yamada" is used, and in the example shown in FIG. 15, the character "20135" is used). Further, when the character (excluding the characters constituting the item) does not exist within the predetermined range of the first side of the item (right side in this embodiment: corresponding to horizontal writing), the acquisition unit 210 is the first side of the item. The next item (next item) or the character up to the line feed existing on a different second side (lower side: corresponding to vertical writing in this embodiment) is acquired as the character corresponding to the item.

なお、取得部２１０は、分類部２０６で分類された処理の種別に応じて、項目の第１側（本実施形態では右側）に存在する文字を項目に対応する文字として取得するか、項目の第２側（本実施形態では下側：縦書きに対応）に存在する文字を項目に対応する文字として取得するかを決定するようにしてもよい。この場合、縦書きの書類であるか横書きの書類であるかを書類の種別に対応して分類用ＤＢ２に記憶しておき、取得部２１０は、分類用ＤＢ２を参照し、分類部２０６で分類された処理の種別に応じて、項目の第１側（本実施形態では右側）に存在する文字を項目に対応する文字として取得するか、項目の第２側（本実施形態では下側：縦書きに対応）に存在する文字を、項目に対応する文字として取得するかを決定するようにしてもよい。 In addition, the acquisition unit 210 acquires the character existing on the first side (right side in this embodiment) of the item as the character corresponding to the item, depending on the type of processing classified by the classification unit 206, or the item. It may be decided whether to acquire the character existing on the second side (lower side in this embodiment: corresponding to vertical writing) as the character corresponding to the item. In this case, whether the document is written vertically or horizontally is stored in the classification DB 2 according to the type of the document, and the acquisition unit 210 refers to the classification DB 2 and classifies the document in the classification unit 206. Depending on the type of processing performed, the characters existing on the first side of the item (right side in this embodiment) are acquired as the characters corresponding to the item, or the second side of the item (lower side in this embodiment: vertical). It may be decided whether to acquire the character existing in (corresponding to writing) as the character corresponding to the item.

以上のように、実施形態に係るサーバ２は、２以上の情報源から対象者に関する情報を取得する取得部２１０と、取得部２１０で取得された対象者に関する情報の表記を所定の表記に変更する表記変更部２１１と、表記変更部２１１で所定の表示に変更された対象者に関する情報を統合する統合部２１２と、統合部２１２で統合された対象者に関する情報に基づいて、対象者が監視対象となるリスクを算出する算出部２１３とを備える。このため、多くの情報に基づいて対象者のリスクを算出することができ利便性が向上する。 As described above, the server 2 according to the embodiment changes the notation of the information about the target person acquired by the acquisition unit 210 and the acquisition unit 210 to acquire the information about the target person from two or more information sources to a predetermined notation. The target person monitors based on the information about the target person integrated by the notation changing unit 211, the integrated unit 212 that integrates the information about the target person changed to the predetermined display by the notation changing unit 211, and the integrated unit 212. It is provided with a calculation unit 213 for calculating a target risk. Therefore, the risk of the target person can be calculated based on a lot of information, and the convenience is improved.

また、実施形態に係るサーバ２の表記変更部２１１は、表記変更用ＤＢ４に記憶された対象者に関する情報と、該情報の表記を変更する処理ルールとを関連付けた情報（ロジック処理用の情報）を参照し、前記処理ルールに基づいて、取得部２１０で取得された対象者に関する情報の表記を所定の表記に変更する。このため、種々の情報源から取得された対象者に関する情報を正確に統合することができる。 Further, the notation changing unit 211 of the server 2 according to the embodiment is information relating the information about the target person stored in the notation changing DB 4 and the processing rule for changing the notation of the information (information for logic processing). , And based on the processing rule, the notation of the information about the target person acquired by the acquisition unit 210 is changed to a predetermined notation. Therefore, it is possible to accurately integrate information about the target person obtained from various information sources.

また、実施形態に係るサーバ２の表記変更部２１１は、表記変更用ＤＢ４に記憶された対象者に関する情報と、該情報の辞書データによる上書き処理内容とを関連付けた情報（第１マスタ処理用の情報）を参照し、前記処理内容に基づいて、取得部２１０で取得された対象者に関する情報の表記を前記所定の表記に変更する。このため、種々の情報源から取得された対象者に関する情報をより正確に統合することができる。 Further, the notation changing unit 211 of the server 2 according to the embodiment is information relating the information about the target person stored in the notation changing DB 4 and the overwriting processing content of the information by the dictionary data (for the first master processing). Information), and based on the processing content, the notation of the information about the target person acquired by the acquisition unit 210 is changed to the predetermined notation. Therefore, it is possible to more accurately integrate the information about the subject obtained from various information sources.

また、実施形態に係るサーバ２の統合部２１２は、表記変更部２１１で所定の表示に変更された対象者に関する情報を項目ごとに照合し、重複する項目について、最も多い記載を重複する項目に対応する情報として選択する。このため、種々の情報源から取得された対象者に関する情報を統合する際の正確性を向上することができる。 Further, the integrated unit 212 of the server 2 according to the embodiment collates the information about the target person whose display has been changed to the predetermined display by the notation changing unit 211 for each item, and regarding the duplicated items, the most frequently described items are duplicated. Select as the corresponding information. Therefore, it is possible to improve the accuracy in integrating the information about the subject obtained from various information sources.

また、実施形態に係るサーバ２の算出部２１３は、リスク算出用ＤＢ６に記憶された対象者に関する情報の各項目の組み合わせに対して設定されたリスクスコアに基づいて、対象者が監視対象となるリスクを算出する。このため、項目ごとにリスクレートを変更することで、対象者が監視対象となるリスクの算出を柔軟に変更することができ、利便性が向上する。 Further, the calculation unit 213 of the server 2 according to the embodiment monitors the target person based on the risk score set for the combination of each item of the information about the target person stored in the risk calculation DB 6. Calculate the risk. Therefore, by changing the risk rate for each item, the target person can flexibly change the calculation of the risk to be monitored, and the convenience is improved.

また、実施形態に係るサーバ２は、書類の画像から文字を認識する認識部２０５と、書類から取得する項目の情報を参照し、認識部２０５が認識した文字に項目が存在するか否かを判定する判定部２０７と、判定部２０７が存在すると判定しない項目がある場合、該項目を構成する各文字を認識部２０５が認識した文字から探索する探索部２０８と、探索部２０８により探索された各文字を項目として認識可能に処理する結合部２０９と、各項目に対応する文字を対象者に関する情報として取得する取得部２１０とを備える。このため、文書を効果的に読み取ることができ、文書の文字認識率が向上する。 Further, the server 2 according to the embodiment refers to the recognition unit 205 that recognizes characters from the image of the document and the information of the items acquired from the document, and determines whether or not the character recognized by the recognition unit 205 has an item. When there is a determination unit 207 for determination and an item for which the determination unit 207 is not determined to exist, the search unit 208 and the search unit 208 search for each character constituting the item from the characters recognized by the recognition unit 205. It includes a connecting unit 209 that processes each character so as to be recognizable as an item, and an acquisition unit 210 that acquires characters corresponding to each item as information about a target person. Therefore, the document can be read effectively, and the character recognition rate of the document is improved.

また、本実施形態に係るサーバ２の探索部２０８は、各文字のうちの１文字を起点とした所定範囲内に項目を構成する他の文字が存在するか探索する。このように所定範囲内を探索するため、離れた箇所に存在する文字を間違って項目を構成する文字として認識することがない。このため、項目に対応する情報を間違って取得する虞を低減することができる。 Further, the search unit 208 of the server 2 according to the present embodiment searches for the existence of other characters constituting the item within a predetermined range starting from one of the characters. Since the search is performed within a predetermined range in this way, characters existing at distant places are not mistakenly recognized as characters constituting the item. Therefore, it is possible to reduce the risk of erroneously acquiring the information corresponding to the item.

また、本実施形態に係るサーバ２の取得部２１０は、項目の第１側に存在する文字を、項目に対応する文字として取得する。このため、項目に対応する情報を間違って取得する虞を低減することができる。 Further, the acquisition unit 210 of the server 2 according to the present embodiment acquires the character existing on the first side of the item as the character corresponding to the item. Therefore, it is possible to reduce the risk of erroneously acquiring the information corresponding to the item.

また、本実施形態に係るサーバ２の取得部２１０は、項目の第１側の所定範囲内に文字が存在しない場合、項目の第１側とは異なる第２側に存在する文字を、項目に対応する文字として取得する。このため、項目に対応する情報をより効果的に取得することができる。 Further, when the character does not exist within the predetermined range on the first side of the item, the acquisition unit 210 of the server 2 according to the present embodiment sets the character existing on the second side different from the first side of the item into the item. Get as the corresponding character. Therefore, the information corresponding to the item can be acquired more effectively.

また、本実施形態に係るサーバ２は、書類の画像を複数の補正パタンにより補正し、各補正パタンに対応する複数の補正後の画像を生成する補正部２０４を備えている。そして、認識部２０５は、補正部２０４で生成された各補正パタンに対応する複数の補正後の画像から文字を認識し、複数の補正後の画像から認識した文字のうち最も多いものを選択する。このため、文字を誤って読み取る確率及び文字を読み取れない確率の少なくとも一方を低減することができ、文字認識の正答率が向上する。 Further, the server 2 according to the present embodiment includes a correction unit 204 that corrects an image of a document by a plurality of correction patterns and generates a plurality of corrected images corresponding to each correction pattern. Then, the recognition unit 205 recognizes characters from a plurality of corrected images corresponding to each correction pattern generated by the correction unit 204, and selects the most recognized characters from the plurality of corrected images. .. Therefore, at least one of the probability of erroneously reading a character and the probability of not being able to read a character can be reduced, and the correct answer rate of character recognition is improved.

また、本実施形態では、補正パタンは、各々１以上の異なる補正を含んでいる。このように１以上の異なる補正を組み合わせているので、文字を誤って読み取る確率や文字を読み取れない確率の少なくとも一方をより低減することができ、文字認識の正答率が更に向上する。 Also, in this embodiment, the correction patterns each include one or more different corrections. Since one or more different corrections are combined in this way, at least one of the probability of erroneously reading a character and the probability of not being able to read a character can be further reduced, and the correct answer rate of character recognition is further improved.

[実施形態の変形例１]
上記実施形態では、分類部２０６は、分類用ＤＢ２を参照し、認識部２０５で認識された文字に、用意されたパタンマッチ用データやキーワードが存在するか否か書類の種別ごとに判定し、認識部２０５で認識された文字に、用意されたパタンマッチ用データ又はキーワードのいずれか一つが含まれている場合、書類の画像データを、該パタンマッチ用データ又はキーワードに対応する種別に分類している。 [Modification 1 of the embodiment]
In the above embodiment, the classification unit 206 refers to the classification DB 2 and determines whether or not the prepared pattern matching data or keywords exist in the characters recognized by the recognition unit 205 for each type of document. When the characters recognized by the recognition unit 205 include any one of the prepared pattern match data or keywords, the image data of the document is classified into the types corresponding to the pattern match data or keywords. ing.

しかしながら、分類部２０６は、分類用ＤＢ２を参照し、認識部２０５で認識された文字に、用意されたパタンマッチ用データやキーワードが存在するか否か書類の種別ごとに判定し、認識部２０５で認識された文字に、用意されたパタンマッチ用データやキーワードが最も多く含まれている種別に書類の画像データを分類するようにしてもよい。 However, the classification unit 206 refers to the classification DB 2 and determines whether or not the prepared pattern matching data or keywords exist in the characters recognized by the recognition unit 205 for each type of document, and the recognition unit 205 determines. The image data of the document may be classified into the types in which the prepared pattern matching data and keywords are most contained in the characters recognized in.

また、分類用ＤＢ２に、書類の種別ごとに含まれていてはいけないパタンマッチ用データ（画像データや特徴点データ（例えば、印章の画像データや特徴点データなど））やキーワード（ＫＷ）を記憶し、認識部２０５で認識された文字に、該含まれていてはいけないパタンマッチ用データやキーワードが含まれている場合、書類の画像データを、該パタンマッチ用データ又はキーワードに対応する種別に分類しないようにしてもよい。 In addition, pattern matching data (image data, feature point data (for example, image data of seals, feature point data, etc.)) and keywords (KW) that should not be included for each type of document are stored in the classification DB2. If the characters recognized by the recognition unit 205 include pattern matching data or keywords that should not be included, the image data of the document is classified into the type corresponding to the pattern matching data or keywords. It may not be classified.

[実施形態の変形例２]
また、上記実施形態では、項目マスタＤＢ３には、書類の種別ごとに取得する情報の項目が関連付けて記憶されているが、書類の種別ごとに取得する情報の項目の名称に、統一された項目の名称を関連付けて記憶するようにしてもよい。書類の種別によって同じ内容であるにも関わらず項目の名称が異なる場合がある。例えば、ある種別の書類では項目が「氏名」となっているが、他の種別の書類では項目が「名前」となっていることが考えられる。また、ある種別の書類では項目が「住所」となっているが、他の種別の書類では項目が「住まい」となっていることが考えられる。 [Modification 2 of the embodiment]
Further, in the above embodiment, the item master DB 3 stores the items of information to be acquired for each type of document in association with each other, but the names of the items of information to be acquired for each type of document are unified. You may associate and memorize the name of. Depending on the type of document, the name of the item may differ even though the content is the same. For example, it is conceivable that the item is "name" in one type of document, but the item is "name" in another type of document. In addition, it is conceivable that the item is "address" in one type of document, but the item is "house" in another type of document.

このような場合に、項目マスタＤＢ３に、書類の種別ごとに取得する情報の項目の名称（例えば「住所」や「住まい」）に、統一された項目の名称（例えば「住所」）を関連付けて記憶し、項目に対応する文字に統一された項目の情報を付与するようにしてもよい。このように構成することで、書類の種別により異なる項目の名称を統一して管理することができ、例えば、検索や名寄せ等、データ利用の利便性が向上する。 In such a case, in the item master DB3, the name of the item of the information to be acquired for each type of document (for example, "address" or "house") is associated with the unified name of the item (for example, "address"). It may be memorized and the unified item information may be added to the characters corresponding to the items. With this configuration, the names of different items can be unified and managed depending on the type of document, and the convenience of data use such as search and name identification is improved.

１情報処理システム
２サーバ（情報処理装置）
２００Ａ通信ＩＦ
２００Ｂ記憶装置
２００ＣＣＰＵ
２０１受信部
２０２送信部
２０３記憶装置制御部
２０４補正部
２０５認識部
２０６分類部
２０７判定部
２０８探索部
２０９結合部
２１０取得部
２１１表記変更部
２１２統合部
２１３算出部
３ユーザ端末
３００Ａ通信ＩＦ
３００Ｂ記憶装置
３００Ｃ入力装置
３００Ｄ表示装置
３００ＥＣＰＵ
３０１受信部
３０２送信部
３０３記憶装置制御部
３０４操作受付部
３０５表示装置制御部
４ネットワーク
ＤＢ１補正パタンデータベース
ＤＢ２分類用データベース
ＤＢ３項目マスタデータベース
ＤＢ４表記変更用データベース
ＤＢ５対象者データベース
ＤＢ６リスク算出用データベース 1 Information processing system 2 Server (information processing device)
200A communication IF
200B storage device 200C CPU
201 Reception unit 202 Transmission unit 203 Storage device control unit 204 Correction unit 205 Recognition unit 206 Classification unit 207 Determination unit 208 Search unit 209 Coupling unit 210 Acquisition unit 211 Notation change unit 212 Integration unit 213 Calculation unit 3 User terminal 300A Communication IF
300B Storage device 300C Input device 300D Display device 300E CPU
301 Receiving unit 302 Transmitting unit 303 Storage device control unit 304 Operation receiving unit 305 Display device control unit 4 Network DB1 Correction pattern database DB2 Classification database DB3 Item master database DB4 Notation change database DB5 Target person database DB6 Risk calculation database

Claims

An acquisition department that acquires information about the target person from two or more sources,
A notation change unit that changes the notation of information about the target person acquired by the acquisition unit to a predetermined notation, and a notation change unit.
An integrated unit that integrates information about the target person whose display has been changed to a predetermined display in the notation changing unit, and an integrated unit.
A calculation unit for calculating the risk of the target person being monitored based on the information about the target person integrated in the integration unit is provided .
The integrated part is
The information about the target person whose display has been changed to a predetermined display is collated for each item in the notation change unit, and the most frequently described items are selected as the information corresponding to the duplicated items for the duplicated items.
An information processing device characterized by this.

The notation change part
With reference to the information relating the information about the target person and the processing rule for changing the notation of the information, the notation of the information about the target person acquired by the acquisition unit is defined as the predetermined information based on the processing rule. Change to notation,
The information processing apparatus according to claim 1.

The notation change part
With reference to the information relating the information about the target person and the overwriting processing content by the dictionary data of the information, the notation of the information about the target person acquired by the acquisition unit based on the processing content is described as defined above. Change to notation,
The information processing apparatus according to claim 1 or 2, wherein the information processing device is characterized by the above.

The calculation unit
Based on the risk score set for each combination of information about the subject, the risk for which the subject is monitored is calculated.
The information processing apparatus according to any one of claims 1 to 3, wherein the information processing apparatus is characterized.

A recognition unit that recognizes characters from image data,
A determination unit that refers to the information of the item acquired from the document and determines whether or not the item exists in the character recognized by the recognition unit.
When there is an item for which it is not determined that the determination unit exists, a search unit for searching each character constituting the item from the characters recognized by the recognition unit is provided.
The acquisition unit
According to the search result in the search unit, the character corresponding to each character is acquired as information about the target person.
The information processing apparatus according to any one of claims 1 to 4, wherein the information processing apparatus is characterized.

The search unit
The information processing apparatus according to claim 5 , wherein the information processing apparatus is characterized in that it searches for the existence of other characters constituting the item within a predetermined range starting from one of the characters.

The process by which the acquisition department acquires information about the target person from two or more sources,
The process in which the notation changing unit changes the notation of the information about the target person acquired by the acquisition unit to a predetermined notation, and
The process in which the integration unit integrates the information about the target person acquired by the acquisition unit, and
Calculating unit, based on the integrated information on the subject by the integration unit, have a, a step of calculating the risk of the subject is to be monitored,
The integrated part is
The information about the target person whose display has been changed to a predetermined display is collated for each item in the notation change unit, and the most frequently described items are selected as the information corresponding to the duplicated items for the duplicated items.
An information processing method characterized by that.

Computer,
Acquisition department that acquires information about the target person from two or more sources,
A notation change unit that changes the notation of information about the target person acquired by the acquisition unit to a predetermined notation,
An integrated unit that integrates information about the target person acquired by the acquisition unit,
Based on the information about the target person integrated in the integrated unit, the target person is made to function as a calculation unit for calculating the risk of being monitored .
The integrated part is
The information about the target person whose display has been changed to a predetermined display is collated for each item in the notation change unit, and the most frequently described items are selected as the information corresponding to the duplicated items for the duplicated items.
An information processing program characterized by this.