JP2022152134A

JP2022152134A - Image data edition program, image data edition method, and image data edition device

Info

Publication number: JP2022152134A
Application number: JP2021054791A
Authority: JP
Inventors: 義之原田; Yoshiyuki Harada; 成示豆田; Seiji Mameda; 浩二和田; Koji Wada; 貴美子浅野; Kimiko Asano; 靖郡司; Yasushi Gunji; 裕史時津; Yasushi Tokitsu
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2022-10-12

Abstract

To provide an image data edition program that accurately masks a character string corresponding to an identified item included in image data.SOLUTION: A server 10 acquires, when acquiring image data ((3) in Fig. 4), character strings included in the image data, and refers to a character string pattern DB in which information is registered in advance ((2) in Fig. 4) and identifies an item name to be masked (for example, "my-number") from the acquired character strings. The server refers to the character string pattern DB, and identifies one character string that satisfies a feature of the character string corresponding to the identified item name from character strings included in the image data. The server identifies, based on the identified string, character strings that are aligned vertically and horizontally and satisfy the feature of the character string corresponding to the item name. The server masks, on the basis of a processing result, the image data ((4) in Fig. 4).SELECTED DRAWING: Figure 4

Description

本発明は、画像データ編集プログラム、画像データ編集方法及び画像データ編集装置に関する。 The present invention relates to an image data editing program, an image data editing method, and an image data editing apparatus.

例えば、銀行等において、利用するシステムのマニュアルを作成する場合、実際の入力画面等をスクリーンショットして得られる画像を利用することがある。このような画像には、顧客情報などの秘匿すべき情報が含まれるため、従来においては、人手で秘匿すべき情報にマスクをかける必要があった。 For example, when creating a manual for a system used in a bank or the like, images obtained by taking screen shots of actual input screens and the like may be used. Since such an image includes confidential information such as customer information, it has been conventionally necessary to manually mask the confidential information.

また、画像の秘匿すべき部分を隠蔽する技術として、例えば特許文献１、２等が知られている。 In addition, for example, Patent Documents 1 and 2, etc. are known as techniques for concealing a portion of an image that should be concealed.

特開２００５－１８２２０号公報Japanese Unexamined Patent Application Publication No. 2005-18220 特開２００８－２０４２２６号公報JP 2008-204226 A

しかしながら、人手で秘匿すべき情報にマスクをかける場合、手間がかかるとともに、見落としが生じるおそれがある。 However, when manually masking information that should be kept confidential, it is troublesome and may be overlooked.

また、例えば特許文献２では、表において特定の項目の所定方向（例えば下側）に並ぶ文字列を非表示にすることはできるものの、特定の項目に対応する文字列が当該項目の所定方向に並んでいない場合には非表示にすることはできない。 Further, for example, in Patent Document 2, although it is possible to hide a character string arranged in a predetermined direction (for example, downward) of a specific item in a table, a character string corresponding to a specific item is displayed in a predetermined direction of the item. If they are not lined up, they cannot be hidden.

１つの側面では、本発明は、画像データに含まれる特定の項目に対応する文字列を精度よくマスクすることが可能な画像データ編集プログラム、画像データ編集方法及び画像データ編集装置を提供することを目的とする。 In one aspect, the present invention provides an image data editing program, an image data editing method, and an image data editing apparatus capable of precisely masking a character string corresponding to a specific item included in image data. aim.

一つの態様では、画像データ編集プログラムは、画像データに含まれる文字列を取得し、マスク対象の項目名と、該項目名に対応する文字列の特徴とを関連付けて記憶する記憶部を参照して、取得した前記文字列の中から前記マスク対象の項目名を抽出し、前記記憶部を参照して、前記画像データに含まれる文字列の中から、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第１文字列を特定し、特定した前記第１文字列を基準として、縦方向及び横方向に並ぶ、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第２文字列を特定し、前記画像データの、前記第１文字列と前記第２文字列をマスクする、処理をコンピュータに実行させるための画像データ編集プログラムである。 In one aspect, the image data editing program obtains a character string included in the image data, and refers to a storage unit that associates and stores an item name to be masked and a feature of the character string corresponding to the item name. extracts the item name to be masked from the obtained character string, refers to the storage unit, and corresponds to the extracted item name to be masked from the character string included in the image data. identifying a first character string that satisfies the characteristics of the character string to be masked, and using the identified first character string as a reference, identifying the characteristics of the character string corresponding to the extracted item name to be masked, arranged in the vertical and horizontal directions; An image data editing program for causing a computer to execute a process of specifying a satisfying second character string and masking the first character string and the second character string of the image data.

画像データに含まれる特定の項目に対応する文字列を精度よくマスクすることができる。 Character strings corresponding to specific items included in image data can be accurately masked.

一実施形態に係る情報処理システムの構成を概略的に示す図である。It is a figure showing roughly composition of an information processing system concerning one embodiment. 図２（ａ）は、サーバのハードウェア構成を示す図であり、図２（ｂ）は、管理者端末及び利用者端末のハードウェア構成を示す図である。FIG. 2(a) is a diagram showing the hardware configuration of the server, and FIG. 2(b) is a diagram showing the hardware configurations of the administrator terminal and the user terminal. サーバの機能ブロック図である。It is a functional block diagram of a server. 管理者端末、サーバ、利用者端末の処理の概要を示す図である。FIG. 3 is a diagram showing an outline of processing of an administrator terminal, server, and user terminal; 図５（ａ）は、文字列パターンＤＢを示す図であり、図５（ｂ）、図５（ｃ）は、文字列パターンＤＢのチェックルールのフィールドに格納される画像の一例を示す図である。FIG. 5(a) is a diagram showing a character string pattern DB, and FIGS. 5(b) and 5(c) are diagrams showing examples of images stored in the check rule fields of the character string pattern DB. be. 登録部の処理を示すフローチャートである。4 is a flowchart showing processing of a registration unit; サーバによる画像データのマスク処理を示すフローチャートである。4 is a flowchart showing mask processing of image data by a server; キャプチャされた画像データの一例を示す図である。FIG. 3 is a diagram showing an example of captured image data; FIG. 図８の画像データに含まれる全ての文字列の情報を示す図である。9 is a diagram showing information of all character strings included in the image data of FIG. 8; FIG. 図１０（ａ）、図１０（ｂ）は、図７の処理で得られるリストを示す図（その１）である。FIGS. 10(a) and 10(b) are diagrams (part 1) showing a list obtained by the process of FIG. 図１１（ａ）、図１１（ｂ）は、図７の処理で得られるリストを示す図（その２）である。FIGS. 11(a) and 11(b) are diagrams (part 2) showing the list obtained by the process of FIG. 図１２（ａ）、図１２（ｂ）は、図７の処理で得られるリストを示す図（その３）である。FIGS. 12(a) and 12(b) are diagrams (part 3) showing the list obtained by the process of FIG. 項目名の原点を基準とした第１象限～第４象限を示す図である。It is a diagram showing the first to fourth quadrants with reference to the origin of item names. 図７のステップＳ４２を説明するための図である。FIG. 8 is a diagram for explaining step S42 in FIG. 7; FIG. 図１５（ａ）～図１５（ｅ）は、マスク対象の情報を示す図である。15(a) to 15(e) are diagrams showing information to be masked. マスク処理された画像データの一例を示す図である。FIG. 4 is a diagram showing an example of masked image data; 図１７（ａ）、図１７（ｂ）は、画像データに含まれる表の別例について説明するための図である。FIGS. 17A and 17B are diagrams for explaining another example of the table included in the image data.

以下、情報処理システムの一実施形態について、図１～図１７に基づいて詳細に説明する。 An embodiment of an information processing system will be described in detail below with reference to FIGS. 1 to 17. FIG.

図１には、一実施形態に係る情報処理システム１００の構成が概略的に示されている。図１に示すように、情報処理システム１００は、画像データ編集装置としてのサーバ１０と、管理者端末６０と、複数の利用者端末７０と、備えており、各装置は、インターネットなどのネットワーク８０に接続されている。 FIG. 1 schematically shows the configuration of an information processing system 100 according to one embodiment. As shown in FIG. 1, an information processing system 100 includes a server 10 as an image data editing device, an administrator terminal 60, and a plurality of user terminals 70. Each device is connected to a network 80 such as the Internet. It is connected to the.

サーバ１０は、利用者端末７０から送信されてくる画像データを取得して、取得した画像データの中から、マスク（秘匿）すべき文字列を特定する。また、サーバ１０は、特定した文字列を自動的にマスク処理し、マスク処理後の画像データを利用者端末７０に戻す処理を実行する。 The server 10 acquires the image data transmitted from the user terminal 70 and identifies character strings to be masked (confidential) from the acquired image data. In addition, the server 10 automatically performs mask processing on the specified character string, and performs processing for returning the masked image data to the user terminal 70 .

図２（ａ）には、サーバ１０のハードウェア構成が示されている。図２（ａ）に示すように、サーバ１０は、ＣＰＵ（Central Processing Unit）９０、ＲＯＭ（Read Only Memory）９２、ＲＡＭ（Random Access Memory）９４、記憶部（例えば、ＳＳＤ（Solid State Drive）やＨＤＤ（Hard Disk Drive））９６、ネットワークインタフェース９７、及び可搬型記憶媒体用ドライブ９９等を備えている。これらサーバ１０の構成各部は、バス９８に接続されている。サーバ１０では、ＲＯＭ９２あるいはＨＤＤ９６に格納されているプログラム（画像データ編集プログラムを含む）、或いは可搬型記憶媒体用ドライブ９９が可搬型記憶媒体９１から読み取ったプログラム（画像データ編集プログラムを含む）をＣＰＵ９０が実行することにより、図３に示す、各部の機能が実現される。なお、図３には、サーバ１０の記憶部９６等に格納されている画像データ記憶部４２や、記憶部としての文字列パターンＤＢ４０も図示されている。なお、図３の各部の機能は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。なお、図３の各部の詳細については後述する。 FIG. 2(a) shows the hardware configuration of the server 10. As shown in FIG. As shown in FIG. 2A, the server 10 includes a CPU (Central Processing Unit) 90, a ROM (Read Only Memory) 92, a RAM (Random Access Memory) 94, a storage unit (for example, an SSD (Solid State Drive) or HDD (Hard Disk Drive)) 96, a network interface 97, a portable storage medium drive 99, and the like. Each component of the server 10 is connected to the bus 98 . In the server 10, a program (including an image data editing program) stored in the ROM 92 or the HDD 96, or a program (including an image data editing program) read from the portable storage medium 91 by the portable storage medium drive 99 is read by the CPU 90. , the function of each part shown in FIG. 3 is realized. FIG. 3 also shows the image data storage unit 42 stored in the storage unit 96 of the server 10 and the character string pattern DB 40 as a storage unit. Note that the function of each unit in FIG. 3 may be implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). Details of each part in FIG. 3 will be described later.

管理者端末６０は、管理者が利用するＰＣ（Personal Computer）等の端末であり、図２（ｂ）に示すようなハードウェア構成を有する。具体的には、図２（ｂ）に示すように、管理者端末６０は、ＣＰＵ１９０、ＲＯＭ１９２、ＲＡＭ１９４、記憶部１９６、ネットワークインタフェース１９７、表示部１９３、入力部１９５、及び可搬型記憶媒体用ドライブ１９９等を備えている。これら管理者端末６０の構成各部は、バス１９８に接続されている。また、可搬型記憶媒体用ドライブ１９９は、可搬型記憶媒体１９１に記憶されているプログラムやデータを読み取る装置である。表示部１９３は、液晶ディスプレイ等を含み、入力部１９５は、キーボードやマウス、タッチパネル等を含む。 The administrator terminal 60 is a terminal such as a PC (Personal Computer) used by an administrator, and has a hardware configuration as shown in FIG. 2(b). Specifically, as shown in FIG. 2B, the administrator terminal 60 includes a CPU 190, a ROM 192, a RAM 194, a storage unit 196, a network interface 197, a display unit 193, an input unit 195, and a portable storage medium drive. 199 etc. Each component of the administrator terminal 60 is connected to the bus 198 . The portable storage medium drive 199 is a device for reading programs and data stored in the portable storage medium 191 . The display unit 193 includes a liquid crystal display and the like, and the input unit 195 includes a keyboard, mouse, touch panel and the like.

管理者端末６０を利用する管理者は、入力部１９５を介して、画像データにおいてマスク処理を施すべき文字列の情報を入力する。サーバ１０では、管理者が入力した情報を取得すると、当該情報を管理し、マスク処理に利用する。 An administrator using the administrator terminal 60 inputs information of a character string to be masked in the image data via the input unit 195 . When the information input by the administrator is obtained, the server 10 manages the information and uses it for mask processing.

利用者端末７０は、利用者（例えば銀行で利用されるシステムのマニュアルを作成するユーザ）が利用するＰＣ（Personal Computer）等の端末であり、管理者端末６０と同様、図２（ｂ）に示すようなハードウェア構成を有する。 The user terminal 70 is a terminal such as a PC (Personal Computer) used by a user (for example, a user who creates a manual for a system used in a bank). It has a hardware configuration as shown.

利用者端末７０を利用する利用者は、入力部１９５を介して、表示部１９３上に表示されている画面をキャプチャするための指示を入力する（例えば、キーボードの「print screen」キーを押下する）。利用者端末７０は、当該指示が入力されると、表示部１９３上に表示されている画面をキャプチャし、生成された画像データをサーバ１０に送信する。また、利用者端末７０は、サーバ１０からマスク処理後の画像データが送信されてきたときに、当該画像データを取得する。 A user using the user terminal 70 inputs an instruction to capture the screen displayed on the display unit 193 via the input unit 195 (for example, presses the "print screen" key on the keyboard). ). When the instruction is input, the user terminal 70 captures the screen displayed on the display unit 193 and transmits the generated image data to the server 10 . Also, when image data after mask processing is transmitted from the server 10, the user terminal 70 acquires the image data.

図４には、本実施形態における、管理者端末６０、サーバ１０、利用者端末７０の処理の概要が示されている。本実施形態においては、管理者が、管理者端末６０に、マスク処理を施すべき文字列の情報を入力すると、管理者端末６０は、（１）マスク処理すべき文字列の情報をサーバ１０に送信する。この情報を受信したサーバ１０は、（２）管理者端末６０から受信したマスク処理すべき文字列の情報を登録する。この（１）、（２）の処理は、管理者がマスク処理すべき文字列の情報を入力するたびに実行される。一方、利用者端末７０において、利用者が画面をキャプチャする指示を入力すると、利用者端末７０は、（３）画面をキャプチャし、キャプチャした画像データをサーバ１０に送信する。これに対し、画像データを受信したサーバ１０は、（４）受信した画像データに対しマスク処理を実行する。そして、サーバ１０は、（５）マスク処理後の画像データを利用者端末７０に送信する。本実施形態では、例えば、利用者端末７０において図８に示すような画像データがキャプチャされた場合、サーバ１０では、図１６に示すようにマスク処理し、マスク処理後の画像データが利用者端末７０に送信されるようになっている。 FIG. 4 shows an outline of processing of the administrator terminal 60, server 10, and user terminal 70 in this embodiment. In this embodiment, when the administrator inputs the information of the character string to be masked to the administrator terminal 60, the administrator terminal 60 (1) sends the information of the character string to be masked to the server 10. Send. The server 10 that has received this information (2) registers the information of the character string to be masked received from the administrator terminal 60 . The processes (1) and (2) are executed each time the administrator inputs the character string information to be masked. On the other hand, when the user inputs an instruction to capture a screen at the user terminal 70 , the user terminal 70 (3) captures the screen and transmits the captured image data to the server 10 . On the other hand, the server 10 that has received the image data (4) performs mask processing on the received image data. Then, the server 10 (5) transmits the masked image data to the user terminal 70 . In this embodiment, for example, when image data as shown in FIG. 8 is captured by the user terminal 70, the server 10 performs mask processing as shown in FIG. 70.

次に、サーバ１０の機能について、図３に基づいて説明する。サーバ１０は、ＣＰＵ９０がプログラムを実行することにより、登録部２０、画像データ受信部２４、文字列取得部としてのＯＣＲ処理部２６、秘密情報解析部２８、マスク処理部３０、マスク画像送信部３２、として機能する。 Next, functions of the server 10 will be described with reference to FIG. The CPU 90 executes a program so that the server 10 includes a registration unit 20, an image data reception unit 24, an OCR processing unit 26 as a character string acquisition unit, a secret information analysis unit 28, a mask processing unit 30, and a mask image transmission unit 32. , functions as

登録部２０は、管理者端末６０において管理者が入力したマスク処理すべき文字列の情報を取得し、文字列パターンＤＢ４０に格納する。ここで、文字列パターンＤＢ４０は、画像データの中からマスク処理をすべき文字列を特定するために必要な情報が格納されている。具体的には、文字列パターンＤＢ４０は、図５（ａ）に示すように、「項目名」、「属性」、「値」、「チェックルール」の各フィールドを有する。「項目名」のフィールドには、マスク処理をすべき文字列の項目名（例えば「マイナンバー」や「性別」など）が格納されている。「属性」及び「値」のフィールドには、「項目名」に対応する文字列の特徴が格納されている。例えば、項目名が「マイナンバー」の場合、属性として「%n12」が格納されている。「%n12」は、１２桁の数値を意味する。また、例えば、項目名が「性別」の場合、値として具体的な文字列「男」、「女」「男性」、「女性」が格納されている。 The registration unit 20 acquires the information of the character string to be masked input by the administrator at the administrator terminal 60 and stores it in the character string pattern DB 40 . Here, the character string pattern DB 40 stores information necessary for identifying character strings to be masked from image data. Specifically, the character string pattern DB 40 has fields of "item name", "attribute", "value", and "check rule", as shown in FIG. 5(a). The "item name" field stores item names of character strings to be masked (for example, "my number" and "sex"). The "attribute" and "value" fields store the characteristics of the character string corresponding to the "item name". For example, if the item name is "my number", "%n12" is stored as the attribute. "%n12" means a 12-digit number. For example, when the item name is "sex", specific character strings "male", "female", "male", and "female" are stored as values.

「チェックルール」には、属性や値では表現できない条件がある場合に、詳細なチェックルールが格納される。例えば、項目名が「マイナンバー」の場合、１２桁の数値がマイナンバーであるかをチェックする「マイナンバーチェック」が格納される。マイナンバーチェックとしては、総務省で定められた、文字列の１２桁目でチェックデジットを行う方法を用いることができる（平成二十六年総務省令第八十五号「行政手続における特定の個人を識別するための番号の利用等に関する法律の規定による通知カード及び個人番号カード並びに情報提供ネットワークシステムによる特定個人情報の提供等に関する省令」第五条）。また、例えば項目名が「免許証番号」の場合、「チェックルール」には、「免許証番号チェック」が格納される。この免許証番号チェックの場合、警察庁で定められた、文字列の１１桁目でチェックデジットを行う方法を用いることができる（昭和５６年９月１０日「丁運発第１０５号運転免許証の番号の形式及び内容について」参照）。 "Check rules" stores detailed check rules when there are conditions that cannot be expressed by attributes or values. For example, if the item name is "my number", "my number check" for checking whether the 12-digit number is my number is stored. As a My Number check, the method of performing a check digit in the 12th digit of a character string specified by the Ministry of Internal Affairs and Communications (Ministry of Internal Affairs and Communications Ordinance No. 85 of 2014, Article 5 of the Ministerial Ordinance Concerning the Provision of Specific Personal Information through Notification Cards and Individual Number Cards and Information Providing Network Systems under the Provisions of the Law Concerning the Use of Numbers to Identify Individuals. For example, if the item name is "license number", "check rule" stores "license number check". In the case of this driver's license number check, it is possible to use the method of performing a check digit on the 11th digit of the character string specified by the National Police Agency. About the format and content of the number”).

上記のように、本実施形態では、図５（ａ）の文字列パターンＤＢ４０において、マスク対象の項目名と、項目名に対応する文字列の特徴とが関連付けて（紐付けて）記憶されている。 As described above, in this embodiment, in the character string pattern DB 40 of FIG. there is

また、例外処理として、ある領域をマスク領域として追加したり、ある領域をマスク領域から除外する場合には、「チェックルール」のフィールドに、領域を特定するための画像が格納される。なお、領域を特定するための画像の具体例については、後述する。 As exception processing, when adding a certain area as a mask area or excluding a certain area from the mask area, an image for specifying the area is stored in the "check rule" field. A specific example of the image for identifying the area will be described later.

画像データ受信部２４は、利用者端末７０においてキャプチャされた画像データを取得し、画像データ記憶部４２に記憶する。なお、画像データ記憶部４２には、サーバ１０においてマスク処理される画像データが一時記憶される。 The image data receiving unit 24 acquires image data captured by the user terminal 70 and stores it in the image data storage unit 42 . Image data to be masked by the server 10 is temporarily stored in the image data storage unit 42 .

ＯＣＲ処理部２６は、画像データ記憶部４２に記憶された画像データを取得し、ＯＣＲ（Optical Character Recognition）処理により、画像データに含まれる文字列を取得する。 The OCR processing unit 26 acquires the image data stored in the image data storage unit 42, and acquires character strings included in the image data by OCR (Optical Character Recognition) processing.

秘密情報解析部２８は、文字列パターンＤＢ４０（図５（ａ））を参照して、ＯＣＲ処理部２６が取得した文字列の中から、文字列パターンＤＢ４０に格納されている項目名と一致する文字列を特定する。また、秘密情報解析部２８は、特定した文字列を基準として、項目名に対応する特徴を有する文字列（マスクすべき、又はマスクするのが好ましい文字列）を抽出する。 The confidential information analysis unit 28 refers to the character string pattern DB 40 (FIG. 5(a)), and matches the item name stored in the character string pattern DB 40 among the character strings acquired by the OCR processing unit 26. Identifies a string. In addition, the confidential information analysis unit 28 extracts a character string (a character string that should be masked or is preferably masked) having characteristics corresponding to the item name based on the identified character string.

マスク処理部３０は、画像データ記憶部４２から画像データを読み出し、秘密情報解析部２８が抽出したマスクすべき文字列をマスクする処理を実行する。 The mask processing unit 30 reads out the image data from the image data storage unit 42 and performs processing for masking the character string extracted by the secret information analysis unit 28 to be masked.

マスク画像送信部３２は、マスク処理部３０がマスク処理した画像データを利用者端末７０に対して送信する処理を実行する。 The mask image transmission unit 32 executes processing for transmitting the image data masked by the mask processing unit 30 to the user terminal 70 .

（サーバ１０の処理について）
次に、サーバ１０の処理について、図６、図７のフローチャートに沿って、その他図面を適宜参照しつつ詳細に説明する。 (Regarding the processing of the server 10)
Next, the processing of the server 10 will be described in detail along the flow charts of FIGS. 6 and 7 and with appropriate reference to other drawings.

（登録部２０の処理）
図６は、登録部２０の処理を示すフローチャートである。この図６の処理は、図４の（１）、（２）の処理に対応している。 (Processing of registration unit 20)
FIG. 6 is a flow chart showing the processing of the registration unit 20. As shown in FIG. The processing in FIG. 6 corresponds to the processing in (1) and (2) in FIG.

図６の処理が開始されると、登録部２０は、まず、ステップＳ１０において、マスク対象の項目名と文字列パターンの情報（属性、値、チェックルール）を受信するまで待機する。管理者端末６０に管理者がマスク対象の項目名と文字列パターンの情報を入力すると、ステップＳ１２に移行する。 When the process of FIG. 6 is started, the registration unit 20 first waits in step S10 until information (attributes, values, check rules) of item names and character string patterns to be masked is received. When the administrator inputs information on the item name to be masked and the character string pattern to the administrator terminal 60, the process proceeds to step S12.

ステップＳ１２に移行すると、登録部２０は、項目名と文字列パターンとを紐づけて、文字列パターンＤＢ４０に格納する。例えば、管理者が項目名「マイナンバー」と「性別」に関する文字列パターンの情報（特徴の情報）を入力した場合には、図５（ａ）に示すように情報を格納する。登録部２０は、その後はステップＳ１０に戻り、上記処理を繰り返し実行する。 In step S<b>12 , the registration unit 20 associates the item name with the character string pattern and stores them in the character string pattern DB 40 . For example, when the administrator inputs character string pattern information (feature information) relating to the item names "my number" and "gender", the information is stored as shown in FIG. 5(a). The registration unit 20 then returns to step S10 and repeats the above process.

（画像データのマスク処理）
図７は、サーバ１０による画像データのマスク処理を示すフローチャートである。この図７の処理は、図４の（３）～（５）の処理に対応している。図７の処理は、利用者端末７０からキャプチャされた画像データが送信されてきた段階（画像データ受信部２４が画像データを取得した段階）で開始される。なお、ここでは、画像データ受信部２４は、図８に示すような画像データを取得し、画像データ記憶部４２に記憶したとする。図８においては、画像データの左上を原点とした座標系を示しているが、実際の画像データには座標系の表示は含まれないものとする。 (Mask processing of image data)
FIG. 7 is a flowchart showing image data masking processing by the server 10 . The processing in FIG. 7 corresponds to the processing in (3) to (5) in FIG. The process of FIG. 7 is started when the captured image data is transmitted from the user terminal 70 (when the image data receiving unit 24 acquires the image data). It is assumed here that the image data receiving section 24 acquires image data as shown in FIG. 8 and stores it in the image data storage section 42 . FIG. 8 shows the coordinate system with the upper left corner of the image data as the origin, but the actual image data does not include display of the coordinate system.

図７の処理が開始されると、まず、ステップＳ３０において、ＯＣＲ処理部２６が、画像データに対してＯＣＲ処理を実行し、各文字列と、文字列の位置やサイズ情報を取得する。ここで、文字列の位置やサイズの情報には、文字列の左上位置を原点として、原点の横方向の座標、原点の縦方向の座標、文字列全体を矩形とみなしたときの矩形サイズ横、矩形サイズ縦の情報が含まれる。図９には、図８の画像データに含まれる全ての文字列の情報が示されている。例えば、「マイナンバー」という文字列であれば、文字列の情報は、文字列＝「マイナンバー」、原点の横方向の座標＝「１５０」、原点の縦方向の座標＝「５０」、文字列全体を矩形とみなしたときの矩形サイズ横＝「７５」、矩形サイズ縦＝「２５」である。図９のように取得された情報は、図１０（ａ）に示すようにリスト化され、秘密情報解析部２８に送信される。 When the process of FIG. 7 is started, first, in step S30, the OCR processing unit 26 performs OCR processing on the image data to acquire each character string and the position and size information of the character string. Here, the character string position and size information includes the horizontal coordinates of the origin, the vertical coordinates of the origin, and the size of the rectangle when the entire character string is regarded as a rectangle. , which contains information about the rectangle size and height. FIG. 9 shows information of all character strings included in the image data of FIG. For example, for the character string "my number", the information of the character string is: character string = "my number", horizontal coordinates of the origin = "150", vertical coordinates of the origin = "50", character When the entire column is regarded as a rectangle, the width of the rectangle=“75” and the length of the rectangle=“25”. The information acquired as shown in FIG. 9 is listed as shown in FIG.

次いで、ステップＳ３２では、秘密情報解析部２８が、文字列パターンＤＢ４０（図５（ａ））を参照して、文字列パターンＤＢ４０に格納されている項目名と一致する文字列を特定する。ここでは、図１０（ａ）のリストから、図１０（ｂ）に示すように、文字列「マイナンバー」と「性別」が特定される。 Next, in step S32, the confidential information analysis unit 28 refers to the character string pattern DB 40 (FIG. 5(a)) to identify character strings that match item names stored in the character string pattern DB 40. FIG. Here, the character strings "my number" and "sex" are identified from the list in FIG. 10(a), as shown in FIG. 10(b).

次いで、ステップＳ３４では、秘密情報解析部２８が、文字列パターンＤＢ４０を参照し、特定した項目名に対応する特徴を有する文字列を抽出する。 Next, in step S34, the confidential information analysis unit 28 refers to the character string pattern DB 40 and extracts a character string having characteristics corresponding to the specified item name.

ここで、秘密情報解析部２８は、まず、項目名「マイナンバー」についての処理を行うものとする。このとき、秘密情報解析部２８は、図１０（ａ）のリストの中から、１２桁の数値であり、かつマイナンバーチェックの条件をクリアする文字列を特定する。例えば、図１０（ａ）のリスト中から、１２桁の数値として、図１１（ａ）に示す（Ａ）～（Ｇ）の７つの文字列が特定されたとする。また、特定された７つの文字列のうち、マイナンバーチェックの条件をクリアする文字列は、（Ａ）文字列「１２３４５６７８９０１２」と、（Ｃ）「２３４５６７８９０１２３」、（Ｇ）「３４５６７８９０１２３４」となったものとする。 Here, it is assumed that the confidential information analysis unit 28 first performs processing for the item name "my number". At this time, the secret information analysis unit 28 identifies a character string that is a 12-digit numerical value and that satisfies the My Number check conditions from the list of FIG. 10(a). For example, assume that seven character strings (A) to (G) shown in FIG. 11(a) are specified as 12-digit numerical values from the list in FIG. 10(a). In addition, among the 7 specified character strings, the character strings that clear the conditions of the My Number check are (A) character string “123456789012”, (C) “234567890123”, and (G) “345678901234”. shall be

また、秘密情報解析部２８は、上記マイナンバーチェックの条件を満たす文字列であっても、マスク領域から除外する領域に存在する文字列は除外する。図５（ａ）の例では、マスク領域から除外する領域が、従業員番号欄画像.PNGで特定されている。従業員番号欄画像.PNGは、図５（ｂ）に示すような画像である。この場合、秘密情報解析部２８は、項目名「従業員番号」の下側の欄の領域に存在する１２桁の文字列は、マスクしないこととする。図８の例では、当該領域に文字列「９９９９９９９９９９９９」や「８８８８８８８８８８８８８」が存在しているので、これらの文字列については、仮にマイナンバーチェックの条件を満たす場合でも除外される。この結果、項目名「マイナンバー」に対応する特徴を有する文字列のリストとして、図１１（ｂ）に示すようなリストが得られる。 In addition, the secret information analysis unit 28 excludes character strings that exist in areas excluded from the mask area, even if the character strings satisfy the My Number check conditions. In the example of FIG. 5(a), the area to be excluded from the mask area is specified by the employee number column image.PNG. The employee number column image.PNG is an image as shown in FIG. 5(b). In this case, the confidential information analysis unit 28 does not mask the 12-digit character string existing in the field under the item name "employee number". In the example of FIG. 8, since the character strings "999999999999" and "8888888888888" exist in the area, these character strings are excluded even if they meet the My Number check conditions. As a result, a list as shown in FIG. 11B is obtained as a list of character strings having characteristics corresponding to the item name "my number".

本実施形態では、上述したように、マスク領域から除外する領域を定義するために「従業員番号欄画像.PNG」のような画像（図５（ｂ））を用いることとしている。これにより、利用者が表をどのような大きさ（拡大率）でキャプチャしても、マスク領域から除外する箇所を精度よく特定することができる。 In this embodiment, as described above, an image such as "employee number column image.PNG" (FIG. 5(b)) is used to define the area to be excluded from the mask area. As a result, regardless of the size (enlargement ratio) of the table captured by the user, it is possible to accurately specify the portion to be excluded from the mask area.

また、秘密情報解析部２８は、項目名「性別」についての処理も行う。この場合、秘密情報解析部２８は、図１０（ａ）のリストから、「男」、「女」、「男性」、「女性」のいずれかである文字列（図１２（ａ）の（Ｈ）、（Ｉ）参照）を、特定する。この結果、項目名「性別」に対応する特徴を有する文字列のリストとして、図１２（ｂ）に示すようなリストが得られる。 The secret information analysis unit 28 also processes the item name "gender". In this case, the confidential information analysis unit 28 selects a character string ((H ), see (I)). As a result, a list as shown in FIG. 12B is obtained as a list of character strings having characteristics corresponding to the item name "sex".

次いで、ステップＳ３６では、秘密情報解析部２８が、項目名に対応する特徴を有する文字列が抽出できたか否かを判断する。このステップＳ３６の判断が否定された場合、すなわちステップＳ３４において文字列を特定できなかった場合には、ステップＳ５０に移行し、秘密情報解析部２８はマスク処理を行う箇所が無かった旨を利用者端末７０に通知する。利用者は、この通知を参照することで、キャプチャした画像に秘匿すべき情報がないことを確認できるので、安心して、キャプチャした画像をマニュアル等の作成に利用することができる。その後は、図７の全処理を終了する。 Next, in step S36, the secret information analysis unit 28 determines whether or not a character string having characteristics corresponding to the item name has been extracted. If the determination in step S36 is negative, that is, if the character string could not be specified in step S34, the process proceeds to step S50, and the secret information analysis unit 28 notifies the user that there was no portion to be masked. The terminal 70 is notified. By referring to this notification, the user can confirm that there is no information to be kept confidential in the captured image, so that the captured image can be used to create a manual or the like with peace of mind. After that, all the processing in FIG. 7 is terminated.

一方、ステップＳ３６の判断が肯定された場合、すなわち、図１１（ｂ）、図１２（ｂ）に示すようにステップＳ３４において文字列を特定できた場合には、ステップＳ３８に移行する。ステップＳ３８に移行すると、秘密情報解析部２８は、ステップＳ３４において抽出できた文字列に対応する項目名を１つ選択する。ここでは、項目名「マイナンバー」が選択されるものとする。 On the other hand, if the determination in step S36 is affirmative, that is, if the character string can be specified in step S34 as shown in FIGS. 11(b) and 12(b), the process proceeds to step S38. After proceeding to step S38, the confidential information analysis unit 28 selects one item name corresponding to the character string extracted in step S34. Here, it is assumed that the item name "my number" is selected.

次いで、ステップＳ４０では、秘密情報解析部２８が、選択した項目名の左上側の領域を除く領域において、抽出した文字列の中から、選択した項目名の位置に最も近い文字列を特定する。この場合、秘密情報解析部２８は、図１０（ｂ）の項目名「マイナンバー」の座標と、図１１（ｂ）のリストに含まれる文字列の座標を参照する。そして、秘密情報解析部２８は、図１３に示すように項目名「マイナンバー」の原点から見て、第２象限を除く領域（第１、第３、第４象限）に存在し、項目名「マイナンバー」の原点から最も原点が近い文字列（第１文字列）を特定する。このように第２象限を除外するのは、第２象限には、選択している項目名「マイナンバー」に対応する文字列が存在する可能性が低いからであり、第２象限を除外することで、処理を簡素化することができる。ここでは、図１１（ｂ）のリストのうち、文字列「１２３４５６７８９０１２」が特定されたものとする。 Next, in step S40, the secret information analysis unit 28 identifies the character string closest to the position of the selected item name from among the extracted character strings in the area excluding the upper left area of the selected item name. In this case, the secret information analysis unit 28 refers to the coordinates of the item name "my number" in FIG. 10(b) and the coordinates of the character strings included in the list in FIG. 11(b). Then, as shown in FIG. 13, the confidential information analysis unit 28 exists in the areas (first, third, and fourth quadrants) excluding the second quadrant when viewed from the origin of the item name "my number". Identify the character string (first character string) closest to the origin of "my number". The reason why the second quadrant is excluded in this way is that there is a low possibility that the character string corresponding to the selected item name "my number" exists in the second quadrant, so the second quadrant is excluded. Therefore, the processing can be simplified. Here, it is assumed that the character string "123456789012" is identified from the list in FIG. 11(b).

次いで、ステップＳ４２では、秘密情報解析部２８が、抽出した文字列（図１１（ｂ））の中から、特定した文字列（第１文字列）を基準として、縦横に並ぶ文字列（第２文字列）を特定する。具体的には、秘密情報解析部２８は、図１４に示すように、文字列「１２３４５６７８９０１２」から縦方向に探索することで、文字列「２３４５６７８９０１２３」を特定する。また、秘密情報解析部２８は、文字列「１２３４５６７８９０１２」から横方向に探索したり、文字列「２３４５６７８９０１２３」から横方向に探索するが（図１４の破線矢印参照）、この場合には、新たな文字列は特定されない。したがって、本実施形態では、秘密情報解析部２８は、図１５（ａ）、図１５（ｂ）に示すようなマスク対象領域の情報（マスク処理をすべき文字列を示す情報）を生成し、マスク処理部３０に送信する。なお、例えば、文字列「１２３４５６７８９０１２」や「２３４５６７８９０１２３」から横方向に探索した結果、新たな文字列が特定された場合には、秘密情報解析部２８は、新たに特定された文字列を起点として縦方向や横方向に文字列の探索を行うものとする。 Next, in step S42, the secret information analysis unit 28 uses the specified character string (first character string) from among the extracted character strings (FIG. 11(b)) as a reference, and the character strings (second character string). Specifically, as shown in FIG. 14, the secret information analysis unit 28 specifies the character string "234567890123" by searching in the vertical direction from the character string "123456789012". In addition, the secret information analysis unit 28 searches in the horizontal direction from the character string "123456789012" and searches in the horizontal direction from the character string "234567890123" (see the dashed arrow in FIG. 14). The string is unspecified. Therefore, in the present embodiment, the confidential information analysis unit 28 generates information of mask target regions (information indicating character strings to be masked) as shown in FIGS. 15(a) and 15(b), It is transmitted to the mask processing unit 30 . Note that, for example, when a new character string is specified as a result of searching in the horizontal direction from the character strings "123456789012" and "234567890123", the confidential information analysis unit 28 uses the newly specified character string as a starting point. It is assumed that character strings are searched vertically and horizontally.

次いで、ステップＳ４４では、秘密情報解析部２８が、マスク領域追加処理を実行し、追加対象の文字列を特定する。この場合、秘密情報解析部２８は、図５（ａ）の文字列パターンＤＢ４０を参照して、選択している項目名に対応するチェックルールとして「マスク領域追加」が存在しているかを判断する。そして、存在している場合には、秘密情報解析部２８は、当該マスク領域追加に対応付けられている画像（ここでは「備考欄画像.PNG」）を参照する。ここで、「備考欄画像.PNG」は、図５（ｃ）に示すような画像であるとする。秘密情報解析部２８は、この「備考欄画像.PNG」と同一の部分が、図８の画像データ中に存在するかを判断する。図８の場合、図５（ｃ）と同一の部分が存在しているので、その部分の下側の欄に存在する１２桁の数値で、マイナンバーチェックの条件を満たす文字列をマスク対象とする。図８の例では、「３４５６７８９０１２３４」がマスク対象領域となる。したがって、秘密情報解析部２８は、図１５（ｃ）に示すようなマスク対象領域の情報を生成し、マスク処理部３０に送信する。 Next, in step S44, the secret information analysis unit 28 executes mask area addition processing to specify a character string to be added. In this case, the confidential information analysis unit 28 refers to the character string pattern DB 40 of FIG. 5A and determines whether or not there is a check rule "addition of mask area" corresponding to the selected item name. . Then, if it exists, the secret information analysis unit 28 refers to the image (here, "remarks column image.PNG") associated with the mask area addition. Here, it is assumed that "remark column image.PNG" is an image as shown in FIG. 5(c). The confidential information analysis unit 28 determines whether or not the same portion as this "remark column image.PNG" exists in the image data of FIG. In the case of FIG. 8, since there is the same part as in FIG. 5(c), the 12-digit numerical value existing in the column below that part and the character string that satisfies the My Number check conditions is to be masked. do. In the example of FIG. 8, "345678901234" is the mask target area. Therefore, the confidential information analysis unit 28 generates information on the mask target area as shown in FIG.

なお、本実施形態では、上述したように、マスク領域に追加する領域を定義するために図５（ｃ）に示す「備考欄画像.PNG」のような画像を用いることとしている。これにより、利用者が表をどのような大きさ（拡大率）でキャプチャしても、マスク領域を精度よく特定することができる。 In this embodiment, as described above, an image such as "remark column image.PNG" shown in FIG. 5C is used to define the area to be added to the mask area. This allows the user to accurately specify the mask area regardless of the size (enlargement ratio) of the table captured by the user.

次いで、ステップＳ４６では、秘密情報解析部２８が、項目名を全て選択したか否かを判断する。ここでの判断が否定されると、秘密情報解析部２８は、ステップＳ３８に戻る。 Next, in step S46, the secret information analysis unit 28 determines whether or not all item names have been selected. If the determination here is denied, the secret information analysis unit 28 returns to step S38.

ステップＳ３８に戻ると、秘密情報解析部２８は、ステップＳ３４において抽出できた文字列に対応する項目名を１つ選択する。ここでは、項目名「性別」が選択されるものとする。 Returning to step S38, the secret information analysis unit 28 selects one item name corresponding to the character string extracted in step S34. Here, it is assumed that the item name "gender" is selected.

次いで、ステップＳ４０では、秘密情報解析部２８が、選択した項目名「性別」の原点を基準とした左上の領域を除く領域において、抽出した文字列（図１２（ｂ））の中から、選択した項目名の位置に最も近い文字列を特定する。この場合、図１２（ｂ）のリストに含まれる文字列の中から、図８において上側に位置する「男性」の文字列が特定される。 Next, in step S40, the secret information analysis unit 28 selects from the extracted character strings (Fig. 12(b)) in the area excluding the upper left area based on the origin of the selected item name "sex". Identifies the string closest to the position of the item name entered. In this case, the character string "Male" positioned at the top in FIG. 8 is identified from among the character strings included in the list of FIG. 12(b).

次いで、ステップＳ４２では、秘密情報解析部２８が、抽出した文字列（図１２（ｂ））の中から、ステップＳ４０で特定した文字列を基準として、縦横に並ぶ文字列を特定する。これにより、図８において下側に位置する「男性」の文字列が特定される。本実施形態では、秘密情報解析部２８は、図１５（ｄ）、図１５（ｅ）に示すようなマスク対象領域の情報を生成し、マスク処理部３０に送信する。 Next, in step S42, the confidential information analysis unit 28 identifies the character strings arranged vertically and horizontally from the extracted character strings (FIG. 12(b)) with reference to the character string identified in step S40. As a result, the character string of "male" located on the lower side in FIG. 8 is specified. In the present embodiment, the secret information analysis unit 28 generates information on the mask target area as shown in FIGS.

次いで、ステップＳ４４では、秘密情報解析部２８が、マスク領域追加処理を実行し、追加対象の文字列を特定する。なお、図５（ａ）に示すように、選択中の項目名「性別」には「マスク領域追加」のチェックルールが存在していないので、ここでの処理は行われず、ステップＳ４６に移行する。 Next, in step S44, the secret information analysis unit 28 executes mask area addition processing to specify a character string to be added. As shown in FIG. 5(a), since there is no check rule for "add mask area" for the selected item name "sex", the processing here is not performed, and the process proceeds to step S46. .

ステップＳ４６に移行すると、秘密情報解析部２８は、項目名を全て選択したか否かを判断する。ここでの判断が肯定されると、秘密情報解析部２８は、ステップＳ４８に移行する。 After proceeding to step S46, the secret information analysis unit 28 determines whether or not all item names have been selected. If the determination here is affirmative, the secret information analysis unit 28 proceeds to step S48.

ステップＳ４８では、マスク処理部３０が、画像データのうち、特定された文字列の位置をマスク処理する。具体的には、マスク処理部３０は、秘密情報解析部２８から送信されてきた図１５（ａ）～図１５（ｅ）に示すマスク対象領域の情報に基づいて、図８の画像データを図１６に示すようにマスク処理する。そして、マスク画像送信部３２は、マスク処理後の画像データ（図１６）を利用者端末７０に送信する。これにより、利用者は、人手で画像データにマスク処理を行う必要がなくなる。また、本実施形態では、精度よく画像データがマスク処理されるため、秘匿すべき情報が漏洩するのを防止することができる。以上により、図７の全処理が終了する。 In step S48, the mask processing unit 30 performs mask processing on the position of the specified character string in the image data. Specifically, the mask processing unit 30 converts the image data shown in FIG. Mask processing is performed as shown in 16 . Then, the masked image transmission unit 32 transmits the masked image data ( FIG. 16 ) to the user terminal 70 . This eliminates the need for the user to manually mask the image data. In addition, in the present embodiment, image data is masked with high accuracy, so that confidential information can be prevented from being leaked. With the above, all the processing in FIG. 7 ends.

なお、本実施形態では、図１７（ａ）に示すような表（３列目以降の項目名が省略されている表）がキャプチャされた場合でも、マスク処理すべき文字列を精度よく特定することができる。この場合、項目名「マイナンバー」が特定され（（１）参照）、項目名に対応する特徴を有する文字列のうち最も近い文字列（第１文字列）が特定される（（２）参照）。そして、（２）で特定された最も近い文字列から縦横方向に探索されることで、同様の特徴を有する文字列（第２文字列）が特定される（（３）～（５）参照）。なお、本処理においては、（２）の文字列を起点として、縦方向に探索をし、縦方向の文字列の探索が完了したら、（２）の文字列と、縦方向の探索で見つかった（３）の文字列を起点として横方向への探索を行う。また、横方向の探索で新たに見つかった（４）、（５）の文字列からも縦方向及び横方向に探索を行う。なお、既に見つかっている文字列が再度見つかった場合には、その文字列からは再度探索は行わないこととする。 Note that, in this embodiment, even when a table such as that shown in FIG. 17A (a table in which item names in the third and subsequent columns are omitted) is captured, character strings to be masked can be specified with high accuracy. be able to. In this case, the item name "my number" is identified (see (1)), and the closest character string (first character string) among character strings having characteristics corresponding to the item name is identified (see (2) ). Then, by searching in the vertical and horizontal directions from the closest character string identified in (2), a character string (second character string) having similar characteristics is identified (see (3) to (5)). . In this process, starting from the character string in (2), the search is performed in the vertical direction. A horizontal search is performed starting from the character string in (3). The character strings (4) and (5) newly found in the horizontal search are also searched vertically and horizontally. If a character string that has already been found is found again, the character string is not searched again.

また、図１７（ｂ）に示すような表（項目名「マイナンバー」よりも上側にマイナンバーの文字列が表示された表）がキャプチャされた場合でも、マスク処理すべき文字列を精度よく特定することができる。この場合、（１）項目名「マイナンバー」が特定され、（２）項目名に対応する特徴を有する文字列のうち最も近い文字列（第１文字列）が特定される。そして、（３）～（５）最も近い文字列から縦横方向に探索して同様の特徴を有する文字列（第２文字列）が特定される。このように、様々な形式の表がキャプチャされた画像データであっても、図８の画像データと同様に、マスク処理を精度よく行うことができる。 In addition, even when a table such as that shown in FIG. can be specified. In this case, (1) the item name “my number” is identified, and (2) the closest character string (first character string) among character strings having characteristics corresponding to the item name is identified. Then, (3) to (5) a character string having similar characteristics (second character string) is specified by searching in the vertical and horizontal directions from the closest character string. In this way, even for image data in which tables in various formats are captured, mask processing can be performed with high accuracy, as with the image data in FIG.

これまでの説明から明らかなように、本実施形態では、秘密情報解析部２８により、画像データから所定の項目名を抽出する抽出部、画像データから項目名に対応する文字列の特徴を満たす文字列を特定する第１、第２特定部としての機能が実現されている。 As is clear from the above description, in the present embodiment, the secret information analysis unit 28 uses an extraction unit that extracts a predetermined item name from the image data, a character string that satisfies the characteristics of the character string corresponding to the item name from the image data. Functions as first and second specifying units for specifying columns are realized.

以上、詳細に説明したように、本実施形態によると、ＯＣＲ処理部２６は、画像データに含まれる文字列を取得する（Ｓ３０）。また、秘密情報解析部２８は、文字列パターンＤＢ４０を参照して、取得した文字列の中からマスク対象の項目名を特定する（Ｓ３２）。また、秘密情報解析部２８は、文字列パターンＤＢ４０を参照して、画像データに含まれる文字列の中から、特定した項目名に対応する文字列の特徴を満たす文字列を特定する（Ｓ３４、Ｓ４０）。また、秘密情報解析部２８は、特定した文字列を基準として、縦方向及び横方向に並ぶ、項目名に対応する文字列の特徴を満たす文字列を特定する（Ｓ４２）。そして、マスク処理部３０は、秘密情報解析部２８の処理結果に基づいて、画像データをマスク処理する（Ｓ４８）。すなわち、秘密情報解析部２８は、マスクするのが好ましい文字列の特徴を有する文字列（第１文字列）を基準として、同様の特徴を有する文字列が配置されている可能性の高い位置を探索し、同様の特徴を有する文字列（第２文字列）を特定する。これにより、精度よくマスク対象の文字列を特定することができる。したがって、サーバ１０によれば、マスク対象の文字列が精度よくマスク処理された画像データを自動的に生成し、利用者端末７０に提供することができる。また、サーバ１０において画像データが自動的にマスク処理されることから、利用者は、画像データを手動でマスク処理しなくてもよくなる。例えば、マスク処理の方法としては、項目名の所定方向（右方向や下方向）に存在する所定文字数の文字列をマスクする方法がある。しかしながら、このような方法では、図１７（ａ）や図１７（ｂ）のような表を含む画像データを適切にマスクすることは難しい。一方、本実施形態では、図１７（ａ）や図１７（ｂ）のような表を含む画像データであっても、適切にマスク処理を行うことができる。 As described above in detail, according to the present embodiment, the OCR processor 26 acquires character strings included in image data (S30). The confidential information analysis unit 28 also refers to the character string pattern DB 40 to identify the item name to be masked from the obtained character strings (S32). In addition, the confidential information analysis unit 28 refers to the character string pattern DB 40 to identify, from among the character strings included in the image data, a character string that satisfies the characteristics of the character string corresponding to the identified item name (S34, S40). Based on the identified character string, the secret information analysis unit 28 identifies a character string that satisfies the characteristics of the character strings corresponding to the item names that are aligned vertically and horizontally (S42). Then, the mask processing unit 30 performs mask processing on the image data based on the processing result of the confidential information analysis unit 28 (S48). That is, the secret information analysis unit 28 uses a character string (first character string) having characteristics of a character string that is preferably masked as a reference, and identifies positions where character strings having similar characteristics are likely to be placed. A search is made to identify a character string (second character string) having similar characteristics. As a result, it is possible to specify the character string to be masked with high accuracy. Therefore, according to the server 10 , it is possible to automatically generate image data in which character strings to be masked are accurately masked, and to provide the image data to the user terminal 70 . Further, since the image data is automatically masked by the server 10, the user does not have to manually mask the image data. For example, as a method of mask processing, there is a method of masking a character string of a predetermined number of characters existing in a predetermined direction (to the right or downward) of the item name. However, with such a method, it is difficult to appropriately mask image data including tables such as those shown in FIGS. 17(a) and 17(b). On the other hand, in the present embodiment, it is possible to appropriately perform mask processing even for image data including tables such as those shown in FIGS. 17A and 17B.

また、本実施形態では、ステップＳ４０において、秘密情報解析部２８は、選択している項目名の左上側（第２象限）を除く範囲において、項目名に対応する特徴を有する文字列を１つ特定する。このように、処理において、第２象限を除外することで、処理の簡素化を図ることができる。 Further, in this embodiment, in step S40, the secret information analysis unit 28 extracts one character string having characteristics corresponding to the item name in a range excluding the upper left side (second quadrant) of the selected item name. Identify. In this way, by excluding the second quadrant in the processing, the processing can be simplified.

また、本実施形態では、図５（ａ）の「チェックルール」のフィールドに、マスク領域を追加するために用いる画像（例えば、備考欄画像.PNG）を格納しておくことができる。このように、マスク領域を特定する画像を格納しておくことで、表がどのような大きさ（拡大率）でキャプチャされたとしても、マスク領域に追加する領域を精度よく特定することができる。 In this embodiment, an image (for example, remarks column image.PNG) used for adding a mask area can be stored in the "check rule" field in FIG. 5(a). By storing an image that specifies the mask area in this way, the area to be added to the mask area can be specified with high accuracy no matter what size (magnification) the table is captured. .

なお、上記実施形態では、秘密情報解析部２８は、ステップＳ４０において、項目名に対応する特徴を有する文字列のうち、項目名の最も近くに存在する文字列を特定しているが、これに限られるものではない。例えば、秘密情報解析部２８は、項目名に対応する特徴を有する文字列の中からランダムに１つの文字列を特定してもよい。 In the above-described embodiment, the confidential information analysis unit 28 identifies the character string closest to the item name among the character strings having characteristics corresponding to the item name in step S40. It is not limited. For example, the secret information analysis unit 28 may randomly specify one character string from the character strings having characteristics corresponding to the item name.

なお、上記実施形態では、秘密情報解析部２８は、ステップＳ４０において、項目名の原点を基準として第２象限以外の範囲で、項目名の位置に最も近い文字列を特定する場合について説明したが、これに限られるものではない。例えば、ステップＳ３４において、秘密情報解析部２８が文字列を抽出する範囲から、第２象限を除外してもよい。また、ステップＳ４２において、秘密情報解析部２８が縦横に探索する際に、第２象限を除外することとしてもよい。あるいは、ステップＳ４０やステップＳ３４、Ｓ４２において、第２象限を除外せずに各処理を行うこととしてもよい。 In the above embodiment, the confidential information analysis unit 28 identifies the character string closest to the position of the item name in a range other than the second quadrant with respect to the origin of the item name in step S40. , but not limited to these. For example, in step S34, the second quadrant may be excluded from the range from which the confidential information analysis unit 28 extracts character strings. Further, in step S42, the second quadrant may be excluded when the secret information analysis unit 28 searches vertically and horizontally. Alternatively, in steps S40, S34, and S42, each process may be performed without excluding the second quadrant.

なお、上記実施形態では、マスク処理部３０は、文字列を黒塗りの矩形形状でマスクする場合について説明したが、これに限られるものではない。例えば、文字列の文字数がわかるように、１文字ずつマスクすることとしてもよい。また、モザイク処理を行ったり、任意の文字（例えばアルファベットや×など）で置き換える処理を行うこととしてもよい。 In the above-described embodiment, the mask processing unit 30 masks the character string in a black rectangular shape, but the mask processing unit 30 is not limited to this. For example, each character may be masked so that the number of characters in the character string can be known. Alternatively, mosaic processing may be performed, or processing for replacement with arbitrary characters (for example, alphabetic characters, x, etc.) may be performed.

なお、上記の処理機能は、コンピュータによって実現することができる。その場合、処理装置が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記憶媒体（ただし、搬送波は除く）に記録しておくことができる。 Note that the processing functions described above can be realized by a computer. In that case, a program is provided that describes the processing contents of the functions that the processing device should have. By executing the program on a computer, the above processing functions are realized on the computer. A program describing the processing content can be recorded in a computer-readable storage medium (excluding carrier waves).

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ（Digital Versatile Disc）、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）などの可搬型記憶媒体の形態で販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When a program is distributed, it is sold in the form of a portable storage medium such as a DVD (Digital Versatile Disc) or a CD-ROM (Compact Disc Read Only Memory) on which the program is recorded. It is also possible to store the program in the storage device of the server computer and transfer the program from the server computer to another computer via the network.

プログラムを実行するコンピュータは、例えば、可搬型記憶媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記憶媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行することもできる。 A computer that executes a program stores, for example, a program recorded on a portable storage medium or a program transferred from a server computer in its own storage device. The computer then reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable storage medium and execute processing according to the program. In addition, the computer can also execute processing in accordance with the received program each time the program is transferred from the server computer.

上述した実施形態は本発明の好適な実施の例である。但し、これに限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変形実施可能である。 The embodiments described above are examples of preferred implementations of the present invention. However, the present invention is not limited to this, and various modifications can be made without departing from the spirit of the present invention.

なお、以上の実施形態の説明に関して、更に以下の付記を開示する。
（付記１）画像データに含まれる文字列を取得し、
マスク対象の項目名と、該項目名に対応する文字列の特徴とを関連付けて記憶する記憶部を参照して、取得した前記文字列の中から前記マスク対象の項目名を抽出し、
前記記憶部を参照して、前記画像データに含まれる文字列の中から、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第１文字列を特定し、
特定した前記第１文字列を基準として、縦方向及び横方向に並ぶ、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第２文字列を特定し、
前記画像データの、前記第１文字列と前記第２文字列をマスクする、
処理をコンピュータに実行させるための画像データ編集プログラム。
（付記２）前記第１文字列を特定する処理では、抽出した前記マスク対象の項目名の左上側を除く範囲に存在する文字列の中から、前記第１文字列を１つ特定する、ことを特徴とする付記１に記載の画像データ編集プログラム。
（付記３）前記第１文字列を特定する処理では、抽出した前記マスク対象の項目名に最も近い位置に存在する前記第１文字列を特定する、ことを特徴とする付記１又は２に記載の画像データ編集プログラム。
（付記４）前記記憶部は、マスク対象の領域を示す画像と、前記マスク対象の領域に対応する文字列の特徴とを関連付けて記憶しており、
前記マスクする処理では、前記画像データのうち、前記記憶部に記憶されている前記マスク対象の領域に存在し、かつ前記マスク対象の領域に対応する文字列の特徴を有する文字列をマスクする、ことを特徴とする付記１～３のいずれかに記載の画像データ編集プログラム。
（付記５）画像データに含まれる文字列を取得し、
マスク対象の項目名と、該項目名に対応する文字列の特徴とを関連付けて記憶する記憶部を参照して、取得した前記文字列の中から前記マスク対象の項目名を抽出し、
前記記憶部を参照して、前記画像データに含まれる文字列の中から、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第１文字列を特定し、
特定した前記第１文字列を基準として、縦方向及び横方向に並ぶ、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第２文字列を特定し、
前記画像データの、前記第１文字列と前記第２文字列をマスクする、
処理をコンピュータが実行することを特徴とする画像データ編集方法。
（付記６）画像データに含まれる文字列を取得する文字列取得部と、
マスク対象の項目名と、該項目名に対応する文字列の特徴とを関連付けて記憶する記憶部と、
前記記憶部を参照して、取得した前記文字列の中から前記マスク対象の項目名を抽出する抽出部と、
前記記憶部を参照して、前記画像データに含まれる文字列の中から、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第１文字列を特定する第１特定部と、
特定した前記第１文字列を基準として、縦方向及び横方向に並ぶ、抽出した前記マスク対象の項目名に対応する文字列の特徴を満たす第２文字列を特定する第２特定部と、
前記画像データの、前記第１文字列と前記第２文字列をマスクするマスク処理部と、
を備える画像データ編集装置。
（付記７）前記第１特定部は、抽出した前記マスク対象の項目名の左上側を除く範囲に存在する文字列の中から、前記第１文字列を１つ特定する、ことを特徴とする付記６に記載の画像データ編集装置。
（付記８）前記第１特定部は、抽出した前記マスク対象の項目名に最も近い位置に存在する前記第１文字列を特定する、ことを特徴とする付記６又は７に記載の画像データ編集装置。
（付記９）前記記憶部は、マスク対象の領域を示す画像と、前記マスク対象の領域に対応する文字列の特徴とを関連付けて記憶しており、
前記マスク処理部は、前記画像データのうち、前記記憶部に記憶されている前記マスク対象の領域に存在し、かつ前記マスク対象の領域に対応する文字列の特徴を有する文字列をマスクする、ことを特徴とする付記６～８のいずれかに記載の画像データ編集装置。 In addition, the following additional remarks will be disclosed with respect to the above description of the embodiment.
(Appendix 1) Get the character string included in the image data,
extracting the item name to be masked from the obtained character string by referring to a storage unit that stores an item name to be masked and a feature of a character string corresponding to the item name in association with each other;
identifying a first character string that satisfies the characteristics of the extracted character string corresponding to the item name to be masked, from among the character strings included in the image data, by referring to the storage unit;
Using the identified first character string as a reference, identifying a second character string that satisfies the characteristics of the character string corresponding to the extracted item name to be masked and arranged in the vertical and horizontal directions;
masking the first character string and the second character string of the image data;
An image data editing program for making a computer execute processing.
(Appendix 2) In the process of identifying the first character string, one of the first character strings is identified from character strings existing in a range excluding the upper left side of the extracted item name to be masked. The image data editing program according to appendix 1, characterized by:
(Appendix 3) The method according to appendix 1 or 2, characterized in that, in the process of identifying the first character string, the first character string existing at a position closest to the extracted item name to be masked is identified. image data editing program.
(Appendix 4) The storage unit associates and stores an image showing a mask target area and a feature of a character string corresponding to the mask target area,
In the masking process, among the image data, a character string that exists in the mask target area stored in the storage unit and has characteristics of a character string corresponding to the mask target area is masked. The image data editing program according to any one of Appendices 1 to 3, characterized by:
(Appendix 5) Acquire the character string included in the image data,
extracting the item name to be masked from the obtained character string by referring to a storage unit that stores an item name to be masked and a feature of a character string corresponding to the item name in association with each other;
identifying a first character string that satisfies the characteristics of the extracted character string corresponding to the item name to be masked, from among the character strings included in the image data, by referring to the storage unit;
Using the identified first character string as a reference, identifying a second character string that satisfies the characteristics of the character string corresponding to the extracted item name to be masked and arranged in the vertical and horizontal directions;
masking the first character string and the second character string of the image data;
An image data editing method characterized in that processing is executed by a computer.
(Additional remark 6) a character string obtaining unit for obtaining a character string included in the image data;
a storage unit that associates and stores an item name to be masked and a feature of a character string corresponding to the item name;
an extraction unit that refers to the storage unit and extracts the item name to be masked from the obtained character string;
a first specifying unit that refers to the storage unit and specifies, from among the character strings included in the image data, a first character string that satisfies the characteristics of the extracted character string corresponding to the item name to be masked;
a second identifying unit that identifies a second character string that satisfies the characteristics of the character string corresponding to the extracted item name to be masked and that is aligned vertically and horizontally with the identified first character string as a reference;
a mask processing unit that masks the first character string and the second character string of the image data;
An image data editing device comprising:
(Supplementary Note 7) The first identification unit identifies one of the first character strings from character strings existing in a range excluding the upper left side of the extracted item name to be masked. The image data editing device according to appendix 6.
(Supplementary note 8) The image data editing according to Supplementary note 6 or 7, wherein the first specifying unit specifies the first character string present at a position closest to the extracted item name to be masked. Device.
(Additional Note 9) The storage unit associates and stores an image showing a mask target area and a feature of a character string corresponding to the mask target area,
The mask processing unit masks a character string existing in the mask target area stored in the storage unit and having characteristics of a character string corresponding to the mask target area in the image data. The image data editing device according to any one of appendices 6 to 8, characterized by:

１０サーバ（画像データ編集装置）
２６ＯＣＲ処理部（文字列取得部）
２８秘密情報解析部（抽出部、第１特定部、第２特定部）
３０マスク処理部
４０文字列パターンＤＢ（記憶部）
10 server (image data editing device)
26 OCR processing unit (character string acquisition unit)
28 secret information analysis unit (extraction unit, first identification unit, second identification unit)
30 mask processing unit 40 character string pattern DB (storage unit)

Claims

Get the string contained in the image data,
extracting the item name to be masked from the obtained character string by referring to a storage unit that stores an item name to be masked and a feature of a character string corresponding to the item name in association with each other;
identifying a first character string that satisfies the characteristics of the extracted character string corresponding to the item name to be masked, from among the character strings included in the image data, by referring to the storage unit;
Using the identified first character string as a reference, identifying a second character string that satisfies the characteristics of the character string corresponding to the extracted item name to be masked and arranged in the vertical and horizontal directions;
masking the first character string and the second character string of the image data;
An image data editing program for making a computer execute processing.

In the process of identifying the first character string, one of the first character strings is identified from character strings existing in a range excluding the upper left side of the extracted item name to be masked. The image data editing program according to claim 1.

3. The image data according to claim 1, wherein, in the process of specifying the first character string, the first character string existing at a position closest to the extracted item name to be masked is specified. editing program.

The storage unit associates and stores an image showing a region to be masked and a feature of a character string corresponding to the region to be masked,
In the masking process, among the image data, a character string that exists in the mask target area stored in the storage unit and has characteristics of a character string corresponding to the mask target area is masked. 4. The image data editing program according to any one of claims 1 to 3, characterized by:

Get the string contained in the image data,
extracting the item name to be masked from the obtained character string by referring to a storage unit that stores an item name to be masked and a feature of a character string corresponding to the item name in association with each other;
identifying a first character string that satisfies the characteristics of the extracted character string corresponding to the item name to be masked, from among the character strings included in the image data, by referring to the storage unit;
Using the identified first character string as a reference, identifying a second character string that satisfies the characteristics of the character string corresponding to the extracted item name to be masked and arranged in the vertical and horizontal directions;
masking the first character string and the second character string of the image data;
An image data editing method characterized in that processing is executed by a computer.

a character string acquisition unit that acquires a character string included in image data;
a storage unit that associates and stores an item name to be masked and a feature of a character string corresponding to the item name;
an extraction unit that refers to the storage unit and extracts the item name to be masked from the obtained character string;
a first specifying unit that refers to the storage unit and specifies, from among the character strings included in the image data, a first character string that satisfies the characteristics of the extracted character string corresponding to the item name to be masked;
a second identifying unit that identifies a second character string that satisfies the characteristics of the character string corresponding to the extracted item name to be masked and that is aligned vertically and horizontally with the identified first character string as a reference;
a mask processing unit that masks the first character string and the second character string of the image data;
An image data editing device comprising: