JP7206488B2

JP7206488B2 - Information processing device, client terminal, control method, and program

Info

Publication number: JP7206488B2
Application number: JP2019039307A
Authority: JP
Inventors: 透藤城
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2023-01-18
Anticipated expiration: 2038-06-29
Also published as: JP2020004375A

Description

本発明は、電子メールのセキュリティ技術に関する発明であり、特に、受信した迷惑メールのセキュリティ技術に関する。 The present invention relates to electronic mail security technology, and more particularly to security technology for received unsolicited junk mail.

近年、ネットワークの発展に伴い、電子メール（以下、必要に応じて単に「メール」と称する）が広く利用されるようになった。 In recent years, with the development of networks, electronic mail (hereinafter simply referred to as "mail" as necessary) has come to be widely used.

これに伴い、受信者が望まない広告や嫌がらせ等による迷惑メールを受信してしまうケースが多数あり、なかには、不特定多数の受信者に大量に送信されるスパムメールやマルウェアが付されている不審メール等も存在する。 As a result, there are many cases where recipients receive unsolicited spam emails that contain unwanted advertisements or harassment. There is also e-mail.

このようなメールは、受信者にとって迷惑になるだけでなく、マルウェアの感染やフィッシングサイトへの誘導など、様々な脅威を引き起こす。 Such emails are not only annoying to recipients, but also pose various threats such as malware infection and guidance to phishing sites.

このようなメールは、テンプレート（メールの本文、添付ファイル、ＵＲＬ等を含む）の大部分を使いまわし、受信者の情報及び送信元の情報を変えながらメールが送信されることがある。 In such mail, most of the template (including mail text, attached files, URLs, etc.) is reused, and the mail is sometimes sent while changing recipient information and sender information.

このため、異なる送信元のメールアドレスから全く同じもしくは、ほとんど同じ本文や添付ファイル、ＵＲＬを含んだメールが送信される（このようなメールを類似メールと呼ぶ）。 For this reason, emails containing exactly the same or almost the same text, attached files, and URLs are sent from email addresses of different senders (such emails are called similar emails).

また、テンプレート部分は定期的に更新され、特に、スパムボットによるメールなどは、上記のような特徴を持つことが多い。 In addition, the template portion is updated periodically, and in particular, spambot emails often have the above characteristics.

通常のメールの運用においては、類似メールが様々な送信元から届くといった状況は少なく、例えば、空メールや受信側が送信内容を指定しているような場合等の特定の状況に限られる。 In normal mail operation, situations in which similar mails arrive from various senders are rare, and are limited to specific situations such as blank mails or cases where the recipient specifies the content of transmission.

したがって、様々な脅威から身を守るためには、送信元から送信された脅威を引き起こし得る類似メールを特定する必要がある。 Therefore, in order to protect oneself from various threats, it is necessary to identify similar emails that may cause threats sent from the sender.

このような特定を行う方法として、既に受信したメールと新たに受信したメールとの送信元の情報とメッセージＩＤとを用いて、同じメールを受信したか否かを判定する方法が開示されている（例えば、特許文献１参照）。 As a method for such identification, there is disclosed a method of determining whether or not the same mail has been received by using the information of the sender of the already received mail and the newly received mail and the message ID. (See Patent Document 1, for example).

特許第５３２６７８５号Patent No. 5326785

しかしながら、特許文献１に記載の方法では、メーリングリストと当該メーリングリストに含まれる個人リストが宛先に設定されることにより、重複したメールを受信することを防ぐことを目的として、このようなメールを受信したときに、受信者に対して重複しないようにメールを表示するものである。 However, in the method described in Patent Literature 1, a mailing list and a personal list included in the mailing list are set as destinations for the purpose of preventing duplicate mails from being received. Sometimes it will display the mail to the recipient in a unique way.

したがって、脅威を引き起こし得るメールが表示される可能性もあり、このようなメールが表示され、受信者の端末が脅威に侵されるような問題等が生じ得る。 Therefore, there is a possibility that an e-mail that may pose a threat may be displayed, and such a problem may arise that such an e-mail is displayed and the recipient's terminal is invaded by the threat.

そこで、本発明では、脅威を引き起こし得る電子メールを特定し、特定された電子メールに基づいて迷惑メールを判別するための学習用データを生成することが可能な仕組みを提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a mechanism capable of identifying emails that may pose a threat and generating learning data for identifying spam emails based on the identified emails. .

上記課題を解決するために、本発明は、コンピュータを、電子メールを受け付ける受付手段と、前記受付手段により受信した電子メールのなかから、送信元を異ならせて送信された同一または類似する電子メールを特定する特定手段と、前記特定された電子メールに係る情報を正解データとして、迷惑メールを判別するための学習用データを生成する生成手段と、として機能させるためのプログラムである。 In order to solve the above-mentioned problems, the present invention provides a computer, receiving means for receiving e-mails, and receiving the same or similar e-mails sent from different senders from the e-mails received by the receiving means. and generating means for generating learning data for discriminating spam e-mails using the information related to the e-mails identified as correct data .

本発明によれば、脅威を引き起こし得る電子メールを特定し、特定された電子メールに基づいて迷惑メールを判別するための学習用データを生成することができる、という効果を奏する。
ADVANTAGE OF THE INVENTION According to this invention, it is effective in the ability to identify the e -mail which may cause a threat, and to generate the learning data for discriminating an unsolicited junk e -mail based on the identified e -mail.

情報処理システムの概略構成の一例を示す構成図である。1 is a configuration diagram showing an example of a schematic configuration of an information processing system; FIG. メールサーバ、及びクライアント端末のハードウェア構成の一例を示すブロック図である。3 is a block diagram showing an example of hardware configurations of a mail server and a client terminal; FIG. ソフトウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of a software configuration. 迷惑メールを判定する処理の一例を示すフローチャートである。10 is a flow chart showing an example of processing for judging junk mail. 判定情報設定ファイルの構成の一例を示す構成図である。4 is a configuration diagram showing an example of the configuration of a determination information setting file; FIG. 共有情報設定ファイルの構成の一例を示す構成図である。4 is a configuration diagram showing an example of the configuration of a shared information setting file; FIG. メール情報テーブルの構成の一例を示す構成図である。4 is a configuration diagram showing an example of the configuration of a mail information table; FIG. 迷惑メールを判定する処理と既存の技術と組合せて利用するときの処理の一例を示す図である。It is a figure which shows an example of the process when using in combination with the process which determines an unsolicited junk e-mail, and an existing technique. 迷惑メールを再利用する処理の一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of processing for reusing spam e-mail; FIG. リスト画面の構成の一例を示す構成図である。FIG. 4 is a configuration diagram showing an example of the configuration of a list screen; 詳細画面の構成の一例を示す図である。It is a figure which shows an example of a structure of a detail screen.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施形態に係る情報処理システムの構成の一例を示す構成図である。 FIG. 1 is a configuration diagram showing an example configuration of an information processing system according to an embodiment of the present invention.

図１に示すように、本実施形態に係る情報処理システム１００は、メールサーバ１０１、クライアント端末１０２（少なくとも１台以上備える）、及びＬＡＮ１０３を含む構成を備えており、広域ネットワーク１０４を介して外部メールサーバ１０５と接続されている。 As shown in FIG. 1, an information processing system 100 according to this embodiment includes a mail server 101, a client terminal 102 (at least one or more), and a LAN 103. It is connected to the mail server 105 .

メールサーバ１０１は、電子メールの送受信を行うために用いられる情報処理装置であって、電子メールのメールアドレス管理や、当該メールアドレスに送信されてきた電子メールを保存する等の機能を有している。 The mail server 101 is an information processing device used for sending and receiving e-mails, and has functions such as managing e-mail addresses of e-mails and saving e-mails sent to the e-mail addresses. there is

また、外部メールサーバ１０５から送信される電子メールに対する迷惑メールを判定する処理（詳細後述）を行う。 It also performs a process (details will be described later) for judging junk e-mails for e-mails sent from the external mail server 105 .

クライアント端末１０２は、メールサーバ１０１で管理されているメールアドレスを使用して電子メールのやり取りを行うユーザが操作する端末装置である。 The client terminal 102 is a terminal device operated by a user who exchanges e-mails using mail addresses managed by the mail server 101 .

また、クライアント端末１０２は、外部メールサーバ１０５から提供される様々なコンテンツ等をユーザへ提供する端末装置でもある。 The client terminal 102 is also a terminal device that provides various contents and the like provided by the external mail server 105 to the user.

さらに、クライアント端末１０２は、ＬＡＮ１０３を介してメールサーバ１０１に記憶した本発明に関する設定やデータの参照や編集等を行うことが可能である。 Furthermore, the client terminal 102 can refer to and edit settings and data related to the present invention stored in the mail server 101 via the LAN 103 .

外部メールサーバ１０５は、様々なコンテンツ等をユーザへ提供する装置であり、サービス事業者や個人ユーザ等によって設置されたものであったり、外部のユーザが所有するメールサーバとして設置されたものであったりする。 The external mail server 105 is a device for providing various contents to users, and may be installed by a service provider or an individual user, or may be installed as a mail server owned by an external user. or

尚、メールサーバ１０１とＬＡＮ１０３との間に情報処理装置を設け、この情報処理装置において、外部メールサーバ１０５から送信される電子メールに関して、迷惑メールを判定する構成でも良い。 An information processing device may be provided between the mail server 101 and the LAN 103, and the information processing device may determine whether an email sent from the external mail server 105 is junk mail.

図２は、本発明の実施形態におけるメールサーバ１０１、及びクライアント端末１０２のハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram showing an example of the hardware configuration of the mail server 101 and client terminal 102 according to the embodiment of the present invention.

図２に示すように、メールサーバ１０１、及びクライアント端末１０２は、システムバス２０４を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３、入力コントローラ２０５、ビデオコントローラ２０６、メモリコントローラ２０７、よび通信Ｉ／Ｆコントローラ２０８が接続される。 As shown in FIG. 2, the mail server 101 and the client terminal 102 are connected via a system bus 204 to a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, and an input controller 205. , a video controller 206, a memory controller 207, and a communication I/F controller 208 are connected.

ＣＰＵ２０１は、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 comprehensively controls each device and controller connected to the system bus 204 .

ＲＯＭ２０２あるいは外部メモリ２１１は、ＣＰＵ２０１が実行する制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、本情報処理方法を実現するためのコンピュータ読み取り実行可能なプログラムおよび必要な各種データ（データテーブルを含む）を保持している。 ROM 202 or external memory 211 stores BIOS (Basic Input/Output System) and OS (Operating System), which are control programs executed by CPU 201, computer-readable executable programs for realizing this information processing method, and necessary various types of programs. Holds data (including data tables).

ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１１からＲＡＭ２０３にロードし、ロードしたプログラムを実行することで各種動作を実現する。 A RAM 203 functions as a main memory, a work area, and the like for the CPU 201 . The CPU 201 loads programs and the like necessary for executing processing from the ROM 202 or the external memory 211 to the RAM 203, and executes the loaded programs to realize various operations.

入力コントローラ２０５は、キーボード２０９や不図示のマウス等のポインティングデバイス等の入力装置からの入力を制御する。入力装置がタッチパネルの場合、ユーザがタッチパネルに表示されたアイコンやカーソルやボタンに合わせて押下（指等でタッチ）することにより、各種の指示を行うことができることとする。 The input controller 205 controls inputs from input devices such as a keyboard 209 and a pointing device such as a mouse (not shown). When the input device is a touch panel, the user can give various instructions by pressing (touching with a finger or the like) an icon, cursor, or button displayed on the touch panel.

また、タッチパネルは、マルチタッチスクリーンなどの、複数の指でタッチされた位置を検出することが可能なタッチパネルであってもよい。 Also, the touch panel may be a touch panel, such as a multi-touch screen, capable of detecting positions touched by multiple fingers.

ビデオコントローラ２０６は、ディスプレイ２１０などの外部出力装置への表示を制御する。ディスプレイは本体と一体になったノート型パソコンのディスプレイも含まれるものとする。なお、外部出力装置はディスプレイに限ったものははく、例えばプロジェクタであってもよい。また、前述のタッチ操作を受け付け可能な装置については、入力装置も提供する。 Video controller 206 controls display on an external output device such as display 210 . The display shall include the display of a notebook computer integrated with the main body. Note that the external output device is not limited to a display, and may be a projector, for example. In addition, an input device is also provided for the device capable of receiving the above-described touch operation.

なおビデオコントローラ２０６は、表示制御を行うためのビデオメモリ（ＶＲＡＭ）を制御することが可能で、ビデオメモリ領域としてＲＡＭ２０３の一部を利用することもできるし、別途専用のビデオメモリを設けることも可能である。 The video controller 206 can control a video memory (VRAM) for performing display control, and can use part of the RAM 203 as a video memory area, or provide a separate dedicated video memory. It is possible.

メモリコントローラ２０７は、外部メモリ２１１へのアクセスを制御する。外部メモリとしては、ブートプログラム、各種アプリケーション、フォントデータ、ユーザファイル、編集ファイル、および各種データ等を記憶する外部記憶装置（ハードディスク）、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等を利用可能である。 A memory controller 207 controls access to the external memory 211 . External memory can be connected to an external storage device (hard disk), flexible disk (FD), or PCMCIA card slot for storing boot programs, various applications, font data, user files, edit files, and various data via an adapter. A compact flash (registered trademark) memory or the like can be used.

通信Ｉ／Ｆコントローラ２０９は、ネットワークを介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信やＩＳＤＮなどの電話回線、および携帯電話の３Ｇ回線を用いた通信が可能である。 A communication I/F controller 209 connects and communicates with an external device via a network, and executes communication control processing in the network. For example, communication using TCP/IP, telephone lines such as ISDN, and communication using 3G lines for mobile phones are possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１０上での表示を可能としている。また、ＣＰＵ２０１は、ディスプレイ２１０上の不図示のマウスカーソル等でのユーザ指示を可能とする。 The CPU 201 enables display on the display 210 by, for example, rasterizing an outline font to a display information area in the RAM 203 . The CPU 201 also allows the user to issue instructions using a mouse cursor (not shown) on the display 210 .

次に図３を参照して、本発明の実施形態における各種装置の機能構成の一例について説明する。尚、各機能については、後述するフローチャート等と合わせて説明を行う。 Next, with reference to FIG. 3, an example of functional configurations of various devices according to the embodiment of the present invention will be described. It should be noted that each function will be described together with a flow chart and the like to be described later.

メールサーバ１０１は、メール受信部３００、抽出部３０２、記憶部３０４、判定部３０６、処理部３０８、共有部３１０、及び再利用部３１２を備えている。 The mail server 101 includes a mail reception unit 300 , an extraction unit 302 , a storage unit 304 , a determination unit 306 , a processing unit 308 , a sharing unit 310 and a reuse unit 312 .

メール受信部３００は、外部メールサーバ１０５から送信される電子メールを受信し、抽出部３０２は、メール受信部３００で受信した電子メールから所定の条件に従って、本文、添付ファイル、ＵＲＬ等に関する情報を取得する。 The mail receiving unit 300 receives an e-mail sent from the external mail server 105, and the extracting unit 302 extracts information about the text, attached file, URL, etc. from the e-mail received by the mail receiving unit 300 according to predetermined conditions. get.

記憶部３０４は、抽出部３０２によって取得した電子メールに関する情報をテーブルへ記憶したり、迷惑メールの判定を行う際に使用する定義情報を設定ファイルへ記憶する。 The storage unit 304 stores the information about the e-mail acquired by the extraction unit 302 in a table, and stores the definition information used when judging junk mail in a setting file.

判定部３０６は、メール受信部３００で受信した電子メールと記憶部３０４によって記憶した電子メールとの本文、添付ファイル、ＵＲＬ等に関しての類似度を算出し、算出した類似度が設定ファイルに設定された閾値以上であって、送信元の異なり数が設定ファイルに設定された閾値上であるか否かによって、迷惑メールであるか否かの判定を行う。 The determining unit 306 calculates the degree of similarity between the e-mail received by the mail receiving unit 300 and the e-mail stored by the storage unit 304 in terms of text, attached files, URLs, etc., and the calculated degree of similarity is set in the setting file. It is determined whether or not the mail is junk mail based on whether or not the number of different senders exceeds the threshold set in the setting file.

処理部３０８は、判定部３０６によって、迷惑メールとして判定した電子メールに対して、２次被害等が生じないように処理を行い、共有部３１０は、同様な理由により、このような電子メールに関する情報を他のシステムやユーザへ共有する。 The processing unit 308 performs processing so as not to cause secondary damage or the like for e-mails determined as junk mail by the determination unit 306. Share information with other systems and users.

再利用部３１２は、迷惑メールとして判定した電子メールに関する情報を既存の技術（スパム対策ソフトやアンチウイルスソフト）にインプットし、セキュリティ全般の精度を向上させる。 The reuse unit 312 inputs information about e-mails determined as junk mails to existing technologies (anti-spam software and anti-virus software) to improve overall security accuracy.

次に図４に示すフローチャートを用いて、本発明の実施形態におけるメールサーバ１０１が実行する迷惑メールを判定する処理について説明する。尚、本処理は、メールサーバ１０１のＣＰＵ２０１が所定の制御プログラムを読み出して実行される。 Next, the process of judging junk mail executed by the mail server 101 according to the embodiment of the present invention will be described with reference to the flowchart shown in FIG. This processing is executed by the CPU 201 of the mail server 101 reading out a predetermined control program.

ステップＳ４００では、メール受信部３００は、外部メールサーバ１０５から送信された電子メールを受信する。 In step S400, the mail receiving unit 300 receives an e-mail sent from the external mail server 105. FIG.

ステップＳ４０２では、抽出部３０２は、ステップＳ４００において受信した電子メールに関する情報を抽出する。 At step S402, the extraction unit 302 extracts information about the e-mail received at step S400.

抽出する電子メールに関する情報としては、例えば、送信元に関する情報として、送信元のメールアドレス、送信元のメールアドレスのハッシュ値（予め決めた任意のハッシュ関数により算出された値。但し、一連の処理では、同じハッシュ関数を用いる。以後、同様）、送信元のメールアドレスの任意の識別子、送信元のメールサーバのドメイン情報、送信元のメールサーバのＩＰアドレス、ＭＴＡ（ＭａｉｌＴｒａｎｓｆｅｒＡｇｅｎｔ）の経路情報などがある。 The information about the email to be extracted includes, for example, the sender's email address and the hash value of the sender's email address (a value calculated by an arbitrary predetermined hash function as information about the sender). The same hash function is used hereafter), any identifier of the sender's email address, the domain information of the sender's email server, the IP address of the sender's email server, and the route information of the MTA (Mail Transfer Agent) and so on.

また、抽出するメールに関する情報として本文に関する情報があり、本文のハッシュ値、ファジーハッシュ値（予め決めた任意のファジーハッシュ関数により算出された値。但し、一連の処理では、同じファジーハッシュ関数を用いる。以後、同様）、任意の識別子（例えば、本文を識別するための番号を振り、重複した本文があれば同一番号を振る等）、特徴量、本文全体、要素ごとのハッシュ値、要素ごとの情報（ハッシュ値、ファジーハッシュ値、任意の識別子、特徴量等。）などがある。本文の要素としては、例えば、各段落や一定サイズのブロックごとなどがある。 In addition, there is information about the text as information about the mail to be extracted, and the hash value of the text and the fuzzy hash value (value calculated by any predetermined fuzzy hash function. However, the same fuzzy hash function is used in the series of processes. Hereafter, the same applies), arbitrary identifier (for example, assign a number to identify the text, assign the same number if there is a duplicate text, etc.), feature amount, entire text, hash value for each element, There is information (hash value, fuzzy hash value, arbitrary identifier, feature amount, etc.). Elements of the text include, for example, each paragraph and each block of a certain size.

要素ごとの情報に関しては、一部の要素をランダムに入れ替えたり、いくつかの要素から組合せたりして、本文を作成するような迷惑メールが存在する。 As for the information for each element, there are unsolicited mails in which some elements are randomly exchanged or some elements are combined to create a text.

例えば、前半の段落に利用する文章リストと後半の段落に利用する文章リストから、それぞれランダムに選択して組合せて全体の文章を作成したメールや、最後の署名部分をランダムに作成したメール等が存在する。 For example, there are emails in which the whole sentence is created by randomly selecting and combining the list of sentences used in the first half of the paragraph and the list of sentences used in the second half of the paragraph, and the mail in which the signature part at the end is randomly created. exist.

したがって、要素ごとに情報を抽出することによって、迷惑メールを特定する精度を上げることが可能となる。 Therefore, by extracting information for each element, it is possible to improve the accuracy of identifying spam.

また、抽出するメールに関する情報として添付ファイルに関する情報があり、添付ファイルに関するハッシュ値、添付ファイルに関するファジーハッシュ値、任意の識別子（例えば、添付ファイルを識別するための番号を振り、重複した添付ファイルがあれば同一番号を振る等）、動作情報（サンドボックスや自動解析環境で動作させて動作ログを取得する。動作情報の例としては、、マクロ付きのＥＸＣＥＬ（登録商標）ファイルであれば、マクロによって行われる動作、通信先ＵＲＬ、ファイルの読み書きのログ等。実行ファイルでも同様に、実行ファイルを実行したときに行われる動作のログ。あるいは、通信先ＵＲＬやファイルの読み書き、レジストリの読み書き、シャットダウンや再起動などのＰＣへの命令等）、添付ファイル自身（基本的には、ハッシュ値やファジーハッシュ値だけで問題ないが、添付ファイルそのものがあると様々な方法で類似度が算出できるようになるため。）、メタデータ（例えば、ファイル名、ファイルタイプ、ファイルサイズ、作成日時、作成者名、製品名、バージョン情報、Ｅｘｉｆ、アンチウイルスソフトによる検出名等）、要素ごとのハッシュ値、要素ごとの情報（ハッシュ値、ファジーハッシュ値、任意の識別子、動作情報（添付ファイルが複数存在する場合、各添付ファイルの動作情報））等がある。 In addition, there is information about attachments as information about emails to be extracted, hash values about attachments, fuzzy hash values about attachments, arbitrary identifiers (for example, assigning numbers to identify attachments, duplicate attachments If there is, assign the same number, etc.), operation information (operate in a sandbox or automatic analysis environment and acquire an operation log. Examples of operation information include, if it is an EXCEL (registered trademark) file with a macro, the macro Actions performed by , communication destination URL, file read/write log, etc. Similarly for execution files, log of actions performed when executing the executable file.Alternatively, communication destination URL and file read/write, registry read/write, shutdown and instructions to the PC such as restarting), the attached file itself (Basically, there is no problem with just the hash value or fuzzy hash value, but if there is an attached file itself, similarity can be calculated by various methods. ), metadata (e.g., file name, file type, file size, date and time of creation, creator name, product name, version information, Exif, name detected by anti-virus software, etc.), hash value for each element, element information (hash value, fuzzy hash value, arbitrary identifier, operation information (when there are multiple attached files, operation information of each attached file)) and the like.

添付ファイルの要素として、例えば、一定サイズのブロックごとや、ファイルのセクションごと、ファイルのリソース部分、Ｏｆｆｉｃｅであれば、マクロ部分のみなどがある。 Elements of the attached file include, for example, each fixed-size block, each section of the file, the resource portion of the file, and only the macro portion in the case of Office.

要素ごとの情報に関しては、一部の要素をランダムに入れ替えたり、いくつかの要素から組合せたりして、作成されたファイルが添付された迷惑メールが存在する。 As for the information for each element, there are unsolicited e-mails attached with files created by randomly replacing some elements or combining several elements.

例えば、ファイルのリソース部分のみを書き換えた添付ファイルや、Ｏｆｆｉｃｅのマクロ部分のみを再利用し、内容は毎回書き換えている添付ファイルが存在する。 For example, there is an attached file in which only the resource portion of the file is rewritten, and an attached file in which only the macro portion of Office is reused and the content is rewritten each time.

したがって、前述と同様に、要素ごとに情報を抽出することによって、迷惑メールを特定する精度を上げることが可能となる。 Therefore, as described above, by extracting information for each element, it is possible to improve the accuracy of identifying spam.

さらに、抽出する電子メールに関する情報としてＵＲＬ等に関する情報があり、ハッシュ値、ファジーハッシュ値、任意の識別子（例えば、ＵＲＬを識別するための番号を振り、重複したＵＲＬがあれば同一番号を振る等）、動作情報（サンドボックスや自動解析環境で動作させ動作ログを取得する。動作情報の例としては、ＵＲＬにアクセスしたときのリダイレクト先や、ダウンロードされるファイル等。）、ＵＲＬ全体、メタデータ（ドメインならばＩＰアドレス、ＧｅｏＬｏｃａｔｉｏｎ（ＩＰアドレスの位置情報)、アンチウイルスソフト等の検出情報、要素ごとのハッシュ値、要素ごとの情報（ハッシュ値、ファジーハッシュ値、任意の識別子）などがある。 In addition, there is information on URL etc. as information on e-mail to be extracted, hash value, fuzzy hash value, arbitrary identifier (for example, assigning a number to identify the URL, assigning the same number if there is a duplicate URL, etc.) ), operation information (operate in a sandbox or automatic analysis environment and acquire operation logs. Examples of operation information include redirect destinations when accessing URLs, files to be downloaded, etc.), entire URLs, metadata (For domains, IP address, GeoLocation (location information of IP address), detection information such as antivirus software, hash value for each element, information for each element (hash value, fuzzy hash value, arbitrary identifier), and the like.

ＵＲＬの要素として、例えばドメイン部分、ＦＱＤＮ、ＵＲＬのパスの部分、ＵＲＬのクエリ部分などがある。 The URL elements include, for example, a domain portion, an FQDN, a URL path portion, a URL query portion, and the like.

要素ごとの情報に関しては、一部の要素をランダムに入れ替えたり、いくつかの要素から組合せたりして、ＵＲＬを作成するような迷惑メールが存在する。 Regarding the information for each element, there are unsolicited e-mails that create a URL by randomly replacing some elements or combining several elements.

例えば、ＵＲＬパスの末尾のディレクトリ部分をランダムに書き換えたＵＲＬや、ＵＲＬのクエリ部分をランダムに入れ替えたＵＲＬ等が存在する。 For example, there are URLs in which the directory part at the end of the URL path is randomly rewritten, and URLs in which the query part of the URL is randomly replaced.

その他にも、抽出する電子メールに関する情報として、送信先のメールアドレスとしてＴＯやＣＣ、ＢＣＣに設定されたアドレス、件名などを抽出してもよい。 In addition, as the information related to the e-mail to be extracted, the address set in TO, CC, and BCC as the destination e-mail address, the subject, and the like may be extracted.

各情報の任意の識別子は、ハッシュ関数以外にも、例えば、送信元のメールアドレス、本文、添付ファイル、ＵＲＬ、またこれらの要素に一つ一つ番号を振った識別子のようにデータを区別あるいは、概ね区別できるものであってもよい。 Arbitrary identifiers for each piece of information can be used to distinguish data such as sender's email address, text, attached files, URLs, and identifiers numbered one by one in addition to hash functions. , may be roughly distinguishable.

さらに、各情報は、加工してもよい。例えば、本文に関する情報ならば改行コードを削除したり、ＵＲＬ部分を削除したりしてもよい。また、添付ファイルに関する情報であれば、ファイル形式の変換や画像化、実行ファイルのアンパックなどをしてもよい。 Furthermore, each piece of information may be processed. For example, if the information is related to the text, the line feed code may be deleted, or the URL portion may be deleted. Also, if it is information related to an attached file, file format conversion, image conversion, execution file unpacking, and the like may be performed.

ステップＳ４０４では、記憶部３０４は、ステップＳ４０２において抽出した各情報をメール情報テーブル７００（図７参照）に記憶する。 In step S404, the storage unit 304 stores each piece of information extracted in step S402 in the mail information table 700 (see FIG. 7).

図７には、メール情報テーブル７００の構成が示されており、メール情報テーブル７００は、ステップＳ４０２において抽出した電子メールの情報を記憶するテーブルであり、本文に関する情報である本文のハッシュ７０２及び本文のファジーハッシュ７０４、添付ファイルに関する情報である添付ファイルのハッシュ７０６及び添付ファイルのファジーハッシュ７０８、ＵＲＬに関する情報であるＵＲＬのハッシュ７１０、送信元のメールアドレスのハッシュ７１２等の項目を含んで構成されている。 FIG. 7 shows the structure of the mail information table 700. The mail information table 700 is a table for storing the information of the e-mail extracted in step S402. fuzzy hash 704, attached file hash 706 and attached file fuzzy hash 708 that are information about the attached file, URL hash 710 that is information about the URL, hash 712 of the sender's email address, and the like. ing.

本文のハッシュ７０２には、本文に関するハッシュ値が記憶され、本文のファジーハッシュ７０４には、本文に関するファジーハッシュ値が記憶される。 The text hash 702 stores a hash value related to the text, and the text fuzzy hash 704 stores a fuzzy hash value related to the text.

添付ファイルのハッシュ７０６には、添付ファイルに関するハッシュ値が記憶され、添付ファイルのファジーハッシュ７０８には、添付ファイルに関するファジーハッシュ値が記憶される。 Attachment Hash 706 stores a hash value for the attachment, and Attachment Fuzzy Hash 708 stores a fuzzy hash value for the attachment.

ＵＲＬのハッシュ７１０には、ＵＲＬに関するハッシュ値が記憶され、送信元アドレスのハッシュ７１２には、送信元のメールアドレスに関するハッシュ値が記憶される。
尚、記憶する形式はテーブルでなくともファイルでもよい。また、メール情報テーブル７００は、情報処理システム１００内に構築されたものを利用しても良いし、外部のシステムへ構築されたものを利用してもよい。さらに、外部のシステムのデータベースとデータを交換することも可能である。 The URL hash 710 stores a hash value related to the URL, and the sender address hash 712 stores a hash value related to the email address of the sender.
Note that the storage format may be a file rather than a table. Also, the mail information table 700 may be constructed within the information processing system 100 or may be constructed in an external system. Furthermore, it is also possible to exchange data with databases of external systems.

尚、類似メールは頻繁に更新されるため、メール情報テーブル７００に記憶された古いデータを削除した場合でも十分な効果が期待できる。 Since similar e-mails are frequently updated, sufficient effects can be expected even when old data stored in the e-mail information table 700 is deleted.

削除のタイミングとしては、例えば、定期的に削除する、または、最終アクセス（本文、添付ファイル、ＵＲＬのハッシュ値、ファジーハッシュ値、識別子等を最後に画面等によって参照した時間を示す。類似メールがなければ受信した時間(テーブルに記憶した時間)になり、類似メールがあった場合は、最後に受信した時間を示す。）から一定時間経過したデータを削除してもよいし、最初に記憶されてから一定時間経過したデータを削除するなどでもよい。 The timing of deletion may be, for example, periodical deletion, or the time when the last access (text, attached file, URL hash value, fuzzy hash value, identifier, etc.) was last viewed on a screen or the like. If there is no similar mail, it will be the time it was received (the time stored in the table), and if there is a similar mail, it will indicate the last time it was received. Data that has passed a certain period of time may be deleted.

これにより、リソースの負荷を軽減させたり、迷惑メールを判定する処理の速度を向上させたりする効果がある。 This has the effect of reducing the load on resources and improving the speed of processing for judging spam.

ステップＳ４０６では、判定部３０６は、ステップＳ４０４で記憶した電子メールが既にメール情報テーブル７００に記憶されている電子メールと、本文に関する類似度が高く、送信元に関する情報の類似度が低い（送信元が異なる）場合、類似メールとして判定する。 In step S406, the determination unit 306 determines that the e-mail stored in step S404 has a high degree of similarity with the e-mail already stored in the mail information table 700 in terms of text and a low degree of similarity in terms of the source information (sender are different), it is determined as a similar mail.

あるいは、本文が同一の電子メールが、異なる送信元のメールアドレスから合計何件以上受信しているかを判定する。 Alternatively, it is determined how many e-mails with the same text have been received from different source e-mail addresses.

本ステップでは、図５に示す判定情報設定ファイル５００の本文に関する設定５０４に設定された類似度の閾値よりも、本文に関する類似度以上であるか否かによって判定を行う。 In this step, determination is made based on whether or not the degree of similarity regarding the text is greater than or equal to the similarity threshold set in the setting 504 regarding the text of the determination information setting file 500 shown in FIG.

また、判定情報設定ファイル５００の全体に関する設定５０２に設定された類似メールの件数の閾値よりも、本文が同一の電子メールであって、異なる送信元のメールアドレスから送信された電子メールの合計件数の方が高いか否かによって判定を行う。 In addition, the total number of e-mails with the same text but sent from different source e-mail addresses is greater than the threshold for the number of similar e-mails set in the setting 502 for the entire determination information setting file 500. is higher or not.

図５に示す判定情報設定ファイル５００は、本発明における各判定を行うための情報を記憶した設定ファイルである。 A determination information setting file 500 shown in FIG. 5 is a setting file that stores information for performing each determination in the present invention.

判定情報設定ファイル５００は、全体の設定５０２、本文に関する設定５０４、添付ファイルに関する設定５０６、及びＵＲＬに関する設定５０８等を含んで構成されている。 The determination information setting file 500 includes overall settings 502, settings 504 related to text, settings 506 related to attached files, settings 508 related to URLs, and the like.

判定情報設定ファイル５００は、本処理の最初に読み込んでもよいし、本処理における各ステップで必要な設定のみ読み込んでもよい。 The determination information setting file 500 may be read at the beginning of this process, or only settings necessary for each step in this process may be read.

全体の設定５０２は、全般的な処理に関する設定を記憶するものであり、本文に関する設定５０４は、本文の処理に関する設定を記憶するものであり、添付ファイルに関する設定５０６は、添付ファイルの処理に関する設定を記憶するものであり、ＵＲＬに関する設定５０４は、ＵＲＬの処理に関する設定を記憶するものである。 The overall settings 502 store settings related to general processing, the text settings 504 store settings related to text processing, and the attachment file settings 506 store settings related to attachment file processing. , and the URL-related setting 504 stores settings related to URL processing.

他の例としては、本文に関するファジーハッシュ値の類似度が、本文に関する設定５０４に設定された類似度の閾値以上となる電子メールであって、全体に関する設定５０２に設定された類似メールの件数の閾値よりも、異なる送信元のメールサーバのＩＰアドレスから送信された電子メールの合計件数の方が高いか否かによって判定してもよい。 As another example, the similarity of the fuzzy hash value for the text is equal to or greater than the similarity threshold set in the setting 504 for the text, and the number of similar emails set in the setting 502 for the whole is It may be determined whether or not the total number of e-mails sent from different source mail server IP addresses is higher than the threshold.

もしくは、本文が同一の電子メールであって、全体に関する設定５０２に設定された類似メールの件数の閾値よりも、異なる送信元のメールアドレスから送信された電子メールの合計件数よりも高く、かつ、本文の類似度が、本文に関する設定５０４に設定された類似度の閾値以上の電子メールであり、全体に関する設定５０２に設定された類似メールの件数の閾値よりも、電子メールの受信件数が高くなる、の組合せで判定するなどでもよい。 Alternatively, it is higher than the threshold for the number of similar e-mails set in the setting 502 related to the whole, and higher than the total number of e-mails sent from different source e-mail addresses, and An e-mail whose text has a degree of similarity greater than or equal to the threshold of similarity set in the setting 504 regarding the text, and the number of received e-mails is higher than the threshold of the number of similar e-mails set in the setting 502 regarding the whole. , may be used for determination.

尚、さらに、本文を前述した要素ごとに分割した情報を用いて、類似判定を行うことも可能である。この時、本文に関する設定５０４において、要素ごとに設定を行ってもよい。 Furthermore, it is also possible to perform similarity determination using information obtained by dividing the text for each element described above. At this time, in the setting 504 relating to the text, settings may be made for each element.

設定には、類似度の閾値や本文に類似する迷惑メール検知に利用するかどうかなどがあり、状況に応じて要素ごとに設定を追加する。 The settings include similarity thresholds and whether or not to use spam emails similar to the text, and add settings for each element according to the situation.

判定情報設定ファイル５００の「本文の大きさ（例えば、文字数やバイト数等）」に設定された値以上の場合のみ類似メールの判定を行うなどとしてもよい。 Similar emails may be determined only when the size of the text (for example, the number of characters, the number of bytes, etc.) of the determination information setting file 500 is greater than or equal to the value set.

ステップＳ４０８では、判定部３０６は、ステップＳ４０４で記憶した電子メールが既にメール情報テーブル７００に記憶されている電子メールと、添付ファイルに関する類似度が高く、送信元に関する情報の類似度が低い（送信元が異なる）場合、類似メールとして判定する。 In step S408, the determination unit 306 determines that the e-mail stored in step S404 and the e-mail already stored in the mail information table 700 have a high degree of similarity regarding the attached file and a low degree of similarity regarding the information regarding the sender (sent If the source is different), it is determined as a similar mail.

本ステップでは、図５に示す判定情報設定ファイル５００の添付ファイルに関する設定５０６に設定された類似度の閾値よりも、添付ファイルに関する類似度が高いか否かによって判定を行う。 In this step, it is determined whether or not the similarity of the attached file is higher than the similarity threshold set in the setting 506 of the attached file of the determination information setting file 500 shown in FIG.

例えば、添付ファイルに関するファジーハッシュ値の類似度が、添付ファイルに関する設定５０６に設定された類似度の閾値以上の電子メールであって、全体に関する設定５０２に設定された類似メールの件数の閾値よりも、異なる送信元アドレスから送信されたメールの合計件数が高くなるか否かによって判定する。 For example, the similarity of the fuzzy hash value of the attached file is equal to or higher than the threshold of similarity set in the setting 506 of the attached file, and the number of similar e-mails is greater than the threshold of the number of similar mails set in the setting 502 of the whole. , and whether or not the total number of mails sent from different source addresses increases.

もしくは、サンドボックス等の保護された領域で添付ファイルを実行した結果、動作ログ（ファイルやレジストリの読み書き、ネットワーク通信、実行したＡＰＩ等）の類似度が、添付ファイルに関する設定５０６に設定された類似度の閾値以上となる電子メールであって、全体に関する設定５０２に設定された類似メールの件数の閾値よりも、異なる送信元のメールアドレスから送信された電子メールの合計件数が高いか否かによって判定するなどでもよい。 Alternatively, as a result of executing the attached file in a protected area such as a sandbox, the similarity of the operation log (reading and writing of files and registries, network communication, executed API, etc.) is set in the setting 506 related to the attached file. Depends on whether or not the total number of e-mails equal to or higher than the degree threshold and sent from different source e-mail addresses is higher than the threshold for the number of similar e-mails set in the overall settings 502. It may be determined.

さらに、添付ファイルを前述した要素ごとに分割した情報を用いて、類似判定を行うことも可能である。 Furthermore, it is possible to perform similarity determination using information obtained by dividing the attached file for each element described above.

この時、添付ファイルに関する設定５０６において、要素ごとに設定を行ってもよい。 At this time, in the settings 506 regarding attached files, settings may be made for each element.

設定には、類似度の閾値や添付ファイルを類似する迷惑メール検知に利用するかどうかなどがあり、状況に応じて要素ごとに設定を追加する。 Settings include similarity thresholds and whether or not to use attachments to detect similar spam emails.

ステップＳ４０６では、判定部３０６は、ステップＳ４０４で記憶した電子メールが既にメール情報テーブル７００に記憶されている電子メールと、ＵＲＬに関する類似度が高く、送信元に関する情報の類似度が低い（送信元が異なる）場合、類似メールとして判定する。 In step S406, the determination unit 306 determines that the e-mail stored in step S404 has a high degree of similarity in URL with an e-mail already stored in the mail information table 700, and has a low degree of similarity in terms of sender information (sender are different), it is determined as a similar mail.

本ステップでは、図５に示す判定情報設定ファイル５００のＵＲＬに関する設定５０８に設定された類似度の閾値よりも、ＵＲＬに関する類似度が高いか否かによって判定を行う。 In this step, it is determined whether or not the similarity of the URL is higher than the similarity threshold set in the setting 508 of the URL of the determination information setting file 500 shown in FIG.

例えば、メールに記載されたＵＲＬのＦＱＤＮが同一のメールであって、全体に関する設定５０２に設定された類似メールの件数の閾値よりも、異なる送信元のメールアドレスから送信された電子メールの合計件数が高いか否かによって判定する。 For example, the total number of e-mails that have the same FQDN of the URL described in the e-mail and have been sent from different source e-mail addresses than the threshold for the number of similar e-mails set in the overall settings 502 is high or not.

もしくは、サンドボックス等の保護された領域でＵＲＬにアクセスした後のレスポンスデータの類似度が、ＵＲＬに関する設定５０８に設定された類似度以上となる電子メールであって、全体に関する設定５０２に設定された類似メールの件数の閾値よりも、異なる送信元のメールアドレスから送信された電子メールの合計件数が高いか否かによって判定するなどでもよい。 Alternatively, the similarity of the response data after accessing the URL in a protected area such as a sandbox is equal to or higher than the similarity set in the setting 508 regarding the URL, and the e-mail is set in the setting 502 regarding the whole. It is also possible to determine whether or not the total number of e-mails sent from different source e-mail addresses is higher than the threshold value for the number of similar e-mails.

さらに、ＵＲＬを要素ごとに分割した情報を用いて、類似判定を行うことも可能である。 Furthermore, it is possible to perform similarity determination using information obtained by dividing the URL into elements.

この時、ＵＲＬに関する設定５０８において、要素ごとに設定を行ってもよい。 At this time, in the URL-related setting 508, setting may be made for each element.

設定には、類似度の閾値やＵＲＬが類似する迷惑メール検知に利用するかどうかなどがあり、状況に応じて要素ごとに設定を追加する。 The settings include similarity thresholds and whether or not to use spam emails with similar URLs.

ステップＳ４１２では、判定部３０６は、ステップＳ４０６において求めた件数やステップＳ４０８において求めた件数、あるいは、ステップＳ４１０において求めた件数が一定以上の場合、その類似メールを迷惑メールと判定し、ステップＳ４１４へ処理を進め、迷惑メールとして判定しない場合、ステップＳ４１６へ処理を進める。 In step S412, if the number of cases obtained in step S406, the number of cases obtained in step S408, or the number of cases obtained in step S410 exceeds a certain value, the determination unit 306 determines that the similar e-mail is junk e-mail, and proceeds to step S414. If the process proceeds and the mail is not determined as junk mail, the process proceeds to step S416.

また、ステップＳ４０６の件数、ステップＳ４０８の件数、及びステップＳ４１０の件数の組合せで判断してもよい。 Alternatively, the determination may be made based on a combination of the number of cases in step S406, the number of cases in step S408, and the number of cases in step S410.

また、完全一致の場合と類似の場合で、閾値を変えたり、組合せを変えたりしてもよい。例えば、本文が同一の電子メールを異なる送信元のメールアドレスから３件以上受信した場合、その本文の内容と同一の内容の電子メールを全て迷惑メールとして扱う。 Also, the threshold value or the combination may be changed between the case of exact match and the case of similarity. For example, if three or more e-mails with the same text are received from different sender e-mail addresses, all the e-mails with the same content as the text are treated as junk mail.

または、本文が８０％以上類似していて、添付ファイルに関するファジーハッシュ値も９０％以上類似している電子メールが、異なる送信元のメールアドレスから３件以上受信した場合、当該電子メール（その電子メールと本文が８０％以上類似し、添付ファイルと９０％以上類似しているメール）を迷惑メールとして扱うことにしてもよい。 Alternatively, if three or more e-mails with texts that are 80% or more similar in text and 90% or more similar fuzzy hash values for attached files are received from different sender e-mail addresses, the e-mail (that e-mail An email whose text is 80% or more similar to the email and 90% or more similar to the attached file) may be treated as spam.

ステップＳ４１４では、処理部３０８は、ステップＳ４１２において迷惑メールとして判定した電子メールに対する任意の処理を行う。 In step S414, the processing unit 308 performs arbitrary processing on the e-mail determined as junk mail in step S412.

迷惑メールに対する任意の処理として、例えば、対象電子メールの削除、対象電子メールへの警告文の追加、添付ファイルやＵＲＬの無害化、仮想環境上での実行などがある。 Arbitrary processing for unsolicited e-mail includes, for example, deletion of target e-mail, addition of warning text to target e-mail, detoxification of attached files and URLs, and execution on a virtual environment.

また、任意の処理は、完全一致で検知した場合や類似として検知した場合など、検知の仕方によって異なる処理を行ってもよい。例えば、完全一致で検知した場合は、電子メールを削除し、類似として検知した場合は、警告文を追加するなどがある。 In addition, the arbitrary processing may be performed differently depending on how the detection is performed, such as when detecting a perfect match or when detecting a similarity. For example, if an exact match is detected, the e-mail is deleted, and if similarity is detected, a warning message is added.

さらに、組合せを利用してもよい。例えば、本文の類似度が高い電子メールは一定未満の件数だが、添付ファイルの類似度が高い電子メールは一定以上の件数が存在する場合は、削除せずに添付ファイルを無害化するなどの処理でもよい。このように検知条件によって、処理を変更することで精度や利便性が向上する。 Additionally, combinations may be utilized. For example, if the number of e-mails with high similarity in text is less than a certain number, but there are more than a certain number of e-mails with high similarity in attached files, the attached files are rendered harmless without being deleted. It's okay. By changing the processing according to the detection conditions in this manner, accuracy and convenience are improved.

ステップＳ４１６では、共有部３１０は、ステップＳ４０４において記憶した各情報をユーザと共有する。 In step S416, sharing unit 310 shares each piece of information stored in step S404 with the user.

共有先としては、例えば、他のユーザや経路上のサーバ、拠点内の別のサーバ、拠点間での共有、あるいは、他の企業や組織との共有、さらに開発元との共有などがある。 Examples of sharing destinations include other users, a server on a route, another server within a base, sharing between bases, sharing with other companies or organizations, and sharing with a developer.

経路上のサーバと各情報を共有する方法としては、例えば、外部メールサーバ１０５からメールサーバＡから、メールサーバＢあるいはメールサーバＣに電子メールを分岐して送信するような構成を備えている場合、メールサーバＢ、Ｃで本システムを通常利用する。 As a method of sharing each information with a server on the route, for example, when the external mail server 105 has a configuration in which an e-mail is sent from the mail server A to the mail server B or the mail server C by branching. , mail servers B and C normally use this system.

メールサーバＢにおいて迷惑メールと判定した電子メールに関する情報をメールサーバＡに共有する。 The information about the e-mail judged to be junk mail by the mail server B is shared with the mail server A. - 特許庁

メールサーバＡでは、負荷を減らすために代表的な情報のみ抽出し、メールサーバＢから共有された情報のみ記憶し、類似メール判定を行う。これによりメールサーバＣでも迷惑メールを判定できる。 The mail server A extracts only typical information in order to reduce the load, stores only the information shared from the mail server B, and performs similar mail determination. Thus, even the mail server C can determine junk mail.

共有する情報としては、送信元に関する情報や本文に関する情報、添付ファイルやＵＲＬに関する情報などを共有対象の情報ごとに、共有する先の範囲を設定することができる。 As the information to be shared, the range of destinations to be shared can be set for each information to be shared, such as information on the sender, information on the text, information on the attached file and URL.

図６のように共有情報設定ファイル６００によって設定を調整してもよい。図６に示す共有情報設定ファイル６００は、情報共有の設定を行うための設定ファイルである。 The setting may be adjusted by a shared information setting file 600 as shown in FIG. A shared information setting file 600 shown in FIG. 6 is a setting file for setting information sharing.

共有情報設定ファイル６００は、共有の設定６０２、送信元に関する設定６０４、本文に関する設定６０６、添付ファイルに関する設定６０８、及びＵＲＬに関する設定６１０等を含んで構成されている。 The shared information setting file 600 includes sharing settings 602, sender settings 604, text settings 606, attachment file settings 608, URL settings 610, and the like.

全体の共有の設定６０２は、全般的な共有設定を記憶するものであり、送信元に関する設定６０４は、送信元の共有に関する設定を記憶するものであり、本文に関する設定６０６は、本文の共有に関する設定を記憶するものであり、添付ファイルに関する設定６０８は、添付ファイルの共有に関する設定を記憶するものであり、ＵＲＬに関する設定６１０は、ＵＲＬの共有に関する設定を記憶するものである。 The overall sharing settings 602 store general sharing settings, the sender settings 604 store settings relating to sharing of the sender, and the body settings 606 store body sharing settings. Settings related to attachments 608 store settings related to sharing of attached files, and settings related to URL 610 store settings related to sharing of URLs.

例えば、開発元に共有する場合は、本文に関する設定６０６により、本文のハッシュ値、及び送信元に関する設定６０４により、送信元のメールアドレスのハッシュ値の一部（例えば、先頭２文字）及びメールアドレスのドメイン情報のみ共有する。 For example, when sharing with the developer, the setting 606 for the text is used to set the hash value of the text, and the setting 604 for the sender is used to set a part of the hash value of the email address of the sender (for example, the first two characters) and the email address. share domain information only.

これにより開発元へ共有することの懸念が少なくできる効果がある。（ここで、メールアドレスのハッシュ値は、レインボーテーブル（平文とハッシュ値のセットのテーブル）などにより元に戻せる可能性があるが、ハッシュ値の一部であれば衝突が発生するためもとの値を推測することがより困難になる。 This has the effect of reducing concerns about sharing with the developer. (Here, the hash value of the email address may be restored using a rainbow table (a table of sets of plain text and hash value), etc., but if it is part of the hash value, a collision will occur, so the original It becomes more difficult to guess the value.

また、一定以上の送信元が異なる類似メールの件数がカウントできればいいため衝突が発生しても影響は少ない）。 Also, since it is enough to count the number of similar emails with different senders above a certain level, even if a collision occurs, the impact is small.)

または、拠点間の別のメールサーバと共有する場合は、本文に関する設定６０６により、本文に関するハッシュ値及びファジーハッシュ値、添付ファイルに関する設定６０８により、添付ファイルに関するハッシュ値及びファジーハッシュ値、ＵＲＬに関する設定６１０により、ＦＱＤＮ、送信元に関する設定６０４により、送信元のメールアドレス及びメールアドレスのハッシュ値全体の情報を共有するなどとしてもよい。 Alternatively, when sharing with another mail server between bases, the setting 606 for the text is used to set the hash value and fuzzy hash value for the text, the setting 608 for the attached file is used to set the hash value and fuzzy hash value for the attached file, and the setting for the URL. By 610, the FQDN, by setting 604 regarding the sender, the information of the entire hash value of the email address and the email address of the sender may be shared.

また、各クライアント端末１０２に本発明を実装している場合などでは、類似メールが一定以上溜まらない可能性があるため、経路上のメールサーバ等に共有することで、サンプル数が増え、判定精度が向上する効果が期待できる。 In addition, when the present invention is implemented in each client terminal 102, there is a possibility that similar mails will not accumulate more than a certain amount. can be expected to improve

また、他企業、他組織または開発元と共有することで、セキュリティ企業がＷｅｂ等で注意喚起をする以上の効果が期待できる。 In addition, by sharing with other companies, other organizations, or developers, it is possible to expect a greater effect than security companies alerting users on the Web or the like.

尚、ステップＳ４０６の処理を行う前に、ステップＳ４０６以降の処理を一時的に待機する待機時間（例えば、短めに設定した場合は、１０分、長めに設定した場合は、1時間程度）を設けることも可能である。 Before performing the processing of step S406, a waiting time (for example, 10 minutes if set short, about 1 hour if set long) is provided to temporarily wait for the processing after step S406. is also possible.

類似する迷惑メールは、最初の一件目は類似メールが存在しないため必ず送信される。待機時間を設けることで、一定以上の件数が溜まるまでの時間的猶予ができ、最初の一件目を含めて検知することができる。また、類似する迷惑メールは短期間に送られてくるため、短い遅延時間でも効果が期待できる。 Similar unsolicited e-mails are always sent because there is no similar e-mail for the first time. By setting a waiting time, it is possible to allow time until the number of cases reaches a certain level, and detection including the first case is possible. Also, since similar spam mails are sent in a short period of time, even a short delay time can be expected to be effective.

また、この待機時間は、送信元情報ごとに設定を変更してもよい。このような迷惑メールは、初めて受信する送信元から送信されることが多い。例えば、初めて受信する送信元に関してのみ待機時間を設けるなどがある。これにより、全ての電子メールを遅延させなくても、遅延時間を設ける効果が期待できる。 Also, this waiting time may be set differently for each transmission source information. Such unsolicited junk mail is often sent from a sender who receives it for the first time. For example, waiting time may be provided only for a transmission source that receives data for the first time. As a result, the effect of providing a delay time can be expected without delaying all e-mails.

また、ステップＳ４１２において各類似メールの判定を行った結果に基づいて、迷惑メールの判定を行っているが、判定精度を上げるために、受信実績や送信実績を類似メールの判定に利用してもよい。 Spam mail is determined based on the result of determination of similar mails in step S412. good.

類似する迷惑メールの多くは、初めて受信する送信元であることが多い。また、同じ送信元が利用される場合でも、一方的に受信しているのみで送信実績がない場合もある。 Many of the similar spam emails are from senders who receive them for the first time. Moreover, even when the same transmission source is used, there is a case where the reception is one-sided and there is no transmission record.

利用法としては、例えば、受信実績がない、つまり、初めて受信する送信元（メール情報テーブル７００に存在しない送信元）のみ各類似メールの判定を行う。 As a usage, for example, each similar mail is determined only from a sender who has no record of reception, that is, a sender who receives it for the first time (a sender who does not exist in the mail information table 700).

あるいは、受信実績はあるものの送信実績のない送信元のみ各類似メールの判定を行う。この場合、送信実績として、送信した電子メールの送信元に関する情報を含んだ電子メールに関する情報を記憶しておく必要がある。 Alternatively, each similar mail is judged only from a sender who has a record of reception but has no record of transmission. In this case, it is necessary to store information on e-mails including information on the senders of sent e-mails as the transmission record.

また、受信実績や送信実績を本文に関する類似メール判定、添付ファイルに関する類似メール判定、ＵＲＬに関する類似メール判定のそれぞれで利用してもよい。 Also, the reception record and the transmission record may be used for each of the similar mail determination regarding the text, the similar mail determination regarding the attached file, and the similar mail determination regarding the URL.

例えば、受信実績がある電子メールに対して、本文は類似メール判定を行うが、添付ファイルやＵＲＬの類似メール判定を行わず、受信実績がない電子メールに対しては、本文、添付ファイル、ＵＲＬの類似メール判定を行うことなどとすることも可能である。 For example, for e-mails that have been received, similar e-mails are judged for the text, but similar e-mails for attached files and URLs are not judged. It is also possible to perform similar mail judgment of

これにより、受信実績がある電子メールの添付ファイルやＵＲＬは、危険性が少ないが、なりすましメールのように、受信実績がないにも関わらず、本文をコピーして、危険な添付ファイルやＵＲＬがある電子メールを検知することも可能である。 As a result, attachment files and URLs of e-mails that have been received are less dangerous, but like spoofed e-mails, even if there is no reception record, the text can be copied and dangerous attachments and URLs can be sent. It is also possible to detect certain e-mails.

設定には、類似する電子メールが何件以上あった場合に迷惑メールとして判定するかの閾値や送信、受信実績を利用するかなどがあり、状況に応じて設定を追加する。 The settings include a threshold value for judging the number of similar e-mails as junk e-mails, whether to use transmission and reception records, and other settings depending on the situation.

これによって、スパムボットに代表されるような同一のテンプレートを用いて、送信元、送信先を変えながら送信される迷惑メール（類似メール）を自動で検出し、受信を抑制することができる。 This makes it possible to automatically detect unsolicited e-mails (similar e-mails) that are sent to different senders and destinations using the same template, as represented by spambots, and suppress their reception.

また、事前にデータを用意して機械学習する必要がなく、頻繁に更新される迷惑メールに対して、迅速に対応でき、頻繁に更新される迷惑メールに対して、都度ユーザが設定を追加する必要がなく、煩わしさを軽減できる。 In addition, there is no need to prepare data in advance and perform machine learning, and it is possible to respond quickly to spam emails that are updated frequently, and users can add settings each time they are updated spam emails. No need, less hassle.

また、受信側のみで対策を行っているため、送信側のメールサーバの仕様等に影響されずに対策を行うことが可能である In addition, since countermeasures are taken only on the receiving side, it is possible to take countermeasures without being affected by the specifications of the mail server on the sending side.

［変形例］
図８は、既存の技術と組合せることによって、迷惑メールと判定した電子メールに対して、セキュリティ精度を高めるシステムの構成図の一例である。 [Modification]
FIG. 8 is an example of a configuration diagram of a system that enhances security accuracy for e-mails determined to be junk e-mails by combining with existing technology.

この既存の技術との組合せの構成図は、メール受信処理８００、既存技術による処理８０２、前述した迷惑メール判定処理８０４、及びメール転送処理８０６からなる。 The block diagram of this combination with existing technology consists of mail reception processing 800, processing 802 by existing technology, junk mail judgment processing 804 and mail forwarding processing 806 described above.

メール受信処理８００は、外部メールサーバ１０５から電子メールを受け取る処理を行い、既存の技術による処理８０２は、スパムメールやマルウェア付き電子メールを、既存の技術であるスパム対策ソフトやアンチウイルスソフトにより検出する処理である。 Mail reception processing 800 performs processing for receiving e-mails from the external mail server 105, and processing 802 using existing technology detects spam e-mails and e-mails containing malware using anti-spam software and anti-virus software, which are existing technologies. It is a process to

迷惑メール判定処理８０４は、実施形態で説明した迷惑メールを判定する処理であり、メール転送処理８０６は、電子メールに対しての判定処理等が完了した後に、次のメールサーバ１０１やクライアント端末１０２等に当該電子メールを転送する処理である。 Spam mail judgment processing 804 is processing for judging spam mail described in the embodiment. This is a process of forwarding the e-mail to, for example.

迷惑メール判定処理８０４は、既存の技術よりも先に処理を行ってもよいし、また、同一のメールサーバ１０１で処理を行わなくてもよい。 The unsolicited e-mail determination processing 804 may be processed prior to the existing technology, or may not be processed by the same mail server 101 .

迷惑メール判定処理８０４は既存の技術と競合しない処理であるため、組み合わせて利用することが可能であり、組合せて利用することで、迷惑メールの検知率が向上する。 Since the unsolicited e-mail determination process 804 is a process that does not compete with existing techniques, it can be used in combination, and by using it in combination, the unsolicited e-mail detection rate is improved.

次に図９のフローチャートを用いて、本発明の実施形態における迷惑メールを再利用する処理について説明する。尚、本処理は、メールサーバ１０１のＣＰＵ２０１が所定の制御プログラムを読み出して実行される。 Next, processing for reusing unsolicited junk e-mails according to the embodiment of the present invention will be described with reference to the flowchart of FIG. This processing is executed by the CPU 201 of the mail server 101 reading out a predetermined control program.

本発明では、自動で新規の迷惑メールを検知することが可能であるため、検出した電子メールのデータは、既存のフィルタリングやスパム対策ソフト、アンチウイルスソフトなどに流用することができる。 In the present invention, it is possible to automatically detect new unsolicited e-mails, so the detected e-mail data can be used for existing filtering, anti-spam software, anti-virus software, and the like.

また、機械学習の学習データやセキュリティ訓練などで使用する教材としても流用することができる。これにより、本発明以外の技術の性能を向上させる効果がある。 It can also be used as learning data for machine learning and teaching materials used in security training. This has the effect of improving the performance of techniques other than the present invention.

ステップＳ９００では、再利用部３１２は、迷惑メールとして判定した電子メールの送信元情報に関するデータを再利用する。 In step S900, the reuse unit 312 reuses the data regarding the sender information of the e-mail determined as spam.

送信元情報は、他の迷惑メールの送信にも利用される可能性があるため、迷惑メールの送信元情報を、電子メールの送受信に係るフィルタリングやブラックリスト設定に登録するなどにより、迷惑メールの抑制が可能になる。 Since the sender information may also be used to send other spam emails, it is important to prevent spam emails by registering the sender information of spam emails in filtering or blacklist settings related to sending and receiving emails. suppression becomes possible.

送信元情報には、メールアドレス、メールサーバのＩＰアドレス、あるいは、配信経路のメールサーバの情報などが利用できる。 A mail address, an IP address of a mail server, information on a mail server on a delivery route, or the like can be used as the source information.

ステップＳ９０２では、再利用部３１２は、迷惑メールとして判定した電子メールの送信先情報に関するデータを再利用する。 In step S902, the reuse unit 312 reuses the data regarding the destination information of the e-mail determined as spam.

迷惑メールには、日常的には利用しない送信先情報の組合せが利用されることがある。例えば、ＴＯに製品Ａのサポート窓口と企業の採用の窓口、ＣＣにＷｅｂサイトに関する問い合わせ窓口と個人宛等のように関連性の少ないメールアドレスを複数指定している場合がある。 Unsolicited mail may use a combination of destination information that is not used on a daily basis. For example, there are cases in which a plurality of e-mail addresses with little relevance are specified, such as a support contact for product A and a company hiring contact for TO, and an inquiry contact for a website and an individual address for CC.

このような送信先情報の組合せは、送信先のリストとして使用され、再度同様の送信先情報の組合せに対して迷惑メールが送信される可能性がある。 Such a combination of destination information is used as a list of destinations, and there is a possibility that unsolicited mail may be sent to the same combination of destination information again.

そのため、この送信先情報の組合せを再利用することが可能である。例えば、この送信先情報の組合せに対して送信される電子メールに警告文を追加するなどの利用ができる。 Therefore, it is possible to reuse this combination of destination information. For example, it can be used to add a warning text to the e-mail sent for this combination of destination information.

また、送信実績及び受信実績を利用して、本発明により検出した送信先情報の組合せが、実績のない組合せであった場合に、その送信先情報の組合せを電子メールの送受信に係るフィルタリングやブラックリストに登録することなどもできる。 In addition, by using the transmission record and the reception record, if the combination of destination information detected by the present invention is a combination that has no track record, the combination of destination information is filtered or blacklisted for sending and receiving e-mails. You can also register on the list.

ステップＳ９０４では、再利用部３１２は、迷惑メールとして判定した電子メールの本文情報に関するデータを再利用する。 In step S904, the reuse unit 312 reuses the data regarding the text information of the e-mail determined as junk mail.

迷惑メールは、図４のステップＳ４１６においてデータの共有を行っているが、既存のフィルタリングやスパム対策ソフトにも迷惑メールとして判定した電子メールの本文情報をインプットすることにより、このような本文情報を含む新たな電子メールを特定することが可能となる。 Spam e-mail data is shared in step S416 of FIG. It becomes possible to specify new e-mails that contain.

ステップＳ９０６では、再利用部３１２は、迷惑メールとして判定した電子メールの添付ファイルの情報に関するデータを再利用する。 In step S906, the reuse unit 312 reuses the data related to the attached file information of the e-mail determined as spam.

迷惑メールは、図４のステップＳ４１６においてデータの共有を行っているが、既存のフィルタリング、スパム対策ソフト、または、アンチウイルスソフトなどにも迷惑メールとして判定した電子メールの添付ファイルの情報をインプットすることにより、このような添付ファイルの情報を含む新たな電子メールや通信を特定することが可能となる。 For spam mail, data is shared in step S416 of FIG. 4, but the information of the attached file of the email determined as spam is also input to existing filtering, anti-spam software, or anti-virus software. This makes it possible to identify new e-mails and communications that include such attached file information.

これにより、メール経由以外にＷｅｂ経由で当該ファイルが利用された場合でも検知することができる。 As a result, it is possible to detect even if the file is used via the Web instead of via e-mail.

ステップＳ９０８では、再利用部３１２は、迷惑メールとして判定した電子メールのＵＲＬに関するデータを再利用する。 In step S908, the reuse unit 312 reuses the data regarding the URL of the e-mail determined as junk mail.

本発明で検出した迷惑メールは、図４のステップＳ４１６においてデータの共有を行っているが、既存のフィルタリング、スパム対策ソフト、アンチウイルスソフト、ＵＴＭ（ＵｎｉｆｉｅｄＴｈｒｅａｔＭａｎａｇｅｍｅｎｔ）などにも迷惑メールとして判定したＵＲＬに関する情報をインプットすることにより、このようなＵＲＬに関する情報を含む新たな電子メールや通信を特定することが可能となる。 Spam emails detected by the present invention are shared as data in step S416 of FIG. By inputting information about URLs, it becomes possible to identify new e-mails and communications that contain information about such URLs.

ステップＳ９１０では、再利用部３１２は、迷惑メールとして判定した電子メールの本文、添付ファイル、ＵＲＬ等などからシグネチャを作成する。 In step S910, the reuse unit 312 creates a signature from the text, attached file, URL, etc. of the e-mail determined as spam.

シグネチャの例として、ハッシュ値または、データの一部を何か所か抽出し、パターンマッチングを行う。例えば、本文ならば２、５、７行から１０文字ずつ抽出し、抽出した文字列をシグネチャとする。そして、抽出した文字列全てを含む電子メールを迷惑メールとして判定する。 As an example of a signature, a hash value or a part of data is extracted from several places and pattern matching is performed. For example, in the case of the text, 10 characters are extracted from each of lines 2, 5, and 7, and the extracted character strings are used as signatures. Then, an e-mail containing all of the extracted character strings is determined as an unsolicited e-mail.

あるいは、ファイルならば、ｎバイト～ｎ＋１００バイト、ｍバイト～ｍ＋１００バイトを抽出しシグネチャとする。そして、抽出したバイト列を全て含むファイルが添付された電子メールを迷惑メールとして判定する。 Alternatively, in the case of a file, n bytes to n+100 bytes and m bytes to m+100 bytes are extracted as signatures. Then, an e-mail attached with a file containing all of the extracted byte strings is determined as an unsolicited e-mail.

この時、迷惑メールの類似度が高いものから重複部分を抽出するしてもよい。これにより、ランダム要素を除外することで、判定精度を向上させることが可能となる。 At this time, overlapping parts may be extracted from spam mails with a high degree of similarity. As a result, it is possible to improve the determination accuracy by excluding random elements.

また、受信時だけでなく、メール送信時にも利用できる。迷惑メールは、スパムボットなどにより送信されることが多いため、送信時にシグネチャで検出した場合は、クライアント端末１０２がスパムボットに感染し、踏み台になっている可能性がある。 In addition, it can be used not only when receiving mail, but also when sending mail. Spam emails are often sent by spambots and the like, so if the signature is detected at the time of sending, the client terminal 102 may be infected by the spambots and become a stepping stone.

スパムボットに感染したクライアント端末１０２は、同一のテンプレートを利用して迷惑メールを送信していることから、同じスパムボットに感染したクライアント端末１０２が複数ある場合、各クライアント端末１０２から同じ内容の電子メールが送信される。 Client terminals 102 infected with spambot use the same template to send unsolicited spam emails. Email is sent.

メールサーバ１０１では、この同じ内容の電子メールが様々なところから受信されることからこのような迷惑メールを検知することが可能となる。 The mail server 101 can detect such unsolicited junk e-mails because e-mails with the same contents are received from various places.

このため、送信時に検出することで、加害者になる可能性を減らすことができ、メールにより感染拡大するような場合は、被害の拡大も抑制できる。 For this reason, by detecting at the time of transmission, it is possible to reduce the possibility of becoming a perpetrator, and in the case where infection spreads due to e-mail, it is possible to suppress the spread of damage.

さらに、踏み台になっているクライアント端末１０２を特定または自動検知することも可能である。 Furthermore, it is also possible to specify or automatically detect the client terminal 102 that is a stepping stone.

自動検知の方法例として、検知した電子メールのＲｅｃｅｉｖｅｄヘッダーから最初に利用した拠点内のメールサーバに対して、検知した電子メールを送信したクライアント端末１０２のＩＰアドレスを問い合わせる。 As an example of the automatic detection method, the IP address of the client terminal 102 that sent the detected e-mail is inquired from the Received header of the detected e-mail to the mail server in the first used site.

最初に利用されたメールサーバでは、電子メールの送信ログからクライアント端末のＩＰアドレスを抽出して、問い合わせに返答する。 The mail server used first extracts the IP address of the client terminal from the email transmission log and responds to the inquiry.

これにより、自動で踏み台になっているクライアント端末１０２を自動検知することが可能となる。 This makes it possible to automatically detect the client terminal 102 that is automatically used as a stepping stone.

シグネチャまたは検出したデータは、機械学習の学習データやセキュリティ訓練などで使用する教材としても流用することができる。 Signatures or detected data can also be used as learning data for machine learning or as teaching materials for security training.

機械学習では、学習前に学習用のデータを作成する必要があるが、本発明では自動で迷惑メールを検出できるため、検出したデータをそのまま機械学習の学習データとしても利用することもできる。 In machine learning, it is necessary to create learning data before learning, but since the present invention can automatically detect spam, the detected data can be used as it is as learning data for machine learning.

ステップＳ９１２では、再利用部３１２は、迷惑メールを分類及び分析を行う。 In step S912, the reuse unit 312 classifies and analyzes the spam mail.

分類の方法例として、迷惑メールのｂｏｄｙ０、ｂｏｄｙ１・・・・ａｔｔａｃｈｍｅｎｔ０、ａｔｔａｃｈｍｅｎｔ１、・・・・というようにパートごとに分割し、パートごとにハッシュ値を算出し、各パートのハッシュ値が同一の迷惑メールを抽出し、Ｆｒｏｍアドレスのユニーク数をカウントし閾値(例えば、３件)以上あった迷惑メールを全て抽出し、ハッシュ値ごとにまとめて分類する。尚、さらに、Ｃｏｎｔｅｎｔ－Ｔｙｐｅごとに分類しても良い。 As an example of classification method, spam is divided into parts such as body 0, body 1, attachment 0, attachment 1, . . . Extract spam mails, count the number of unique From addresses, extract all spam mails exceeding a threshold value (for example, 3), and classify them collectively by hash value. Further, the classification may be made for each Content-Type.

類似性の高い迷惑メールをまとめることで、図１０及び図１１に示す画面の表示内容のように迷惑メールを分類して表示することができる。 By grouping together highly similar unsolicited emails, it is possible to classify and display the unsolicited emails like the display contents of the screens shown in FIGS. 10 and 11 .

また、各画面は、それぞれ設定により表示する内容を変更することができる。またこの画面構成は一例であり、他の画面構成でもよい。 In addition, the contents to be displayed on each screen can be changed by setting. Also, this screen configuration is an example, and other screen configurations may be used.

これにより、多く受信される迷惑メールや最近の傾向がわかり、注意喚起等を行いやすくできる。 As a result, it is possible to understand the spam mails that are frequently received and the recent trends, and to make it easier to call attention.

また、分類を行うことで研究等に利用するうえでの労力を削減することができる。 In addition, classification can reduce the labor required for use in research or the like.

さらに、分析にも利用できる。例えば、検出したメールの送信時間等を利用して、攻撃者の傾向や、迷惑メールの傾向を分析することができる。共通する項目などから攻撃者や攻撃者グループの追跡や特定にも利用できる。 Furthermore, it can be used for analysis. For example, it is possible to analyze the trends of attackers and the trends of spam emails by using the transmission times of detected emails. It can also be used to track and identify attackers and attacker groups based on common items.

また、定期的に送信される迷惑メールを事前予測することも可能である。例えば、「〇月の請求書」、「〇月の発注書」など、日時や季節、イベントごとに送信される迷惑メールは、事前予測し、送られてきた時点で警告文を挿入することなども可能である。 It is also possible to predict in advance spam mails that are periodically sent. For example, anticipate spam emails sent by date, season, event, such as "Invoice of XX month" or "Purchase order of XX month", and insert a warning message when it is sent. is also possible.

図１０には、迷惑メールと判定したメールを分類してリスト表示するリスト画面１０００の構成を示す構成図の一例である。 FIG. 10 is an example of a configuration diagram showing the configuration of a list screen 1000 for classifying and listing emails determined to be junk emails.

リスト画面１０００は、表示設定１００２、分類１００４、及びリスト表示部１００６等を含んで構成されている。 The list screen 1000 includes a display setting 1002, a category 1004, a list display section 1006, and the like.

表示設定１００２は、リスト画面１０００に表示する条件を設定するが、表示設定１００２は、類似項目１００８、類似度１０１０、期間１０１２等を含んで構成されており、他の設定を追加してもよい。 The display setting 1002 sets the conditions for displaying on the list screen 1000. The display setting 1002 includes a similar item 1008, a degree of similarity 1010, a period 1012, etc., and other settings may be added. .

また、類似項目１００８は、本文が類似するメール、添付ファイルが類似するメール、ＵＲＬが類似するメール等を少なくとも１以上選択することが可能である。 Further, for the similar item 1008, it is possible to select at least one or more of e-mails with similar texts, e-mails with similar attached files, e-mails with similar URLs, and the like.

類似項目１００８で選択した項目に該当する類似するメールのうち、類似度１０１０は、指定した類似度以上の類似度を示す類似するメール等を表示することが可能であり、期間１０１２は、指定した期間に受信した類似するメール等を表示することが可能である。 Of the similar emails corresponding to the item selected in the similar item 1008, the similarity 1010 can display similar emails showing a similarity greater than or equal to the specified similarity. It is possible to display similar e-mails received during the period.

分類１００４は、分類を行った単位を表示し、リスト表示部１００６は、表示設定１１０２に設定された項目に対して指示された条件を満たす類似するメール等を、分類１００４の分類に応じてリスト表示する。 A classification 1004 displays a unit of classification, and a list display section 1006 displays a list of similar e-mails etc. satisfying conditions specified for the items set in the display settings 1102 according to the classification of the classification 1004. indicate.

図１１には、迷惑メールと判定したメールを分類して当該メールの詳細情報を表示する詳細画面１１００の構成を示す構成図の一例である。 FIG. 11 is an example of a configuration diagram showing the configuration of a detailed screen 1100 for classifying emails determined to be junk mails and displaying detailed information about the emails.

詳細画面１１００は、表示設定１１０２、分類１１０４、及び詳細情報表示部１１０６等からなる。 The detailed screen 1100 includes a display setting 1102, a category 1104, a detailed information display section 1106, and the like.

表示設定１１０２は、詳細画面１１００に表示する条件を設定するが、表示設定１１０２は、類似項目１１０８、類似度１１１０、期間１１１２等を含んで構成されており、他の設定を追加してもよい。 The display settings 1102 set the conditions for displaying on the detailed screen 1100. The display settings 1102 include similar items 1108, degree of similarity 1110, period 1112, etc., and other settings may be added. .

また、類似項目１１０８は、本文が類似するメール、添付ファイルが類似するメール、ＵＲＬが類似するメール等を少なくとも１以上選択することが可能である。 For the similar item 1108, it is possible to select at least one e-mail with a similar text, an e-mail with a similar attached file, an e-mail with a similar URL, or the like.

類似項目１１０８で選択した項目に該当する類似するメールのうち、類似度１１１０は、指定した類似度以上の類似度を示す類似するメール等を表示することが可能であり、期間１１１２は、指定した期間に受信した類似するメール等を表示することが可能である。 Of the similar emails corresponding to the item selected in the similar item 1108, the similarity 1110 can display similar emails showing a similarity greater than or equal to the specified similarity. It is possible to display similar e-mails received during the period.

分類１１０４は、分類を行った単位を表示し、詳細情報表示部１１０６は、表示設定１１０２に設定された項目に対して指示された条件を満たす類似するメール等を分類１１０４の分類に応じて表示する。 A classification 1104 displays a unit of classification, and a detailed information display section 1106 displays similar emails etc. that satisfy conditions specified for the items set in the display settings 1102 according to the classification of the classification 1104 . do.

尚、詳細画面１１００は、詳細情報表示部１１０６は、分類ごとに表示単位を分けて表示を行っても良い。例えば、分類Ａのみを表示し、次へボタン（不図示）を押下することで、分類Ｂのみを表示するような態様をとることが可能である。 The detailed information display unit 1106 may display the detailed screen 1100 by dividing the display unit for each classification. For example, it is possible to display only category A and then press a next button (not shown) to display only category B.

また、詳細情報表示部１１０６は、１つのメールの詳細情報を表示することでも良い。例えば、分類Ａに属する１つのメールの詳細情報を表示し、次へボタン（不図示）を押下することで、分類Ａに属する他のメールが存在する場合は、当該メールの詳細情報を表示し、分類Ａに属する他のメールが存在しない場合は、分類Ｂに属するメールの詳細情報を表示することでも良い。 Further, the detailed information display section 1106 may display detailed information of one mail. For example, by displaying the detailed information of one email belonging to category A and pressing the next button (not shown), if there is another email belonging to category A, the detailed information of that email is displayed. , when there is no other mail belonging to the category A, the detailed information of the mail belonging to the category B may be displayed.

また、分類１００４を選択することによって、選択した分類に属するメールの詳細画面１１００に表示しても良いし、リスト表示部１００６に表示されたレコードを選択することによって、選択したレコードのメールの詳細情報を詳細画面１１００に表示しても良い。 Further, by selecting the category 1004, the detailed screen 1100 of the mail belonging to the selected category may be displayed, and by selecting the record displayed in the list display area 1006, the details of the mail of the selected record may be displayed. Information may be displayed on the details screen 1100 .

本発明の実施形態においては、迷惑メールを判定する処理をメールサーバ１０１において実行する構成として説明したが、クライアント端末１０２で実行するよう構成してもよい。 In the embodiment of the present invention, the mail server 101 is configured to execute the process of judging unsolicited junk mail, but the client terminal 102 may be configured to execute the process.

あるいは、クラウド環境における情報処理装置で迷惑メールを判定する処理を実行するように構成しても良く、この場合、メールサーバ１０１（あるいは、メールボックス）に対して外部プログラムとして実行してもよい。 Alternatively, an information processing apparatus in a cloud environment may be configured to execute the process of judging junk mail. In this case, the process may be executed as an external program to the mail server 101 (or mailbox).

以上、実施形態について示したが、本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although the embodiments have been described above, the present invention can be embodied as, for example, systems, devices, methods, programs, recording media, and the like. Specifically, it may be applied to a system composed of a plurality of devices, or may be applied to an apparatus composed of a single device.

また、本発明におけるプログラムは、図４、９に示すフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は図４、９の処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは図４、９の各装置の処理方法ごとのプログラムであってもよい。 Further, the program in the present invention is a program that allows a computer to execute the processing methods of the flow charts shown in FIGS. It is The program according to the present invention may be a program for each processing method of each device shown in FIGS.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, a recording medium recording a program for realizing the functions of the above-described embodiments is supplied to a system or device, and the computer (or CPU or MPU) of the system or device reads the program stored in the recording medium. Needless to say, the object of the present invention can also be achieved by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium implements the novel functions of the present invention, and the recording medium recording the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＤＶＤ－ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 Examples of recording media for supplying programs include flexible disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, DVD-ROMs, magnetic tapes, non-volatile memory cards, ROMs, EEPROMs, silicon A disk or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiments are realized, but also based on the instructions of the program, the OS (operating system) and the like running on the computer are actually executed. Needless to say, a case where part or all of the processing is performed and the functions of the above-described embodiments are realized by the processing are included.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written in the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is read according to the instruction of the program code. It goes without saying that a case where a CPU or the like provided in a function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適用できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Moreover, the present invention may be applied to a system composed of a plurality of devices or to an apparatus composed of a single device. Moreover, it goes without saying that the present invention can also be applied to a case in which a program is supplied to a system or apparatus. In this case, by loading a recording medium storing a program for achieving the present invention into the system or device, the system or device can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Furthermore, by downloading and reading out the program for achieving the present invention from a server, database, etc. on the network using a communication program, the system or device can enjoy the effects of the present invention. It should be noted that all configurations obtained by combining each of the above-described embodiments and modifications thereof are also included in the present invention.

１００情報処理システム
１０１メールサーバ
１０２クライアント端末
１０３ＬＡＮ
１０４広域ネットワーク
１０５外部メールサーバ 100 information processing system 101 mail server 102 client terminal 103 LAN
104 wide area network 105 external mail server

Claims

the computer,
a receiving means for receiving e-mail;
an identifying means for identifying the same or similar e-mails sent from different senders, among the e-mails received by the receiving means;
generating means for generating learning data for discriminating spam by using the information related to the identified e-mail as correct data ;
A program to function as

2. The generation means generates learning data for discriminating spam mail based on information relating to at least one of the text, attached file, and URL of the identified e-mail. program described in .

3. The program according to claim 1, wherein the data generated by said generating means is data used in machine learning.

4. The program according to any one of claims 1 to 3, wherein the data generated by said generating means is data for determining whether a received e -mail is junk mail.

a receiving means for receiving e-mail;
an identifying means for identifying the same or similar e-mails sent from different senders, among the e-mails received by the receiving means;
generating means for generating learning data for discriminating spam by using the information related to the identified e-mail as correct data ;
An information processing device comprising:

A control method for an information processing device,
a receiving step in which the receiving means receives an e-mail;
an identifying step in which identifying means identifies identical or similar e-mails sent from different senders from among the e-mails received in the receiving step;
a generation step in which the generation means generates learning data for discriminating spam by using the information related to the identified e-mail as correct data ;
A control method for an information processing device, comprising: