JP2011227850A

JP2011227850A - E-mail classification device, e-mail management server, e-mail classification method and e-mail classification program

Info

Publication number: JP2011227850A
Application number: JP2010099506A
Authority: JP
Inventors: Yukiko Sawatani; 雪子澤谷; Masaru Miyake; 優三宅
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2010-04-23
Filing date: 2010-04-23
Publication date: 2011-11-10

Abstract

PROBLEM TO BE SOLVED: To provide an e-mail classification device, an e-mail management server, an e-mail classification method and an e-mail classification program which can reduce processing load for removing spam mail and reduce operation load of a user.SOLUTION: An e-mail classification device comprises: a mail receiving part 11 which receives e-mail; a feature vector acquisition part 12 which acquires a feature vector which shows feature of the e-mail based on data which shows ratio of junk mail in the e-mails received in the past and classified based on items of header information of the received e-mail; a rule creation part 14 which creates a classification rule for classifying e-mail with classification information whether an e-mail is a normal mail or not and a corresponding feature vector as learning data; and a classification part 13 which refers to the feature vector acquired from the e-mail and classifies whether the e-mail is a normal mail or a junk mail based on the classification rule when receiving a new e-mail.

Description

本発明は、受信した電子メールが迷惑メールであるか正常メールであるかを判定する電子メール分類装置、電子メール分類方法及び電子メール分類プログラムに関する。 The present invention relates to an e-mail classification device, an e-mail classification method, and an e-mail classification program for determining whether a received electronic mail is a junk mail or a normal mail.

近年、ネットワークの発展により、気軽に電子メール（以下、単にメールという。）を送受信できるようになったことに伴い、受信者が必要としていない迷惑メール（スパムメール、ｓｐａｍｍａｉｌ）の数も増大している。ここで、「スパムメール」とは、受信者の意図を無視して事前の要請や同意なしに、無差別かつ大量発信されるメールを意味するものである。 In recent years, with the development of networks, it has become possible to easily send and receive e-mail (hereinafter simply referred to as mail), and the number of junk mail (spam mail) that the recipient does not need has also increased. ing. Here, “spam mail” means mail that is sent indiscriminately and in large quantities without ignoring the recipient's intention and without prior request or consent.

このようなスパムメールは、添付ファイル等によるウイルス感染や、不要なメールの増加による受信者の業務生産性及び効率の低下や、トラフィックの増加によるサーバ及びネットワークへの負荷増大や、詐欺サイトへの誘導等による個人情報や機密情報の漏洩等の点において、個人及び団体を問わずに脅威となり得るものである。 Such spam emails include virus infections due to attachments, etc., decreased productivity and efficiency of recipients due to an increase in unnecessary emails, increased load on servers and networks due to increased traffic, and fraudulent sites In terms of leakage of personal information and confidential information due to guidance, etc., it can be a threat regardless of individuals or organizations.

そこで、このようなスパムメールを排除するための様々な対策がとられている。例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）のメールアカウントに届くメールに対するフィルタがある。このフィルタは、送信者の情報（メールアドレス、ホスト情報等）による仕分けや、メール本文の構文解析を行う（例えば、非特許文献１及び非特許文献２参照）。 Therefore, various measures are taken to eliminate such spam mail. For example, there is a filter for mail that reaches a mail account of a PC (Personal Computer). This filter performs sorting based on sender information (email address, host information, etc.) and syntax analysis of the mail text (see, for example, Non-Patent Document 1 and Non-Patent Document 2).

ところで、このようなフィルタをＰＣに比べて非力な携帯電話機等の携帯端末に適用しようとした場合、処理負荷が大きいため、実用には適さない。また、携帯端末の場合には、特定のサイトから漏洩したメールアドレスを利用してスパムメールが送信される場合が多く、一個人に対するスパムメールの送信者は限られている。したがって、メールの送受信を管理するサーバ（例えば、携帯電話の通信サービスを提供している会社（キャリア）のサーバ）において、フィルタルールの設定を行う方式がとられている（例えば、非特許文献３参照）。さらに、通信サービスのキャリアでは、ユーザからの迷惑メールの報告を受け付けて、フィルタルールを更新する場合もある（例えば、非特許文献４参照）。 By the way, when such a filter is applied to a portable terminal such as a portable telephone that is less powerful than a PC, the processing load is large, and thus it is not suitable for practical use. In the case of a mobile terminal, spam mail is often transmitted using an email address leaked from a specific site, and the sender of spam mail for one individual is limited. Therefore, a method for setting filter rules is employed in a server that manages transmission and reception of mail (for example, a server of a company (carrier) that provides a mobile phone communication service) (for example, Non-Patent Document 3). reference). Furthermore, a carrier of a communication service may receive a junk mail report from a user and update a filter rule (see, for example, Non-Patent Document 4).

しかしながら、フィルタルールの設定では、ユーザが受信拒否したいメールアドレスのドメインを登録する必要があるため、操作が煩雑となる。また、送信元のメールアドレスを偽装されたり、多数のサーバを用いて送信されたりした場合には効果が少ない。さらに、「携帯電話以外からのメールを拒否する」や「ＵＲＬを含むメールを拒否する」等の条件を設定した場合には、この拒否ルールに該当する正常メールを受信できなくなってしまう。 However, in setting the filter rule, it is necessary to register the domain of the mail address that the user wants to refuse to receive, so the operation becomes complicated. In addition, when the sender's e-mail address is forged or sent using a large number of servers, the effect is small. Furthermore, when a condition such as “reject mail from other than a mobile phone” or “reject mail including a URL” is set, normal mail corresponding to this reject rule cannot be received.

また、迷惑メールの報告をキャリアへ報告する場合には、該当のキャリアの携帯電話機からの報告メールに限定されており、さらに、ユーザによる操作が必要であるため、利便性に課題があった。 Moreover, when reporting a junk mail report to a carrier, it is limited to a report mail from the mobile phone of the carrier concerned, and further, there is a problem in convenience because an operation by the user is required.

そこで、本発明者らは、スパムメールを排除するための処理負荷を軽減し、かつ、ユーザの操作負荷を軽減するための電子メールの分類手法を提案した（特願２００９−２４２２８７号明細書）。 Therefore, the present inventors have proposed an e-mail classification method for reducing the processing load for eliminating spam mail and reducing the operation load on the user (Japanese Patent Application No. 2009-242287). .

ＳｐａｍＡｓｓａｓｓｉｎ、［平成２１年１０月７日］、インターネット＜ｈｔｔｐ：／／ｗｗｗ．ｓｖｎ．ａｐａｃｈｅ．ｏｒｇ／ｒｅｐｏｓ／ａｓｆ／ｓｐａｍａｓｓａｓｓｉｎ／ｂｒａｎｃｈｅｓ／３．２／ＲＥＡＤＭＥ＞SpamAssassin, [October 7, 2009], Internet <http: // www. svn. apache. org / repos / asf / spamassin / branches / 3.2 / README> ＴｒａｎｓＷＡＲＥ、［平成２１年１０月７日］、インターネット＜ｈｔｔｐ：／／ｗｗｗ．ｔｒａｎｓｗａｒｅ．ｃｏ．ｊｐ／ｐｒｏｄｕｃｔ／ａｈ／ｆｉｌｔｅｒ．ｈｔｍｌ＞TransWare, [October 7, 2009], Internet <http: // www. transware. co. jp / product / ah / filter. html> 「受信・拒否設定」、［平成２１年１０月７日］、インターネット＜ｈｔｔｐ：／／ｗｗｗ．ｎｔｔｄｏｃｏｍｏ．ｃｏ．ｊｐ／ｉｎｆｏ／ｓｐａｍ＿ｍａｉｌ／ｍｅａｓｕｒｅ／ｄｏｍａｉｎ／＞“Reception / rejection setting”, [October 7, 2009], Internet <http: // www. nttdocomo. co. jp / info / spam_mail / measure / domain /> 「迷惑メールを受け取ってしまったら」、［平成２１年１０月７日］、インターネット＜ｈｔｔｐ：／／ｗｗｗ．ｎｔｔｄｏｃｏｍｏ．ｃｏ．ｊｐ／ｉｎｆｏ／ｓｐａｍ＿ｍａｉｌ／ｉｆ／ｉｎｄｅｘ０１．ｈｔｍｌ＞“If you have received junk mail”, [October 7, 2009], Internet <http: // www. nttdocomo. co. jp / info / spam_mail / if / index01. html>

しかしながら、効率的にスパムメールを排除するためには、さらに処理負荷を軽減し、かつ、ユーザの操作負荷を軽減することが望まれる。 However, in order to efficiently eliminate spam mails, it is desirable to further reduce the processing load and the user's operation load.

本発明は、スパムメールを排除するための処理負荷を軽減し、かつ、ユーザの操作負荷を軽減できる電子メール分類装置、電子メール管理サーバ、電子メール分類方法及び電子メール分類プログラムを提供することを目的とする。 The present invention provides an e-mail classification device, an e-mail management server, an e-mail classification method, and an e-mail classification program that can reduce a processing load for eliminating spam mail and reduce a user's operation load. Objective.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）電子メールを受信する受信部と、前記受信部により受信された前記電子メールのヘッダ情報に基づいて、当該電子メールの特徴を示す特徴ベクトルを取得する取得部と、前記電子メールが正常メールであるか迷惑メールであるかの分類情報を受け付けた場合に、当該分類情報及び対応する前記特徴ベクトルを学習データとして、電子メールが正常メールであるか迷惑メールであるかを分類するための分類ルールを作成する作成部と、前記受信部により新たに電子メールを受信した際に、当該電子メールから前記取得部により取得される特徴ベクトルを参照し、前記作成部により作成された前記分類ルールに基づいて、当該電子メールが正常メールであるか迷惑メールであるかを分類する分類部と、を備え、前記取得部は、前記ヘッダ情報に含まれる、送信者の名前、送信者のメールアドレス、返信先メールアドレス又は件名のいずれかの項目に基づく分類が同一である過去に受信した電子メールのうち迷惑メールの割合を示すデータ、及び前記いずれかの項目が同一の電子メールを過去に受信したか否かを示すデータの、少なくともいずれかに基づいて前記特徴ベクトルを取得する電子メール分類装置。 (1) A receiving unit that receives an e-mail, an acquisition unit that acquires a feature vector indicating characteristics of the e-mail based on header information of the e-mail received by the receiving unit, and the e-mail is normal In order to classify whether an e-mail is a normal mail or a junk mail using the classification information and the corresponding feature vector as learning data when classification information indicating whether the mail is spam or spam is received A creation unit that creates a classification rule, and the classification rule created by the creation unit with reference to a feature vector acquired by the acquisition unit from the email when a new email is received by the receiving unit A classifying unit that classifies whether the electronic mail is a normal mail or a junk mail based on the header information. Data indicating the ratio of junk e-mail out of previously received e-mails with the same classification based on any of the items of sender name, sender e-mail address, reply e-mail address or subject included in An e-mail classification device that acquires the feature vector based on at least one of data indicating whether or not any of the items has received the same e-mail in the past.

このような構成によれば、電子メール分類装置は、メールのヘッダ情報の項目に基づく統計情報により特徴量を定義し、この特徴量を要素とする特徴ベクトルを取得し、分類情報及び特徴ベクトルを学習データとして分類ルールを作成する。これにより、電子メール分類装置は、メールから取得された各特徴ベクトルに応じて、正常メールであるかスパムメールであるかを分類することができる。 According to such a configuration, the e-mail classification device defines a feature quantity based on statistical information based on an item of mail header information, acquires a feature vector having this feature quantity as an element, and obtains the classification information and the feature vector. Create classification rules as learning data. As a result, the electronic mail classification device can classify whether the mail is a normal mail or a spam mail according to each feature vector acquired from the mail.

したがって、電子メール分類装置は、スパムメールに特有のヘッダ情報の法則性を特徴ベクトルとして表し、メール本文を解析することなくスパムメールを判定するので、メール本文を解析することに比べて処理負荷を低減することができる。さらに、電子メール分類装置は、自動的にメールを分類するので、ユーザの操作負荷を軽減することができる。 Therefore, the e-mail classification device represents the rule of header information peculiar to spam mail as a feature vector, and determines spam mail without analyzing the mail body. Can be reduced. Furthermore, since the e-mail classification device automatically classifies mail, the operation load on the user can be reduced.

（２）前記取得部は、前記特徴ベクトルとして、（ａ）前記送信者のメールアドレスのトップレベルドメインが同一である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｂ）前記返信先メールアドレスのトップレベルドメインが同一である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｃ）前記送信者のメールアドレスのアカウント部のうち少なくとも一部が同一である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｄ）前記返信先メールアドレスのアカウント部のうち少なくとも一部が同一である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｅ）前記送信者のメールアドレス内のアカウント部の文字列長が同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｆ）前記送信者のメールアドレス内のホスト部の文字列長が同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｇ）前記送信者のメールアドレス内のホスト部におけるドメイン階層の深さが同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｈ）前記返信先メールアドレス内のアカウント部の文字列長が同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｉ）前記返信先メールアドレス内のホスト部の文字列長が同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｊ）前記返信先メールアドレス内のホスト部におけるドメイン階層の深さが同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｋ）前記送信者の名前の文字列長が同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｌ）前記件名の文字列長が同一分類である過去に受信した電子メールのうち、迷惑メールの割合を示すデータ、（ｍ）前記送信者の名前が同一である電子メールを過去に受信したか否かを示すデータ、（ｎ）前記送信者のメールアドレスが同一である電子メールを過去に受信したか否かを示すデータ、（ｏ）前記返信先メールアドレスが同一である電子メールを過去に受信したか否かを示すデータ、（ｐ）前記件名が同一である電子メールを過去に受信したか否かを示すデータ、のうち少なくともいずれかを要素とする前記特徴ベクトルを取得する（１）に記載の電子メール分類装置。 (2) The acquisition unit, as the feature vector, (a) data indicating a ratio of junk mail among previously received e-mails having the same top-level domain of the sender's mail address, (b) (C) data indicating the ratio of junk e-mail among previously received e-mails having the same top-level domain of the reply e-mail address, and (c) at least part of the account part of the sender's e-mail address is the same Data indicating the ratio of junk e-mail among e-mails received in the past, (d) Of e-mails received in the past where at least a part of the account part of the reply-to e-mail address is the same, (E) data received in the past in which the character string length of the account part in the sender's email address is of the same classification Data indicating the ratio of junk mail in mail, (f) Data indicating the ratio of junk mail among e-mails received in the past in which the character string length of the host part in the sender's mail address is of the same classification (G) data indicating the ratio of spam among previously received e-mails whose domain hierarchy depth in the host part in the sender's e-mail address is of the same classification, (h) the reply e-mail address (1) Data indicating the ratio of junk mail among previously received e-mails in which the character string length of the account part is of the same classification, (i) the character string length of the host part in the reply mail address is of the same classification Data indicating the ratio of junk e-mail among e-mails received in the past, (j) the same domain layer depth in the host part in the reply mail address (K) data indicating the ratio of junk e-mail among the previously received e-mails, and (k) the ratio of junk e-mails among e-mails received in the past whose character string length of the sender's name is of the same classification. (L) data indicating the ratio of junk e-mail out of e-mails received in the past whose character string length of the subject is the same classification, (m) e-mails having the same sender name in the past (N) data indicating whether or not an e-mail having the same sender's mail address has been received in the past, and (o) an electronic having the same return mail address. The feature vector including at least one of data indicating whether a mail has been received in the past and (p) data indicating whether an email having the same subject has been received in the past is taken. The electronic mail classification device according to (1) to be obtained.

このような構成によれば、電子メール分類装置は、送信者の名前、送信者のメールアドレス、返信先メールアドレス及び件名に基づく統計情報である１６種類のデータからなる特徴ベクトルを取得する。すなわち、電子メール分類装置は、メール本文は解析せず、過去に受信したメールの統計情報を学習することにより分類ルールを生成することで、携帯端末におけるスパムメールの特性を抽出することができる。これにより、電子メール分類装置は、処理負荷を低減しつつ、高精度で自動的にスパムメールを判定することができる。 According to such a configuration, the electronic mail classification device acquires a feature vector including 16 types of data that is statistical information based on the sender's name, the sender's mail address, the reply destination mail address, and the subject. In other words, the e-mail classification device can extract the characteristics of spam mail in the mobile terminal by generating classification rules by learning statistical information of mail received in the past without analyzing the mail text. Thereby, the electronic mail classification device can automatically determine spam mail with high accuracy while reducing the processing load.

（３）前記作成部は、前記特徴ベクトルのそれぞれに対応して、当該特徴ベクトルを取得した電子メールが迷惑メールである可能性を示す変数値を決定し、かつ、前記分類ルールとして、当該変数値を正常メールに対応するものと迷惑メールに対応するものとに分類する閾値を設定する（１）又は（２）に記載の電子メール分類装置。 (3) The creation unit determines a variable value indicating the possibility that the e-mail from which the feature vector is acquired is a spam mail corresponding to each of the feature vectors, and the variable is used as the classification rule. The electronic mail classification apparatus according to (1) or (2), wherein a threshold value for classifying values into those corresponding to normal mail and those corresponding to spam mail is set.

このような構成によれば、電子メール分類装置は、特徴ベクトルのそれぞれに対して、スパムメールである可能性を示す変数値を決定し、この変数値を分類するための閾値を設定する。したがって、電子メール分類装置は、受信したメールの特徴ベクトルを取得することにより変数値を求め、閾値と比較することにより正常メール及びスパムメールを容易に分類することができる。 According to such a configuration, the electronic mail classification device determines a variable value indicating the possibility of being spam mail for each feature vector, and sets a threshold value for classifying the variable value. Therefore, the electronic mail classification device can easily classify normal mail and spam mail by obtaining a variable value by acquiring a feature vector of the received mail and comparing it with a threshold value.

（４）前記作成部は、前記変数値を正常メールに対応するものに分類する第１の閾値と、前記変数値を迷惑メールに対応するものに分類する第２の閾値とを設定し、
前記分類部は、新たに受信した電子メールを、正常メール、迷惑メール、及びその他の保留メールに分類する（３）に記載の電子メール分類装置。 (4) The creation unit sets a first threshold value for classifying the variable value into one corresponding to normal mail, and a second threshold value for classifying the variable value into one corresponding to spam mail,
The said classification | category part is an electronic mail classification apparatus as described in (3) which classifies the newly received electronic mail into a normal mail, a junk mail, and another hold mail.

このような構成によれば、電子メール分類装置は、正常メールを判定するための第１の閾値と、スパムメールを判定するための第２の閾値とを個別に設定する。したがって、電子メール分類装置は、正常メール又はスパムメールである可能性が高いメールをそれぞれ判定でき、その他を保留メールに分類することにより、誤った分類を抑制できる。 According to such a configuration, the electronic mail classification device individually sets the first threshold for determining normal mail and the second threshold for determining spam mail. Therefore, the electronic mail classification device can determine each of the mails that are likely to be normal mails or spam mails, and can classify the others as reserved mails to suppress erroneous classification.

（５）前記作成部は、前記分類部により分類された電子メールについて、当該分類の結果を変更する入力を受け付けた場合に、当該電子メールに対応する前記変数値、又は前記閾値を調整する（３）又は（４）に記載の電子メール分類装置。 (5) For the email classified by the classification unit, the creation unit adjusts the variable value or the threshold corresponding to the email when receiving an input to change the classification result ( The electronic mail classification device according to 3) or (4).

このような構成によれば、電子メール分類装置は、自動的な分類結果がユーザにより変更された場合、特徴ベクトル毎の変数値又は閾値を調整し、分類ルールの学習結果を調整することができる。したがって、電子メール分類装置は、変更入力により再学習し、分類精度を向上させることができる。 According to such a configuration, when the automatic classification result is changed by the user, the e-mail classification device can adjust the variable value or threshold value for each feature vector and adjust the learning result of the classification rule. . Therefore, the e-mail classification device can re-learn by change input and improve the classification accuracy.

（６）前記作成部は、前記変数値に応じた重み付けが付加された前記分類情報を、前記学習データとして受け付ける（３）から（５）のいずれかに記載の電子メール分類装置。 (6) The e-mail classification apparatus according to any one of (3) to (5), wherein the creation unit accepts the classification information to which weighting according to the variable value is added as the learning data.

このような構成によれば、電子メール分類装置は、スパムメールである可能性を示す変数値に応じた重み付けを学習データにできるので、確実性の高い分類情報が優先されることにより、分類精度の向上が期待できる。 According to such a configuration, the e-mail classification device can weight the learning data according to the variable value indicating the possibility of being spam mail. Improvement can be expected.

（７）前記作成部は、所定の契機により前記分類ルールを再作成する（１）から（６）のいずれかに記載の電子メール分類装置。 (7) The e-mail classification device according to any one of (1) to (6), wherein the creation unit re-creates the classification rule at a predetermined opportunity.

このような構成によれば、電子メール分類装置は、例えば一定周期や、処理負荷の低下時等、所定の契機により分類ルールを再作成する。したがって、電子メール分類装置は、新たなメールを学習データとして、分類ルールを更新することができる。 According to such a configuration, the e-mail classification device re-creates the classification rule at a predetermined timing, for example, at a certain period or when the processing load is reduced. Therefore, the electronic mail classification device can update the classification rule using new mail as learning data.

（８）前記作成部は、前記受信部により現在までの所定期間に受信された電子メールに基づいて前記分類ルールを作成し、当該所定期間より前に受信された電子メールを参照しない（１）から（７）のいずれかに記載の電子メール分類装置。 (8) The creation unit creates the classification rule based on an email received by the receiving unit during a predetermined period until now, and does not refer to an email received before the predetermined period (1) To the electronic mail classification device according to any one of (7).

このような構成によれば、電子メール分類装置は、現在までの所定期間に受信されたメールに基づいて学習するので、これより前に受信された古いメールを対象外とし、新しい情報により分類ルールを作成することができる。したがって、電子メール分類装置は、最近のスパムメールの特徴を反映して精度の高い分類ルールを作成することができる。 According to such a configuration, the e-mail classification device learns based on the mails received during a predetermined period until now, so that old mails received before this time are excluded, and classification rules are determined based on new information. Can be created. Therefore, the electronic mail classification device can create a highly accurate classification rule reflecting the characteristics of recent spam mail.

（９）前記分類部により迷惑メールに分類された電子メールの情報を、当該電子メールの受信を管理するサーバへ通知する通知部をさらに備える（１）から（８）のいずれかに記載の電子メール分類装置。 (9) The electronic device according to any one of (1) to (8), further including a notification unit that notifies information on an electronic mail classified as spam by the classification unit to a server that manages reception of the electronic mail. Mail classification device.

このような構成によれば、電子メール分類装置は、メールの受信を管理するサーバ（例えば、携帯電話の通信サービスを提供しているキャリアのサーバ）へ、スパムメールに分類したメールの情報を通知する。したがって、電子メール分類装置は、スパムメールの情報をサーバへ自動的に報告し、サーバにおいてフィルタルールを更新させることができる。 According to such a configuration, the e-mail classification device notifies the mail information classified as spam mail to a server that manages the reception of the mail (for example, a server of a carrier that provides a mobile phone communication service). To do. Therefore, the e-mail classification device can automatically report the spam mail information to the server and update the filter rule in the server.

（１０）端末と接続され、当該端末に宛てた電子メールを管理する電子メール管理サーバであって、前記電子メールを受信する受信部と、前記受信部により受信された前記電子メールのヘッダ情報を前記端末へ転送し、かつ、当該端末からの要求に応じて、当該電子メールの本文を当該端末へ転送する転送部と、前記受信部により受信された前記電子メールのヘッダ情報に基づいて、当該電子メールの特徴を示す特徴ベクトルを取得する取得部と、前記電子メールが正常メールであるか迷惑メールであるかの分類情報を受け付けた場合に、当該分類情報及び対応する前記特徴ベクトルを学習データとして、電子メールが正常メールであるか迷惑メールであるかを分類するための分類ルールを作成する作成部と、前記作成部により作成された前記分類ルールを、所定のタイミングで前記端末へ送信する送信部と、を備え、前記取得部は、前記ヘッダ情報に含まれる、送信者の名前、送信者のメールアドレス、返信先メールアドレス又は件名のいずれかの項目に基づく分類が同一である過去に受信した電子メールのうち迷惑メールの割合を示すデータ、及び前記いずれかの項目が同一の電子メールを過去に受信したか否かを示すデータの、少なくともいずれかに基づいて前記特徴ベクトルを取得する電子メール管理サーバ。 (10) An e-mail management server that is connected to a terminal and manages e-mail addressed to the terminal, the receiving unit receiving the e-mail, and header information of the e-mail received by the receiving unit Transfer to the terminal, and in response to a request from the terminal, based on header information of the email received by the transfer unit that forwards the body of the email to the terminal and the reception unit, When receiving an acquisition unit that acquires a feature vector indicating a feature of an email and classification information indicating whether the email is a normal email or a junk email, the classification information and the corresponding feature vector are learned data A creating unit for creating a classification rule for classifying whether an email is normal mail or spam mail, and a pre-created by the creating unit A transmission unit that transmits a classification rule to the terminal at a predetermined timing, and the acquisition unit includes a sender name, a sender email address, a reply destination email address, or a subject included in the header information. Data indicating the ratio of junk mail among previously received e-mails having the same classification based on any item, and data indicating whether any of the items has received the same e-mail in the past An e-mail management server that acquires the feature vector based on at least one of them.

このような構成によれば、電子メール管理サーバは、メールのヘッダ情報の項目に基づく統計情報により特徴量を定義し、この特徴量を要素とする特徴ベクトルを取得し、分類情報及び特徴ベクトルを学習データとして分類ルールを作成する。これにより、電子メールの宛先である端末は、メールから取得された各特徴ベクトルに応じて、正常メールであるかスパムメールであるかを分類することができる。 According to such a configuration, the e-mail management server defines a feature quantity based on statistical information based on an item of mail header information, acquires a feature vector having this feature quantity as an element, and obtains classification information and a feature vector. Create classification rules as learning data. As a result, the terminal that is the destination of the e-mail can classify whether it is a normal mail or a spam mail according to each feature vector acquired from the mail.

したがって、電子メール管理サーバは、スパムメールに特有のヘッダ情報の法則性を特徴ベクトルとして表し、端末は、メール本文を解析することなくスパムメールを判定するので、メール本文を解析することに比べて処理負荷を低減することができる。さらに、端末は、自動的にメールを分類するので、ユーザの操作負荷を軽減することができる。 Therefore, the e-mail management server represents the rule of header information peculiar to spam mail as a feature vector, and the terminal determines spam mail without analyzing the mail text. Processing load can be reduced. Furthermore, since the terminal automatically classifies the mail, the operation load on the user can be reduced.

（１１）前記作成部は、前記電子メールが受信されてから所定期間において、当該電子メールの本文が前記端末へ転送された場合に、当該電子メールが正常メールであるとする前記分類情報を受け付け、当該電子メールの本文が前記端末へ転送されなかった場合に、当該電子メールが迷惑メールであるとする前記分類情報を受け付ける（１０）に記載の電子メール管理サーバ。 (11) The creation unit accepts the classification information that the e-mail is a normal mail when the text of the e-mail is transferred to the terminal in a predetermined period after the e-mail is received. The electronic mail management server according to (10), wherein when the body of the electronic mail is not transferred to the terminal, the classification information indicating that the electronic mail is junk mail is received.

このような構成によれば、電子メール管理サーバは、端末がメール本文をダウンロードしたか否かを分類情報として、分類ルールを学習することができる。これにより、端末のユーザによる判断を教師データとして分類ルールの精度を向上することができる。 According to such a configuration, the e-mail management server can learn the classification rule using the classification information indicating whether the terminal has downloaded the mail text. Thereby, the accuracy of the classification rule can be improved by using the judgment by the user of the terminal as teacher data.

（１２）電子メールを受信する受信ステップと、前記受信ステップにおいて受信された前記電子メールのヘッダ情報に基づいて、当該電子メールの特徴を示す特徴ベクトルを取得する取得ステップと、前記電子メールが正常メールであるか迷惑メールであるかの分類情報を受け付けた場合に、当該分類情報及び対応する前記特徴ベクトルを学習データとして、電子メールが正常メールであるか迷惑メールであるかを分類するための分類ルールを作成する作成ステップと、前記受信ステップにおいて新たに電子メールを受信した際に、当該電子メールから前記取得ステップにおいて取得される特徴ベクトルを参照し、前記作成ステップにおいて作成された前記分類ルールに基づいて、当該電子メールが正常メールであるか迷惑メールであるかを分類する分類ステップと、をコンピュータが実行し、前記取得ステップにおいて、前記ヘッダ情報に含まれる、送信者の名前、送信者のメールアドレス、返信先メールアドレス又は件名のいずれかの項目に基づく分類が同一である過去に受信した電子メールのうち迷惑メールの割合を示すデータ、及び前記いずれかの項目が同一の電子メールを過去に受信したか否かを示すデータの、少なくともいずれかに基づいて前記特徴ベクトルを取得する電子メール分類方法。 (12) A receiving step of receiving an e-mail, an acquisition step of acquiring a feature vector indicating the feature of the e-mail based on header information of the e-mail received in the receiving step, and the e-mail is normal In order to classify whether an e-mail is a normal mail or a junk mail using the classification information and the corresponding feature vector as learning data when classification information indicating whether the mail is spam or spam is received A creation step for creating a classification rule, and a reference to a feature vector acquired in the acquisition step from the email when a new email is received in the reception step, and the classification rule created in the creation step To determine whether the email is normal or spam A similar classification step is executed by the computer, and in the acquisition step, the classification based on any item of the sender name, the sender email address, the reply destination email address, or the subject included in the header information is the same The feature based on at least one of the data indicating the ratio of junk mail among the e-mails received in the past and the data indicating whether or not any of the items has received the same e-mail in the past Email classification method to get vector.

このような構成によれば、電子メール分類方法をコンピュータが実行することにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected when the computer executes the e-mail classification method.

（１３）電子メールを受信する受信ステップと、前記受信ステップにおいて受信された前記電子メールのヘッダ情報に基づいて、当該電子メールの特徴を示す特徴ベクトルを取得する取得ステップと、前記電子メールが正常メールであるか迷惑メールであるかの分類情報を受け付けた場合に、当該分類情報及び対応する前記特徴ベクトルを学習データとして、電子メールが正常メールであるか迷惑メールであるかを分類するための分類ルールを作成する作成ステップと、前記受信ステップにおいて新たに電子メールを受信した際に、当該電子メールから前記取得ステップにおいて取得される特徴ベクトルを参照し、前記作成ステップにおいて作成された前記分類ルールに基づいて、当該電子メールが正常メールであるか迷惑メールであるかを分類する分類ステップと、をコンピュータに実行させ、前記取得ステップにおいて、前記ヘッダ情報に含まれる、送信者の名前、送信者のメールアドレス、返信先メールアドレス又は件名のいずれかの項目に基づく分類が同一である過去に受信した電子メールのうち迷惑メールの割合を示すデータ、及び前記いずれかの項目が同一の電子メールを過去に受信したか否かを示すデータの、少なくともいずれかに基づいて前記特徴ベクトルを取得させる電子メール分類プログラム。 (13) A reception step of receiving an email, an acquisition step of acquiring a feature vector indicating the feature of the email based on header information of the email received in the reception step, and the email is normal In order to classify whether an e-mail is a normal mail or a junk mail using the classification information and the corresponding feature vector as learning data when classification information indicating whether the mail is spam or spam is received A creation step for creating a classification rule, and a reference to a feature vector acquired in the acquisition step from the email when a new email is received in the reception step, and the classification rule created in the creation step To determine whether the email is normal or spam And the same classification step based on any of the items of sender name, sender email address, reply email address, or subject included in the header information in the acquisition step. The feature based on at least one of the data indicating the ratio of junk mail among the e-mails received in the past and the data indicating whether or not any of the items has received the same e-mail in the past E-mail classification program to get vectors.

このような構成によれば、コンピュータに電子メール分類プログラムを実行させることにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected by causing the computer to execute the e-mail classification program.

本発明によれば、スパムメールを排除するための処理負荷を軽減し、かつ、ユーザの操作負荷を軽減できる。 According to the present invention, it is possible to reduce a processing load for eliminating spam mail and reduce a user's operation load.

第１実施形態に係る携帯端末の機能構成を示す図である。It is a figure which shows the function structure of the portable terminal which concerns on 1st Embodiment. 第１実施形態に係る分類ルールの第１の例を示す図である。It is a figure which shows the 1st example of the classification rule which concerns on 1st Embodiment. 第１実施形態に係る分類ルールの第２の例を示す図である。It is a figure which shows the 2nd example of the classification rule which concerns on 1st Embodiment. 第１実施形態に係るメール受信に伴う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process accompanying the mail reception which concerns on 1st Embodiment. 第１実施形態に係る分類ルールを作成する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which produces the classification rule which concerns on 1st Embodiment. 第２実施形態に係る管理サーバの機能構成を示す図である。It is a figure which shows the function structure of the management server which concerns on 2nd Embodiment.

＜第１実施形態＞
以下、本発明の実施形態の一例である第１実施形態について説明する。本実施形態に係る携帯端末１（電子メール分類装置）は、メールの送受信を行う一連の処理において、ヘッダ情報を受信した状態、すなわち本文を受信する前の状態において、メールが迷惑メール（以下、スパムメールという。）であるか否かを判定する装置である。なお、携帯端末１は、例えば、携帯電話機やＰＨＳ等、所定のキャリアの無線通信サービスに対応した端末である。 <First Embodiment>
Hereinafter, a first embodiment, which is an example of an embodiment of the present invention, will be described. In the mobile terminal 1 (e-mail classification device) according to the present embodiment, in a series of processes for sending and receiving mail, in a state in which header information is received, that is, in a state before receiving a text, mail is spam (hereinafter, referred to as spam mail). It is a device that determines whether it is spam mail. The mobile terminal 1 is a terminal compatible with a predetermined carrier wireless communication service, such as a mobile phone or a PHS.

図１は、本実施形態に係る携帯端末１の機能構成を示す図である。
携帯端末１は、制御部１０と、記憶部２０と、通信部３０と、入力部４０と、表示部５０と、を備える。 FIG. 1 is a diagram illustrating a functional configuration of the mobile terminal 1 according to the present embodiment.
The mobile terminal 1 includes a control unit 10, a storage unit 20, a communication unit 30, an input unit 40, and a display unit 50.

制御部１０は、携帯端末１の全体を制御する部分であり、記憶部２０に記憶された各種プログラムを適宜読み出して実行することにより、前述のハードウェアと協働し、本実施形態における各種機能を実現している。制御部１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）であってよい。なお、制御部１０が備える各部の機能は後述する。 The control unit 10 is a part that controls the entire mobile terminal 1, and appropriately reads and executes various programs stored in the storage unit 20, thereby cooperating with the above-described hardware and various functions in the present embodiment. Is realized. The control unit 10 may be a CPU (Central Processing Unit). In addition, the function of each part with which the control part 10 is provided is mentioned later.

記憶部２０は、ハードウェア群を携帯端末１として機能させるための各種プログラム、本実施形態の各種機能を制御部１０に実行させるプログラム、及び各種データベース等を記憶する。なお、記憶部２０が備える各種データベースは後述する。 The storage unit 20 stores various programs for causing the hardware group to function as the mobile terminal 1, programs for causing the control unit 10 to execute various functions of the present embodiment, various databases, and the like. Various databases included in the storage unit 20 will be described later.

通信部３０は、所定の周波数帯（例えば、２ＧＨｚ帯や８００ＭＨｚ帯等）で外部装置（例えば、基地局を介してメールの送受信を管理するサーバ）と通信を行う。そして、通信部３０は、アンテナより受信した信号を復調処理し、処理後の信号を制御部１０に供給し、また、制御部１０から供給された信号を変調処理し、アンテナから外部装置に送信する。 The communication unit 30 communicates with an external device (for example, a server that manages transmission / reception of mail via a base station) in a predetermined frequency band (for example, 2 GHz band, 800 MHz band, etc.). The communication unit 30 demodulates the signal received from the antenna, supplies the processed signal to the control unit 10, modulates the signal supplied from the control unit 10, and transmits the signal from the antenna to the external device. To do.

入力部４０は、携帯端末１に対するユーザからの指示入力を受け付けるインタフェース装置である。入力部４０は、例えばキー操作部やタッチパネルにより構成される。 The input unit 40 is an interface device that receives an instruction input from the user to the mobile terminal 1. The input unit 40 is configured by, for example, a key operation unit or a touch panel.

表示部５０は、ユーザにデータの入力を受け付ける画面を表示したり、携帯端末１による処理結果の画面を表示したりするものである。ユーザは、表示部５０に表示された画面により、受信メールを確認する。表示部５０は、液晶ディスプレイや有機ＥＬディスプレイであってよい。 The display unit 50 displays a screen for accepting data input to the user, or displays a screen of a processing result by the mobile terminal 1. The user confirms the received mail on the screen displayed on the display unit 50. The display unit 50 may be a liquid crystal display or an organic EL display.

前述の制御部１０は、メール受信部１１（受信部）と、特徴ベクトル取得部１２（取得部）と、分類部１３と、ルール作成部１４（作成部）と、データ登録部１５と、通知部１６と、を備える。また、記憶部２０は、ルールＤＢ（データベース）２１と、メールＤＢ２２と、を備える。 The control unit 10 includes a mail reception unit 11 (reception unit), a feature vector acquisition unit 12 (acquisition unit), a classification unit 13, a rule creation unit 14 (creation unit), a data registration unit 15, a notification Unit 16. The storage unit 20 includes a rule DB (database) 21 and a mail DB 22.

メール受信部１１は、通信部３０を介して、携帯端末１のユーザのメールアドレスに宛てたメールを受信する。 The mail receiving unit 11 receives mail addressed to the mail address of the user of the mobile terminal 1 via the communication unit 30.

特徴ベクトル取得部１２は、メール受信部１１により受信されたメールのヘッダ情報に基づいて、メールの特徴を示す特徴ベクトルを取得する。ヘッダ情報には、送信者の名前、送信者のメールアドレス、返信先メールアドレス及び件名が含まれている。特徴ベクトル取得部１２は、ヘッダ情報に含まれるいずれかの項目に基づく分類が同一である過去に受信したメールのうちスパムメールの割合を示すデータ、及びいずれかの項目が同一のメールを過去に受信したか否かを示すデータの、少なくともいずれかに基づいて、特徴ベクトルを取得する。具体的には、特徴ベクトルは、以下の（ａ）〜（ｐ）の１６種類のデータのうちの少なくともいずれかを要素とするベクトルである。 The feature vector acquisition unit 12 acquires a feature vector indicating the feature of the mail based on the mail header information received by the mail reception unit 11. The header information includes the sender's name, the sender's email address, the reply destination email address, and the subject. The feature vector acquisition unit 12 stores the data indicating the ratio of spam mails among the mails received in the past that have the same classification based on any item included in the header information, and the mails that have the same item in the past. A feature vector is acquired based on at least one of the data indicating whether or not it has been received. Specifically, the feature vector is a vector having at least one of the following 16 types of data (a) to (p) as an element.

（ａ）送信者のメールアドレスのトップレベルドメインが同一である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (A) Data indicating the ratio of spam mails among mails received in the past in which the sender's mail address has the same top-level domain.

この要素は、例えば、過去に受信したメールと送信者のメールアドレスのトップレベルドメインが一致した場合、このトップレベルドメインが一致するメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when the top-level domain of the sender's email address matches the email received in the past, this element indicates “0 indicating the proportion of emails with the same top-level domain that have been determined as spam email in the past. ”To“ 1 ”.

なお、トップレベルドメインの一致するメールが所定の割合（ｐ１％）に満たない場合は、要素値を固定値（例えば「０」）と定義する。これにより、出現頻度が低い送信者のメールに対する分類結果により分類ルールが大きく変動して分類精度が低下するのを抑制できる。 Note that if the number of emails matching the top level domain is less than a predetermined ratio (p1%), the element value is defined as a fixed value (for example, “0”). Thereby, it can suppress that a classification rule changes greatly with the classification | category result with respect to the sender's mail with low appearance frequency, and classification accuracy falls.

（ｂ）返信先メールアドレスのトップレベルドメインが同一である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (B) Data indicating the ratio of spam mails among mails received in the past in which the top level domain of the reply mail address is the same.

この要素は、例えば、過去に受信したメールと返信先メールアドレスのトップレベルドメインが一致した場合、このトップレベルドメインが一致するメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 This element is, for example, “0” indicating the proportion of mail that has been received in the past and the top-level domain of the reply mail address that has been determined to be spam mail in the past among the mail that matches the top-level domain. This is real value data of “1”.

なお、トップレベルドメインの一致するメールが所定の割合（ｐ２％）に満たない場合は、要素値を固定値（例えば「０」）と定義する。これにより、出現頻度が低い返信先のメールに対する分類結果により分類ルールが大きく変動して分類精度が低下するのを抑制できる。 Note that if the number of emails matching the top-level domain is less than a predetermined ratio (p2%), the element value is defined as a fixed value (for example, “0”). As a result, it is possible to prevent the classification accuracy from being greatly changed due to the classification result for the reply destination mail having a low appearance frequency and the classification accuracy is lowered.

（ｃ）送信者のメールアドレスのアカウント部のうち少なくとも一部が同一である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (C) Data indicating the ratio of spam mails among previously received mails having at least a part of the account part of the sender's mail address.

この要素は、例えば、過去に受信したメールと送信者のメールアドレスのアカウント部のうち少なくとも一部が一致した場合、この一致するメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when at least a part of the account portion of the email address of the sender and the email received in the past matches, this element indicates “0” indicating the proportion of the matched email that has been determined as spam email in the past. ”To“ 1 ”.

ここで、アカウント部とは、メールアドレスのうち「＠」マークよりも前方の文字列をいう。特徴ベクトル取得部１２は、アカウント部を所定の区切り文字（例えば、ハイフン、ドット、パイプ、アンダーバー等）で区切った場合の先頭の文字列により、過去に受信したメールとの一致又は不一致を判断してよい。これにより、例えば、「ｍａｉｌ−７５＠ｅｘａｍｐｌｅ．ｃｏｍ」と「ｍａｉｌ−０５＠ｅｘａｍｐｌｅ．ｃｏｍ」等のように、同一の送信者である可能性が高いメールアドレスを同一視して統計処理できる。 Here, the account part means a character string ahead of the “@” mark in the mail address. The feature vector acquisition unit 12 determines whether the account part matches or does not match with the mail received in the past based on the first character string when the account part is separated by a predetermined delimiter (for example, hyphen, dot, pipe, underbar, etc.). It's okay. Accordingly, for example, mail addresses that are likely to be the same sender, such as “mail-75@example.com” and “mail-05@example.com”, can be statistically processed.

なお、一致するメールが所定の割合（ｐ３％）に満たない場合は、要素値を固定値（例えば「０」）と定義する。これにより、出現頻度が低い送信者のメールに対する分類結果により分類ルールが大きく変動して分類精度が低下するのを抑制できる。 If the number of matching emails is less than the predetermined ratio (p3%), the element value is defined as a fixed value (for example, “0”). Thereby, it can suppress that a classification rule changes greatly with the classification | category result with respect to the sender's mail with low appearance frequency, and classification accuracy falls.

（ｄ）返信先メールアドレスのアカウント部のうち少なくとも一部が同一である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (D) Data indicating the ratio of spam mails among mails received in the past in which at least a part of the account part of the reply mail address is the same.

この要素は、例えば、過去に受信したメールと返信先メールアドレスのアカウント部のうち少なくとも一部が一致した場合、この一致するメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when at least a part of the account part of the reply mail address and the mail received in the past matches, this element is “0” indicating the ratio of the matched mail that has been determined to be spam mail in the past. This is real value data of “1”.

特徴ベクトル取得部１２は、（ｃ）の要素と同様に、アカウント部を所定の区切り文字（例えば、ハイフン、ドット、パイプ、アンダーバー等）で区切った場合の先頭の文字列により、過去に受信したメールとの一致又は不一致を判断してよい。 Similar to the element (c), the feature vector acquisition unit 12 received in the past by the first character string when the account part is separated by a predetermined delimiter (for example, hyphen, dot, pipe, underbar, etc.). A match or mismatch with the email may be determined.

なお、一致するメールが所定の割合（ｐ４％）に満たない場合は、要素値を固定値（例えば「０」）と定義する。これにより、出現頻度が低い返信先のメールに対する分類結果により分類ルールが大きく変動して分類精度が低下するのを抑制できる。 If the number of matching emails is less than the predetermined ratio (p4%), the element value is defined as a fixed value (for example, “0”). As a result, it is possible to prevent the classification accuracy from being greatly changed due to the classification result for the reply destination mail having a low appearance frequency and the classification accuracy is lowered.

（ｅ）送信者のメールアドレス内のアカウント部の文字列長が同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (E) Data indicating the ratio of spam emails among emails received in the past in which the character string length of the account part in the sender email address is of the same classification.

この要素は、例えば、文字列長が所定の閾値（ｉ１）より大きい場合、同様に文字列長が閾値（ｉ１）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、文字列長が所定の閾値（ｉ１）以下の場合、同様に文字列長が閾値（ｉ１）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when the character string length is larger than a predetermined threshold value (i1), this element indicates the ratio that has been determined to be spam mail in the past among the mails received in the past that similarly exceed the threshold value (i1). This is real value data of “0” to “1”. Further, when the character string length is equal to or less than a predetermined threshold (i1), “0” indicating the ratio of the past determined as spam mail among the previously received mails whose character string length is equal to or less than the threshold (i1). This is real value data of “1”.

（ｆ）送信者のメールアドレス内のホスト部の文字列長が同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (F) Data indicating the ratio of spam mail among mails received in the past in which the character string length of the host part in the mail address of the sender is the same classification.

ここで、ホスト部とは、メールアドレスのうち「＠」マークよりも後方の文字列をいう。
この要素は、例えば、文字列長が所定の閾値（ｉ２）より大きい場合、同様に文字列長が閾値（ｉ２）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、文字列長が所定の閾値（ｉ２）以下の場合、同様に文字列長が閾値（ｉ２）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 Here, the host part refers to a character string behind the “@” mark in the mail address.
For example, when the character string length is larger than a predetermined threshold (i2), this element indicates the ratio of the past determined as spam mail among the mails received in the past in which the character string length exceeds the threshold (i2). This is real value data of “0” to “1”. Further, when the character string length is equal to or less than the predetermined threshold (i2), similarly, “0” indicating the ratio of the past determined as spam mail among the mails received in the past whose character string length is equal to or less than the threshold (i2). This is real value data of “1”.

（ｇ）送信者のメールアドレス内のホスト部におけるドメイン階層の深さが同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (G) Data indicating the ratio of spam mails among mails received in the past in which the domain hierarchy in the host part within the sender's mail address has the same classification.

ここで、ドメイン階層の深さとは、ホスト部がドット「．」によって区切られた個数をいい、例えば、「ｔｅｓｔ＠ｅｘａｍｐｌｅ．ｃｏｍ」の場合、「２」となる。
この要素は、例えば、ドメイン階層の深さが所定の閾値（ｉ３）より大きい場合、同様にドメイン階層の深さが閾値（ｉ３）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、ドメイン階層の深さが所定の閾値（ｉ３）以下の場合、同様にドメイン階層の深さが閾値（ｉ３）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 Here, the depth of the domain hierarchy refers to the number of host parts divided by dots “.”. For example, in the case of “test@example.com”, it is “2”.
For example, if the depth of the domain hierarchy is larger than a predetermined threshold (i3), this element is determined as spam mail in the past among the mails received in the past in which the domain hierarchy depth exceeds the threshold (i3). This is real value data of “0” to “1” indicating the ratio of the received data. In addition, when the domain hierarchy depth is equal to or less than a predetermined threshold (i3), the ratio of the past determined as spam mail among the previously received emails whose domain hierarchy depth is equal to or less than the threshold (i3) This is real value data of “0” to “1”.

（ｈ）返信先メールアドレス内のアカウント部の文字列長が同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (H) Data indicating the ratio of spam mails among mails received in the past in which the character string length of the account part in the reply mail address is the same classification.

この要素は、例えば、文字列長が所定の閾値（ｉ４）より大きい場合、同様に文字列長が閾値（ｉ４）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、文字列長が所定の閾値（ｉ４）以下の場合、同様に文字列長が閾値（ｉ４）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when the character string length is larger than a predetermined threshold (i4), this element indicates the ratio of the past determined as spam mail among the mails received in the past in which the character string length exceeds the threshold (i4). This is real value data of “0” to “1”. Further, when the character string length is equal to or less than the predetermined threshold (i4), “0” indicating the ratio of the past determined as spam mail among the previously received mails whose character string length is equal to or less than the threshold (i4). This is real value data of “1”.

（ｉ）返信先メールアドレス内のホスト部の文字列長が同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (I) Data indicating the ratio of spam mails among mails received in the past in which the character string length of the host part in the reply mail address is the same classification.

この要素は、例えば、文字列長が所定の閾値（ｉ５）より大きい場合、同様に文字列長が閾値（ｉ５）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、文字列長が所定の閾値（ｉ５）以下の場合、同様に文字列長が閾値（ｉ５）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when the character string length is larger than a predetermined threshold value (i5), this element indicates the ratio that has been determined as spam mail in the past among the mails received in the past that similarly exceed the threshold value (i5). This is real value data of “0” to “1”. Further, when the character string length is equal to or less than the predetermined threshold (i5), “0” indicating the ratio of the past determined as spam mail among the previously received mails whose character string length is equal to or less than the threshold (i5). This is real value data of “1”.

（ｊ）返信先メールアドレス内のホスト部におけるドメイン階層の深さが同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (J) Data indicating the ratio of spam mails among mails received in the past in which the depth of the domain hierarchy in the host part in the reply mail address is the same classification.

この要素は、例えば、ドメイン階層の深さが所定の閾値（ｉ６）より大きい場合、同様にドメイン階層の深さが閾値（ｉ６）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、ドメイン階層の深さが所定の閾値（ｉ６）以下の場合、同様にドメイン階層の深さが閾値（ｉ６）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when the domain layer depth is larger than a predetermined threshold (i6), this element is determined to be spam mail in the past among the emails received in the past in which the domain layer depth similarly exceeds the threshold (i6). This is real value data of “0” to “1” indicating the ratio of the received data. In addition, when the domain hierarchy depth is equal to or less than a predetermined threshold (i6), the ratio of the past determined as spam mail among the previously received emails whose domain hierarchy depth is equal to or less than the threshold (i6) This is real value data of “0” to “1”.

（ｋ）送信者の名前の文字列長が同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (K) Data indicating a ratio of spam mails among mails received in the past in which the character string length of the sender's name is the same classification.

この要素は、例えば、文字列長が所定の閾値（ｉ７）より大きい場合、同様に文字列長が閾値（ｉ７）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、文字列長が所定の閾値（ｉ７）以下の場合、同様に文字列長が閾値（ｉ７）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when the character string length is larger than a predetermined threshold (i7), this element indicates the ratio of the past determined as spam mail among the mails received in the past in which the character string length exceeds the threshold (i7). This is real value data of “0” to “1”. Further, when the character string length is equal to or less than the predetermined threshold (i7), “0” indicating the ratio of the past determined as spam mail among the previously received mails whose character string length is equal to or less than the threshold (i7). This is real value data of “1”.

（ｌ）件名の文字列長が同一分類である過去に受信したメールのうち、スパムメールの割合を示すデータ。 (L) Data indicating the ratio of spam mails among mails received in the past whose subject string length is the same.

この要素は、例えば、文字列長が所定の閾値（ｉ８）より大きい場合、同様に文字列長が閾値（ｉ８）を超える過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。また、文字列長が所定の閾値（ｉ８）以下の場合、同様に文字列長が閾値（ｉ８）以下の過去に受信したメールのうち、スパムメールと過去に判定された割合を示す「０」〜「１」の実数値データである。 For example, when the character string length is larger than a predetermined threshold value (i8), this element indicates the ratio of the past determined as spam mail among the mails received in the past in which the character string length exceeds the threshold value (i8). This is real value data of “0” to “1”. Further, when the character string length is equal to or less than a predetermined threshold (i8), similarly, “0” indicating the proportion of the mail received in the past whose character string length is equal to or less than the threshold (i8) in the past. This is real value data of “1”.

なお、上記（ｅ）〜（ｌ）の要素において、各閾値（ｉ１〜ｉ８）は複数あってもよく、これらの閾値により区切られた分類ごとに要素値が設定される。これにより、特徴ベクトル取得部１２は、文字列長やドメイン階層の深さによってスパムメールの割合に関する傾向が大きく変動する場合に、傾向が異なる分類ごとに要素値を差別化できるので、誤判定を抑制してメールの分類精度を向上できる。 In the elements (e) to (l), there may be a plurality of threshold values (i1 to i8), and an element value is set for each category divided by these threshold values. As a result, the feature vector acquisition unit 12 can differentiate the element value for each category having a different tendency when the tendency regarding the ratio of spam mails greatly varies depending on the character string length or the depth of the domain hierarchy. It can be suppressed to improve the mail classification accuracy.

（ｍ）送信者の名前が同一であるメールを過去に受信したか否かを示すデータ。
（ｎ）送信者のメールアドレスが同一であるメールを過去に受信したか否かを示すデータ。
（ｏ）返信先メールアドレスが同一であるメールを過去に受信したか否かを示すデータ。
（ｐ）件名が同一であるメールを過去に受信したか否かを示すデータ。
なお、（ｍ）〜（ｐ）の要素は、「ＹＥＳ」又は「ＮＯ」の２値データである。 (M) Data indicating whether or not a mail having the same sender name has been received in the past.
(N) Data indicating whether or not a mail having the same sender's mail address has been received in the past.
(O) Data indicating whether or not a mail having the same reply mail address has been received in the past.
(P) Data indicating whether or not an email having the same subject has been received in the past.
The elements (m) to (p) are binary data “YES” or “NO”.

特徴ベクトル取得部１２は、これら（ａ）〜（ｐ）の要素の組合せにより、スパムメールの特徴を特定の特徴ベクトルで表すことができる。また、スパムメールは、簡易なフィルタルールではスパムメールと判断されやすいメーリングリストや、定期購読しているメールマガジン等とは、異なる特徴ベクトルとなるため、精度良く分類することができる。 The feature vector acquisition unit 12 can express the feature of the spam mail with a specific feature vector by combining these elements (a) to (p). Also, spam mail can be classified with high accuracy because it has a different feature vector from a mailing list that is easily determined as spam mail by a simple filter rule, or a mail magazine subscribed to regularly.

なお、特徴ベクトル取得部１２は、過去に受信したメールを参照して比較する際、メール受信部１１により所定期間（例えば、現在までの１年間）に受信されたメールを対象としてよい。これにより、携帯端末１は、スパムメールの特徴が変化することによる影響を低減し、新たなスパムメールを精度良く検出できる。 Note that the feature vector acquisition unit 12 may target mail received by the mail receiving unit 11 during a predetermined period (for example, one year until now) when comparing with reference to mail received in the past. Thereby, the portable terminal 1 can reduce the influence by the characteristic of spam mail changing, and can detect a new spam mail accurately.

分類部１３は、特徴ベクトル取得部１２により取得された特徴ベクトルを参照し、後述の分類ルール（ルールＤＢ２１）に基づいて、メールが正常メールであるかスパムメールであるかを分類する。 The classification unit 13 refers to the feature vector acquired by the feature vector acquisition unit 12 and classifies whether the email is a normal email or a spam email based on a later-described classification rule (rule DB 21).

なお、分類ルールが作成されていない初期状態では、分類部１３は、過去に受信した複数のメールを仮のルールによって分類する。具体的には、例えば、送信者がアドレス帳に登録されているものは正常メール、それ以外をスパムメールとする。あるいは、ユーザの操作入力により分類されている場合には、この分類に従うこととしてよい。 In an initial state where no classification rule is created, the classification unit 13 classifies a plurality of emails received in the past according to a provisional rule. Specifically, for example, the sender registered in the address book is a normal mail, and the other is a spam mail. Alternatively, when classification is performed based on user operation input, this classification may be followed.

ルール作成部１４は、受信したメールが正常メールであるかスパムメールであるかの分類情報を受け付けた場合に、この分類情報及び対応する特徴ベクトルを学習データとして、メールが正常メールであるかスパムメールであるかを分類するための分類ルールを作成する。分類情報は、分類部１３により分類された結果であり、この結果の中でも確実性の高いデータ、すなわち、正常メール又はスパムメールである可能性が高いデータであることが好ましい。なお、分類ルールの作成は、処理負荷が軽い方法が望ましく、例えば、ｋ−ＮＮ法や決定二分木、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）等の学習アルゴリズムを利用することができる。 When the rule creation unit 14 receives classification information indicating whether the received mail is a normal mail or a spam mail, the rule creation unit 14 uses the classification information and the corresponding feature vector as learning data to determine whether the mail is a normal mail or spam. Create a classification rule to classify mail. The classification information is a result of classification by the classification unit 13, and among these results, it is preferable that the data is highly reliable, that is, data that is highly likely to be normal mail or spam mail. The classification rule is preferably created by a method with a light processing load. For example, a learning algorithm such as a k-NN method, a decision binary tree, or SVM (Support Vector Machine) can be used.

ここで、ルール作成部１４は、例えば一定周期や、携帯端末１の処理負荷の低下時等、所定の契機により分類ルールを再作成する。また、ルール作成部１４は、スパムメールの特徴が変化した場合にも分類の精度を維持させるために、メール受信部１１により現在までの所定期間（例えば、１年間）に受信されたメールに基づいて分類ルールを作成する。ルール作成部１４は、この所定期間より前に受信されたメールを参照しないことで、古いスパムメールの特徴に影響されず、新たなスパムメールを精度良く判定することができる。また、対象データ量が減少するため、携帯端末１の処理負荷が低減される。 Here, the rule creation unit 14 re-creates the classification rule at a predetermined timing, for example, at a certain period or when the processing load of the mobile terminal 1 is reduced. Further, the rule creating unit 14 is based on the mail received by the mail receiving unit 11 during a predetermined period (for example, one year) so as to maintain the classification accuracy even when the characteristics of the spam mail are changed. Create a classification rule. By not referring to the mail received before the predetermined period, the rule creating unit 14 can accurately determine a new spam mail without being affected by the characteristics of the old spam mail. Moreover, since the amount of target data decreases, the processing load of the portable terminal 1 is reduced.

ここで、作成される分類ルールの例を説明する。
図２は、本実施形態に係る分類ルールの第１の例を示す図である。 Here, an example of the created classification rule will be described.
FIG. 2 is a diagram illustrating a first example of the classification rule according to the present embodiment.

ルール作成部１４は、所定の学習アルゴリズムによって、各特徴ベクトルに対して、スパムメールである可能性を示す変数値Ｐｓ（０≦Ｐｓ≦１）を決定する。Ｐｓは、１に近ければスパムメールである可能性が高く、０に近ければ正常なメールである可能性が高い。前述の分類情報は、このＰｓであってよい。また、Ｐｓが０又は１に近いほど重み付けをした値を分類情報としてもよい。 The rule creation unit 14 determines a variable value Ps (0 ≦ Ps ≦ 1) indicating the possibility of being spam mail for each feature vector by a predetermined learning algorithm. If Ps is close to 1, there is a high possibility of being a spam mail, and if it is close to 0, there is a high possibility of being a normal mail. The above classification information may be this Ps. A value weighted as Ps is closer to 0 or 1 may be used as the classification information.

図２の例では、理想的な分類ルールにより、受信したメールから取得される特徴ベクトルに対応してＰｓが０又は１の近辺に集中している。縦軸は実際の正常メール及びスパムメールの件数の累積分布を示しており、正常メール（実線）は、Ｐｓが０付近で１００％に達しているが、スパムメール（破線）は、Ｐｓが１付近まで０％である。 In the example of FIG. 2, Ps is concentrated in the vicinity of 0 or 1 corresponding to the feature vector acquired from the received mail according to an ideal classification rule. The vertical axis shows the cumulative distribution of the number of actual normal mails and spam mails. Normal mail (solid line) reaches 100% when Ps is near 0, but spam mail (dashed line) has Ps of 1 It is 0% to the vicinity.

ここで、ルール作成部１４は、Ｐｓを正常メールに対応するものとスパムメールに対応するものとに分類する閾値Ｐｓｔｈを設定する。図２の場合には、ルール作成部１４は、Ｐｓｔｈを「０．５」に設定し、ＰｓがＰｓｔｈ未満のメールを正常メール（フォルダ１）に、ＰｓがＰｓｔｈ以上のメールをスパムメール（フォルダ２）に分類している。 Here, the rule creating unit 14 sets a threshold value Psth for classifying Ps into one corresponding to normal mail and one corresponding to spam mail. In the case of FIG. 2, the rule creation unit 14 sets Psth to “0.5”, mails with Ps less than Psth are normal mails (folder 1), and mails with Ps greater than Psth are spam mails (folder 2).

理想的な状態では、このように確実にメールを分類することができるが、現実的にはＰｓは様々な値をとるため、１つの閾値Ｐｓｔｈにより確実に分類することは難しい。
図３は、本実施形態に係る分類ルールの第２の例を示す図である。 In an ideal state, mails can be reliably classified in this way. However, since Ps takes various values in reality, it is difficult to reliably classify by one threshold value Psth.
FIG. 3 is a diagram illustrating a second example of the classification rule according to the present embodiment.

図３の例では、正常メール（実線）は、Ｐｓが０から離れた値にも、スパムメールは、Ｐｓが１から離れた値にも分散している。そのため、ルール作成部１４は、第１の閾値（Ｐｓｔｈ１＝０．３）及び第２の閾値（Ｐｓｔｈ２＝０．７）を設定している。この場合、分類部１３は、ＰｓがＰｓｔｈ１未満であれば正常メール（フォルダ１）に、ＰｓがＰｓｔｈ２以上であればスパムメール（フォルダ３）に受信メールを分類する。そして、分類部１３は、ＰｓがＰｓｔｈ１以上Ｐｓｔｈ２未満であれば、受信メールを保留メール（フォルダ２）に分類する。 In the example of FIG. 3, the normal mail (solid line) is dispersed in a value where Ps is away from 0, and the spam mail is dispersed in a value where Ps is away from 1. Therefore, the rule creation unit 14 sets a first threshold value (Psth1 = 0.3) and a second threshold value (Psth2 = 0.7). In this case, the classification unit 13 classifies the received mail as a normal mail (folder 1) if Ps is less than Psth1, and as a spam mail (folder 3) if Ps is greater than or equal to Psth2. Then, if Ps is greater than or equal to Psth1 and less than Psth2, the classification unit 13 classifies the received mail as a pending mail (folder 2).

図１に戻って、データ登録部１５は、分類部１３により分類されたメールを、メールＤＢ２２にフォルダ分け（正常メール、保留メール、スパムメール）して格納する。このとき、データ登録部１５は、特徴ベクトル及び分類情報（Ｐｓ又はＰｓに重み付けをした値）をメールに対応付けて格納する。 Returning to FIG. 1, the data registration unit 15 stores the mail classified by the classification unit 13 in the mail DB 22 by dividing it into folders (normal mail, suspended mail, spam mail). At this time, the data registration unit 15 stores the feature vector and the classification information (Ps or a value obtained by weighting Ps) in association with the mail.

また、データ登録部１５は、入力部４０からの所定の操作入力に応じて、分類結果を変更してメールＤＢ２２を更新する。具体的には、例えば、正常メールに分類されたメールをスパムメールのフォルダに移動したり、保留メールに分類されたメールを正常メール又はスパムメールのフォルダに移動したりといった操作を受け付けることにより、メールＤＢ２２を更新する。 In addition, the data registration unit 15 changes the classification result and updates the mail DB 22 according to a predetermined operation input from the input unit 40. Specifically, for example, by accepting operations such as, for example, receiving operations such as moving mail classified as normal mail to a spam mail folder, or moving mail classified as pending mail to a normal mail or spam mail folder The mail DB 22 is updated.

さらに、データ登録部１５は、このような分類結果の変更入力を受け付けた場合に、この変更情報をルール作成部１４へ提供する。ルール作成部１４は、この変更情報に基づいて、ルールＤＢ２１の分類ルールを更新する。 Further, the data registration unit 15 provides this change information to the rule creation unit 14 when receiving the change input of the classification result. The rule creation unit 14 updates the classification rule of the rule DB 21 based on this change information.

具体的には、ルール作成部１４は、分類を変更したメールに対応する前述の変数値Ｐｓ若しくは閾値Ｐｓｔｈ（Ｐｓｔｈ１、Ｐｓｔｈ２）、又はその両方を調整する。すなわち、例えば、スパムメール又は保留メールを正常メールに変更した場合はＰｓを小さく、正常メール又は保留メールをスパムメールに変更した場合はＰｓを大きく調整する。また、保留メールを正常メールに変更した場合はＰｓｔｈ１を大きく、保留メールをスパムメールに変更した場合はＰｓｔｈ２を小さく調整する。 Specifically, the rule creation unit 14 adjusts the aforementioned variable value Ps or threshold value Psth (Psth1, Psth2) corresponding to the mail whose classification has been changed, or both. That is, for example, Ps is adjusted to be small when spam mail or hold mail is changed to normal mail, and Ps is adjusted to be large when normal mail or hold mail is changed to spam mail. Further, Psth1 is adjusted to be large when the hold mail is changed to normal mail, and Psth2 is adjusted to be small when the hold mail is changed to spam mail.

通知部１６は、分類部１３によりスパムメールに分類されたメールの情報を、このメールの受信を管理するサーバ（携帯端末１の通信サービスを提供しているキャリアのサーバ）へ通知する。この通知は、所定のアドレスに対するメール通知であってよい。通知を受け付けたサーバは、受信メールのフィルタルールを更新し、携帯端末１へのスパムメールの送信を抑制することができる。 The notification unit 16 notifies the server managing the reception of the mail (the server of the carrier providing the communication service of the mobile terminal 1) of the mail classified as spam mail by the classification unit 13. This notification may be a mail notification to a predetermined address. The server that receives the notification can update the filter rule of the received mail and suppress the transmission of the spam mail to the mobile terminal 1.

なお、通知部１６は、メールが分類されたタイミングで自動的に通知を行ってもよいが、これには限られない。通知部１６は、誤った分類に基づく自動通知を防ぐため、ユーザの確認入力を受け付けた場合に通知することとしてよい。また、通知部１６は、自動的な通知を行わず、ユーザからの要求に応じて通知することとしてもよい。 In addition, although the notification part 16 may notify automatically at the timing when the mail was classified, it is not restricted to this. The notification unit 16 may notify when a user's confirmation input is accepted in order to prevent automatic notification based on an incorrect classification. Moreover, the notification part 16 is good also as notifying according to the request | requirement from a user, without performing automatic notification.

図４は、本実施形態に係る携帯端末１におけるメール受信に伴う制御部１０の処理の流れを示すフローチャートである。 FIG. 4 is a flowchart showing a flow of processing of the control unit 10 accompanying mail reception in the mobile terminal 1 according to the present embodiment.

ステップＳ１（受信ステップ）において、制御部１０（メール受信部１１）は、携帯端末１のユーザ宛のメールを受信する。 In step S 1 (reception step), the control unit 10 (mail receiving unit 11) receives mail addressed to the user of the mobile terminal 1.

ステップＳ２（取得ステップ）において、制御部１０（特徴ベクトル取得部１２）は、ステップＳ１で受信したメールのヘッダ情報と、過去に受信したメールのヘッダ情報及び分類情報（正常メールかスパムメールかの区分情報）とに基づいて、特徴ベクトルを取得する。 In step S2 (acquisition step), the control unit 10 (feature vector acquisition unit 12) obtains the header information of the mail received in step S1, the header information of the mail received in the past, and the classification information (whether normal mail or spam mail). The feature vector is acquired based on the classification information.

ステップＳ３（分類ステップ）において、制御部１０（分類部１３）は、ステップＳ２で取得した特徴ベクトルを、ルールＤＢ２１に格納されている分類ルールと照合し、受信したメールを、正常メール、スパムメール又は保留メールに分類する。 In step S3 (classification step), the control unit 10 (classification unit 13) collates the feature vector acquired in step S2 with the classification rule stored in the rule DB 21, and converts the received mail into normal mail and spam mail. Or classify it as a pending email.

ステップＳ４において、制御部１０（データ登録部１５）は、ステップＳ３で分類したメールを、フォルダ分けして分類情報と共にメールＤＢ２２に格納する。 In step S4, the control unit 10 (data registration unit 15) divides the mail classified in step S3 into folders and stores it in the mail DB 22 together with the classification information.

図５は、本実施形態に係る携帯端末１の制御部１０（ルール作成部１４）が分類ルールを作成する処理（作成ステップ）の流れを示すフローチャートである。 FIG. 5 is a flowchart showing a flow of processing (creation step) in which the control unit 10 (rule creation unit 14) of the mobile terminal 1 according to the present embodiment creates a classification rule.

ステップＳ１１において、ルール作成部１４は、分類ルール作成のタイミングか否かを判定する。具体的には、所定の周期の到来や、携帯端末１の処理負荷が所定以下に低下したことを検知し、作成のタイミングと判定する。ルール作成部１４は、この判定がＹＥＳの場合、処理をステップＳ１２に移し、判定がＮＯの場合、処理をステップＳ１７に移す。 In step S 11, the rule creation unit 14 determines whether it is time to create a classification rule. Specifically, the arrival of a predetermined period or the processing load of the portable terminal 1 is detected to be lower than a predetermined level, and it is determined as the creation timing. When the determination is YES, the rule creating unit 14 moves the process to step S12, and when the determination is NO, the rule creating unit 14 moves the process to step S17.

ステップＳ１２において、ルール作成部１４は、メールＤＢ２２から、分類済みのメールに関する分類情報、及び各メールの特徴ベクトルを、学習データとして取得する。 In step S 12, the rule creation unit 14 acquires the classification information related to the classified mail and the feature vector of each mail as learning data from the mail DB 22.

ステップＳ１３において、ルール作成部１４は、ステップＳ１２で取得した学習データに基づいて、所定のアルゴリズムにより分類ルールの学習を行う。具体的には、まず、各特徴ベクトルに変数値Ｐｓを付与する。 In step S13, the rule creating unit 14 learns the classification rule by a predetermined algorithm based on the learning data acquired in step S12. Specifically, first, a variable value Ps is assigned to each feature vector.

ステップＳ１４において、ルール作成部１４は、ステップＳ１３で付与されたＰｓを、通常メール又はスパムメールに分類するための閾値Ｐｓｔｈを決定する。閾値Ｐｓｔｈは、前述のように、２種類（Ｐｓｔｈ１、Ｐｓｔｈ２）を決定されることとしてよい。 In step S14, the rule creating unit 14 determines a threshold value Psth for classifying Ps given in step S13 into normal mail or spam mail. As described above, two types of threshold values Psth (Psth1, Psth2) may be determined.

ステップＳ１５において、ルール作成部１４は、ステップＳ１３及びステップＳ１４において作成した分類ルールを、ルールＤＢ２１に格納する。 In step S15, the rule creation unit 14 stores the classification rules created in step S13 and step S14 in the rule DB 21.

ステップＳ１６において、ルール作成部１４は、分類ルールの作成を終了するか否かを判定する。具体的には、ルール作成部１４は、メールの分類又は分類ルールの更新が不要となった場合に、所定の操作入力を受け付けることにより、分類ルールの作成終了を判定する。ルール作成部１４は、この判定がＹＥＳの場合、処理を終了し、判定がＮＯの場合、処理をステップＳ１１に戻して、所定のタイミングでの分類ルールの作成を継続する。 In step S 16, the rule creation unit 14 determines whether or not to finish creating the classification rule. Specifically, the rule creation unit 14 determines the end of the creation of the classification rule by receiving a predetermined operation input when it is no longer necessary to classify the mail or update the classification rule. When this determination is YES, the rule creation unit 14 ends the process. When the determination is NO, the rule creation unit 14 returns the process to step S11 and continues creating the classification rule at a predetermined timing.

ステップＳ１７において、ルール作成部１４は、ユーザからの操作入力により分類結果を変更されたか否かを判定する。ルール作成部１４は、この判定がＹＥＳの場合、処理をステップＳ１８に移し、判定がＮＯの場合、処理をステップＳ１６に移す。 In step S 17, the rule creation unit 14 determines whether or not the classification result has been changed by an operation input from the user. When this determination is YES, the rule creation unit 14 proceeds to step S18, and when the determination is NO, the rule creation unit 14 proceeds to step S16.

ステップＳ１８において、ルール作成部１４は、ルールＤＢ２１に格納されている分類ルールによる分類結果が適切ではなかったので、分類を変更されたメールの変数値Ｐｓ又は閾値Ｐｓｔｈを調整する。そして、ルール作成部１４は、処理をステップＳ１５に移し、分類ルールを更新する。 In step S18, the rule creation unit 14 adjusts the variable value Ps or the threshold value Psth of the mail whose classification has been changed because the classification result based on the classification rule stored in the rule DB 21 is not appropriate. And the rule preparation part 14 moves a process to step S15, and updates a classification rule.

以上のように、本実施形態によれば、携帯端末１は、ヘッダ情報から容易に抽出可能な数種類のデータに基づいて容易に特徴ベクトルを取得し、各特徴ベクトルに付与された変数値Ｐｓを閾値Ｐｓｔｈ（Ｐｓｔｈ１、Ｐｓｔｈ２）により容易に自動分類する。したがって、携帯端末１は、分類ルールの作成及び分類の処理負荷を低減し、かつ、ユーザの操作負荷を軽減することができる。さらに、携帯端末１は、メール本文は解析せず、携帯端末１におけるスパムメールに特有のヘッダ情報の法則性を特徴ベクトルとして表した分類ルールを生成するので、処理負荷を低減しつつ、高精度で自動的にスパムメールを判定することができる。 As described above, according to this embodiment, the mobile terminal 1 easily obtains a feature vector based on several types of data that can be easily extracted from the header information, and uses the variable value Ps assigned to each feature vector. Automatic classification is easily performed based on the threshold value Psth (Psth1, Psth2). Therefore, the mobile terminal 1 can reduce the processing load for creating and classifying classification rules and reducing the operation load on the user. Furthermore, since the mobile terminal 1 does not analyze the mail body and generates a classification rule that represents the rule of header information unique to spam mail in the mobile terminal 1 as a feature vector, the processing accuracy is reduced while reducing the processing load. Can automatically determine spam emails.

ここで、既存の手法では、以下の１７種類のデータを要素とする特徴ベクトルを用いた。
（ａ´）送信者のメールアドレスがアドレス帳に登録されているか否かを示すデータ。
（ｂ´）送信者の名前が共通で送信者のメールアドレスが異なるメールを過去に受信したか否かを示すデータ。
（ｃ´）送信者のメールアドレスが共通で送信者の名前が異なるメールを過去に受信したか否かを示すデータ。
（ｄ´）送信者の名前が共通で返信先メールアドレスが異なるメールを過去に受信したか否かを示すデータ。
（ｅ´）返信先メールアドレスが共通で送信者の名前が異なるメールを過去に受信したか否かを示すデータ。
（ｆ´）送信者のメールアドレスが共通で返信先メールアドレスが異なるメールを過去に受信したか否かを示すデータ。
（ｇ´）返信先メールアドレスが共通で送信者のメールアドレスが異なるメールを過去に受信したか否かを示すデータ。
（ｈ´）送信者の名前が共通で件名が異なるメールを過去に受信したか否かを示すデータ。
（ｉ´）件名が共通で送信者の名前が異なるメールを過去に受信したか否かを示すデータ。
（ｊ´）送信者のメールアドレスが共通で件名が異なるメールを過去に受信したか否かを示すデータ。
（ｋ´）件名が共通で送信者のメールアドレスが異なるメールを過去に受信したか否かを示すデータ。
（ｌ´）件名が共通で返信先メールアドレスが異なるメールを過去に受信したか否かを示すデータ。
（ｍ´）返信先メールアドレスが共通で件名が異なるメールを過去に受信したか否かを示すデータ。
（ｎ´）送信者のメールアドレスが前回いつ出現したかを示すデータ。
（ｏ´）送信者の名前が前回いつ出現したかを示すデータ。
（ｐ´）返信先メールアドレスが前回いつ出現したかを示すデータ。
（ｑ´）件名が前回いつ出現したかを示すデータ。
この既存の手法の場合、アドレス帳の登録件数をｎ件、過去のメール件数をＮ件としたとき、比較処理の最大回数は、（ａ´）について「ｎ」回、（ｂ´）〜（ｍ´）について、それぞれ「２Ｎ」回、（ｎ´）〜（ｑ´）について、それぞれ「Ｎ」回なので、合計で「ｎ＋１２×２Ｎ＋４×Ｎ」回の比較処理が必要であった。 Here, in the existing method, feature vectors having the following 17 types of data as elements are used.
(A ′) Data indicating whether or not the sender's mail address is registered in the address book.
(B ′) Data indicating whether or not mails having the same sender name and different sender mail addresses have been received in the past.
(C ′) Data indicating whether or not a mail having a common sender mail address and a different sender name has been received in the past.
(D ′) Data indicating whether or not a mail having a common sender name and a different reply destination mail address has been received in the past.
(E ′) Data indicating whether or not a mail having a common reply destination mail address and a different sender name has been received in the past.
(F ′) Data indicating whether or not a mail having a common sender mail address and a different reply destination mail address has been received in the past.
(G ′) Data indicating whether or not a mail having a common reply destination mail address and a different mail address of the sender has been received in the past.
(H ′) Data indicating whether or not a mail having a common sender name and a different subject has been received in the past.
(I ′) Data indicating whether or not a mail having a common subject name and a different sender name has been received in the past.
(J ′) Data indicating whether or not a mail having a common mail address and a different subject has been received in the past.
(K ′) Data indicating whether or not a mail having a common subject and a different mail address of the sender has been received in the past.
(L ′) Data indicating whether or not a mail having a common subject line and a different reply destination mail address has been received in the past.
(M ′) Data indicating whether or not a mail having a common reply destination mail address and a different subject has been received in the past.
(N ′) Data indicating when the sender's mail address appeared last time.
(O ′) Data indicating when the sender's name appeared last time.
(P ′) Data indicating when the reply mail address appeared last time.
(Q ′) Data indicating when the subject appears last time.
In the case of this existing method, when the number of registrations in the address book is n and the number of past mails is N, the maximum number of comparison processes is “n” times for (a ′), (b ′) to ( Since m ′) is “2N” times and (n ′) to (q ′) is “N” times, a total of “n + 12 × 2N + 4 × N” comparison processes are required.

これに対して、本実施形態の場合、（ａ）〜（ｐ）の各要素について最大で「Ｎ」回、合計「１６×Ｎ」回の比較処理により特徴ベクトルを作成できる。したがって、携帯端末１は、さらに処理負荷を軽減することができる。 On the other hand, in the case of this embodiment, a feature vector can be created by a maximum of “N” times and a total of “16 × N” times of comparison processing for each element of (a) to (p). Therefore, the portable terminal 1 can further reduce the processing load.

なお、特徴ベクトルとして、本実施形態における（ａ）〜（ｐ）の要素に、上記の（ａ´）〜（ｑ´）の要素をさらに加えてもよい。この場合、特徴ベクトルがさらに細分化されるので、分類精度の向上が期待できる。 As the feature vector, the elements (a ′) to (q ′) may be further added to the elements (a) to (p) in the present embodiment. In this case, since the feature vector is further subdivided, an improvement in classification accuracy can be expected.

また、携帯端末１は、自動的な分類結果がユーザにより変更された場合、特徴ベクトル毎の変数値Ｐｓ又は閾値Ｐｓｔｈを調整し、分類ルールの学習結果を調整することができるので、ユーザからの変更入力により再学習し、分類精度を向上させることができる。さらに、携帯端末１は、変数値Ｐｓに応じた重み付けを学習データにできるので、確実性の高い分類情報が優先されることにより、分類精度の向上が期待できる。 In addition, when the automatic classification result is changed by the user, the mobile terminal 1 can adjust the variable value Ps or the threshold value Psth for each feature vector and adjust the learning result of the classification rule. Re-learning by change input can improve classification accuracy. Furthermore, since the mobile terminal 1 can weight the learning data according to the variable value Ps, it is expected that classification accuracy with high certainty is prioritized to improve classification accuracy.

また、携帯端末１は、例えば一定周期や、処理負荷の低下時等、所定の契機により分類ルールを再作成するので、新たなメールを学習データとして、分類ルールを更新することができる。さらに、携帯端末１は、現在までの所定期間に受信されたメールに基づいて学習するので、これより前に受信された古いメールを対象外とし、新しい情報により分類ルールを作成することができる。したがって、携帯端末１は、最近のスパムメールの特徴を反映して精度の高い分類ルールを作成することができる。 Moreover, since the portable terminal 1 re-creates a classification rule with a predetermined trigger, for example, at a fixed period or when the processing load is reduced, the classification rule can be updated using new mail as learning data. Furthermore, since the mobile terminal 1 learns based on mails received during a predetermined period until now, old mails received before this time can be excluded, and classification rules can be created with new information. Accordingly, the mobile terminal 1 can create a highly accurate classification rule reflecting the characteristics of recent spam mails.

また、携帯端末１は、メールの受信を管理するサーバへ、スパムメールに分類されたメールの情報を自動的に又は要求に応じて通知するので、このサーバにおいてフィルタルールを更新させ、スパムメールの受信を抑制することができる。 In addition, since the mobile terminal 1 notifies the server that manages the reception of the mail information of the mail classified as spam mail automatically or upon request, the filter rule is updated in this server, and the spam mail Reception can be suppressed.

＜第２実施形態＞
以下、本発明の実施形態の一例である第２実施形態について説明する。本実施形態に係る管理サーバ２（電子メール管理サーバ）は、携帯端末１ａと接続され、この携帯端末１ａに宛てたメールを管理するサーバであり、例えば、携帯電話やＰＨＳ等の所定の無線通信サービスを提供するキャリアのサーバである。管理サーバ２は、携帯端末１ａにおいて、受信したメールがスパムメールであるか否かを判定するための分類ルールを生成して、この携帯端末１ａへ提供する。
以下、第１実施形態である携帯端末１の構成と比較しながら、同様の構成については、同一の符号又は「ａ」を付加した符号を付し、説明を省略又は簡略化する。 Second Embodiment
Hereinafter, a second embodiment which is an example of an embodiment of the present invention will be described. The management server 2 (electronic mail management server) according to the present embodiment is a server that is connected to the mobile terminal 1a and manages mail addressed to the mobile terminal 1a. For example, a predetermined wireless communication such as a mobile phone or PHS It is a server of a carrier that provides a service. The management server 2 generates a classification rule for determining whether or not the received mail is spam mail in the mobile terminal 1a, and provides the classification rule to the mobile terminal 1a.
Hereinafter, while comparing with the configuration of the mobile terminal 1 according to the first embodiment, the same configuration is denoted by the same symbol or the symbol added with “a”, and the description is omitted or simplified.

図６は、本実施形態に係る管理サーバ２の機能構成を示す図である。
管理サーバ２は、制御部１０ａと、記憶部２０ａと、通信部３０ａと、入力部４０ａと、表示部５０ａと、を備える。 FIG. 6 is a diagram illustrating a functional configuration of the management server 2 according to the present embodiment.
The management server 2 includes a control unit 10a, a storage unit 20a, a communication unit 30a, an input unit 40a, and a display unit 50a.

制御部１０ａは、第１実施形態の制御部１０に相当し、管理サーバ２の全体を制御する部分であり、記憶部２０ａに記憶された各種プログラムを適宜読み出して実行することにより、前述のハードウェアと協働し、本実施形態における各種機能を実現している。 The control unit 10a corresponds to the control unit 10 of the first embodiment, and is a part that controls the entire management server 2. The above-described hardware can be obtained by appropriately reading and executing various programs stored in the storage unit 20a. Various functions in the present embodiment are realized in cooperation with the hardware.

記憶部２０ａは、第１実施形態の記憶部２０に相当し、ハードウェア群を管理サーバ２として機能させるための各種プログラム、本実施形態の各種機能を制御部１０ａに実行させるプログラム、及び各種データベース等を記憶する。 The storage unit 20a corresponds to the storage unit 20 of the first embodiment, and various programs for causing a hardware group to function as the management server 2, programs for causing the control unit 10a to execute various functions of the present embodiment, and various databases Memorize etc.

通信部３０ａは、第１実施形態の通信部３０に相当し、所定のネットワーク３及び基地局４を介して、複数の携帯端末１ａと通信を行う。 The communication unit 30a corresponds to the communication unit 30 of the first embodiment, and communicates with a plurality of portable terminals 1a via the predetermined network 3 and the base station 4.

入力部４０ａは、第１実施形態の入力部４０に相当し、管理サーバ２に対するユーザ（管理者）からの指示入力を受け付けるインタフェース装置である。 The input unit 40a corresponds to the input unit 40 of the first embodiment, and is an interface device that receives an instruction input from the user (administrator) to the management server 2.

表示部５０ａは、第１実施形態の表示部５０に相当し、ユーザ（管理者）にデータの入力を受け付ける画面を表示したり、管理サーバ２による処理結果の画面を表示したりするものである。 The display unit 50a corresponds to the display unit 50 of the first embodiment, and displays a screen for accepting data input to the user (administrator) or displays a screen of a processing result by the management server 2. .

ここで、制御部１０ａは、メール受信部１１ａ（受信部）と、特徴ベクトル取得部１２ａ（取得部）と、ルール作成部１４ａ（作成部）と、データ登録部１５ａと、メール転送部１７（転送部）と、ルール送信部１８（送信部）と、を備える。
また、記憶部２０ａは、ルールＤＢ２１ａと、メールＤＢ２２ａと、を備える。 Here, the control unit 10a includes a mail reception unit 11a (reception unit), a feature vector acquisition unit 12a (acquisition unit), a rule creation unit 14a (creation unit), a data registration unit 15a, and a mail transfer unit 17 ( A transfer unit) and a rule transmission unit 18 (transmission unit).
The storage unit 20a includes a rule DB 21a and a mail DB 22a.

メール受信部１１ａは、第１実施形態のメール受信部１１に相当する。メール受信部１１ａは、通信部３０ａを介して、いずれかの携帯端末１ａのユーザのメールアドレスに宛てたメールを受信し、メールＤＢ２２ａに記憶する。 The mail receiving unit 11a corresponds to the mail receiving unit 11 of the first embodiment. The mail receiving unit 11a receives a mail addressed to the user's mail address of one of the mobile terminals 1a via the communication unit 30a, and stores it in the mail DB 22a.

特徴ベクトル取得部１２ａは、第１実施形態の特徴ベクトル取得部１２に相当し、メール受信部１１ａにより受信されたメールのヘッダ情報に基づいて、第１実施形態と同様に、メールの特徴を示す特徴ベクトルを取得する。 The feature vector acquisition unit 12a corresponds to the feature vector acquisition unit 12 of the first embodiment, and shows the feature of the mail, as in the first embodiment, based on the header information of the mail received by the mail reception unit 11a. Get a feature vector.

ルール作成部１４ａは、第１実施形態のルール作成部１４に相当し、受信したメールが正常メールであるかスパムメールであるかの分類情報を受け付けた場合に、この分類情報及び対応する特徴ベクトルを学習データとして、第１実施形態と同様に、メールが正常メールであるかスパムメールであるかを分類するための分類ルールを作成して、ルールＤＢ２１ａに記憶する。 The rule creation unit 14a corresponds to the rule creation unit 14 of the first embodiment, and when receiving the classification information indicating whether the received mail is a normal mail or a spam mail, this classification information and the corresponding feature vector As a learning data, a classification rule for classifying whether a mail is a normal mail or a spam mail is created and stored in the rule DB 21a as in the first embodiment.

ここで、ルール作成部１４ａは、メールが受信されてから所定期間において、このメールの本文が宛先である携帯端末１ａへ転送された場合に、このメールが正常メールであるとする分類情報をメールＤＢ２２ａから受け付ける。また、ルール作成部１４ａは、このメールの本文が携帯端末１ａへ転送されなかった場合に、このメールがスパムメールであるとする分類情報をメールＤＢ２２ａから受け付ける。 Here, the rule creating unit 14a sends the classification information indicating that this mail is a normal mail when the body of the mail is transferred to the destination mobile terminal 1a within a predetermined period after the mail is received. Accept from DB22a. Further, the rule creating unit 14a accepts, from the mail DB 22a, classification information indicating that this mail is spam mail when the text of this mail is not transferred to the mobile terminal 1a.

データ登録部１５ａは、特徴ベクトル取得部１２ａにより取得された特徴ベクトル、及びメール転送部１７の処理状況に応じた分類情報（Ｐｓ又はＰｓに重み付けをした値）をメールに対応付けて格納する。 The data registration unit 15a stores the feature vector acquired by the feature vector acquisition unit 12a and the classification information corresponding to the processing status of the mail transfer unit 17 (Ps or a value weighted to Ps) in association with the mail.

メール転送部１７は、メール受信部１１ａにより受信されてメールＤＢ２２に格納されているメールのヘッダ情報を携帯端末１ａへ転送し、かつ、この携帯端末１ａからの要求に応じて、メールの本文を携帯端末１ａへ転送する。また、この転送結果は、データ登録部１５に通知され、転送の有無に応じた分類情報がメールＤＢ２２に記憶される。 The mail transfer unit 17 transfers the header information of the mail received by the mail receiving unit 11a and stored in the mail DB 22 to the mobile terminal 1a, and in response to a request from the mobile terminal 1a, the text of the mail is transferred. Transfer to the portable terminal 1a. Further, the transfer result is notified to the data registration unit 15, and classification information corresponding to the presence or absence of transfer is stored in the mail DB 22.

ルール送信部１８は、ルール作成部１４ａにより作成された分類ルールを、所定のタイミングで携帯端末１ａへ送信する。所定のタイミングとは、携帯端末１ａからの要求を受けたタイミングであってよく、携帯端末１ａは、定期的に管理サーバ２から分類ルールをダウンロードして更新する。 The rule transmission unit 18 transmits the classification rule created by the rule creation unit 14a to the mobile terminal 1a at a predetermined timing. The predetermined timing may be a timing at which a request from the mobile terminal 1a is received, and the mobile terminal 1a periodically downloads and updates the classification rule from the management server 2.

なお、携帯端末１ａは、受信したメールをダウンロードした分類ルールにより分類するために、メールの特徴ベクトルを取得する第１実施形態の特徴ベクトル取得部１２を備えていることとしてよい。あるいは、上記のメール転送部１７が特徴ベクトル取得部１２ａにより取得された特徴ベクトルを、携帯端末１ａへ、メールのヘッダ情報と共に転送してもよい。 Note that the mobile terminal 1a may include the feature vector acquisition unit 12 of the first embodiment that acquires the feature vector of the email in order to classify the received email according to the downloaded classification rule. Alternatively, the mail transfer unit 17 may transfer the feature vector acquired by the feature vector acquisition unit 12a to the mobile terminal 1a together with the header information of the mail.

また、管理サーバ２は、第１実施形態の分類部１３を備えていてもよい。この場合、データ登録部１５ａは、分類部１３により分類された結果と、メール転送部１７の転送結果に応じた分類情報とが異なる場合、この変更情報をルール作成部１４ａへ提供する。ルール作成部１４ａは、この変更情報に基づいて、第１実施形態と同様に、ルールＤＢ２１ａの分類ルールを更新する。 Further, the management server 2 may include the classification unit 13 of the first embodiment. In this case, when the result classified by the classification unit 13 and the classification information according to the transfer result of the mail transfer unit 17 are different, the data registration unit 15a provides this change information to the rule creation unit 14a. Based on this change information, the rule creation unit 14a updates the classification rules in the rule DB 21a, as in the first embodiment.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、本実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. Further, the effects described in the present embodiment are merely a list of the most preferable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the present embodiment.

前述の変数値Ｐｓは、スパムメールである可能性を示すと共に、受信したメールの重要度を表す指標としても利用可能である。すなわち、携帯端末１（又は１ａ）は、Ｐｓが小さい（０に近い）ほど、受信したメールが重要又は緊急であると判断し、格納フォルダを分類し、又はフラグを付与し、ユーザに提示してもよい。 The variable value Ps described above indicates the possibility of being spam mail and can also be used as an index representing the importance of received mail. That is, the mobile terminal 1 (or 1a) determines that the received mail is more important or urgent as Ps is smaller (closer to 0), classifies the storage folder, or gives a flag to the user. May be.

また、前述の実施形態では、本発明を携帯端末１（又は１ａ）が受信するメールに適用した場合を説明したが、電子メール分類装置又は電子メール管理サーバは、これには限られず、本発明は、インターネットにおけるメールを送受信するＰＣ等その他の通信端末、及びこのメールの送受信を管理するサーバにも適用可能である。 In the above-described embodiment, the case where the present invention is applied to mail received by the mobile terminal 1 (or 1a) has been described. However, the electronic mail classification apparatus or the electronic mail management server is not limited to this, and the present invention is not limited thereto. Can also be applied to other communication terminals such as PCs for sending and receiving mail on the Internet and servers for managing the sending and receiving of this mail.

また、前述の実施形態では、特徴ベクトル取得部１２（又は１２ａ）は、ヘッダ情報に含まれる項目に基づく分類が同一である過去に受信した電子メールのうちスパムメールの割合を示すデータを要素として、特徴ベクトルを取得した。これらの要素となる統計値は、予め分類ごとに算出され、分類ルールと共に記憶されていてもよい。このことによれば、携帯端末１（又は１ａ）において新たに受信したメールを分類する際に、特徴ベクトルの取得処理の負荷が低減される。 In the above-described embodiment, the feature vector acquisition unit 12 (or 12a) uses, as an element, data indicating the ratio of spam mails among previously received emails having the same classification based on the items included in the header information. The feature vector was acquired. The statistical values serving as these elements may be calculated in advance for each classification and stored together with the classification rules. According to this, when classifying newly received mail in the mobile terminal 1 (or 1a), the load of the feature vector acquisition process is reduced.

１携帯端末（電子メール分類装置）
１ａ携帯端末
２管理サーバ（電子メール管理サーバ）
１０、１０ａ制御部
１１、１１ａメール受信部（受信部）
１２、１２ａ特徴ベクトル取得部（取得部）
１３分類部
１４、１４ａルール作成部（作成部）
１５、１５ａデータ登録部
１６通知部
１７メール転送部（転送部）
１８ルール送信部（送信部）
２０、２０ａ記憶部
２１、２１ａルールＤＢ
２２、２２ａメールＤＢ
３０、３０ａ通信部
４０、４０ａ入力部
５０、５０ａ表示部 1 Mobile terminal (E-mail classification device)
1a mobile terminal 2 management server (e-mail management server)
10, 10a Control unit 11, 11a Mail receiving unit (receiving unit)
12, 12a Feature vector acquisition unit (acquisition unit)
13 Classification unit 14, 14a Rule creation unit (creation unit)
15, 15a Data registration unit 16 Notification unit 17 Mail transfer unit (transfer unit)
18 Rule transmitter (transmitter)
20, 20a Storage unit 21, 21a Rule DB
22, 22a Mail DB
30, 30a Communication unit 40, 40a Input unit 50, 50a Display unit

Claims

A receiver for receiving e-mail;
Based on the header information of the email received by the receiving unit, an acquisition unit that acquires a feature vector indicating the feature of the email;
Whether the e-mail is a normal e-mail or a junk e-mail when the classification information indicating whether the e-mail is a normal e-mail or a junk e-mail is received, using the classification information and the corresponding feature vector as learning data A creation unit for creating a classification rule for classifying
When a new e-mail is received by the receiving unit, referring to the feature vector acquired by the acquiring unit from the e-mail, the e-mail is normal based on the classification rule generated by the generating unit A classification unit that classifies whether the message is email or spam,
The acquisition unit is a nuisance among previously received e-mails that have the same classification based on any of the items of the sender name, sender e-mail address, reply-to e-mail address, or subject included in the header information. An e-mail classification device that acquires the feature vector based on at least one of data indicating a mail ratio and data indicating whether or not the same item has received the same e-mail in the past.

The acquisition unit, as the feature vector,
(A) data indicating the ratio of junk mail out of previously received e-mails having the same top-level domain of the sender's e-mail address;
(B) data indicating the ratio of junk e-mail among e-mails received in the past in which the top-level domain of the reply e-mail address is the same;
(C) data indicating the ratio of junk e-mail among previously received e-mails in which at least a part of the account part of the sender's e-mail address is the same;
(D) data indicating the ratio of spam mails among previously received e-mails in which at least a part of the account part of the reply mail address is the same;
(E) data indicating the ratio of junk e-mail among e-mails received in the past in which the character string length of the account part in the sender's e-mail address is of the same classification;
(F) Data indicating the ratio of spam mail among previously received e-mails in which the character string length of the host part in the sender's mail address is of the same classification,
(G) data indicating the ratio of junk e-mail among e-mails received in the past in which the domain hierarchy depth in the host portion in the sender's e-mail address is the same classification;
(H) data indicating the ratio of junk mail among e-mails received in the past in which the character string length of the account part in the reply mail address is the same classification;
(I) data indicating the ratio of junk e-mail among e-mails received in the past in which the character string length of the host part in the reply e-mail address is the same classification;
(J) Data indicating the ratio of spam among previously received e-mails in which the depth of the domain hierarchy in the host part in the reply-to e-mail address is the same classification;
(K) data indicating the ratio of junk mail among e-mails received in the past in which the character string length of the sender's name is the same classification;
(L) Data indicating the ratio of junk e-mail among e-mails received in the past in which the character string length of the subject is the same classification;
(M) data indicating whether an email having the same sender name has been received in the past;
(N) data indicating whether or not an email having the same sender's email address has been received in the past;
(O) data indicating whether an e-mail having the same reply-to e-mail address has been received in the past;
(P) data indicating whether an email having the same subject has been received in the past;
The electronic mail classification apparatus according to claim 1, wherein the feature vector having at least one of them as an element is acquired.

The creation unit determines a variable value indicating the possibility that the e-mail from which the feature vector is acquired is a spam mail corresponding to each of the feature vectors, and sets the variable value as normal as the classification rule. The electronic mail classification device according to claim 1 or 2, wherein a threshold value for classifying the mail into one corresponding to mail and one corresponding to spam mail is set.

The creating unit sets a first threshold value for classifying the variable value into one corresponding to normal mail, and a second threshold value for classifying the variable value into one corresponding to spam mail,
The electronic mail classification apparatus according to claim 3, wherein the classification unit classifies newly received electronic mail into normal mail, spam mail, and other reserved mail.

The said creation part adjusts the said variable value corresponding to the said email, or the said threshold value, when the input which changes the result of the said classification | category is received about the email classified by the said classification | category part. The electronic mail classification device according to claim 4.

The electronic mail classification apparatus according to claim 3, wherein the creation unit receives the classification information to which weighting according to the variable value is added as the learning data.

The electronic mail classification apparatus according to claim 1, wherein the creation unit re-creates the classification rule at a predetermined opportunity.

The said preparation part produces the said classification rule based on the email received by the said receiving part in the predetermined period until now, and does not refer to the email received before the said predetermined period. The electronic mail classification device according to any one of 7.

The e-mail classification device according to claim 1, further comprising a notification unit that notifies information on an e-mail classified as spam by the classification unit to a server that manages reception of the e-mail. .

An email management server that is connected to a terminal and manages emails addressed to the terminal,
A receiving unit for receiving the email;
Transferring the header information of the e-mail received by the receiving unit to the terminal, and in response to a request from the terminal, transferring a body of the e-mail to the terminal;
Based on the header information of the email received by the receiving unit, an acquisition unit that acquires a feature vector indicating the feature of the email;
Whether the e-mail is a normal e-mail or a junk e-mail when the classification information indicating whether the e-mail is a normal e-mail or a junk e-mail is received, using the classification information and the corresponding feature vector as learning data A creation unit for creating a classification rule for classifying
A transmission unit that transmits the classification rule created by the creation unit to the terminal at a predetermined timing;
The acquisition unit is a nuisance among previously received e-mails having the same classification based on any of the sender name, sender e-mail address, reply-to e-mail address, or subject included in the header information. An e-mail management server that acquires the feature vector based on at least one of data indicating a mail ratio and data indicating whether or not the same item has received the same e-mail in the past.

The creation unit receives the classification information indicating that the e-mail is a normal mail when the body of the e-mail is transferred to the terminal in a predetermined period after the e-mail is received; The e-mail management server according to claim 10, wherein the classification information that the e-mail is a junk e-mail is received when the text of the e-mail is not transferred to the terminal.

A receiving step for receiving e-mail;
Based on the header information of the email received in the receiving step, an acquisition step of acquiring a feature vector indicating the feature of the email;
Whether the e-mail is a normal e-mail or a junk e-mail when the classification information indicating whether the e-mail is a normal e-mail or a junk e-mail is received, using the classification information and the corresponding feature vector as learning data A creation step to create a classification rule to classify
When a new email is received in the receiving step, the feature vector acquired in the acquisition step is referred to from the email, and the email is normal based on the classification rule created in the creation step. A computer that performs a classification step to classify whether the message is email or spam,
In the acquisition step, annoying among previously received e-mails that have the same classification based on any of the items of sender name, sender e-mail address, reply e-mail address or subject included in the header information An e-mail classification method for acquiring the feature vector based on at least one of data indicating a mail ratio and data indicating whether or not any of the items has received the same e-mail in the past.

A receiving step for receiving e-mail;
Based on the header information of the email received in the receiving step, an acquisition step of acquiring a feature vector indicating the feature of the email;
Whether the e-mail is a normal e-mail or a junk e-mail when the classification information indicating whether the e-mail is a normal e-mail or a junk e-mail is received, using the classification information and the corresponding feature vector as learning data A creation step to create a classification rule to classify
When a new email is received in the receiving step, the feature vector acquired in the acquisition step is referred to from the email, and the email is normal based on the classification rule created in the creation step. A computer for performing a classification step of classifying whether the message is email or spam,
In the acquisition step, annoying among previously received e-mails that have the same classification based on any of the items of sender name, sender e-mail address, reply e-mail address or subject included in the header information An e-mail classification program for acquiring the feature vector based on at least one of data indicating a mail ratio and data indicating whether or not one of the items has received the same e-mail in the past.