JP2008519532A

JP2008519532A - Message profiling system and method

Info

Publication number: JP2008519532A
Application number: JP2007540073A
Authority: JP
Inventors: ポールジャッジ，; グルラジャン，; ドミトリアルペロヴィッチ，; マットモイヤー，
Original assignee: セキュアーコンピューティングコーポレイション
Priority date: 2004-11-05
Filing date: 2005-11-04
Publication date: 2008-06-05
Anticipated expiration: 2025-11-04
Also published as: JP4839318B2

Abstract

メッセージの分類またはメッセージセンダの特性に基づいて、通信がレシピエントに送達されるべきかどうかを決定するための通信を処理する１つ以上のデータプロセッサ上の動作のための方法およびシステム。本発明の一実施形態は、メッセージングエンティティに対する評判を指定するための１つ以上のデータプロセッサ上で動作する方法を提供し、上記方法は、メッセージングエンティティの通信に関連する１つ以上の特性を識別するデータを受信することと、上記受信された識別データに基づいて評判スコアを決定することとを含み、上記決定された評判スコアは、上記メッセージングエンティティの評判を指示し、上記決定された評判スコアは、上記メッセージングエンティティに関連する通信に対して、どの行動がとられるべきかを決定することに使用される。A method and system for operation on one or more data processors that processes a communication to determine whether the communication is to be delivered to a recipient based on message classification or message sender characteristics. One embodiment of the present invention provides a method operating on one or more data processors for specifying a reputation for a messaging entity, the method identifying one or more characteristics associated with communication of the messaging entity. And determining a reputation score based on the received identification data, wherein the determined reputation score indicates the reputation of the messaging entity and the determined reputation score Is used to determine what action should be taken for communications associated with the messaging entity.

Description

この文書は通信を処理するためのシステムおよび方法に広く関連し、特に通信をフィルタリングするためのシステムおよび方法に関連している。 This document relates broadly to systems and methods for handling communications, and in particular to systems and methods for filtering communications.

反スパム（ａｎｔｉ−ｓｐａｍ）産業においては、スパム送信者（ｓｐａｍｍｅｒ）は、スパムフィルタによる検出を回避するための種々の独創的な手段を使用する。利用可能な反スパムシステムは、フェイルオープン（ｆａｉｌ−ｏｐｅｎ）システムを含み、フェイルオープンシステムにおいて、全ての入力メッセージがスパムに対するフィルタをかけられる。しかしながら、これらのシステムは、正当またはスパムとして正しく分類されるメッセージにおいては、非効率および不正確であり得る。 In the anti-spam industry, spammers use a variety of original means to avoid detection by spam filters. Available anti-spam systems include fail-open systems, where all incoming messages are filtered for spam. However, these systems can be inefficient and inaccurate in messages that are correctly classified as legitimate or spam.

本明細書で開示される教示に従って、方法およびシステムが、メッセージングエンティティに評判を指定する１つ以上のデータプロセッサ上に動作を提供される。例えば、方法およびシステムは、メッセージングエンティティの通信に関連する１つ以上の特性を識別するデータを受信することと、受信された識別データに基づいて評判を決定することとを含み、決定された評判スコアは、メッセージングエンティティの評判を指示し、決定された評判スコアは、メッセージングエンティティに関連する通信に対してどの行動がとられるべきかを決定することに使用される。 In accordance with the teachings disclosed herein, methods and systems are provided for operation on one or more data processors that specify a reputation for a messaging entity. For example, the method and system includes receiving data identifying one or more characteristics associated with communication of a messaging entity and determining a reputation based on the received identification data, the determined reputation The score indicates the reputation of the messaging entity, and the determined reputation score is used to determine what action should be taken for communications associated with the messaging entity.

別の例として、トランスミッションセンダの評判スコアを利用するトランスミッションフィルタリングを行うシステムおよび方法が、提供される。システムおよび方法は、センダからのトランスミッションについて少なくとも１つの特性を識別することと、トランスミッション特性を含む評判システムに対してリアルタイムの照会（ｑｕｅｒｙ）を行うことと、トランスミッションに関連する評判を表すスコアを受信することと、センダからのトランスミッションに、センダの評判のスコアの範囲に対応する行動を実行することとを含み得る。 As another example, systems and methods for performing transmission filtering that utilizes transmission sender reputation scores are provided. The system and method identify at least one characteristic for a transmission from a sender, perform a real-time query against a reputation system that includes the transmission characteristic, and receive a score that represents the reputation associated with the transmission And performing actions corresponding to a range of sender reputation scores in a transmission from the sender.

別の例として、トランスミッションのセンダの評判スコアを利用するトランスミッションのグループのフィルタリングを行うためのシステムおよび方法が提供される。例えば、システムおよび方法は、コンテンツの類似性またはトランスミッションセンダの挙動における類似性に基づいて複数のトランスミッションを共にグルーピングすることと、グルーピングにおける各トランスミッションについて少なくとも１つの特性を識別することと、評判システムに対して照会を行い、各センダの評判を表すスコアを受信することと、グループにおける評判が良いセンダおよび評判が良くないセンダのパーセンテージに基づいてトランスミッションのグループを分類することと、を含み得る。 As another example, a system and method for filtering a group of transmissions utilizing transmission sender reputation scores is provided. For example, the system and method can group multiple transmissions together based on content similarity or similarity in transmission sender behavior, identify at least one characteristic for each transmission in the grouping, and Querying and receiving a score representing the reputation of each sender and classifying the group of transmissions based on the percentage of senders with a good reputation and those with a bad reputation in the group.

別の例として、訓練可能なトランスミッションのセットにおいて、トランスミッションのセンダの評判スコアを利用するフィルタリングシステムの調整および訓練を行うためのシステムおよび方法が提供される。例えば、方法はセンダからのトランスミッションについて少なくとも１つの特性を識別することと、評判システムに対して照会を行い、センダの評判を表すスコアを受信することと、センダの評判スコアが分類される範囲に基づいて複数のカテゴリにトランスミッションを分類することと、フィルタリングシステムの最適化のために使用されるべき別のフィルタリングシステムのトレーナにトランスミッションおよびトランスミッションの分類カテゴリを受け渡すことと、を含み得る。 As another example, systems and methods are provided for tuning and training a filtering system that utilizes a transmission sender's reputation score in a trainable transmission set. For example, the method identifies at least one characteristic for a transmission from a sender, queries the reputation system, receives a score representing the sender's reputation, and falls within a range where the sender's reputation score is classified. Classifying the transmissions into a plurality of categories based on and passing the transmission and transmission classification categories to a trainer of another filtering system to be used for optimization of the filtering system.

別の例として、メッセージングエンティティからの通信を分類するために１つ以上のデータプロセッサ上で動作するシステムおよび方法が提供される。例えば、システムおよび方法は、メッセージングエンティティからの通信を受信することと、通信を分類するために複数のメッセージ分類手法を使用することと、メッセージプロファイルスコアを生成するためにメッセージ分類出力を組み合わせることとを含み得、メッセージプロファイルスコアは、メッセージングエンティティに関連する通信に対してどの行動がとられるべきかを決定することに使用される。 As another example, systems and methods are provided that operate on one or more data processors to classify communications from messaging entities. For example, the system and method receive a communication from a messaging entity, use a plurality of message classification techniques to classify the communication, and combine the message classification output to generate a message profile score. The message profile score is used to determine what action should be taken for communications associated with the messaging entity.

別の例として、このようなシステムおよび方法は、センダからのトランスミッションについて少なくとも１つの特性を識別することと、評判システムに対して照会を行い、センダの評判を表すスコアを受信することと、センダの評判スコアが分類される範囲に基づいて複数のカテゴリにトランスミッションを分類することと、フィルタリングシステムの最適化のために使用されるべき別のフィルタリングシステムのトレーナにトランスミッションおよびトランスミッションの分類カテゴリを受け渡すことと、を含み得る。 As another example, such a system and method includes identifying at least one characteristic for a transmission from a sender, querying the reputation system, receiving a score representing the sender's reputation, Classify transmissions into multiple categories based on the extent to which their reputation scores are classified, and pass transmission and transmission classification categories to another filtering system trainer to be used for filtering system optimization Can be included.

別の例として、このようなシステムおよび方法は、メッセージングエンティティからの通信を受信することと、通信を分類するために複数のメッセージ分類手法を使用することと、メッセージプロファイルスコアを生成するためにメッセージ分類出力を組み合わせることとを含み得、メッセージプロファイルスコアは、メッセージングエンティティに関連する通信に対してどの行動がとられるべきかを決定することに使用される。 As another example, such systems and methods can receive communications from a messaging entity, use multiple message classification techniques to classify communications, and generate messages to generate message profile scores. Combining the classification outputs, the message profile score is used to determine what action should be taken for communications associated with the messaging entity.

本明細書で開示される教示に従って、方法およびシステムは、メッセージングエンティティからの通信を分類する１つ以上のデータプロセッサ上に動作を提供される。例えば、システムおよび方法は、複数のメッセージ分類手法を含み得、手法は、メッセージングエンティティから受信される通信を分類するように構成される。システムおよび方法は、メッセージプロファイルスコアを生成するためにメッセージ分類出力を組み合わせるように構成されるメッセージプロファイリング論理をさらに含み得、メッセージプロファイルスコアは、メッセージングエンティティに関連する通信に対してどの行動がとられるべきかを決定することに使用される。 In accordance with the teachings disclosed herein, methods and systems are provided for operation on one or more data processors that classify communications from a messaging entity. For example, the systems and methods may include multiple message classification techniques, which are configured to classify communications received from a messaging entity. The system and method may further include message profiling logic configured to combine the message classification outputs to generate a message profile score, the message profile score taking action for communications associated with the messaging entity. Used to determine what to do.

別の例として、方法およびシステムは、メッセージングエンティティから送達された通信を受信することを含み得る。複数のメッセージ分類手法が通信を分類するために使用される。メッセージ分類手法は信頼値に関連し、信頼値はメッセージ分類手法からメッセージ分類出力を生成することに使用される。メッセージ分類出力は、メッセージプロファイルスコアを生成するために組み合わせられる。メッセージプロファイルスコアはメッセージングエンティティに関連する通信に対してどの行動がとられるべきかを決定することに使用される。 As another example, methods and systems may include receiving communications delivered from a messaging entity. Multiple message classification techniques are used to classify communications. A message classification technique is associated with a confidence value, and the confidence value is used to generate a message classification output from the message classification technique. The message classification outputs are combined to generate a message profile score. The message profile score is used to determine what action should be taken for communications associated with the messaging entity.

別の例として、システムおよび方法は、複数のメッセージ分類手法を利用し得、複数のメッセージ分類手法は、メッセージングエンティティから受信された通信を分類するように構成される。メッセージプロファイリング論理は、メッセージプロファイルスコアを生成するためにメッセージ分類出力を組み合わせるように構成され得る。メッセージプロファイルスコアは、メッセージングエンティティに関連する通信に対してどの行動がとられるべきかを決定することに使用される。 As another example, the systems and methods may utilize multiple message classification techniques that are configured to classify communications received from a messaging entity. Message profiling logic may be configured to combine the message classification outputs to generate a message profile score. The message profile score is used to determine what action should be taken for communications associated with the messaging entity.

別の例として、システムおよび方法は、１つ以上のメッセージ分類手法による使用のためのメッセージ分類パラメータの調整に使用され得る。複数の通信である、または複数のデータを表す、複数の入力データが受信される（例えば、入力論理または処理命令を介して）。チューナプログラムは、メッセージ分類手法に関連するメッセージ分類パラメータを調整するために使用される。通信はメッセージングエンティティから受信される。調整されたメッセージ分類パラメータは、通信を分類するために複数のメッセージ分類手法によって使用される。複数のメッセージ分類手法からのメッセージ分類出力は、メッセージプロファイルスコアを生成するために組み合わせられる。メッセージプロファイルスコアは、メッセージングエンティティに関連する通信に対してどの行動がとられるべきかを決定することに使用される。 As another example, the system and method may be used to adjust message classification parameters for use with one or more message classification techniques. A plurality of input data is received (eg, via input logic or processing instructions) that is a plurality of communications or that represents a plurality of data. The tuner program is used to adjust message classification parameters associated with message classification techniques. Communication is received from the messaging entity. The adjusted message classification parameters are used by multiple message classification techniques to classify communications. Message classification outputs from multiple message classification techniques are combined to generate a message profile score. The message profile score is used to determine what action should be taken for communications associated with the messaging entity.

（詳細な説明）
図１は、３０において、ネットワーク４０上で受信されるトランスミッションを扱うためのシステムを描いている。トランスミッションは多くの異なるタイプの通信（例えば、１つ以上のメッセージングエンティティ（ｍｅｓｓａｇｉｎｇｅｎｔｉｔｙ）５０から送られた電子メール（ｅ−ｍａｉｌ）メッセージ）であり得る。システム３０は、メッセージングエンティティ（例えば、メッセージングエンティティ５２）に対して分類を指定し、メッセージングエンティティに指定された分類に基づいて、メッセージングエンティティの通信に関して行動がとられる。 (Detailed explanation)
FIG. 1 depicts a system for handling transmissions received over a network 40 at 30. The transmission can be many different types of communications (eg, an e-mail message sent from one or more messaging entities 50). System 30 assigns a classification to a messaging entity (eg, messaging entity 52) and takes action with respect to the messaging entity's communication based on the classification specified for the messaging entity.

システム３０は、メッセージングエンティティ５０からの処理通信を支援するために、フィルタリングシステム６０、および評判システム（ｒｅｐｕｔａｔｉｏｎｓｙｓｔｅｍ）７０を使用する。フィルタリングシステム６０は、どんなフィルタリング行動（もしあるのならば）がメッセージングエンティティの通信上でなされるかの決定を支援するために、評判システム７０を使用する。例えば、通信は評判が良い供給源からであると決定され得、従って通信はフィルタされない。 The system 30 uses a filtering system 60 and a reputation system 70 to support processing communications from the messaging entity 50. Filtering system 60 uses reputation system 70 to assist in determining what filtering actions (if any) are made on the messaging entity's communication. For example, the communication can be determined to be from a reputable source, so the communication is not filtered.

フィルタリングシステム６０は、６２において、受信された通信に関連する１つ以上のメッセージ特性を識別し、評判システム７０に対して識別情報（ｉｄｅｎｔｉｆｉｃａｔｉｏｎｉｎｆｏｒｍａｔｉｏｎ）を提供する。評判システム７０は、識別されたメッセージ特性が特定の質を示す確率を計算することにより、評判を評価する。全体としての評判スコアは、計算された確率に基づいて決定され、フィルタリングシステム６０に提供される。 The filtering system 60 identifies at 62 one or more message characteristics associated with the received communication and provides identification information to the reputation system 70. The reputation system 70 evaluates the reputation by calculating the probability that the identified message characteristic indicates a particular quality. The overall reputation score is determined based on the calculated probability and provided to the filtering system 60.

フィルタリングシステム６０は、センダ（ｓｅｎｄｅｒ）の通信のためにどんな行動がとられるかを決定するために、６４において評判スコアを調査する（例えば、通信トランスミッションが、メッセージ受信システム８０内に位置される、通信の指定されたレシピエント（ｒｅｃｉｐｉｅｎｔ）に届けられるかどうか）。フィルタリングシステム６０は、通信が、評判システム７０によって提供されたスコアを付けられた評判の全体に、または一部に基づいて扱われると決定し得る。実例として、通信は、評判が良くない（ｎｏｎ−ｒｅｐｕｔａｂｌｅ）センダからであると決定され得、結果として通信はＳｐａｍとして扱われる（例えば、削除されたり、隔離（ｑｕａｒａｎｔｉｎｅ）されたり、など）。 Filtering system 60 examines a reputation score at 64 to determine what action is taken for sender communication (eg, a communication transmission is located in message receiving system 80, Whether it is delivered to the designated recipient of the communication). Filtering system 60 may determine that communications are handled based on the whole or part of the scored reputation provided by reputation system 70. Illustratively, the communication can be determined to be from a non-reputable sender, resulting in the communication being treated as a Spam (eg, deleted, quarantine, etc.).

評判システムは、フィルタリングシステムを支援するために、多くの異なる方法で構成され得る。例えば、評判システム７０は、当面の状況に依存して、フィルタリングシステム６０に対して外部、または内部に位置され得る。別の例として、図２は、このような、８２において示されているようなセンダのアイデンティティとしてのメッセージ特性識別情報に基づいて、評判スコアを計算するように構成される評判システム７０を描いている。他のメッセージ特性が、センダのアイデンティティの代わりに、またはセンダのアイデンティティに加えて使用され得ることが理解される。さらに、トランスミッションは、多くの異なるタイプのメッセージングエンティティからであり得る（例えば、ドメインネーム、ＩＰアドレス、電話番号、または個別の電子アドレス、組織を代表するユーザ名、コンピュータ、または電子メッセージを送信する個別のユーザ）。例えば、評判が良い、および評判が良くないという、生成された分類は、望まれないトランスミッション、または正当な通信を送信するためのＩＰアドレスの傾向に基づき得る。 Reputation systems can be configured in many different ways to support filtering systems. For example, reputation system 70 may be located external to or internal to filtering system 60 depending on the current situation. As another example, FIG. 2 depicts a reputation system 70 configured to calculate a reputation score based on such message characteristic identification information as the sender's identity as shown at 82. Yes. It will be appreciated that other message characteristics may be used in place of or in addition to the sender identity. Further, transmissions can be from many different types of messaging entities (eg, domain names, IP addresses, telephone numbers, or individual electronic addresses, user names that represent organizations, computers, or individual to send electronic messages) Users). For example, the generated classification of reputable and unreputable may be based on IP transmission trends for sending unwanted transmissions or legitimate communications.

システムの構成９０はまた、図２に示され、バイナリのテスト可能な判定基準９２のセットを識別することにより確立され得、判定基準９２は、良いセンダと悪いセンダとの間の強いディスクリミネータと思われる。Ｐ（ＮＲ｜Ｃ_ｉ）は、センダが質／判定基準Ｃ_ｉに従う場合には、上記センダは評判が良くないという確率として定義され得、Ｐ（Ｒ｜Ｃ_ｉ）は、センダが質／判定基準Ｃ_ｉに従う場合には、上記センダが評判である関数として定義され得る。 System configuration 90 is also shown in FIG. 2 and may be established by identifying a set of binary testable criteria 92, which is a strong discriminator between good senders and bad senders. I think that the. P (NR | C _i ) may be defined as the probability that the sender is not reputable if the sender follows the quality / criteria C _i , and P (R | C _i ) If the criterion C _i is followed, the sender can be defined as a reputable function.

質／判定基準Ｃ_ｉの各々に対し、周期的な（例えば、一日の、一週間の、一月の、など）サンプリング演習が、Ｐ（ＮＲ｜Ｃ_ｉ）の再計算をするために行われ得る。サンプリング演習は、質／判定基準Ｃ_ｉが真であることが既知のセンダＮのランダムサンプルセットＳを選択することを含み得る。サンプル中のセンダは、次いで以下のセットの内の１つにソートされる：評判が良い（Ｒ）、評判が良くない（ＮＲ）、または未知（Ｕ）。Ｎ_Ｒは、評判が良いセンダであるサンプルにおけるセンダ数であり、Ｎ_ＮＲは、評判が良くないセンダのセンダ数、などである。次いで、Ｐ（ＮＲ｜Ｃ_ｉ）およびＰ（Ｒ｜Ｃ_ｉ）は、式： For each quality / criterion C _i , a periodic (eg, daily, weekly, monthly, etc.) sampling exercise is performed to recalculate P (NR | C _i ). Can be broken. The sampling exercise may include selecting a random sample set S of sender N that is known to have a quality / criterion C _{i that} is true. The senders in the sample are then sorted into one of the following sets: reputable (R), not reputable (NR), or unknown (U). N _R is the number of the sender in the sample is reputable sender, N _NR is the sender number of the sender reputation is not good, and the like. P (NR | C _i ) and P (R | C _i ) can then be represented by the formula:

を用いて推定される。この目的において、Ｎ＝３０は、各々の質／判定基準Ｃ_ｉに対してＰ（ＮＲ｜Ｃ_ｉ）およびＰ（Ｒ｜Ｃ_ｉ）の正確な推定を達成するためには大きすぎるサンプルサイズであることが決定される。

Is used to estimate. For this purpose, N = 30 is a sample size that is too large to achieve an accurate estimate of P (NR | C _i ) and P (R | C _i ) for each quality / criteria C _i . It is determined that there is.

全ての判定基準に対し、Ｐ（ＮＲ｜Ｃ_ｉ）およびＰ（Ｒ｜Ｃ_ｉ）を計算した後に、算出された確率は、評判スペースにおける各センダの、評判が良くない確率の総計Ｐ_ＮＲ９４、および評判が良いセンダの確率の総計Ｐ_Ｒ９６を計算されるために使用される。これらの確率は式： After calculating P (NR | C _i ) and P (R | C _i ) for all criteria, the calculated probability is the sum of the unreputable probabilities P _NR 94 for each sender in the reputation space. , and used to reputation is calculated the total P _R 96 probability of a good sender. These probabilities are given by the formula:

を用いて計算され得る。実験においては、上記の式は広範囲の入力判定基準の組み合わせに対して非常に良い挙動を見せ、実際には、それらの挙動は、入力判定基準の「評判が良くない」および「評判が良い」挙動の条件付き確率の単純な（ｎａｉｖｅ）結合を正確に算出するための式の挙動に類似するように見える。

Can be calculated using In experiments, the above formulas behave very well for a wide range of input criteria combinations, and in fact, these behaviors are “not-reputable” and “reputable” input criteria. It appears to be similar to the behavior of the formula for accurately calculating the naive combination of conditional probabilities of behavior.

各センダに対して、Ｐ_ＮＲおよびＰ_Ｒを計算した後に、評判スコアは、そのセンダに対して以下の評判関数： For each sender, after calculating the P _NR and P _R, the reputation score is below the reputation function, on the sender:

を用いて計算される。異なる関数が、評判スコアのデタミネータ９８として振舞い、関数の表現に加えて、多くの異なる形式で表現され得ることが理解される。実例として、図３は、１００において評判スコアを決定するための表形式を描いている。表は、Ｐ_ＮＲおよびＰ_Ｒに基づいて、それらが０．０〜１．０の間で変動する場合に、上記の関数により生成される評判スコアを示している。例えば、１１０に示されているように、５３という評判スコアはＰ_ＮＲ＝０．９およびＰ_Ｒ＝０．２の組み合わせにおいて取得される。この評判スコアは、センダが評判が良いと考慮されない比較的高い指標である。０という評判スコアは、Ｐ_ＮＲおよびＰ_Ｒが同一である場合に取得される（例えば、１２０において示されるように、Ｐ_ＮＲ＝０．７およびＰ_Ｒ＝０．７の場合に、評判スコアが０になる）。評判スコアは、Ｐ_ＲがＰ_ＮＲよりも大きい場合に決定される、センダが比較的評判が良いことを指示するための負の値を有し得る。例えば、１３０に示されるように、Ｐ_ＮＲ＝０．５およびＰ_Ｒ＝０．８の場合には、評判スコアは−１２である。

Is calculated using It is understood that different functions behave as reputation score determinators 98 and can be expressed in many different forms in addition to the function representation. Illustratively, FIG. 3 depicts a tabular format for determining a reputation score at 100. _Table, based on the _{P NR} and _{P R,} if they vary between 0.0 and 1.0 shows a reputation score generated by the above function. For example, as shown at 110, a reputation score of 53 is obtained in a combination of P _NR = 0.9 and P _R = 0.2. This reputation score is a relatively high indicator that a sender is not considered to have a good reputation. Reputation score of 0 _is obtained when _{P NR} and _{P R} are the same (e.g., as indicated at _120, in the case of P NR = 0.7 and _P R = 0.7, the reputation score 0). Reputation score, P _R is determined is larger than P _NR, it may have a negative value to indicate that the sender is relatively reputable. For example, as shown at 130, if P _NR = 0.5 and P _R = 0.8, the reputation score is -12.

評判スコアは図４の１５０に描かれるように、図式的に示され得る。グラフ１５０は、Ｐ_ＮＲおよびＰ_Ｒの値に基づいて、上記の関数より生成された。図４は、項Ｐ_ＮＲ、およびＰ_Ｒが、各々０．０〜１．０の間で変動する確率として、各々ノンスパム性（ｈａｍｍｉｎｅｓｓ）の確率、およびスパム性の（ｓｐａｍｍｉｎｅｓｓ）の確率として使用されるという点で、Ｓｐａｍのコンテキストにおける評判スコアの決定を図示している。 The reputation score may be shown graphically as depicted at 150 in FIG. Graph 150 _is based on the value of _{P NR} and _{P R,} was generated from the above functions. Figure 4 is a section _{P NR,} and _{P R} are as the probability that varies between each of 0.0 to 1.0, is used as the probability of each probability Nonsupamu property (hamminess), and spam of (spamminess) In that respect, it illustrates the determination of reputation scores in the context of Spam.

これらの例において示されるように、評判スコアは、通信の特性（例えば、メッセージングエンティティ特性）、および／またはメッセージングエンティティの挙動に基づいてメッセージングエンティティを指定される数値の評判（ｎｕｍｅｒｉｃｒｅｐｕｔａｔｉｏｎ）であり得る。数値の評判は、評判が良いという分類の連続スペクトルと、評判が良くないという分類の連続スペクトルとの間で変動（ｆｌｕｃｔｕａｔｅ）し得る。しかしながら、評判は、例えば、テキストのカテゴリ、または複数のレベルのテキストのカテゴリによって、非数値のもの（ｎｏｎ−ｎｕｍｅｒｉｃ）であり得る。 As shown in these examples, a reputation score may be a numeric reputation that specifies a messaging entity based on communication characteristics (eg, messaging entity characteristics) and / or behavior of the messaging entity. . The numerical reputation may fluctuate between a continuous spectrum with a good reputation classification and a continuous spectrum with a poor reputation classification. However, the reputation can be non-numerical, for example, depending on the category of text or categories of text at multiple levels.

図５は、動作シナリオを描いており、評判システムは、評判スコアを生成するためにフィルタリングシステムにより使用される。この動作シナリオにおいては、評判スコアは、入力データのセットから、特定のセンダ（例えば、ＩＰアドレス、ドメインネーム、電話番号、住所など）において算出される。図５を参照すると、データは、センダにおける、評判が良くない確率、および評判が良い確率を計算するために必要なステップ２００において収集される。データは、次いで、ステップ２１０において統合され、ステップ２２０において確率の計算に使用される。これは、多種の選択された判定基準において、センダに対する評判が良くない確率、および評判が良い確率を決定することを含む。評判が良くない確率の総計、および評判が良い確率の総計は、次いで各センダに対して計算される。 FIG. 5 depicts an operational scenario, where the reputation system is used by the filtering system to generate a reputation score. In this operational scenario, a reputation score is calculated at a particular sender (eg, IP address, domain name, telephone number, address, etc.) from a set of input data. Referring to FIG. 5, data is collected in the steps 200 necessary to calculate the unreputable and reputable probabilities at the sender. The data is then integrated at step 210 and used to calculate the probability at step 220. This includes determining the probability of not having a good reputation for the sender and the probability of having a good reputation in the various selected criteria. The total probability of not having a good reputation and the total probability of having a good reputation are then calculated for each sender.

各センダに対し、評判が良くない確率の総計、および評判が良い確率の総計を計算した後に、評判スコアは、評判関数を用いるそのセンダに対し、２３０で計算される。ステップ２４０において、センダの評判スコアは、センダに関連する通信を評価するために、ローカルに、および／または１つ以上のシステムに分配される。実例として、評判スコアは、フィルタリングシステムに分配され得る。評判スコアによって、フィルタリングシステムは、センダの評判スコアが分類される範囲に基づいて、トランスミッション上に作用するように選ばれ得る。評判が悪い（ｕｎｒｅｐｕｔａｂｌｅ）センダに対しては、フィルタリングシステムは、トランスミッションをドロップすることを選び得（例えば、静かに）、それが隔離領域に保存することを選び得、または疑わしいとしてトランスミッションにフラグを立てることを選び得る。さらに、フィルタシステムは、特定の期間におけるこのセンダからの全ての将来のトランスミッションに、評判システムに作成させるために新たなルックアップ照会（ｌｏｏｋｕｐｑｕｅｒｙ）を必要とすることなく、このような行動を適用するために選ばれ得る。評判が良いセンダに対し、フィルタリングシステムは、トランスミッションが、フィルタリングシステムにおける、有意な、処理のオーバヘッド、ネットワークのオーバヘッド、または記憶のオーバヘッドを引き起こす、全ての、またはあるフィルタリング手法をバイパスさせるために、トランスミッションに、行動を同様に適用する。 After calculating the sum of the probable probabilities for each sender and the sum of the probable probabilities, a reputation score is calculated at 230 for that sender using the reputation function. In step 240, the sender's reputation score is distributed locally and / or to one or more systems to evaluate communications associated with the sender. Illustratively, the reputation score can be distributed to a filtering system. Depending on the reputation score, the filtering system may be chosen to act on the transmission based on the extent to which the sender's reputation score is classified. For unreputable senders, the filtering system may choose to drop the transmission (eg, quietly), choose to store it in an isolated area, or flag the transmission as suspicious. You can choose to stand. In addition, the filter system applies such behavior without requiring a new look-up query to make the reputation system create all future transmissions from this sender for a specific period of time. Can be chosen to do. For reputable senders, filtering systems allow transmissions to bypass all or certain filtering techniques that cause significant processing overhead, network overhead, or storage overhead in the filtering system. And apply the behavior as well.

本明細書で記載される他の処理フローと同様に、処理および処理の順序は変えられ得、変更され得および／または増大され得るが、それでもやはり望ましい成果を達成し得ることが理解される。例えば、トランスミッションのセンダについての固有の識別情報を抽出するステップへの随意的な追加は、トランスミッションのある部分（例えば、メッセージのヘッダにおける、送信したと称するドメインネーム（ｐｕｒｐｏｒｔｅｄｓｅｎｄｉｎｇｄｏｍａｉｎｎａｍｅ））を、センダについての偽りでない（ｕｎｆｏｒｇｅａｂｌｅ）情報（例えば、トランスミッションの発信元であるＩＰアドレス）に認証するためのセンダ認証（ｓｅｎｄｅｒａｕｔｈｅｎｔｉｃａｔｉｏｎ）手法を用いることであり得る。このプロセスは、フィルタリングシステムが、おそらく偽られており、認証されていない情報（例えば、ドメインネーム、または電子メールアドレス）に照会することにより、評判システム上のルックアップを行うことを可能にし得る。このようなドメイン、またはアドレスが肯定的な評判を有している場合には、トランスミッションは、全ての、またはいくつかのフィルタリング手法をバイパスすることによりレシピエントのシステムに直接送達され得る。このようなドメイン、またはアドレスが否定的な評判を有している場合には、フィルタリングシステムは、トランスミッションをドロップすることを選び得、それを隔離領域に保存することを選び得、または疑わしいとしてフラグを立てることを選び得る。 As with other process flows described herein, it is understood that the processes and order of processes can be changed, changed and / or augmented, yet still achieve desirable results. For example, an optional addition to the step of extracting unique identifying information about the sender of the transmission may be a certain part of the transmission (e.g., the domain name that is said to be sent in the header of the message). It may be to use a sender authentication technique for authenticating to unforgeable information about the sender (e.g., the IP address from which the transmission originated). This process may allow the filtering system to perform a lookup on the reputation system by querying information (eg, domain name, or email address) that is probably fake and not authenticated. If such a domain, or address, has a positive reputation, the transmission can be delivered directly to the recipient's system by bypassing all or some filtering techniques. If such a domain, or address, has a negative reputation, the filtering system may choose to drop the transmission, choose to store it in the quarantine area, or flag as suspicious You can choose to stand up.

多くの異なるタイプのセンダ認証手法が使用され得る（例えば、センダポリシーフレームワーク（ＳｅｎｄｅｒＰｏｌｉｃｙＦｒａｍｅｗｏｒｋ（ＳＰＦ））手法）。ＳＰＦはプロトコルであり、このプロトコルによって、ドメインの所有者は、どのＩＰアドレスが、既知のドメインに代わってメールを送信することを許可されているかを指示するＤＮＳレコードを公開する。他の限定されない例として、ＳｅｎｄｅｒＩＤ、またはＤｏｍａｉｎＫｅｙｓがセンダ認証手法として使用され得る。 Many different types of sender authentication techniques can be used (eg, the Sender Policy Framework (SPF) technique). SPF is a protocol that allows domain owners to publish DNS records that indicate which IP addresses are allowed to send mail on behalf of known domains. As another non-limiting example, SenderID, or DomainKeys can be used as a sender authentication technique.

別の例として、多くの異なるタイプの判定基準が、センダの通信の処理において使用され得る。図６は、評判スコアの決定における使用において、評判が良くない判定基準３００、および評判が良い判定基準３１０の使用を描いている。 As another example, many different types of criteria may be used in the processing of sender communications. FIG. 6 depicts the use of a bad reputation criterion 300 and a good reputation criterion 310 in use in determining a reputation score.

評判が良くない判定基準３００、および評判が良い判定基準３１０は、評判が良くないセンダと、評判が良いセンダとを区別するために役立つ。判定基準のセットは、このスコアをつける手法を用いて生成された評判スコアに有意に影響することなく、しばしば変化し得る。ＳＰＡＭ識別のコンテキスト内の実例として、以下はメッセージのセンダの評判スコアをつけることに使用され得るスパム性判定基準のリストである。リストは網羅的であることを意図しておらず、観測された挙動に基づいて、他の判定基準を含むように、または判定基準を除去するように適合され得る。
１．平均スパムスコア（ＭｅａｎＳｐａｍＳｃｏｒｅ）：センダが送信するトランスミッションの平均スパムプロファイラ（ｐｒｏｆｉｌｅｒ）スコアが、あるしきい値Ｗを超える場合には、センダは「評判が良くない」と宣言される。
２．ＲＤＮＳルックアップフェイラ（ＲＤＮＳＬｏｏｋｕｐＦａｉｌｕｒｅ）：リバース（ｒｅｖｅｒｓｅ）ドメインネームシステム（ＲＤＮＳ）が、センダのＩＰアドレスのフェイル（ｆａｉｌ）に照会する場合には、センダは「評判が良くない」と宣言される。
３．ＲＢＬメンバシップ（ＲＢＬＭｅｍｂｅｒｓｈｉｐ）：センダが、リアルタイムブラックホールリスト（ｒｅａｌ−ｔｉｍｅｂｌａｃｋｈｏｌｅｌｉｓｔ）（ＲＢＬ）に含まれる場合には、センダは「評判が良くない」と宣言される。（注意：複数のＲＢＬが使用され得る。ＲＢＬの各々は別個のテストの判定基準を構成し得る。）
４．メール量（ＭａｉｌＶｏｌｕｍｅ）：センダの平均の（平均の、または中央値の）トランスミッションの量がしきい値Ｘを超える場合には、センダは「評判が良くない」と宣言される。ここで、Ｘは期間におけるトランスミッションにおいて測定される（例えば、一日、一週間、または一ヶ月）。（注意：複数の期間における複数の平均量が使用され得、各々の平均量は別個のテストの判定基準を構成し得る。）
５．メールバースティネス／送信履歴（ＭａｉｌＢｕｒｓｔｉｎｅｓｓ／ＳｅｎｄｉｎｇＨｉｓｔｏｒｙ）：センダの平均の（平均の、または中央値の）トランスミッションのトラフィックパターンのバースティネス（ｂｕｒｓｔｉｎｅｓｓ）（より大きな期間（例えば、一日の活発な送信時間数、または一ヶ月の活発な送信日数）内の活発な送信サブピリオドの数により定義される）が、あるしきい値Ｙよりも小さい場合には、センダは「評判が良くない」と宣言される。ここでＹは、期間ごとのサブピリオドにおいて測定される。（注意：複数の期間において測定された複数の平均バースティネスが使用され得、各々の平均バースティネスの測定は別個のテストの判定基準を構成し得る。）
６．メールブレドス（ＭａｉｌＢｒｅａｄｔｈ）：センダの平均の（平均の、または中央値の）トランスミッショントラフィックブレドス（ｂｒｅａｄｔｈ）（期間（例えば、一日、一週間、または一ヶ月）中に同一のセンダからのトランスミッションを受信するシステムのパーセンテージにより定義される）が、あるしきい値Ｚを超える場合には、センダは「評判が良くない」と宣言される。（注意：複数の期間における複数の平均ブレドスが使用され得、各々の平均ブレドス測定は、別個のテストの判定基準を構成し得る。）
７．マルウェアの活動（ＭａｌｗａｒｅＡｃｔｉｖｉｔｙ）：センダが、測定期間中に１つ以上のマルウェア（ｍａｌｗａｒｅ）コード（例えば、ウイルス、スパイウェア、侵入コード）を送達していることが知られている場合には、センダは「評判が良くない」と宣言される。
８．アドレスのタイプ（ＴｙｐｅｏｆＡｄｄｒｅｓｓ）：インターネットサービスプロバイダ（ＩＳＰ）によって、ダイヤルアップの、またはブロードバンドの動的ホストコントロールプロトコル（ＤＨＣＰ）クライアントに動的に指定されたものとして知られている場合には、センダは「評判が良くない」と宣言される。
９．ＣＩＤＲブロックのスパム性（ＣＩＤＲＢｌｏｃｋＳｐａｍｍｉｎｅｓｓ）：センダのＩＰアドレスが、主に「評判が良くない」ＩＰアドレスを包含するクラスレスドメイン間ルーティング（ＣＩＤＲ）ブロック内に存在することが知られている場合には、センダは「評判が良くない」と宣言される。
１０．人的フィードバック（ＨｕｍａｎＦｅｅｄｂａｃｋ）：センダが、コンテンツ、およびこれらのトランスミッションの他の特性を解析する人々により、所望されないトランスミッションが送信されることが報告される場合には、センダは「評判が良くない」と宣言される。
１１．スパムトラップフィードバック（ＳｐａｍＴｒａｐＦｅｅｄｂａｃｋ）：センダが、スパムトラップ（ｓｐａｍｔｒａｐ）として宣言され、任意の正当なトランスミッションを受信するように想定されていないものとして宣言されているアカウントにトランスミッションを送信する場合には、センダは「評判が良くない」と宣言される。
１２．バウンスバックフィードバック（ＢｏｕｎｃｅｂａｃｋＦｅｅｄｂａｃｋ）：センダが、バウンスバック（ｂｏｕｎｃｅｂａｃｋ）トランスミッションを、またはトランスミッションを、送り先の（ｄｅｓｔｉｎａｔｉｏｎ）システムには存在しないアカウントに送信する場合には、センダは「評判が良くない」と宣言される。
１３．法律制定／標準の適合（Ｌｅｇｉｓｌａｔｉｏｎ／ＳｔａｎｄａｒｄｓＣｏｎｆｏｒｍａｎｃｅ）：センダが、トランスミッションのセンダおよび／またはレシピエントのいずれかの動作する国において、トランスミッションの挙動の法律、規則、および確立された標準に従わない場合には、センダは「評判が良くない」と宣言される。
１４．動作の連続性（ＣｏｎｔｉｎｕｉｔｙｏｆＯｐｅｒａｔｉｏｎ）：センダが、あるしきい値Ｚよりも長く送信する位置において動作されない場合には、センダは「評判が良くない」と宣言される。
１５．レシピエントの需要に対する応答性（ＲｅｓｐｏｎｓｉｖｅｎｅｓｓｔｏＲｅｃｉｐｉｅｎｔＤｅｍａｎｄｓ）：センダが、センダからの任意のこれ以上のトランスミッションを受信しないように、センダとの関係を終結させるためのレシピエントの正当な需要に対して合理的な時間枠において応答しない場合には、センダは「評判が良くない」と宣言される。 The non-reputable criterion 300 and the reputable criterion 310 help to distinguish between a non-reputable sender and a reputable sender. The set of criteria can often change without significantly affecting the reputation score generated using this scoring technique. Illustratively within the context of SPAM identification, the following is a list of spammy criteria that can be used to score a sender's reputation score for a message. The list is not intended to be exhaustive and may be adapted to include other criteria or remove criteria based on observed behavior.
1. Mean Spam Score: If the average spam profiler score of a transmission sent by a sender exceeds a certain threshold W, the sender is declared “not well-received”.
2. RDNS Lookup Failer: When a reverse domain name system (RDNS) queries the sender's IP address fail, the sender is declared "not reputable" .
3. RBL Membership: If a sender is included in a real-time blackhole list (RBL), the sender is declared “not-reputable”. (Note: Multiple RBLs may be used. Each RBL may constitute a separate test criterion.)
4). Mail Volume: If the sender's average (average or median) transmission volume exceeds a threshold X, the sender is declared "not well-received". Here, X is measured at the transmission in the period (eg, one day, one week, or one month). (Note: Multiple average amounts over multiple time periods may be used, and each average amount may constitute a separate test criterion.)
5. Mail Burstyness / Sending History: Sender average (average or median) transmission traffic pattern burstiness (for a larger period of time (eg, daily active transmission) If the number of active transmission subperiods (in hours, or the number of active transmission days in a month) is less than a certain threshold Y, the sender declares it "not good" Is done. Here, Y is measured in a subperiod for each period. (Note: Multiple average burstiness measured over multiple periods may be used, and each average burstiness measurement may constitute a separate test criterion.)
6). Mail Blades: Sender's average (average or median) transmission traffic breadth (from the same sender during a period (eg, one day, one week, or one month) If the threshold (defined by the percentage of systems receiving the transmission) exceeds a certain threshold Z, the sender is declared "not well-received". (Note: Multiple average blades over multiple periods may be used, and each average blade measurement may constitute a separate test criterion.)
7). Malware Activity: If the sender is known to deliver one or more malware codes (eg, virus, spyware, intrusion code) during the measurement period, the sender Is declared "not popular".
8). Type of Address: If known to be dynamically specified by the Internet Service Provider (ISP) as a dial-up or broadband Dynamic Host Control Protocol (DHCP) client, The sender is declared "not well-received".
9. CIDR Block Spamminess: When the sender's IP address is known to reside in a Classless Inter-Domain Routing (CIDR) block that primarily contains "bad" IP addresses Sender is declared "not well-received".
10. Human Feedback: If a sender reports that an undesired transmission is being sent by people who analyze the content and other characteristics of these transmissions, the sender is “not well received. Is declared.
11. Spam Trap Feedback: When a sender sends a transmission to an account that is declared as a spam trap and is not supposed to receive any legitimate transmission, The sender is declared "not well-received".
12 Bounceback Feedback: If a sender sends a bounceback transmission, or if the transmission is sent to an account that does not exist in the destination system, the sender is said to be “unreputable”. Declared.
13. Legislation / Standards Conformance: When a sender does not comply with transmission behavior laws, rules, and established standards in either the transmission sender and / or recipient country of operation Sender is declared "not well-received".
14 Continuity of Operation: If a sender is not operated at a position that transmits longer than a certain threshold Z, the sender is declared "not well-received".
15. Responsiveness to Recipient Demands: For the legitimate demands of the recipient to terminate the relationship with the sender so that the sender does not receive any further transmission from the sender. If the sender does not respond in a reasonable time frame, the sender is declared “not well-received”.

以下は、センダの「評判の良さ」の決定に使用され得る、「評判が良い」判定基準のリストである。リストは網羅的であることを意図しておらず、観測された挙動に基づいて、他の判定基準を含むように、または判定基準を除去するように適合され得る。
１．平均スパムスコア（ＭｅａｎＳｐａｍＳｃｏｒｅ）：センダが送信するトランスミッションの平均スパムプロファイラスコアが、あるしきい値Ｗを下回る場合には、センダは「評判が良い」と宣言される。
２．人的フィードバック（ＨｕｍａｎＦｅｅｄｂａｃｋ）：センダが、それらの送信ステーションが所属する組織の評判に関連する、そのセンダからのトランスミッションフローを解析する人々によって正当なトランスミッションのみを送信されることが報告されている場合には、センダは「評判が良い」と宣言される。 The following is a list of “reputable” criteria that can be used to determine a sender ’s “reputable”. The list is not intended to be exhaustive and may be adapted to include other criteria or remove criteria based on observed behavior.
1. Mean Spam Score: A sender is declared “reputable” if the transmission's average spam profiler score is below a certain threshold W.
2. Human Feedback: Senders are reported to be sent only legitimate transmissions by people who analyze transmission flows from their senders related to the reputation of the organization to which those sending stations belong In some cases, the sender is declared “reputable”.

センダの世界において、各センダの評判の等級を計算した後に、評判の分類は、評判システムを利用する、照会するもの（ｑｕｅｒｉｅｒ）（例えばＤＮＳ、ＨＴＴＰなど）により解釈され得る通信プロトコルを経由して利用可能にされ得る。図７に示されているように、照会３５０がセンダに出されている場合には、評判システムは、センダのトランスミッションの受容性における最終的な判断を行うために、照会者により使用され得る任意の他の関連する付加的な情報だけでなく、センダの評判スコアをも含む戻り値（ｒｅｔｕｒｎｖａｌｕｅ）３６０に応答し得る（例えば、判断スコアの年齢、スコアを決定する入力データなど）。 In the sender's world, after calculating the reputation rating of each sender, the reputation classification is via a communication protocol that can be interpreted by a queryer (eg DNS, HTTP, etc.) that uses the reputation system. Can be made available. As shown in FIG. 7, if a query 350 has been issued to the sender, the reputation system can optionally be used by the queryer to make a final decision on the acceptability of the sender's transmission. May be responsive to a return value 360 that includes the sender's reputation score as well as other relevant additional information (eg, age of the decision score, input data determining the score, etc.).

使用され得る通信プロトコルの例は、ドメインネームシステム（ＤＮＳ）サーバであり、ドメインネームシステムサーバは、ＩＰアドレス（１７２．ｘ．ｙ．ｚ）の形式の戻り値に応答し得る。ＩＰアドレスは、式： An example of a communication protocol that may be used is a Domain Name System (DNS) server, which may respond to a return value in the form of an IP address (172.xyz). The IP address is the formula:

を用いてエンコードされ得る。

Can be encoded using

照会されたセンダの評判は、戻り値から以下のように：
ｒｅｐ＝（−１）^２−ｘ×（２５６ｙ＋ｚ）
解読され得る。 The sent sender's reputation is as follows from the return value:
rep = (-1) ^2-x * (256y + z)
Can be deciphered.

それゆえ、ｘ＝０の場合に、戻ってきた評判は正の数で、ｘ＝１の場合に、戻ってきた評判は負の数である。評判の絶対値はｙおよびｚの値より決定される。このエンコードするスキームはサーバが、ＤＮＳプロトコルを経由して、評判の値を［−６５５３５，６５５３５］の範囲で戻すことを可能にする。それはまた、７（７）を、使用しないビットのままにする（すなわちｘの７つ高位のビットである）。これらのビットは、評判システムの拡張のために保存され得る。（例えば、評判スコアの年齢は、もとの照会するものへ通信され得る。）
図８は、４３０において、ネットワーク４４０上で受信されるトランスミッションを扱うためのシステムを描いている。トランスミッションは、多くの異なるタイプの通信であり得る（例えば、１つ以上のメッセージングエンティティ４５０から送信された電子メール（ｅ−ｍａｉｌ）メッセージ）。システム４３０は、メッセージングエンティティ４５０からの通信を処理することを支援するためのフィルタリングシステム４６０を使用する。フィルタリングシステム４６０は、メッセージングエンティティ４５０からの通信に関連する特性を調査し、調査に基づいて、通信に関連する行動がとられる。例えば、通信は正当であると決定され得、従って、通信がフィルタリングシステム４６０によりフィルタされず、代わりに、意図されたレシピエントへの送達のための受信システム７０に提供される。 Therefore, when x = 0, the returned reputation is a positive number, and when x = 1, the returned reputation is a negative number. The absolute value of reputation is determined from the y and z values. This encoding scheme allows the server to return reputation values in the range [−65535, 65535] via the DNS protocol. It also leaves 7 (7) unused bits (ie, the 7 most significant bits of x). These bits can be saved for extension of the reputation system. (For example, the age of the reputation score can be communicated to the original queryer.)
FIG. 8 depicts a system for handling transmissions received over the network 440 at 430. A transmission may be many different types of communications (e.g., an e-mail message sent from one or more messaging entities 450). System 430 uses a filtering system 460 to assist in processing communications from messaging entity 450. Filtering system 460 examines characteristics associated with communications from messaging entity 450 and, based on the investigation, actions related to communications are taken. For example, the communication may be determined to be legitimate, so the communication is not filtered by the filtering system 460 and is instead provided to the receiving system 70 for delivery to the intended recipient.

メッセージの適切な分類の精度を増加させるために（例えば、スパムまたは正当であるとして）、フィルタリングシステム４６０は、図９に示されるようなメッセージプロファイラプログラム５００によって構成され得る。メッセージプロファイラ５００は、図９に示されているようにメッセージを分類するための、複数のメッセージ分類手法、またはフィルタ５１０を使用する。メッセージプロファイラ５００が使用され得る、例示的なメッセージ分類手法、またはフィルタ５１０は：
・リバースＤＮＳ（ＲｅｖｅｒｓｅＤＮＳ（ＲＤＮＳ））−分類手法であって、（１）ドメインがセンダのＩＰアドレスのＤＮＳシステム内に存在するかどうかと、（２）このようなドメインが存在する場合には、ドメインが、センダがメッセージを送信することを要求するドメインと適合するかどうかの、チェックをするために、メッセージのセンダのＩＰアドレスに基づいて、リバースドメインネームサービス（ＤＮＳ）のルックアップを行う、分類手法。
・リアルタイムブラックホールリスト（Ｒｅａｌ−ｔｉｍｅＢｌａｃｋ−ｈｏｌｅＬｉｓｔ（ＲＢＬ））−分類手法であって、ＩＰアドレスが、任意のＲＢＬｓに不必要なメッセージを送信しそうなＩＰアドレスとして識別されないかどうかをチェックするために、メッセージのセンダのＩＰアドレスに基づいて、１つ以上のリアルタイムブラックホールリスト（ＲＢＬ）の照会を行う、分類手法。
・評判サーバ（ＲｅｐｕｔａｔｉｏｎＳｅｒｖｅｒ）−分類手法であって、センダの評判を記述するスコアを受信するために、メッセージのセンダのＩＰアドレス、および／またはセンダのドメインネームおよび他のメッセージセンダの特性に基づいて、１つ以上の評判サーバの照会を行う、分類手法。
・サイン／指紋ベースの解析（Ｓｉｇｎａｔｕｒｅ／ｆｉｎｇｅｒｐｒｉｎｔｉｎｇ−ｂａｓｅｄＡｎａｌｙｓｉｓ）（例えば、ＳｔａｔｉｓｔｉｃａｌＬｏｏｋｕｐＳｅｒｖｉｃｅ（ＳＬＳ））−分類手法であって、メッセージのハッシュ（ｈａｓｈ）を計算し、算出されたメッセージのハッシュが、最近のメールフローにおいて、どのくらいの頻度で見られるかを決定するための、集中した統計的ルックアップサービス（ＳＬＳ）を照会する、分類手法。
・メッセージヘッダ解析による分類手法（ＭｅｓｓａｇｅＨｅａｄｅｒＡｎａｌｙｓｉｓＣｌａｓｓｉｆｉｃａｔｉｏｎＴｅｃｈｎｉｑｕｅ）−例として、この手法はＳｙｓｔｅｍＤｅｆｉｎｅｄＨｅａｄｅｒ解析（ＳＤＨＡ）、ＵｓｅｒＤｅｆｉｎｅｄＨｅａｄｅｒＡｎａｌｙｓｉｓ（ＵＤＨＡ）などを含み得る。
・システムに定義されるヘッダ解析（ＳｙｓｔｅｍＤｅｆｉｎｅｄＨｅａｄｅｒＡｎａｌｙｓｉｓ（ＳＤＨＡ））−分類手法のセットであって、メッセージを調査し、メッセージのヘッダが、おそらく不必要なメッセージのセンダを識別する傾向にある、特定のシステムに定義される特性を示すかどうかを識別する、セット。
・ユーザに定義されるヘッダ解析（ＵｓｅｒＤｅｆｉｎｅｄＨｅａｄｅｒＡｎａｌｙｓｉｓ（ＵＤＨＡ））−分類手法のセットであって、メッセージを調査し、メッセージのヘッダが、おそらく不必要なメッセージセンダを識別する傾向にある、あるシステムに定義される特性を示すかどうかを識別する、セット。
・センダ認証（ＳｅｎｄｅｒＡｕｔｈｅｎｔｉｃａｔｉｏｎ）−分類手法のセットであって、（１）センダの要求されるドメインが、そのドメインにメールを送信するように権限を与えられたメールサーバの記録を公開しているかどうかと、（２）このような記録が公開されている場合には、記録が、要求されるドメインに代わってメールを送信するためのセンダのＩＰアドレスに権限を与えるかどうかを決定するためにルックアップを行う、セット。一般的に使用されるＳｅｎｄｅｒＡｕｔｈｅｎｔｉｃａｔｉｏｎ手法の例は、センダポリシーフレームワーク（ＳＰＦ）およびＳｅｎｄｅｒＩＤを含む。
・ベイジアンフィルタリング（ＢａｙｅｓｉａｎＦｉｌｔｅｒｉｎｇ）−統計的な分類手法であって、メッセージにおけるテキストのトークン（ｔｏｋｅｎ）（単語）のセットに基づいて、メッセージが特定のカテゴリに分類される条件付き確率の結合の推定を算出する、手法。
・コンテンツフィルタリング（ＣｏｎｔｅｎｔＦｉｌｔｅｒｉｎｇ）−分類手法であって、あるメッセージのカテゴリに関連している単語でメッセージのコンテンツを検索する、手法。
・クラスタリング分類（ＣｌｕｓｔｅｒｉｎｇＣｌａｓｓｉｆｉｃａｔｉｏｎ）−特性の中の類似性の測定に基づく分類手法であって、通信は、望ましい、望ましくない（例えば、スパム）などとしてこのようなグループにクラスタされる。クラスタリングは、グループ内の類似性が高く、グループ間の類似性が低くなるように行われる。
リストは網羅的であることを意図されず、他の手法が発見された場合には他の手法を含むように適合され得る。リストの記載のいくつかは単一の手法を構成し、一方でその他のものは、多くの類似した、または密接に関連した手法の組み合わされたセットを構成する。複数の手法が共同で記述される場合には、メッセージプロファイラ５００は、各々の手法が、各々独自の信頼値を有することを認める。 In order to increase the accuracy of proper classification of messages (eg, as spam or legitimate), the filtering system 460 may be configured by a message profiler program 500 as shown in FIG. Message profiler 500 uses a plurality of message classification techniques, or filters 510, to classify messages as shown in FIG. An exemplary message classification technique or filter 510 in which the message profiler 500 may be used is:
Reverse DNS (Reverse DNS (RDNS))-a classification technique, where (1) the domain exists in the DNS system of the sender's IP address and (2) if such a domain exists Perform a reverse domain name service (DNS) lookup based on the sender IP address of the message to check if the domain is compatible with the domain that the sender requires to send the message. Classification method.
Real-time Black-hole List (RBL) —a classification technique that checks whether an IP address is not identified as an IP address that is likely to send unnecessary messages to any RBLs For this purpose, a classification technique that queries one or more real-time black hole lists (RBLs) based on the sender IP address of the message.
Reputation Server-a classification technique, based on the sender's IP address and / or the sender's domain name and other message sender characteristics to receive a score describing the sender's reputation A classification method that queries one or more reputation servers.
Signature / fingerprinting-based analysis (e.g., Statistical Lookup Service (SLS))-a classification technique that calculates a hash of a message, and the calculated hash of the message is A classification technique that queries a centralized statistical lookup service (SLS) to determine how often it is seen in recent mail flows.
• Message Header Analysis Classification Technique-As an example, this technique may include System Defined Header Analysis (SDHA), User Defined Header Analysis (UDHA), etc.
System Defined Header Analysis (SDHA)-a set of classification techniques that examine messages and the message headers tend to identify possibly unwanted senders of messages. A set that identifies whether to exhibit characteristics defined for a particular system.
User Defined Header Analysis (UDHA)-a set of classification techniques, in which messages are examined and the message headers tend to identify possibly unwanted message senders A set that identifies whether to show characteristics defined for the system.
• Sender Authentication-a set of classification methods, where (1) the sender's requested domain publishes a record of the mail server authorized to send mail to that domain And (2) if such a record is published, to determine whether the record authorizes the sender's IP address for sending mail on behalf of the requested domain A set that performs a lookup. Examples of commonly used Sender Authentication techniques include Sender Policy Framework (SPF) and SenderID.
Bayesian filtering—a statistical classification technique that estimates the combination of conditional probabilities that a message is classified into a particular category based on a set of text tokens (words) in the message. A method to calculate
Content Filtering—a classification technique that searches message content with words associated with a certain message category.
Clustering Classification—A classification technique based on measuring similarity among characteristics, where communications are clustered into such groups as desirable, undesirable (eg, spam), etc. Clustering is performed so that the similarity within a group is high and the similarity between groups is low.
The list is not intended to be exhaustive and can be adapted to include other approaches if other approaches are discovered. Some of the listings constitute a single approach, while others constitute a combined set of many similar or closely related approaches. When multiple techniques are described jointly, the message profiler 500 recognizes that each technique has its own confidence value.

メッセージプロファイラ５００は、しきい値ベースの手法を用いてメッセージを分類する。分類手法の各々５１０は、関連する信頼値５２０を有するメッセージプロファイラ５００により使用される。メッセージがプロファイリングに到達した場合には、メッセージプロファイラ５００は分類手法を介して繰り返し、各々の手法がメッセージを分類するように試みることを可能にする。各々の分類の結果は、［０，１］の範囲のデシマル値（ｄｅｃｉｍａｌｖａｌｕｅ）である。各々の分類手法を介して繰り返した後に、メッセージプロファイラ５００は以下の式： Message profiler 500 classifies messages using a threshold-based approach. Each of the classification techniques 510 is used by a message profiler 500 that has an associated confidence value 520. If the message reaches profiling, the message profiler 500 iterates through the classification techniques, allowing each technique to attempt to classify the message. The result of each classification is a decimal value in the range [0, 1]. After iterating through each classification technique, the message profiler 500 has the following formula:

を用いてメッセージにおけるスコアを算出する。ここで、ＳＶ_ｉは分類手法ｉに関連する信頼値、Ｃ_ｉは分類手法ｉにより生成された［０，１］における分類値である。

Is used to calculate the score in the message. Here, SV _i is a confidence value related to the classification method i, and C _i is a classification value in [0, 1] generated by the classification method i.

非線形のスコアリング関数による分類手法においては、以下の式が使用され得る： In a classification technique with a non-linear scoring function, the following equation can be used:

ここで、ＳＶ_１ｉおよびＳＶ_２ｉは、分類手法ｉに関連する信頼値であり、Ｃ_ｉは、分類手法ｉにより生成された［０，１］における分類値である。

Here, SV _1i and SV _2i are confidence values related to the classification method i, and C _i is a classification value in [0, 1] generated by the classification method i.

メッセージスコアが、５２０において決定された、ある特定のしきい値Ｔを超える場合には、次いでメッセージが第１の定義されたカテゴリに所属することを宣言される。メッセージスコアが、しきい値以下の場合には、反対のカテゴリに所属することを宣言される。システムは次いで、メッセージスコアにより到達したしきい値に基づく、適切な行動をとり得る（例えば、メッセージを隔離すること、メッセージをドロップすること（すなわち５３０において示されているように送達することなしにメッセージを消去すること）、ある特定の文字列（ｓｔｒｉｎｇ）（例えば、「ＳＵＳＰＥＣＴＥＤＳＰＡＭ」）を包含するようにメッセージの題（ｓｕｂｊｅｃｔ）を書き換えること、安全な送達のために、メッセージが暗号化エンジンを通ること、など）。システムはまた、複数のしきい値を特定すること、および各々のしきい値において異なる行動または異なる複数の行動を適用することを可能にし得、これらは分類の結果におけるメッセージプロファイラ５００の増加した信用を意味する。 If the message score exceeds a certain threshold T, determined at 520, then the message is declared to belong to the first defined category. If the message score is below the threshold, it is declared to belong to the opposite category. The system can then take appropriate action based on the threshold reached by the message score (e.g., quarantine the message, drop the message (i.e. without delivering as shown at 530) Message), rewriting the subject of the message to include a specific string (eg, “SUSPECTED SPAM”), the message is encrypted by the encryption engine for secure delivery And so on). The system may also allow multiple thresholds to be identified and different behavior or different behaviors applied at each threshold, which increases the message profiler 500's increased confidence in the classification results. Means.

メッセージプロファイラ５００の効果および精度は、いくつかの因子（例えば、分類手法５１０に関連するＳＶ_ｉ、またはＳＶ_１ｉ／ＳＶ_２ｉという信頼値５２０のセット）に依存している。調整可能なメッセージの分類構成は、値の最適なセットとともに、関連するしきい値および行動のセットを生成するために使用され得、それは、絶え間なく変化するメッセージフローパターン上で動作する分類手法のスコアの分布における頻繁に起こる変化に対して最新の保護を用いてアップデートされたメッセージプロファイラ５００を保持するために周期的に生成され得る。このように、メッセージプロファイラ構成は、ベクトル
（ＳＶ_１，ＳＶ_２，．．．，ＳＶ_Ｎ）
を含む（ベクトルは、全てのＮ個の分類手法の信頼値を表している）。図１０に示されているように、メッセージ分類チューナプログラム６００は、全ての起こりうるベクトルのベクトル空間を介して確率論的な検索を行うことにより、および予め選択されたしきい値において、プロファイラのフィルタリングの精度を最大にするベクトルを識別することによりメッセージプロファイラ５００を調整するように構成され得る。チューナ６００は、これを行うために異なるアプローチを用いる（例えば、発見的な（ｈｅｕｒｉｓｔｉｃ）アプローチ６１０を用いる）。 The effectiveness and accuracy of the message profiler 500 depends on several factors (eg, SV _i associated with the classification technique 510, or a set of confidence values 520 of SV _1i / SV _2i ). A tunable message classification scheme, along with an optimal set of values, can be used to generate a set of associated thresholds and actions, which can be used for classification techniques that operate on constantly changing message flow patterns. It can be generated periodically to keep the message profiler 500 updated with up-to-date protection against frequent changes in the distribution of scores. Thus, the message profiler configuration is a vector (SV ₁ , SV ₂ ,..., SV _N ).
(Vectors represent the confidence values of all N classification methods). As shown in FIG. 10, the message classification tuner program 600 performs a profiler search by performing a probabilistic search through a vector space of all possible vectors and at preselected thresholds. Message profiler 500 may be configured to tune by identifying a vector that maximizes the accuracy of filtering. Tuner 600 uses a different approach to do this (eg, uses a heuristic approach 610).

図１１は、ベクトル空間検索を行うための遺伝的アルゴリズム（ｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍ）として知られる発見的アプローチを用いるチューナを図示している。遺伝的アルゴリズムを裏打ちするコンセプトは、進化論に由来し、そのアルゴリズムにおいて遺伝型（染色体を通じて表現される）は、その表現型（生物学的生物体として表現される）を通じて各々と競合する。時間につれて、生物学的進化は、生物体が進化するための環境において生存することが可能な、高く順応される、複雑な生物体を生成する。同様に、遺伝的アルゴリズムは、問題に対する候補解からなるベクトル空間を介して検索し、ここで、各々の候補解はベクトルとして表現される。多くのシミュレートされた候補解の世代において、遺伝的アルゴリズムは、問題に対してますます良く適合される解に向かって次第に進化する。 FIG. 11 illustrates a tuner that uses a heuristic approach known as a genetic algorithm for performing a vector space search. The concept behind genetic algorithms derives from evolutionary theory, where genotypes (represented through chromosomes) compete with each other through their phenotypes (represented as biological organisms). Over time, biological evolution produces highly adapted, complex organisms that can survive in an environment for the organism to evolve. Similarly, the genetic algorithm searches through a vector space consisting of candidate solutions to the problem, where each candidate solution is represented as a vector. In many simulated candidate solution generations, genetic algorithms gradually evolve toward solutions that are increasingly well adapted to the problem.

時間につれて、問題に対する良好な解を進化するための遺伝的アルゴリズムの能力は、他の候補解に比較して候補解の相対的なフィットネスレベルを評価するための正確なメカニズムの存在に依存する。従って、遺伝的アルゴリズム６５０は、実際の問題のドメインにおいて候補解のフィットネスを正確にモデル化する、フィットネス関数６６０を用いて設計される。 Over time, the ability of a genetic algorithm to evolve a good solution to a problem depends on the existence of an accurate mechanism for assessing the relative fitness level of a candidate solution compared to other candidate solutions. Thus, the genetic algorithm 650 is designed with a fitness function 660 that accurately models the fitness of the candidate solution in the domain of the actual problem.

以下は、メッセージプロファイラ５００： The following is the message profiler 500:

の最適化のために使用され得るフィットネス関数６６０である。関数における項の定義は以下のようになる：
Ｎ_ＣＡＴ１＝第１のカテゴリに所属するデータセット全体からのメッセージベクトルの数
Ｎ_ＣＡＴ２＝第２のカテゴリに所属するデータセット全体からのメッセージベクトルの数
Ｃ＝第２のカテゴリからの誤った分類をされたメッセージのための定数乗数
Ｓ_{ＣＡＴ１＿ＭＩＫＳＴＡＫＥｉ}＝他のカテゴリに所属するように誤った分類をされた第１のメッセージカテゴリからのメッセージベクトルｉのメッセージプロファイラスコア
Ｓ_{ＣＡＴ２＿ＭＩＳＴＡＫＥｉ}＝他のカテゴリに所属するように誤った分類をされた第２のメッセージカテゴリからのメッセージベクトルｉのメッセージプロファイラスコア
Ｔ＝メッセージプロファイラの数値しきい値で、しきい値を超えると、メッセージは第１のカテゴリに所属すると考慮される
関数は、構成が先に分類されたデータのセットにおけるメッセージベクトルを正確に分類しようとしてなされた、誤りに関連するコストを表現する。従って、低いフィットネス値は、遺伝的アルゴリズムの目的のために良く考慮される。関数における第１項は、第２のカテゴリに所属するように誤った分類をされた、第１のカテゴリからのメッセージに関連するコストを表現し（例えば、正当であると分類された望ましくないメッセージ、別名偽陰性（ｆａｌｓｅｎｅｇａｔｉｖｅ））、第２項は、第１のカテゴリに所属するように誤った分類をされた、第２のカテゴリからのメッセージに関連するコストを表現する（例えば、望ましくないと分類された正当なメッセージ、別名偽陽性（ｆａｌｓｅｐｏｓｉｔｉｖｅ））。総和は点の総数を表し、総和により、構成はメッセージベクトルを分類しようとする場合に誤りを生じた。直観的に、各々の項は、本質的に、分類エラーの平均の周波数と、分類エラーの平均の大きさ双方の表現である。第２項は定数Ｃを掛けられていることに注意されたい。この定数（２０という値にセットされ得る）は、一方のカテゴリからのメッセージの誤った分類の、反対のカテゴリからのメッセージの誤った分類に関連する、相対的なコストを表す。Ｃを２０にセットすることによって、これは、第２のカテゴリからのメッセージ上の分類の誤りが、第２のカテゴリからの誤りよりも２０倍費用のかかることを指示する。例えば、メッセージプロファイラ５００が、望ましい、および望ましくないメールの分類に使用される場合には、第１のカテゴリは望ましくないメール（例えば、スパム）を表し得、第２のカテゴリは正当なメッセージを表し得る。次いで、上記の関数は正当なメッセージの誤った分類（偽陽性）を、望ましくないメッセージの誤った分類（偽陰性）に比べ２０倍費用がかかると判断し得る。これは、偽陽性が偽陰性よりもかなり高いリスクを保有するような、反スパムコミュニティにおける現実世界の観点を反映する。メッセージプロファイラ５００が、ポリシーのコンプライアンスに関連する分類のために使用される場合には、偽陽性は、敏感な情報を含むが、メッセージプロファイラ５００によって、それ自体としてはラベルされず、結果として、組織がその特定のカテゴリに適用されるように選ばれ得るようなポリシーを回避させられるようなメッセージである。

Is a fitness function 660 that may be used for optimization of The definition of a term in a function is as follows:
N _CAT1 = number of message vectors from the entire data set belonging to the first category N _CAT2 = number of message vectors from the entire data set belonging to the second category C = incorrect classification from the second category Constant multiplier S _{CAT1_MIKSTAKEi} for the message that has been classified = message profiler score S _{CAT2_MISTAKEi} = message vector i from the first message category _{misclassified} to belong to another category Message profiler score for message vector i from the misclassified second message category T = numeric threshold of message profiler, above which the message is considered to belong to the first category the function is Configuration is made the message vector in the set of data classified previously trying correctly classified, representing the costs associated with the error. Thus, low fitness values are well considered for genetic algorithm purposes. The first term in the function represents the cost associated with a message from the first category that was misclassified to belong to the second category (eg, an undesired message classified as legitimate) , Aka false negative), the second term represents the costs associated with messages from the second category that were misclassified to belong to the first category (eg, undesirable) Legitimate messages categorized as, also known as false positive). The sum represents the total number of points, and due to the sum, the configuration made an error when trying to classify the message vector. Intuitively, each term is essentially a representation of both the average frequency of classification errors and the average magnitude of classification errors. Note that the second term is multiplied by a constant C. This constant (which can be set to a value of 20) represents the relative cost associated with misclassifying messages from one category and misclassifying messages from the opposite category. By setting C to 20, this indicates that classification errors on messages from the second category are 20 times more expensive than errors from the second category. For example, if message profiler 500 is used to classify desirable and undesirable mail, the first category may represent unwanted mail (eg, spam) and the second category represents legitimate messages. obtain. The above function may then determine that a false classification of a legitimate message (false positive) is 20 times more expensive than a false classification of an unwanted message (false negative). This reflects a real-world view in the anti-spam community where false positives carry a significantly higher risk than false negatives. If the message profiler 500 is used for classification related to policy compliance, false positives contain sensitive information but are not labeled as such by the message profiler 500, resulting in organizational Is a message that allows a policy that can be chosen to apply to that particular category.

図１２は、メッセージプロファイラが使用され得る動作シナリオを描いている。図１２を参照すると、動作シナリオは、ステップ７１０において、メッセージングエンティティからネットワーク上に送信された通信を受信することを含む。複数のメッセージ分類手法が、次いで７１０において、通信を分類するために使用される。メッセージ分類手法の各々は、信頼値に関連しており、信頼値は、メッセージ分類手法からのメッセージ分類出力の収集において使用される。各々の分類の出力は、数値、テキスト形式の値、またはカテゴリの値であり得る。メッセージ分類出力は、ステップ７３０においてメッセージプロファイラスコアを生成するためにステップ７２０において組み合わされる。メッセージプロファイラスコアは、メッセージングエンティティに関連する通信に対して、どんな行動がなされるべきかを決定するために、ステップ７４０において使用される。 FIG. 12 depicts an operational scenario in which a message profiler can be used. Referring to FIG. 12, the operational scenario includes receiving, at step 710, a communication sent from the messaging entity over the network. Multiple message classification techniques are then used at 710 to classify the communication. Each of the message classification techniques is associated with a confidence value, which is used in collecting the message classification output from the message classification technique. The output of each classification can be a numeric value, a textual value, or a category value. The message classification outputs are combined at step 720 to generate a message profiler score at step 730. The message profiler score is used in step 740 to determine what action should be taken for communications associated with the messaging entity.

本明細書で記載される他の処理フローと同様に、処理および処理の命令が、変えられ得、変更され得、および／または増大され得、まだ望ましい成果を達成し得ることが理解される。例えば、メッセージプロファイラは、メッセージを２つの区別できるカテゴリに適切に分類することが不可能な単一の手法が存在することを認識する動作シナリオにおいて構成され得る（例えば、望ましい（正当な）および望ましくない（スパム、フィッシング（ｐｈｉｓｈｉｎｇ）、ウイルスなど）メッセージ通信間の区別、あるいは特有の組織のポリシー、法律、または規則をメッセージが遵守するかどうかの決定）。この動作シナリオにおいては、このような構成されたメッセージプロファイラは：
１．多くのメッセージ分類手法の結果を、アプリオリ（ａｐｒｉｏｒｉ）にどの分類手法が使用されるかを特定することなく、分類の総計（例えば、「望ましくない」または「正当な」、「ＨＩＰＰＡ準拠」「ＧＬＢＡ違反」「ＨＲポリシー違反」など）に組み合わすためのフレームワークを提供するように設計され得、
２．手法の重要性のレベルが、時間につれる精度の変化を反映するように調整され得るように、分類手法の分類論理から、各分類手法の重要性（分類の総計への寄与として表現される）をデカップル（ｄｅｃｏｕｐｌｅ）するように設計され得、
３．フレームワークが、分類の総計において非常に正確な比率を達成するためにこの情報を使用するように調整され得るように、メカニズムを介して、フレームワーク内の分類手法の各々の相対的な重要性を記載し、それらの個別の精度の相関を記載するメカニズムを提供するように設計され得、
４．フレームワークが、ある環境において最大の分類精度に調整され得るように、メカニズムを介して、フレームワーク内の分類手法の各々の相対的な重要性を発見するためのメカニズムを提供するように設計され得る。
さらに、メッセージプロファイラは、他の動作シナリオにおいて動作するように構成され得る。例えば、図１３は、適応性のあるメッセージブロッキング、およびホワイトリスティング（ｗｈｉｔｅｌｉｓｔｉｎｇ）を用いて動作するように適合されているメッセージプロファイラを描いている。図１３を参照すると、個別のメッセージの分類に加え、メッセージプロファイラプログラム５００の総計された結果はまた、８２０において、それらのメッセージが受信しているメッセージプロファイラスコアの分配に基づいて、メッセージのセンダを分類するために用いられる。特有の時間枠（例えば、時間、日、週）の間に、特定のセンダ（例えばＩＰ）から受信されたメッセージの平均スコアが、特有のしきい値Ｔ_Ｕを超え、スコア分布がＳＴ_Ｕよりも小さな標準偏差を有する場合には、そのセンダは、「評判が悪い」に分類され得る（情報はデータ記憶装置８４０に記憶される）。プロセス８００は、このようなセンダに由来する全てのメッセージおよび接続が、次のＸ時間において処理することなく、８１０においてドロップされ得ることを決定するために、次いでデータ記憶装置８４０からのデータを使用する。これに対して、平均のスコアが、ＳＴ_Ｌよりも小さな標準偏差を有するしきい値Ｔ_Ｌ以下である場合には、センダは正当であると考えられ得（情報はデータ記憶装置８３０に記憶される）、そのセンダからのメッセージが、プロセス８００により、フィルタリング４６０において有意な処理のオーバヘッド、ネットワークのオーバヘッド、または記憶のオーバヘッドを引き起こす、特定のフィルタリング手法（例えば、メッセージプロファイラ５００のフィルタリング）をバイパスさせ得る。 As with other process flows described herein, it is understood that the processing and processing instructions can be changed, changed, and / or augmented, and still achieve desirable results. For example, a message profiler can be configured in an operational scenario that recognizes that there is a single approach that cannot properly classify messages into two distinct categories (eg, desirable (legitimate) and desirable No (spam, phishing, virus, etc.) distinction between message communications, or determination of whether a message complies with specific organizational policies, laws, or regulations). In this operating scenario, such a configured message profiler is:
1. The results of many message classification techniques can be used to aggregate classifications (eg, “desirable” or “legitimate”, “HIPPA compliant”, “a priori” without specifying which classification technique is used). GLBA violations, “HR policy violations” etc.) can be designed to provide a framework for combining,
2. From the classification logic of the classification technique, the importance of each classification technique (expressed as a contribution to the total classification) so that the level of importance of the technique can be adjusted to reflect the change in accuracy over time Can be designed to decouple,
3. Through the mechanism, the relative importance of each of the classification methods within the framework so that the framework can be adjusted to use this information to achieve a very accurate ratio in the total of the classification And can be designed to provide a mechanism to describe the correlation of their individual accuracy,
4). Designed to provide a mechanism for discovering the relative importance of each of the classification methods within the framework, through the mechanism, so that the framework can be tuned for maximum classification accuracy in an environment. obtain.
Further, the message profiler can be configured to operate in other operating scenarios. For example, FIG. 13 depicts a message profiler that is adapted to operate with adaptive message blocking and whitelisting. Referring to FIG. 13, in addition to the classification of individual messages, the aggregated result of the message profiler program 500 also determines the sender of the message at 820 based on the distribution of the message profiler scores that those messages are receiving. Used to classify. Specific time frame (e.g., hours, days, weeks) during an average score of messages received from a particular sender (for example, IP) is greater than specific threshold T _U, the score distribution than ST _U May have a small standard deviation, the sender may be classified as “bad” (the information is stored in the data store 840). Process 800 then uses the data from data store 840 to determine that all messages and connections originating from such senders can be dropped at 810 without processing in the next X time. To do. On the other hand, if the average score is less than or equal to the threshold _TL having a standard deviation smaller than ST _L , the sender may be considered valid (the information is stored in the data storage device 830). The message from that sender causes the process 800 to bypass certain filtering techniques (eg, message profiler 500 filtering) that cause significant processing overhead, network overhead, or storage overhead in filtering 460. obtain.

メッセージプロファイラはまた、エンド（ｅｎｄｏ）、およびエクソ（ｅｘｏ）フィルタリングシステムの適応性のある訓練に関連して使用され得る。本明細書に記載されるセンダ分類のシステムおよび方法を用いることにより、メッセージプロファイラは、プロファイル内で使用される種々のフィルタリング手法の訓練のために、完全にプロファイルの外に位置するその他のものと同様に使用され得る。このような手法は、ベイジアン、サポートベクトルマシン（ＳＶＭ）、および他の統計学的な定常フィルタリング手法を、サインベースの手法（例えば、統計的なルックアップサービス（ＳＬＳ）およびメッセージクラスタリングタイプの手法）と同様に、含み得る。このような手法における訓練戦略は分類された、正当なおよび望ましくないメッセージのセットを使用し得、そのセットは、このようなセンダからのメッセージのスコアの総計から指定されたセンダの評判に基づいてメッセージプロファイラにより提供され得る。評判が悪いと分類されたセンダからのメッセージは、望ましくないとしてフィルタリングシステムのトレーナに提供され得、望ましいメッセージが、正当なセンダにより送信されたストリームから取得される。 The message profiler may also be used in connection with adaptive training of endo and exo filtering systems. By using the sender classification system and method described herein, the message profiler can be used to train various filtering techniques used within the profile and others that are completely outside the profile. It can be used as well. Such techniques replace Bayesian, Support Vector Machine (SVM), and other statistical stationary filtering techniques with sign-based techniques (eg, Statistical Lookup Service (SLS) and message clustering type techniques). As well as. A training strategy in such an approach may use a set of classified, legitimate and undesirable messages, based on the sender's reputation specified from the total score of messages from such senders. May be provided by a message profiler. Messages from senders classified as unreputable may be provided to the filtering system trainer as undesirable, and the desired messages are obtained from streams sent by legitimate senders.

上記したように、メッセージプロファイラ５００は、１つの分類手法として、評判ベースのアプローチを使用し得る。図１４は、９００において、メッセージングエンティティ４５０からの、ネットワーク４４０上で受信されるトランスミッションを扱うことにおいて、フィルタリングシステム４６０によって使用され得る評判システムを描いている。より明確に、フィルタリングシステム４６０は、どんなフィルタリング行動が（ある場合には）メッセージングエンティティの通信上でとられるべきかの決定（少なくとも部分的に）を支援するために、評判システム９００を使用する。例えば、通信は評判が良い供給源からであると決定され得、結果として通信がフィルタされない。 As described above, the message profiler 500 may use a reputation-based approach as one classification technique. FIG. 14 depicts a reputation system that may be used by the filtering system 460 in handling transmissions received over the network 440 from the messaging entity 450 at 900. More specifically, the filtering system 460 uses the reputation system 900 to assist in determining (at least in part) what filtering actions (if any) should be taken on the messaging entity's communication. For example, it can be determined that the communication is from a reputable source, and as a result, the communication is not filtered.

フィルタリングシステム４６０は、９５０において受信された通信のセンダを識別し、その識別情報を評判システム９００に提供する。評判システム９００は、メッセージングエンティティが特定の特性を示す確率を計算することにより、照会されたセンダのアイデンティティの評判を評価する。全体の評判スコアは、計算された確率に基づいて決定され、フィルタリングシステム４６０に提供される。評判スコアは、値において、数値で、テキスト形式で、カテゴリ的であり得る。 Filtering system 460 identifies the sender of the communication received at 950 and provides that identification information to reputation system 900. The reputation system 900 evaluates the reputation of the queried sender's identity by calculating the probability that the messaging entity exhibits certain characteristics. The overall reputation score is determined based on the calculated probability and provided to the filtering system 460. Reputation scores can be numeric in value, textual, and categorical.

フィルタリングシステム４６０は、９５２において、センダの通信においてどの行動がとられるべきかを決定する。フィルタリングシステム４６０は、評判システム９００からの評判スコアを、メッセージ分類フィルタとして使用し得、メッセージ分類フィルタは、その各々調整された信頼値を掛けられ、次いで他のメッセージ分類フィルタ結果と総計される。 The filtering system 460 determines at 952 what action should be taken in the sender's communication. Filtering system 460 may use the reputation score from reputation system 900 as a message classification filter, which is multiplied by its respective adjusted confidence value and then aggregated with other message classification filter results.

評判システムは、フィルタリングシステムを補助するために多くの異なる方法で構成され得る。例えば、図１５は、評判スコアを計算するように構成されている評判システム９００を描いている。システムの構成１０００は、バイナリのテスト可能な判定基準１００２を識別することにより確立され得、テスト可能な判定基準１００２は、良いセンダと悪いセンダとの間の強いディスクリミネータであると思われる。Ｐ（ＮＲ｜Ｃ_ｉ）は、それが質／判定基準Ｃ_ｉに従う場合には、センダは評判が良くないという確率として定義され得、Ｐ（Ｒ｜Ｃ_ｉ）は、それが質／判定基準Ｃ_ｉに従う場合には、センダが評判である関数として定義され得る。 The reputation system can be configured in many different ways to assist the filtering system. For example, FIG. 15 depicts a reputation system 900 that is configured to calculate a reputation score. System configuration 1000 may be established by identifying binary testable criteria 1002, which may be a strong discriminator between good and bad senders. P (NR | C _i ) can be defined as the probability that a sender is not reputable if it follows quality / criteria C _i , and P (R | C _i ) is quality / criteria If C _i is followed, it can be defined as a function that the sender is reputable.

各々の質／判定基準Ｃ_ｉに対し、周期的な（例えば、一日の、一週間の、一ヶ月の、など）サンプリング演習は、Ｐ（ＮＲ｜Ｃ_ｉ）を再計算するために行われ得る。サンプリング演習は、質／判定基準Ｃ_ｉが真であることが既知のセンダＮのランダムサンプルセットＳを選択することを含み得る。サンプル中のセンダは、次いで以下のセットの内の１つにソートされる：評判が良い（Ｒ）、評判が良くない（ＮＲ）、または未知（Ｕ）。Ｎ_Ｒは、評判が良いセンダであるサンプルにおけるセンダ数であり、Ｎ_ＮＲは、評判が良くないセンダのセンダ数、などである。次いで、Ｐ（ＮＲ｜Ｃ_ｉ）およびＰ（Ｒ｜Ｃ_ｉ）は、式： For each quality / criteria C _i , periodic (eg, daily, weekly, monthly, etc.) sampling exercises are performed to recalculate P (NR | C _i ). obtain. The sampling exercise may include selecting a random sample set S of sender N that is known to have a quality / criterion C _{i that} is true. The senders in the sample are then sorted into one of the following sets: reputable (R), not reputable (NR), or unknown (U). N _R is the number of the sender in the sample is reputable sender, N _NR is the sender number of the sender reputation is not good, and the like. P (NR | C _i ) and P (R | C _i ) can then be represented by the formula:

を用いて推定される。この目的において、Ｎ＝３０は、各々の質／判定基準Ｃ_ｉにおいてＰ（ＮＲ｜Ｃ_ｉ）およびＰ（Ｒ｜Ｃ_ｉ）の正確な推定を達成するためには大きすぎるサンプルサイズであることが決定された。

Is used to estimate. For this purpose, N = 30 is a sample size that is too large to achieve an accurate estimate of P (NR | C _i ) and P (R | C _i ) at each quality / criteria C _i Was decided.

全ての判定基準に対し、Ｐ（ＮＲ｜Ｃ_ｉ）およびＰ（Ｒ｜Ｃ_ｉ）を計算した後に、算出された確率は、評判スペースにおける各センダの、評判が良くない確率の総計Ｐ_ＮＲ１００４、および評判が良いセンダの確率の総計Ｐ_Ｒ１００６を計算するために使用される。これらの確率は式： After calculating P (NR | C _i ) and P (R | C _i ) for all criteria, the calculated probability is the sum of the unreputable probabilities P _NR 1004 for each sender in the reputation space. , And the sum of the probable sender probabilities P _R 1006. These probabilities are given by the formula:

を用いて計算され得る。実験においては、上記の式は広範囲の入力判定基準の組み合わせの非常に良い挙動を見せ、実際には、それらの挙動は、入力判定基準の「評判が良くない」および「評判が良い」挙動の条件付き確率の単純な（ｎａｉｖｅ）結合を正確に算出するための式の挙動に類似するように見える。

Can be calculated using In experiments, the above formulas show very good behavior for a wide range of input criteria combinations, and in fact, those behaviors are similar to those of the input criteria “not-reputable” and “reputable” behavior. It appears to be similar to the behavior of the formula for accurately calculating the naive combination of conditional probabilities.

各センダに対し、Ｐ_ＮＲおよびＰ_Ｒを計算した後に、評判スコアは、そのセンダに対し以下の評判関数： For each sender, after calculating the P _NR and P _R, the reputation score is below the reputation function for that sender:

を用いて計算される。異なる関数が、評判スコアのデタミネータ１００８として振舞い得、関数の表現に加えて、多くの異なる形式で表現され得ることが理解される。実例として、図１６は、１１００において評判スコアを決定するための表形式を描いている。表は、Ｐ_ＮＲおよびＰ_Ｒの値に基づいて、それらが０．０〜１．０の間で変動する場合に、上記の関数により生成される評判スコアを示している。例えば、１１１０に示されているように、５３という評判スコアはＰ_ＮＲ＝０．９およびＰ_Ｒ＝０．２の組み合わせにおいて取得される。この評判スコアは、センダが、評判が良いと考慮されない、比較的高い指標である。０という評判スコアは、Ｐ_ＮＲおよびＰ_Ｒが同一である場合に取得される（例えば、１１２０において示されるように、Ｐ_ＮＲ＝０．７およびＰ_Ｒ＝０．７の場合には、評判スコアが０になる）。評判スコアは、Ｐ_ＲがＰ_ＮＲよりも大きい場合に決定される、センダが比較的評判が良いことを指示するための負の値を有し得る。例えば、１１３０に示されるように、Ｐ_ＮＲ＝０．５およびＰ_Ｒ＝０．８の場合には、評判スコアは−１２である。

Is calculated using It is understood that different functions can behave as reputation score determinators 1008 and can be expressed in many different forms in addition to the function representation. Illustratively, FIG. 16 depicts a tabular format for determining a reputation score at 1100. Table, based on the value of P _NR and P _R, if they vary between 0.0 and 1.0 shows a reputation score generated by the above function. For example, as shown at 1110, a reputation score of 53 is obtained in a combination of P _NR = 0.9 and P _R = 0.2. This reputation score is a relatively high indicator that the sender is not considered good reputation. Reputation score of 0 _is obtained when _{P NR} and _{P R} are the same (e.g., as shown in _1120, when the P NR = 0.7 and _P R = 0.7, the reputation score Becomes 0). Reputation score, P _R is determined is larger than P _NR, it may have a negative value to indicate that the sender is relatively reputable. For example, as shown at 1130, if P _NR = 0.5 and P _R = 0.8, the reputation score is -12.

多くの異なるタイプの判定基準が、評判システムのセンダの通信（例えば、評判スコアを決定するために、評判が良くない判定基準および評判が良い判定基準を用いること）の処理において使用され得る。このような判定基準の例は、２００４年１１月５日に出願され「ＣＬＡＳＳＩＦＩＣＡＴＩＯＮＯＦＭＥＳＳＡＧＩＮＧＥＮＴＩＴＩＥＳ」と題名が付けられた、米国仮特許出願第６０／６２５，５０７号において開示されている。 Many different types of criteria can be used in the processing of reputation system sender communications (eg, using non-reputable and reputable criteria to determine reputation scores). An example of such a criterion is disclosed in US Provisional Patent Application No. 60 / 625,507, filed Nov. 5, 2004 and entitled “CLASSIFICATION OF MESSAGING ENTITIES”.

本明細書で開示されるシステムおよび方法は、例としてのみ示されており、本発明の範囲を制限することを意味しない。上記したシステムおよび方法の他の変形は、当業者にとって明白であり、それ自体は本発明の範囲内であると考慮される。例えば、システムおよび方法は、多くの異なるタイプの通信を扱うように構成され得る（例えば、正当なメッセージ、あるいは望ましくない通信、または予め選択されたポリシーを侵害する通信である）。実例として、望ましくない通信は、スパムまたはウイルスの通信を含み得、予め選択されたポリシーは、企業の通信ポリシー、メッセージングポリシー、法律または規則のポリシー、あるいは国際通信ポリシーを含み得る。 The systems and methods disclosed herein are shown by way of example only and are not meant to limit the scope of the invention. Other variations of the systems and methods described above will be apparent to those skilled in the art and are considered to be within the scope of the invention itself. For example, the systems and methods may be configured to handle many different types of communications (eg, legitimate messages, or unwanted communications, or communications that violate preselected policies). Illustratively, unwanted communications can include spam or virus communications, and preselected policies can include corporate communications policies, messaging policies, legal or regulatory policies, or international communications policies.

本明細書で開示されたシステムおよび方法の、別の広範囲の例および変形として、システムおよび方法は種々のタイプのコンピュータアーキテクチャ上で（例えば、異なるタイプのネットワーク化された環境上で）インプリメントされ得る。実例として、図１７は、サーバアクセスアーキテクチャを描いており、サーバアクセスアーキテクチャにおいて、開示されたシステムおよび方法が使用され得る（例えば図１７の１３３０に示されているように）。この例におけるアーキテクチャは、企業のローカルネットワーク１２９０、およびローカルネットワーク１２９０内に備わっている種々のコンピュータシステムを備える。これらのシステムは、アプリケーションサーバ１２２０（例えば、ウェブサーバおよび電子メールサーバ）、ローカルクライアント１２３０を実行するユーザワークステーション（例えば、電子メールリーダおよびウェブブラウザ）、およびデータ記憶デバイス１２１０（例えばデータベースおよびネットワーク接続されたディスク）を備え得る。これらのシステムは、ローカル通信ネットワーク（例えばイーサネット（登録商標）（Ｅｔｈｅｒｎｅｔ（登録商標））１２５０）を通じてお互いに通信する。ファイアウォールシステム１２４０は、ローカル通信ネットワークとインターネット１２６０との間に備えられる。外部サーバのホスト１２７０および外部クライアント１２８０が、インターネット１２６０に接続されている。本開示は、構成要素間の通信を円滑にするために、インターネット、無線ネットワーク、ワイドエリアネットワーク、ローカルエリアネットワーク、およびこれらの組み合わせを含むがそれに制限されない、任意の種類のネットワークであり得ることが理解される。 As another broad example and variation of the systems and methods disclosed herein, the systems and methods can be implemented on various types of computer architectures (eg, on different types of networked environments). . Illustratively, FIG. 17 depicts a server access architecture, in which the disclosed system and method can be used (eg, as shown at 1330 in FIG. 17). The architecture in this example comprises an enterprise local network 1290 and various computer systems within the local network 1290. These systems include application servers 1220 (eg, web servers and email servers), user workstations (eg, email readers and web browsers) that run local clients 1230, and data storage devices 1210 (eg, database and network connections). Disc). These systems communicate with each other through a local communication network (e.g., Ethernet (registered trademark) 1250). A firewall system 1240 is provided between the local communication network and the Internet 1260. An external server host 1270 and an external client 1280 are connected to the Internet 1260. The present disclosure may be any type of network, including but not limited to the Internet, wireless networks, wide area networks, local area networks, and combinations thereof, to facilitate communication between components. Understood.

ローカルクライアント１２３０は、ローカル通信ネットワークを通じて、アプリケーションサーバ１２２０、および共有のデータ記憶装置１２１０にアクセスし得る。外部クライアント１２８０は、インターネット１２６０を通じて外部アプリケーションサーバ１２７０にアクセスし得る。ローカルサーバ１２２０またはローカルクライアント１２３０が、外部サーバ１２７０へのアクセスを必要とする例、あるいは外部クライアント１２８０または外部サーバ１２７０が、ローカルサーバ１２２０へのアクセスを必要とする例においては、あるアプリケーションサーバに対し、適切なプロトコルにおける電子的な通信は、ファイアウォールシステム１２４０の「常にオープンな」ポートを介して流れる。 The local client 1230 may access the application server 1220 and the shared data storage device 1210 through a local communication network. The external client 1280 can access the external application server 1270 through the Internet 1260. In an example where the local server 1220 or the local client 1230 needs access to the external server 1270, or an example where the external client 1280 or the external server 1270 needs access to the local server 1220, Electronic communication in the appropriate protocol flows through the “always open” port of the firewall system 1240.

本明細書に記載したシステム１３３０は、イーサネット（登録商標）１２８０のようなローカル通信ネットワークに接続したハードウェアデバイス、または１つ以上のサーバ上に配置され得、ファイアウォールシステム１２４０と、ローカルサーバ１２２０およびローカルクライアント１２３０との間に、論理的に挿入され得る。ファイアウォールシステム１２４０を通って、ローカル通信ネットワークに入る、またはネットワークから出て行く、アプリケーションに関連する電子的な通信は、システム１３３０に経路付けられる。 The system 1330 described herein may be located on a hardware device or one or more servers connected to a local communications network, such as Ethernet 1280, including a firewall system 1240, a local server 1220, and It can be logically inserted between the local client 1230. Electronic communications associated with applications that enter or leave the local communications network through firewall system 1240 are routed to system 1330.

図１７の例においては、システム１３３０は、非常に多くのセンダについての評判データを記憶し、処理するように、脅威マネジメントシステム（ｔｈｒｅａｔｍａｎａｇｅｍｅｎｔｓｙｓｔｅｍ）の一部として構成され得る。これは、脅威マネジメントシステムに、電子メール（ｅ−ｍａｉｌ）を許可するか、またはブロックするかについてのより良い説明を受けた上での決定（ｉｎｆｏｒｍｅｄｄｅｃｉｓｉｏｎ）をさせ得る。 In the example of FIG. 17, system 1330 may be configured as part of a threat management system to store and process reputation data for a large number of senders. This may cause the threat management system to make an informed decision on whether to allow or block email (e-mail).

システム１３３０は、多くの異なるタイプの電子メールを扱うために使用され得、ＳＭＴＰおよびＰＯＰ３を含む、電子メールのトランスミッション、送達および処理のために使用される、多種のプロトコルを扱うために使用され得る。これらのプロトコルは、各々、サーバ間の電子メールメッセージを通信するための標準、および電子メールメッセージに関連するサーバクライアント通信のための標準を意味する。これらのプロトコルは、特にＩＥＴＦ（インターネット技術標準化委員会（ＩｎｔｅｒｎｅｔＥｎｇｉｎｅｅｒｉｎｇＴａｓｋＦｏｒｃｅ））によって普及されたＲＦＣ（リクエストフォーコメント（ＲｅｑｕｅｓｔｆｏｒＣｏｍｍｅｎｔｓ））において各々定義される。ＳＭＴＰプロトコルはＲＦＣ１２２１において定義され、ＰＯＰ３プロトコルはＲＦＣ１９３９において定義される。 System 1330 can be used to handle many different types of email, and can be used to handle a variety of protocols used for email transmission, delivery and processing, including SMTP and POP3. . These protocols each mean a standard for communicating email messages between servers, and a standard for server client communication associated with email messages. These protocols are each defined in particular in RFC (Request for Comments) popularized by the IETF (Internet Engineering Task Force). The SMTP protocol is defined in RFC1221, and the POP3 protocol is defined in RFC1939.

これらの標準の開始以来、種々の必要性が電子メールの分野において進化し、エンハンスメントまたは付加的なプロトコルを含むさらなる標準の開発という結果を導いた。例えば、種々のエンハンスメントがＳＭＴＰ標準を進化させ、拡張ＳＭＴＰの進化という結果を導いた。拡張の例は、以下に見出され得る。（１）ＲＦＣ１８６９。上記ＲＦＣ１８６９は、手段（これによりサーバＳＭＴＰは、クライアントＳＭＴＰにそれがサポートするサービス拡張について通知し得る）を定義することにより、ＳＭＴＰサービスを拡張するためのフレームワークを定義する。（２）ＳＭＴＰサービスの拡張を定義するＲＦＣ１８９１。このことは、（ａ）送達ステータス通知（ｄｅｌｉｖｅｒｙｓｔａｔｕｓｎｏｔｉｆｉｃａｔｉｏｎ）が、ある状況下で生成されることと、（ｂ）このような通知がメッセージのコンテンツを戻すかどうかと、（ｃ）ＤＳＮが発行されるレシピエントと、オリジナルのメッセージが送達されたトランザクションの両方を、センダに識別させるＤＳＮと共に戻される付加的な情報と、をＳＭＴＰクライアントが明確にすることを可能にする。 Since the start of these standards, various needs have evolved in the field of email, leading to the development of further standards, including enhancements or additional protocols. For example, various enhancements have evolved the SMTP standard, leading to the evolution of extended SMTP. Examples of expansion can be found below. (1) RFC1869. RFC 1869 defines a framework for extending SMTP services by defining means (so that the server SMTP can inform the client SMTP about the service extensions it supports). (2) RFC 1891 that defines the extension of the SMTP service. This is because (a) a delivery status notification is generated under certain circumstances, (b) whether such notification returns the content of the message, and (c) the DSN issues Allows the SMTP client to clarify both the recipients to be sent and the additional information returned with the DSN that identifies the sender to the transaction in which the original message was delivered.

加えて、ＩＭＡＰプロトコルは、ＰＯＰ３の代替物として進化し、電子メールサーバとクライアントとのさらに進んだ相互作用をサポートする。このプロトコルは、ＲＦＣ２０６０に記載される。 In addition, the IMAP protocol has evolved as an alternative to POP3 to support further interaction between email servers and clients. This protocol is described in RFC 2060.

他の通信メカニズムもまた、ネットワーク上で広く使用される。これらの通信メカニズムは、ボイスオーバーＩＰ（ＶｏＩＰ（ＶｏｉｃｅＯｖｅｒＩＰ））およびインスタントメッセージを含むが、制限されない。ＶｏＩＰはＩＰ電話において、インターネットプロトコル（ＩＰ）を用いる音声情報の送達を扱うための機能のセットを提供するために使用される。インスタントメッセージは、リアルタイムに通信（例えば、会話）を送達するインスタントメッセージサービスに接続するクライアントを含む通信タイプである。 Other communication mechanisms are also widely used on networks. These communication mechanisms include, but are not limited to, Voice over IP (VoIP (Voice Over IP)) and instant messaging. VoIP is used in IP telephony to provide a set of functions for handling the delivery of voice information using the Internet Protocol (IP). Instant messaging is a communication type that includes a client that connects to an instant messaging service that delivers communications (eg, conversations) in real time.

インターネットがより広く使用されるにつれて、インターネットはまた、ユーザにとって新たなトラブルを作り出した。特に、個別のユーザにより受信されるスパムの量は、ここしばらくの間で劇的に増加している。本明細書で使用されるスパムは、レシピエントにより依頼されていない、または望まれていない任意の通信を受け取ることをいう。システムおよび方法は、本明細書において開示されるように、これらのタイプの依頼されていない、または望まれていない通信をアドレスするように構成され得る。これは、電子メールをスパムすることが、企業の資源を消費することおよび生産性に影響を与えることにおいて有用であり得る。 As the Internet has become more widely used, it has also created new troubles for users. In particular, the amount of spam received by individual users has increased dramatically over the past few years. Spam as used herein refers to receiving any communication not requested or desired by the recipient. The systems and methods may be configured to address these types of unsolicited or undesired communications as disclosed herein. This can be useful in spamming emails in consuming corporate resources and impacting productivity.

本明細書で公開されるシステムおよび方法は、１つ以上のデータ処理デバイスとの通信のために、ネットワーク（例えば、ローカルエリアネットワーク、ワイドエリアネットワーク、インターネットなど）、光ファイバ媒体、搬送波、無線ネットワークなどを通じて伝達されたデータ信号を用い得る。データ信号は、本明細書で公開される、デバイスへ提供される、またはデバイスから提供される、任意の、または全てのデータを運び得る。 The systems and methods disclosed herein are for networks (eg, local area networks, wide area networks, the Internet, etc.), fiber optic media, carrier waves, wireless networks for communication with one or more data processing devices. The data signal transmitted through the above may be used. A data signal may carry any or all data published, provided to, or provided from a device.

さらに、本明細書に記載される方法およびシステムは、１つ以上のプロセッサにより実行可能なプログラム命令を含むプログラムコードにより、多くの異なるタイプの処理デバイス上でインプリメントされ得る。ソフトウェアプログラム命令は、処理システムに、本明細書で記載される方法を行わせるように動作可能なソースコード、オブジェクトコード、マシンコードまたは任意の他の記憶されたデータを含み得る。 Further, the methods and systems described herein may be implemented on many different types of processing devices with program code that includes program instructions executable by one or more processors. Software program instructions may include source code, object code, machine code, or any other stored data operable to cause a processing system to perform the methods described herein.

システムの、および方法のデータ（例えば、アソシエーション、マッピングなど）は、異なるタイプの記憶デバイスおよびプログラミング構造物（例えば、データ記憶装置、ＲＡＭ、ＲＯＭ、フラッシュメモリ、単層ファイル、データベース、プログラミングデータ構造、プログラミング変数、ＩＦ−ＴＨＥＮ（または類似のタイプの）命令文構造物など）のような１つ以上の異なるタイプのコンピュータインプリメントの方法において記憶およびインプリメントされ得る。データ構造は、コンピュータプログラムによる使用のためのデータベース、プログラム、メモリまたは他のコンピュータ読み取り可能な媒体において、データを編成することおよび記憶することに使用するためのフォーマットを記述することに注意されたい。 System and method data (eg, association, mapping, etc.) can be stored in different types of storage devices and programming structures (eg, data storage, RAM, ROM, flash memory, single layer files, databases, programming data structures, It can be stored and implemented in one or more different types of computer-implemented methods, such as programming variables, IF-THEN (or similar types of statement structures, etc.). Note that a data structure describes a format for use in organizing and storing data in a database, program, memory or other computer-readable medium for use by a computer program.

システムおよび方法は、方法の動作を行うために、および本明細書に記載されるシステムをインプリメントするために、プロセッサによる実行において使用される命令を包含するコンピュータの記憶メカニズム（例えば、ＣＤ−ＲＯＭ、ディスケット、ＲＡＭ、フラッシュメモリ、コンピュータのハードドライブなど）を含む、多くの異なるタイプのコンピュータ読み取り可能な媒体に提供され得る。 The system and method includes a computer storage mechanism (eg, a CD-ROM, including instructions used in execution by a processor to perform the operations of the method and to implement the system described herein. Many different types of computer readable media may be provided, including diskettes, RAM, flash memory, computer hard drives, etc.

本明細書に記載される、コンピュータのコンポーネント、ソフトウェアモジュール、機能およびデータ構造は、それらの動作に必要とされるデータのフローを許容するために、お互いに、直接的にまたは間接的に接続し得る。ソフトウェア命令またはモジュールは、例えば、コードのサブルーチンユニットとして、コードのソフトウェア機能ユニットとして、オブジェクト（オブジェクト指向のパラダイムなどの場合）として、アプレットとして、コンピュータスクリプト言語において、別のタイプのコンピュータコードまたはファームウェアとして、インプリメントされ得ることにも注意されたい。ソフトウェアのコンポーネントおよび／または機能性は、単一のデバイス上に配置され得、当面の状況に依存して複数のデバイスにわたり分配され得る。 The computer components, software modules, functions, and data structures described herein may be directly or indirectly connected to each other to allow the flow of data required for their operation. obtain. A software instruction or module can be, for example, as a subroutine unit of code, as a software functional unit of code, as an object (in the case of an object-oriented paradigm), as an applet, in a computer script language, as another type of computer code or firmware Note also that it can be implemented. Software components and / or functionality may be located on a single device and distributed across multiple devices depending on the immediate situation.

本明細書の記載および添付する請求の範囲全体にわたって使用される場合には、「１つの（ａ）」「１つの（ａｎ）」および「該」の意味は、文脈上他に明確に指図する場合を除いて、複数の参照を含むことが理解される。また、本明細書の記載および添付する請求の範囲全体にわたって使用される場合には、「において」の意味は、文脈上他に明確に指図する場合を除いて、「の中で」および「上で」を含む。最後に、本明細書の記載および添付する請求の範囲全体にわたって使用される場合には、「および」および「または」の意味は、文脈上他に明確に指図する場合を除いて、接続詞および離接接続詞の双方を含み、交換できるように使用され得る。フレーズ「排他的なまたは」は離説接続詞の意味のみが適用され得る状況を指示するために使用され得る。 As used throughout this specification and the appended claims, the meanings of “a”, “an” and “the” are clearly indicated otherwise in the context. Except where otherwise, it is understood to include multiple references. Also, as used throughout the description and the appended claims, the meaning of “in” means “in” and “above” unless the context clearly dictates otherwise. Including ". Finally, as used throughout this specification and the appended claims, the meanings of “and” and “or” are expressly used in conjunction and conjunction unless the context clearly dictates otherwise. It can be used interchangeably, including both conjunctions. The phrase “exclusive or” can be used to indicate a situation where only the meaning of paradoxical conjunctions can be applied.

図１は、ネットワーク上で受信されたトランスミッションを扱うシステムを描いているブロック図である。FIG. 1 is a block diagram depicting a system for handling transmissions received over a network. 図２は、評判スコアを決定するように構成されている評判システムを描いているブロック図である。FIG. 2 is a block diagram depicting a reputation system that is configured to determine a reputation score. 図３は、種々の計算された確率の値における評判スコアを描いている表である。FIG. 3 is a table depicting reputation scores at various calculated probability values. 図４は、種々の計算された確率の値における評判スコアを描いているグラフである。FIG. 4 is a graph depicting reputation scores at various calculated probability values. 図５は、評判スコアを生成するための動作シナリオを描いているフローチャートである。FIG. 5 is a flowchart depicting an operational scenario for generating a reputation score. 図６は、評判スコアを決定するための、評判が良くない判定基準および評判が良い基準の使用を描いているブロック図である。FIG. 6 is a block diagram depicting the use of bad reputation criteria and good reputation criteria to determine a reputation score. 図７は、センダの評判スコアを含む戻り値に応答するように構成された評判システムを描いているブロック図である。FIG. 7 is a block diagram depicting a reputation system configured to respond to a return value that includes a sender's reputation score. 図８は、ネットワーク上で受信されるトランスミッションを扱うためのシステムを描いているブロック図である。FIG. 8 is a block diagram depicting a system for handling transmissions received over a network. 図９は、メッセージプロファイラプログラムを有するフィルタリングシステムを描いているブロック図である。FIG. 9 is a block diagram depicting a filtering system having a message profiler program. 図１０は、メッセージ分類チューナプログラムを描いているブロック図である。FIG. 10 is a block diagram depicting a message classification tuner program. 図１１は、メッセージ分類チューナプログラムとしての遺伝的アルゴリズムの使用を描いているブロック図である。FIG. 11 is a block diagram depicting the use of a genetic algorithm as a message classification tuner program. 図１２は、メッセージプロファイラが使用される動作シナリオを描いているフローチャートである。FIG. 12 is a flowchart depicting an operational scenario in which a message profiler is used. 図１３は、適応性のあるメッセージブロッキングおよびホワイトリスティングによって動作するように適合されているメッセージプロファイラを描いているブロック図である。FIG. 13 is a block diagram depicting a message profiler adapted to operate with adaptive message blocking and white listing. 図１４は、ネットワーク上で受信されたトランスミッションを扱うための評判システムを描いているブロック図である。FIG. 14 is a block diagram depicting a reputation system for handling transmissions received over a network. 図１５は、評判スコアを決定するために構成されている評判システムを描いているブロック図である。FIG. 15 is a block diagram depicting a reputation system that is configured to determine a reputation score. 図１６は、種々の計算された確率の値における評判スコアを描いている表である。FIG. 16 is a table depicting reputation scores at various calculated probability values. 図１７は、サーバアクセスアーキテクチャを描いているブロック図である。FIG. 17 is a block diagram depicting a server access architecture.

Claims

A method operating on one or more data processors for specifying a reputation for a messaging entity, the method comprising:
Receiving data identifying one or more characteristics associated with communication of the messaging entity;
Determining a reputation score based on the received identification data;
The determined reputation score indicates the reputation of the messaging entity;
The method, wherein the determined reputation score is used to determine what action should be taken for communications associated with the messaging entity.

The method of claim 1, wherein the determined reputation score is distributed to one or more computer systems used for transmission filtering.

The method of claim 1, wherein the determined reputation score is locally distributed to programs used for transmission filtering.

The method of claim 1, wherein the method comprises:
Determining a probability of indicating a reputation based on the received identification data;
The probability of indicating a reputation is based on a range in which the identified one or more communication characteristics indicate a criterion associated with one or more reputations, or are subject to a criterion associated with one or more reputations. Indicate the reputation of the messaging entity,
Determining the reputation score includes determining the reputation score based on a sum of the determined probabilities.

The method of claim 1, wherein the method comprises:
Identifying a set of criteria used to discriminate between a reputable category and an unreputable category, the criteria comprising an unreputable criterion and a reputable criterion Including,
Using statistical sampling to estimate the conditional probability that the messaging entity displays each criterion;
Further comprising calculating a reputation for each messaging entity,
The calculating step includes:
Assuming the messaging entity indicates or obeys the set of criteria, the messaging entity calculates an estimate of the conditional probability combination that is reputable, and each such criteria A messaging entity that indicates or follows is calculated by calculating the individual conditional probabilities that it is actually a reputable messaging entity; ,
Assuming that the messaging entity indicates or obeys the set of criteria, it computes an estimate of the combination of conditional probabilities that the messaging entity is not reputable, and each such decision By calculating the individual conditional probabilities that a messaging entity that indicates or follows a criterion is actually a messaging entity that is not well-reputed, it calculates the probability that the messaging entity is eligible for a negative reputation To do
Calculating a reputation at the messaging entity by applying a function to the probability.

6. The method of claim 5, wherein the reputation of each messaging entity is encoded in a 32-bit dotted decimal IP address format, the method comprising:
Creating a domain name server (DNS) zone that includes the reputation of all messaging entities in the world of messaging entities;
Distributing the reputation of a messaging entity to one or more computer systems through the DNS protocol, wherein the one or more computer systems utilize the reputation in their operations; Method.

6. The method of claim 5, wherein the set of criteria is a group:
Average spam profiler score, reverse domain name server lookup failer, membership in one or more real-time blacklists (RBLs), email volume, email burstiness, email brados, geography Location, malware activity, address type, classless inter-domain routing (CIDR) block containing numerous Internet protocol addresses identified as sending spam, user complaint rate, and honeypot discovery Selected from the following: proportions, proportions of undeliverable transmissions identified as complying with transmission behavior laws, rules and established standards, continuity of operation, and response to recipient demand. The method that is the criterion.

6. The method of claim 5, wherein the function used to encode the reputation of the messaging entity for a 32-bit dotted decimal IP address is:

Is that way.

The method of claim 1, wherein the method comprises:
Determining a probability of indicating a reputation based on the received identification data;
The probability of indicating a reputation is based on a range in which the identified one or more communication characteristics indicate a criterion associated with one or more reputations, or are subject to a criterion associated with one or more reputations. Indicate the reputation of the messaging entity,
Determining the reputation score includes determining the reputation score based on a sum of the determined probabilities;
The reputation score is determined based on applying the aggregate of the determined probabilities to a function;
The function is a function of each of the probabilities, wherein the messaging entity indicates a criterion related to reputation.

A transmission filtering method using a transmission sender reputation score, the method comprising:
Identifying at least one characteristic of the transmission from the sender;
Making a real-time query against the reputation system including the transmission characteristics;
Receiving a score representing a reputation associated with the transmission;
Performing an action corresponding to the score range of the sender's reputation on a transmission from the sender.

11. The method of claim 10, wherein the behavior is the following behavior:
Reject all further transmissions from the sender for a preset period or number of transmissions and silently drop all further transmissions from the sender for a preset period or number of transmissions Isolating all further transmissions from the sender for a preset period or number of transmissions, and for all further transmissions from the sender for a preset period or number of transmissions, A method comprising at least one of bypassing a particular filtering test.

11. The method of claim 10, wherein identifying the at least one characteristic comprises extracting unique identification information for the transmission, or authenticating unique identification information for the transmission, or A method comprising a combination of these.

A method of filtering a group of transmissions using a transmission sender's reputation score, the method comprising:
Grouping multiple transmissions together based on content similarity or similarity in transmission sender behavior;
Identifying at least one characteristic for each transmission in the grouping;
Querying the reputation system and receiving a score indicating the reputation of each sender;
Categorizing the group of transmissions based on the percentage of reputable and unreputable senders in the group.

14. The method of claim 13, wherein identifying the at least one characteristic includes extracting unique identification information for the transmission, or authenticating unique identification information for the transmission, or A method comprising a combination of these.

A method for adjusting and training a filtering system utilizing a transmission sender's reputation score in a trainable transmission set, the method comprising:
Identifying at least one characteristic of the transmission from the sender;
Query the reputation system and receive a score representing the reputation of the sender;
Classify transmissions into multiple categories based on the extent to which sender reputation scores are classified,
Passing the transmission and transmission classification category to another filtering system trainer to be used for optimization of the filtering system.

16. The method of claim 15, wherein identifying the at least one characteristic includes extracting unique identification information for the transmission, or authenticating unique identification information for the transmission, or A method comprising a combination thereof.

A method operating on one or more data processors to classify communications from a messaging entity, comprising:
Receiving communications from a messaging entity;
Using multiple message classification techniques to classify the communication;
Combining the message classification output to generate a message profile score,
The method wherein the message profile score is used to determine what action is taken for the communication associated with the messaging entity.

18. The method of claim 17, wherein the communication is an email message or VoIP communication, or an instant message communication.

18. The method of claim 17, wherein the communication is a legitimate email message, or a communication that violates spam, virus, or corporate policy.

18. The method of claim 17, wherein the message classification technique is a group:
Reverse DNS (RDNS) classification method, real-time black hole list (RBL) classification method, reputation server classification method, signature-based classification method, fingerprint-based classification method, message header analysis classification method, set of sender authentication classification methods, Bayesian A method comprising at least two techniques selected from a filtering statistical classification technique, a clustering classification technique, and a content filtering classification technique.

The method of claim 17, wherein each message classification technique is associated with a confidence value used to generate a message classification output from the message classification technique.

The method of claim 21, wherein a filter value from each of the classification techniques is multiplied by its associated confidence value to generate the message classification output.

23. The method of claim 22, wherein the message profile score is

Where SV _i is a confidence value associated with classification technique i and C _i is the output of the classification technique generated by classification technique i.

23. The method of claim 22, wherein the message profile score is

Where SV _1i and SV _2i are confidence values associated with classification technique i and C _i is the output of the classification technique generated by classification technique i.

18. The method of claim 17, wherein at least one of the message classification techniques includes a reputation scoring technique; the reputation scoring technique assigns a reputation probability to a messaging entity; and indicates the reputation Based on the extent to which the identified one or more communication characteristics indicate a criterion associated with one or more reputations or comply with a criterion associated with one or more reputations. A way of indicating reputation.

A system operating on one or more data processors to classify communications from a messaging entity,
A plurality of message classification techniques, wherein the plurality of message classification techniques are configured to classify communications received from a messaging entity;
Message profiling logic configured to combine the output of the message classifications to generate a message profile score;
With
The method wherein the message profile score is used to determine what action should be taken for the communication associated with the messaging entity.

27. The system of claim 26, wherein the communication is a legitimate message, or an unwanted communication, or a communication that violates a preselected policy;
The unwanted communications include spam or virus communications,
The preselected policy comprises a corporate communications policy, a messaging policy, a law or regulation policy, or an international communications policy.

27. The system of claim 26, wherein each message classification technique is associated with a confidence value used to generate a message classification output from the message classification technique,
The filter value from each of the classification techniques is multiplied by its associated confidence value to produce the message classification output.

30. The system of claim 28, wherein a tuner program is used to adjust the confidence value associated with the message classification technique.

30. The system of claim 29, wherein the tuner program uses a heuristic approach to adjust the confidence value.

30. The system of claim 29, wherein the tuner program uses a genetic algorithm to adjust the confidence value;
The genetic algorithm is designed with a fitness function that models the fitness of the candidate solution in the actual problem domain,
The fitness function represents a cost associated with an error made in attempting to correctly classify a message vector into a preselected set of data.

32. The system of claim 31, wherein the system is configured to operate with adaptive message blocking and white listing.

A method operating on one or more data processors for adjusting message classification parameters used by one or more message classification techniques, comprising:
Receiving multiple communications or multiple input data representing multiple communications;
Using a tuner program to adjust the message classification parameters associated with the message classification technique;
Communication is received from the messaging entity,
The adjusted message classification parameter is used by the plurality of message classification techniques to classify the received communication;
Message classification outputs from the plurality of message classifications are combined to generate a message profile score;
The method wherein the message profile score is used to determine what action should be taken for the communication associated with the messaging entity.

34. The method of claim 33, wherein the message classification parameter includes a confidence value.

35. The method of claim 34, wherein the tuner program uses a heuristic approach to adjust the confidence value.

36. The method of claim 35, wherein the tuner program uses a genetic algorithm to adjust the confidence value.

37. The method of claim 36, wherein the genetic algorithm is designed with a fitness function that models the fitness of candidate solutions in the domain in question.
The fitness function represents a cost associated with an error made in attempting to correctly classify a message vector in a set of pre-classified data.

37. The method of claim 36, wherein the fitness function is

And
Where N _CAT1 is the number of message vectors from the entire data set belonging to the first category,
Where N _CAT2 is the number of message vectors from the entire data set belonging to the second category,
Where C is a constant multiplier for messages incorrectly classified from the second category;
Where S _{CAT1_MIKSTAKEi} is the message profiler score for the message vector i from the first message category that has been misclassified to belong to the other category,
Where S _{CAT2_MISTAKEi} is the message profiler score for the message vector i from the second message category that was misclassified to belong to the other category,
Here, T is a message profiler numerical threshold, and if the threshold is exceeded, the message is considered to belong to the first category.

40. The method of claim 36, wherein the method is configured to operate with adaptive message blocking and white listing.