JP3270483B2

JP3270483B2 - System and method for protecting sensitive information in a database and enabling targeted advertising in a communication network

Info

Publication number: JP3270483B2
Application number: JP50320297A
Authority: JP
Inventors: ギフォード，ウォーレン，スタントン; グリフェス，ナンシー，デイヴィス; カッツ，ジェイムズ，エヴェレット
Original assignee: テルコーディアテクノロジーズインコーポレイテッド
Priority date: 1995-06-12
Filing date: 1996-06-10
Publication date: 2002-04-02
Anticipated expiration: 2016-06-10
Also published as: JP2000513463A

Description

【発明の詳細な説明】発明の分野本発明はデータベース内のある種の情報の機密性を保
持するシステムおよび方法に関する。本発明の実施の形
態によれば、データベースは、説明上、通信ネットワー
クの顧客に関する身上調査（デモグラフィ）情報を含
む。本システムおよび方法によれば、広告主は、その身
上調査が広告主指定のプロファイルに合致している特定
の顧客をターゲットにし、通信ネットワークを介して広
告することを可能にしている。具体的には、本方法およ
びシステムは、広告主が、顧客のプライベート（非パブ
リック）情報を、制御可能な不確実性レベルを越えて推
論できないことを保証し、もって、広告主が特定の顧客
に属する特定の機密情報を推論できないように、身上調
査データベースを処理することに関する。Description: FIELD OF THE INVENTION The present invention relates to systems and methods for maintaining the confidentiality of certain information in databases. According to embodiments of the present invention, the database illustratively includes demographic information about the customers of the communication network. In accordance with the present systems and methods, advertisers are able to target specific customers whose advertiser surveys match advertiser-specified profiles and advertise over communication networks. Specifically, the present methods and systems ensure that advertisers cannot infer customer private (non-public) information beyond a controllable level of uncertainty, so that To process a background check database so that certain confidential information belonging to.

発明の背景本発明は、任意の種類の情報インフラストラクチャ
（information infrastructure）における情報の配送に
関するものである。ここでは、本発明はビデオ・プログ
ラミングを配送できる通信ネットワーク・タイプの情報
インフラストラクチャを用いて説明する。BACKGROUND OF THE INVENTION The present invention relates to the distribution of information in any kind of information infrastructure. The invention is described herein using a communication network type information infrastructure capable of delivering video programming.

従来のケーブル・テレビジョン・ネットワークなどの
ように、広告や他のビデオプログラミングが配送されて
いる典型的なネットワークでは、広告が多数の顧客に無
差別に配送されている。このことが顧客にとって不都合
であるのは、顧客の中には、自分の関心のない広告の対
象となるものもでてくるからである。このことが広告主
にとって不都合であるのは、広告を届けたい顧客や、広
告に関心のない顧客を含めて、多数の顧客視聴者に広告
を配送する費用を広告主が負担しなければならないから
である。In a typical network, such as a conventional cable television network, where advertisements and other video programming are distributed, advertisements are distributed indiscriminately to a large number of customers. This is inconvenient for the customer because some of the customers are targeted for advertisements that are not of interest to them. This is inconvenient for the advertiser because the advertiser must pay for delivering the ad to a large audience, including those who want to reach the ad and those who are not interested in the ad. It is.

好ましい広告ストラテジ（strategy）では、広告主は
広告に関心を持つと思われ選定した顧客群をターゲット
にし、その選定した顧客群にだけ広告を配送している。
最近までは、このようなターゲット広告（targeted adv
ertisement）は、広告が配送されていた通信ネットワー
クでは、特定の顧客だけに広告を配送することができな
いため、ブロードキャスト通信では不可能であった。し
かるに、通信ネットワークの最近の進歩により、ブロー
ドキャストされる広告の選択的配送が可能になってい
る。図１はこの種の改良された従来の通信ネットワーク
10の例を示す。この例では、通信ネットワーク10は種々
のネットワーク、例えば、電話ネットワークと、コンピ
ュータ・ネットワークと、LAN（local area networ
k）、WAN（wide area network）と、ケーブル・テレビ
ジョン・ネットワークが可能である。図示のように、ネ
ットワーク10は、広告主のようなソース21と22を、顧客
のようなデスティネーション31、32、33および34に相互
結合している。通信ネットワーク10はビデオと、オーデ
ィオと、その他のデータを、ソース（例えば、ソース2
1）からデスティネーション31−34のうちの識別デステ
ィネーション（例えば、31と33）だけにトランスポート
することができる。例えば、ビデオと、オーディオと、
データは、パケットに編成されたビットストリームとし
て送信することが可能になっている。各パケットは、デ
スティネーション31、32、33および／または34を表し、
ネットワーク10全体を通して一意の少なくとも１つのID
（例えば、デスネィネーション31と33のID）を含むヘッ
ダ部分を含む。これらのIDはネットワークIDと呼ばれて
いる。パケットは、そのヘッダに入っているネットワー
ク・アドレスで指定された通りに、通信ネットワーク10
によりこれらのデスティネーション31と33だけにルーテ
ィングされる。In a preferred advertising strategy, the advertiser targets a selected group of customers who are likely to be interested in the advertisement and distributes the advertisement only to the selected group of customers.
Until recently, these targeted ads (targeted adv
ertisement) was not possible with broadcast communications because the communications network where the ads were delivered could not deliver the ads to only certain customers. However, recent advances in communication networks have enabled selective delivery of broadcasted advertisements. FIG. 1 shows an improved conventional communication network of this kind.
Here are 10 examples. In this example, communication network 10 includes various networks, for example, a telephone network, a computer network, and a local area network (LAN).
k), wide area network (WAN), and cable television network. As shown, network 10 interconnects sources 21 and 22, such as advertisers, to destinations 31, 32, 33, and 34, such as customers. The communication network 10 transmits video, audio, and other data to a source (eg, source 2).
From 1), the transport can be performed only to the identification destinations (for example, 31 and 33) among the destinations 31 to 34. For example, video, audio,
Data can be transmitted as a bitstream organized into packets. Each packet represents a destination 31, 32, 33 and / or 34,
At least one ID that is unique throughout the network 10
(Eg, the ID of the destinations 31 and 33). These IDs are called network IDs. The packet is sent to the communication network 10 as specified by the network address contained in its header.
Route to these destinations 31 and 33 only.

ターゲット広告ストラテジを実現するには、広告主は
広告のターゲットとなる顧客を判別できなければならな
い。１つの有利な方法として、顧客に関する身上調査デ
ータがデータベースの形に編成されている。データベー
スは、データ・アイテム（項目）の集まりからなり、デ
ータ・モデルに従って編成され、クエリー（query）に
よアクセスされるものと定義されている。ここでは、本
発明はリレーショナル・データベース・モデルを用いて
説明されている。リレーショナル・データベースまたは
リレーションは、情報の行（row）と列（column）を含
む２次元テーブルとして編成することができる。リレー
ションの各列は特定の属性に対応し、その属性のデータ
値を含むドメインを有する。リレーションの各行は各属
性からの１つの値を含み、レコードまたはタプルと呼ば
れている。To implement a targeted advertising strategy, the advertiser must be able to determine the customers targeted by the advertisement. In one advantageous way, the background survey data about the customer is organized in a database. A database is defined as consisting of a collection of data items, organized according to a data model, and accessed by a query. Here, the present invention has been described using a relational database model. A relational database or relation can be organized as a two-dimensional table that contains rows and columns of information. Each column of the relation corresponds to a particular attribute and has a domain that contains the data value of that attribute. Each row of the relation contains one value from each attribute and is called a record or tuple.

図２はリレーショナル・データベース（従来）Ｙの例
を示す。図２のリレーションＹはある母集団に関するデ
ータが収めてある。リレーションＹには、６つの属性ま
たは列２−１、２−２、２−３、２−４、２−５、２−
６を有し、それぞれはその母集団の名前と、年齢と、体
重と、身長と、社会保障番号と、内線電話番号のデータ
値がストアしてある。このデータベースには、12個のレ
コードまたはタプル３−１、３−２、３−３、..、３−
12もある。各タプル３−１、３−２、３−３、..、３−
12は各属性からの１つのデータ値をもっている。例え
ば、タプル３−10は、名前属性値「lee」と、年齢属性
値40と、体重属性値171と、身長属性値180と、社会保障
番号属性値999−98−7654、内線電話属性値0123を有す
る。FIG. 2 shows an example of a relational database (conventional) Y. The relation Y in FIG. 2 contains data on a certain population. Relation Y has six attributes or columns 2-1, 2-2, 2-3, 2-4, 2-5, 2-
6, each of which stores data values of the population, such as name, age, weight, height, social security number, and extension telephone number. This database contains 12 records or tuples 3-1, 3-2, 3-3,.
There are also twelve. Each tuple 3-1, 3-2, 3-3,.
12 has one data value from each attribute. For example, tuple 3-10 has a name attribute value “lee”, an age attribute value 40, a weight attribute value 171, a height attribute value 180, a social security number attribute value 999-98-7654, and an extension telephone attribute value 0123. Having.

広告の対象となるターゲット顧客を識別するために、
クエリーを含むプロファイルは、データベースに対して
実行される。クエリーは関心基準に合致するタプルをデ
ータベースから識別するために使用される。クエリーは
関心基準を指定する述部（predicate）を含むのが普通
である。例えば、リレーションＹに対して実行される次
のクエリー、すなわち、 Select from A where Y.Age〈15 OR Y.Age〉50 は「where Y.Age〈15 OR Y.Age〉50」の述部を含み、こ
れは、15より小か50より大の年齢属性値をもつタプルだ
けを識別することを指定している。従って、広告主はタ
ーゲット顧客視聴者を識別するために、リレーショナル
・データベースに対して実行されるプロファイルを作る
ことができる。To identify the target customers for your ads,
The profile containing the query is executed against the database. The query is used to identify tuples from the database that match the criteria of interest. Queries typically include predicates that specify criteria of interest. For example, the following query executed on relation Y: Select from A where Y.Age <15 OR Y.Age> 50 specifies the predicate of "where Y.Age <15 OR Y.Age>50". Contain, which specifies that only tuples with an age attribute value less than 15 or greater than 50 are identified. Thus, the advertiser can create a profile that is run against the relational database to identify the target customer audience.

このようなターゲット広告方式を実現するときに起こ
る問題は、顧客がリレーショナル・データベースを構築
するために必要な身上調査データを見境無く開示するこ
とに躊躇していることである。具体的には、顧客は、（１）個人顧客に関する生情報を直に放出すること、（２）個人顧客の放出されない情報を、あるプロファイ
ルに合致する顧客の同一性に関する情報から推論するこ
と、（３）特定の個人顧客の放出されない情報を、一連のプ
ロファイルに対応する広告を受け取るか、あるいは受け
取る予定の個人顧客の数と共に、一連のプロファイルの
知識から推論することを懸念している。A problem that arises when implementing such a targeted advertising scheme is that customers are hesitant to disclose the background survey data necessary to build a relational database. Specifically, the customer must: (1) directly emit raw information about the individual customer; (2) infer non-released information of the individual customer from information about the identity of the customer that matches a certain profile; (3) We are concerned about inferring the unreleased information of a particular personal customer from our knowledge of a set of profiles, along with the number of personal customers that will or will receive ads corresponding to the set of profiles.

プライバシを犯すおそれのある最初の２つは、Hardt
−Kornacki ＆ Yacobi,Securing End−User Privacy Du
ring Information Filtering PROC.OF THE CONF.ON HIG
H PERF.INFO.FILTERING,1991においてビデオを検索する
顧客の匿名性を保護するために行なわれているとの同じ
方法で、通信ネットワークを改良することで解消するこ
とができる。この種の改良ネットワークは図３に示され
ている。図示のように、この通信ネットワーク50は、図
１のネットワーク10と同じようにソース（広告主）61、
62とデスティネーション（顧客）71、72、73、74を相互
結合している。しかし、フィルタ・ステーション80と名
前トランスレータ・ステーション90も備えており、これ
らは通信ネットワーク50に接続されている。説明上、フ
ィルタ・ステーション80は顧客身上調査データのデータ
ベースを保持しているメモリ82をもっている。さらに、
フィルタ・ステーション80はメモリ82にストアされた身
上調査データベースに対してクエリーを実行することが
できるプロセッサ84をもっている。各ソースは、ソース
62のように、サーバ64とメモリ66をもっている。ソース
62のサーバ64は１つ以上のプロファイル（識別のターゲ
ット視聴者を識別するクエリーを含む）をフィルタ・ス
テーション80のプロセッサ64に送信する。プロセッサ84
はメモリ82にストアされたリレーショナル・データベー
スに対して各プロファイルクエリーを実行し、各クエリ
ーで識別された各顧客に割り当てられたエイリアス（al
ias）を検索する。次に、プロセッサ84は各プロファイ
ルの対応するエイリアスを、ソース62のサーバ64に送り
返し、これは将来の使用に備えてメモリ66にストアして
おくことができる。The first two potential privacy concerns are Hardt
−Kornacki & Yacobi, Securing End−User Privacy Du
ring Information Filtering PROC.OF THE CONF.ON HIG
This can be overcome by improving the communications network in the same way as is done in H PERF.INFO.FILTERING, 1991, to protect the anonymity of customers searching for videos. Such an improved network is shown in FIG. As shown, this communication network 50 is similar to the network 10 of FIG.
62 and destinations (customers) 71, 72, 73, 74 are interconnected. However, it also has a filter station 80 and a name translator station 90, which are connected to the communication network 50. By way of illustration, the filter station 80 has a memory 82 which holds a database of customer survey data. further,
The filter station 80 has a processor 84 that can execute a query against a personal survey database stored in a memory 82. Each source is a source
Like 62, it has a server 64 and a memory 66. Source
62 server 64 sends one or more profiles (including queries identifying the identified target audience) to processor 64 of filter station 80. Processor 84
Performs each profile query against the relational database stored in memory 82 and assigns the alias (al) assigned to each customer identified in each query.
ias). Processor 84 then sends the corresponding alias for each profile back to server 64 at source 62, which may store it in memory 66 for future use.

広告主ソース62が広告をターゲット顧客デスティネー
ション、例えば、デスティネーション72と74に送ること
を希望しているときは、サーバ64はその広告とエイリア
スをネットワーク50に送り込む。ネットワーク50は広告
とエイリアスを名前トランスレータ・ステーション90の
プロセッサ92に送信する。次に、プロセッサ92は、例え
ば、メモリ94にストアされた情報を用いて、エイリアス
を、対応するネットワーク・アドレスに変換する。次
に、名前トランスレータ・ステーション90のプロセッサ
92はネットワーク・アドレスを用いて、広告を顧客デス
ティネーション72、74に送信する。When advertiser source 62 wants to send an advertisement to a target customer destination, for example, destinations 72 and 74, server 64 sends the advertisement and alias to network 50. The network 50 sends the advertisement and the alias to the processor 92 of the name translator station 90. Next, processor 92 converts the alias to a corresponding network address, using, for example, information stored in memory 94. Next, the processor of the name translator station 90
92 sends the advertisement to customer destinations 72, 74 using the network address.

この改良通信システムでは、顧客デスティネーショ
ン、例えば、デスティネーション72は自身の身上調査情
報知っている。広告主ソース、例えば、ソース62はその
広告と、そのプロファイルと、何人の顧客が広告を受け
取るかを知っている。広告主は個別の顧客71−74のエイ
リアスだけを受け取る。従って、広告主は身上調査情報
を所有せず、顧客を識別する情報（ネットワーク・アド
レスなど）は与えられない。フィルタ・ステーション80
は身上調査データベースに関する情報を含み、広告主か
ら渡されたプロファイルを受け取る。名前トランスレー
タ・ステーション90はエイリアスをネットワーク・アド
レスに変換したものだけを含み、エイリアスと広告を受
け取る。ネットワーク50はデスティネーションの広告と
ネットワーク・アドレスだけを受け取る。In this improved communication system, the customer destination, e.g., destination 72, knows its own background survey information. The advertiser source, eg, source 62, knows the advertisement, its profile, and how many customers will receive the advertisement. Advertisers receive only aliases for individual customers 71-74. Thus, the advertiser does not own the background survey information and is not given information identifying the customer (such as a network address). Filter station 80
Contains information about the background survey database and receives the profile passed by the advertiser. The name translator station 90 receives only aliases and advertisements, including only translations of the aliases to network addresses. Network 50 receives only the destination advertisement and network address.

このような保護にもかかわらず、広告主は依然とし
て、各プロファイルに合致する顧客の数といったよう
に、プロファイルのクエリーを身上調査データベースに
対して実行した結果の一部を入手している。これだけで
も、顧客の個人情報を推論するのに十分である。例え
ば、広告主はZIPコード07090にいて、切手を収集してい
る100人の顧客のIDを知っているとする。さらに、広告
主はZIPコード07090にいて、切手を収集し、年収が＄5
0,000−＄100,000である、全ての顧客をターゲットとす
るプロファイルを渡したとする。100人のエイリアスが
広告主に戻されたとすると、広告主は100人の切手収集
家全員の給料範囲を推論することに成功することにな
る。Despite such protection, advertisers still obtain some of the results of querying the profile against the background database, such as the number of customers matching each profile. This alone is enough to infer the customer's personal information. For example, suppose the advertiser is on ZIP code 07090 and knows the IDs of the 100 customers collecting stamps. In addition, the advertiser is on ZIP code 07090, collects stamps, and earns $ 5
Suppose you pass a profile that targets all customers, which is 0,000- $ 100,000. If 100 aliases were returned to the advertiser, the advertiser would successfully infer the salary range of all 100 stamp collectors.

上記の脅威は、クエリー結果がプライベート情報の推
論につながることになるので「トラッカ攻撃（tracker
attack）」と呼ばれている。もっと一般的にいうと、
「トラッカ」は次の式の解を求める線形システムの特殊
ケースである。The above threats are known as "tracker attacks" because the query results can lead to inference of private information.
attack) ". More generally,
A “tracker” is a special case of a linear system that solves the following equation:

HX＝Ｑ（１）上式において、Ｈは対応するクエリーを満足するタプ
ルを表すマトリックスであり、そこでは、各カラムｊは
異なるタプルを表し、各行ｉは異なるクエリーを表し、
また各マトリックス要素はｊ番目のタプルがｉ番目のク
エリーの述部Ｃを満足している場合は、h_j＝１であり、
そうでない場合は０になっている。Ｃは各ｉ番目のクエ
リーで使用されている述部を表すベクトルであり、Ｘは
述部Ｃ（式（１）により解が求められる）を満足するタ
プル（不明）を表すベクトルであり、Ｑは各ｉ番目のク
エリーから戻され、要素q_iを含むカウントまたは他の結
果のベクトルであり、ここでq_iはｉ番目のクエリーで検
索されたタプルの属性全体の和（またはｉ番目のクエリ
ーから戻された他の結果）である。HX = Q (1) where H is a matrix representing tuples satisfying the corresponding query, where each column j represents a different tuple, each row i represents a different query,
Each matrix element has h _j = 1 when the j-th tuple satisfies the predicate C of the i-th query,
Otherwise, it is 0. C is a vector representing a predicate used in each i-th query, X is a vector representing a tuple (unknown) that satisfies the predicate C (the solution of which is obtained by equation (1)), and Q Is a vector of counts or other results returned from each i-th query and containing element q _i , where q _i is the sum of the attributes of the tuples retrieved in the i-th query (or the i-th query Other results returned from).

従来技術では、統計リレーショナル・データベースを
トラッカ攻撃から保護する解決方法がいくつか提案され
ている。Dobkin Jones ＆ Lipton,Secure Database Pro
tection Against User Inference,ACM TRANS,On DATABA
SE SYS.,vol.4,no 1,Mar.,1979,p.97−106は、クエリー
セットのオーバラップを制限すること、つまり、複数の
類似クエリーセットの受渡しを禁止し、この種の攻撃を
防止することを提案している。しかし、この種のコント
ロールは、以前に受け渡しされた全てのクエリーセット
の記録をとっておき、最近に受け渡しされたクエリーと
突き合わせて比較しなければならないので実現が困難で
ある。「セル抑止（cell−suppression）」手法も提案
されている。この手法では、機密情報を暴露するおそれ
のある統計、または他のクエリー実行結果は、決して放
出されないようにしている。しかし、セル抑止手法は２
次元および３次元テーブルを作成するクエリーでは使用
効率が最良であるが、ターゲット広告を実現するのに関
心のある任意のクエリーではそうでない。In the prior art, several solutions have been proposed to protect statistical relational databases from tracker attacks. Dobkin Jones & Lipton, Secure Database Pro
tection Against User Inference, ACM TRANS, On DATABA
SE SYS., Vol. 4, no 1, Mar., 1979, p. 97-106, restricts the overlap of query sets, that is, prohibits the passing of multiple similar query sets. It is proposed to prevent. However, this type of control is difficult to implement because it must keep a record of all previously passed query sets and compare them against recently passed queries. A "cell-suppression" approach has also been proposed. This approach ensures that statistics or other query execution results that could reveal sensitive information are never released. However, the cell suppression method is 2
Queries that create dimensional and three-dimensional tables are best used, but not any queries that are of interest in implementing targeted advertising.

ランダム・ノイズ（random noise）手法が提案されて
いるが、この手法では、クエリーから戻された結果から
ある乱数を差し引いている。この解決手法は、広告主に
提供される結果が本来的に不正確であるために、ターゲ
ット広告を実現するには不十分である。Warner,Randomi
zed Response:A Survey Technique for Eliminating Ev
asive Answer Bias,60 J.OF THE AM.STAT.ASSOC.p.63−
69（1965））で提案されている代替方式では、個人は時
間のあるパーセンテージで正しくない値をリレーショナ
ル・データベースに入力するおそれがある。この方式の
問題点は、広告主が時間のあるパーセンテージで正しく
ない視聴者を広告のターゲットとすることである。Denn
ing,Secure Statistical Database Under Random Sampl
e Queries,ACM TRANS.ON DATABASE SYS.vol.5,no.3,Sep
t.,1980,p.291−315は、ノイズ手法を開示し、そこで
は、クエリーはリレーショナル・データベース内のタプ
ルの全てではなく、タプルのランダム・サブセットだけ
に適用されている。上述した個々の欠点のほかに、上述
したノイズ付加手法の１つまたは２つ以上は、その地位
を種々のノイズ除去手法に譲る可能性がある。A random noise technique has been proposed, in which a random number is subtracted from the results returned from the query. This solution is insufficient to achieve targeted advertising because the results provided to advertisers are inherently inaccurate. Warner, Randomi
zed Response: A Survey Technique for Eliminating Ev
asive Answer Bias, 60 J.OF THE AM.STAT.ASSOC.p.63−
69 (1965)), the individual may enter an incorrect value into the relational database at some percentage of the time. The problem with this approach is that the advertiser targets the ad to an incorrect audience at some percentage of the time. Denn
ing, Secure Statistical Database Under Random Sampl
e Queries, ACM TRANS.ON DATABASE SYS.vol.5, no.3, Sep
t., 1980, pp. 291-315, disclose a noise approach, where the query is applied to only a random subset of the tuples, rather than all of the tuples in the relational database. In addition to the individual drawbacks described above, one or more of the above-described noise adding techniques may yield their status to various noise removal techniques.

Yu ＆ Chin,A Study on the Protection of Statisti
cal Database,PROC.ACM SIGMOD INT'L CONF.ON THE MGM
T.OF DATA,p.169−181（1977）と、Chin ＆ Ozsoyoglu,
Security in Partitioned Dynamic Statistical Databa
se,PROC.IEEE COMPSAC CONF.,p.594−601（1979））
は、リレーショナル・データベースを非結合パーティシ
ョン（disjoint partitions）に区分する手法を開示し
ている。Yu & Chin, A Study on the Protection of Statisti
cal Database, PROC.ACM SIGMOD INT'L CONF.ON THE MGM
T.OF DATA, p.169-181 (1977) and Chin & Ozsoyoglu,
Security in Partitioned Dynamic Statistical Databa
se, PROC. IEEE COMPSAC CONF., p. 594-601 (1979))
Discloses a technique for partitioning a relational database into disjoint partitions.

上記の手法はいずれも、主に統計リレーショナル・デ
ータベース用に開発されたものであり、ターゲット広告
の実現を可能にする特性をもっていない。具体的には、
上記の手法では、クエリーを満足するタプルを正確に識
別することも、検索されたこのようなタプルの正確なカ
ウント（または他の戻されるクエリー結果）を得ること
もできない。しかるに、これらの特性はどちらも、ター
ゲット広告では重要である。第１に、重要なことは、身
上調査データが渡されたプロファイルに合致する全ての
顧客を正確にターゲットとすることである。第２に、重
要なことは、広告主に請求書を発行する目的のためと、
プロファイルが広告を受け取る、望ましい数の顧客を識
別していたかどうかを判断する目的のために、識別され
た顧客の正確なカウントを得ることである。None of the above techniques were developed primarily for statistical relational databases and have no characteristics that enable the realization of targeted advertising. In particular,
With the above approach, neither the tuples that satisfy the query are accurately identified, nor the exact count of such tuples retrieved (or other returned query results). However, both of these characteristics are important in targeted advertising. First, it is important to accurately target all customers whose background survey data matches the passed profile. Second, importantly, for the purpose of billing advertisers,
To obtain an accurate count of the identified customers for the purpose of determining whether the profile has identified the desired number of customers receiving the advertisement.

よって、本発明の目的は、従来技術の問題点を解消す
ることである。本発明の別の目的は、顧客の機密情報の
プライバシを保持するターゲット広告方法を提供するこ
とである。具体的には、本発明の目的は顧客に関する機
密情報を、身上調査リレーショナル・データベースに対
して実行された１つ以上のプロファイル・クエリーの結
果から広告主が推論する能力を低減することである。Therefore, an object of the present invention is to eliminate the problems of the prior art. It is another object of the present invention to provide a targeted advertising method that maintains the privacy of customer confidential information. Specifically, it is an object of the present invention to reduce the advertiser's ability to infer confidential information about a customer from the results of one or more profile queries performed against a background survey relational database.

発明の概要上記目的およびその他の目的は本発明によって達成さ
れる。本発明の一実施の形態によれば、通信システム環
境で使用される、データベース内の情報の機密を保つこ
とを可能にしている。従来の通信システムと同じよう
に、この実施の形態では、広告主と、顧客と、フィルタ
・ステーションと、名前トランスレータ・ステーション
は、通信ネットワークで相互接続されている。説明上、
フィルタ・ステーションは顧客に関する情報の身上調査
データベースを保持している。しかし、本発明は、任意
の種類の情報をストアしているデータベースでも有効に
働き、またリレーショナル・データベースと非リレーシ
ョナル・データベースのどちらでも有効に働く。広告の
対象となるターゲット視聴者を得るために、広告主はク
エリーを収めている１つ以上のプロファイルをフィルタ
・ステーションに渡すことができる。フィルタ・ステー
ションは身上調査データベースに対してプロファイル・
クエリーを実行し、ターゲット視聴者のプロファイルに
合致する顧客に対応するタプルを識別する。顧客の匿名
性を保つために、フィルタ・ステーションはプロファイ
ルで識別された顧客の識別情報ではなく、その顧客のエ
イリアスを広告主に送信する。広告主が広告をターゲッ
ト顧客視聴者に配送することを望んでいるときは、広告
主はその広告とエイリアスを通信ネットワークを通して
名前トランスレータ・ステーションに送信する。次に、
名前トランスレータ・ステーションは受信したエイリア
スを、その変換テーブルを用いて顧客のネットワーク・
アドレスに変換し、広告を通信ネットワークを通して顧
客に送信する。SUMMARY OF THE INVENTION The above and other objects are achieved by the present invention. According to an embodiment of the present invention, it is possible to keep information in a database confidential used in a communication system environment. As in conventional communication systems, in this embodiment, the advertiser, the customer, the filter station, and the name translator station are interconnected by a communication network. For explanation,
The filter station maintains a background survey database of information about customers. However, the invention works well with databases that store any type of information, and works well with both relational and non-relational databases. To obtain a targeted audience for advertising, an advertiser can pass one or more profiles containing queries to a filter station. The filter station provides a profile
Perform a query to identify tuples corresponding to customers that match the profile of the target audience. To keep the customer anonymous, the filter station sends the customer's alias to the advertiser instead of the customer's identity identified in the profile. When an advertiser wants to deliver an advertisement to a target customer audience, the advertiser sends the advertisement and the alias to the name translator station over the communication network. next,
The name translator station uses the translation table to translate the received alias into the customer's network
Translates the address and sends the advertisement to the customer over the communication network.

従来の通信ネットワークと同じように、本発明の実施
の形態による通信ネットワークは広告主が身上調査リレ
ーションナル・データベースにアクセスするのを制限
し、広告主には顧客の実際のネットワーク・アドレスで
はなく、エイリアスを公表してい。これは、（１）データベース内の生情報が広告主に公表されるこ
と、（２）顧客のIDから機密情報を推論することを防止する。しかし、慣用の通信システムと異なり、本
発明によれば、広告主から渡されたプロファイル・クエ
リーに対する応答としてフィルタ・ステーションから戻
された結果から広告主が機密情報を推論する能力も低減
させている。すなわち、本発明は、トラッカ攻撃とその
他の種類の機密漏洩から保護している。つまり、広告主
がデータベース内の顧客に関する機密情報を、例えば、
プロファイル・クエリーに対する応答として戻されたエ
イリアスの数だけから推論する試みから保護している。As with conventional communication networks, communication networks according to embodiments of the present invention limit advertisers from accessing the background survey relational database and provide the advertiser with the customer's actual network address, Has published an alias. This prevents (1) raw information in the database from being published to advertisers, and (2) inferring confidential information from customer IDs. However, unlike conventional communication systems, the present invention also reduces the advertiser's ability to infer confidential information from results returned from the filter station in response to a profile query passed by the advertiser. . That is, the present invention protects against tracker attacks and other types of security leaks. This means that an advertiser can use sensitive information about customers in the database,
Protects against attempts to infer only from the number of aliases returned in response to a profile query.

この保護を本発明で達成するために、属性は２つのク
ラス、つまり、機密保護が得られないパブリック（publ
ic）属性と、機密保護が得られるプライベート（privat
e）属性に分けられている。広告主がプライベート属性
値を推論するのを防止するために、そのあと、データベ
ースはパブリック属性値とプライベート属性値との間の
高い相関関係を低減するように処理される。１つ以上の
識別パブリック属性値のベクトルは、次の場合には、プ
ライベート属性値と高い相関関係をもつと言われる。To achieve this protection in the present invention, attributes are of two classes: public (non-secure).
ic) Attributes and confidential private (privat)
e) It is divided into attributes. To prevent advertisers from inferring private attribute values, the database is then processed to reduce the high correlation between public and private attribute values. A vector of one or more identifying public attribute values is said to have a high correlation with a private attribute value if:

（１）識別のパブリック属性値のベクトルは、パブリッ
ク属性値のベクトルに一致するパブリック属性値をもつ
データベースのタプルのグループを識別する。(1) The identification public attribute value vector identifies a group of database tuples having a public attribute value that matches the public attribute value vector.

（２）識別されたグループのプライベート属性の値に関
する不確実性レベルは、予め定めた閾値レベル未満であ
る。(2) The uncertainty level regarding the value of the private attribute of the identified group is less than a predetermined threshold level.

別の言い方をすると、タプルのパブリック属性値の特
定のベクトルは、少数のプライベート属性値に一致する
ことがあると、パブリック属性値が既知であるときプラ
イベート属性値に関する不確実性が低下することにな
る。最悪のケースでは、パブリック属性値のベクトルは
１つのプライベート属性値にだけ一致することになる。
従って、パブリック属性の識別ベクトルで識別されたタ
プル・グループの実際のプライベート属性値を判断する
ときの確実性レベルが高くなることになる。説明上、こ
のようなベクトルで識別されたグループの離散的に異な
るプライベート属性値の数が、予め定めた閾値の数未満
であれば、パブリック属性の相関関係は異常に高くな
る。以下では、１つ以上のプライベート属性値との相関
関係が異常に高いパブリック属性値は「高相関パブリッ
ク属性値（highly corelative public attribute valu
e）」と呼ぶことにする。Stated another way, a particular vector of tuple public attribute values may match a small number of private attribute values, reducing the uncertainty about private attribute values when the public attribute values are known. Become. In the worst case, a vector of public attribute values will match only one private attribute value.
Therefore, the certainty level when judging the actual private attribute value of the tuple group identified by the identification vector of the public attribute becomes higher. For the sake of explanation, if the number of discretely different private attribute values of the group identified by such a vector is less than the predetermined threshold value, the correlation of the public attribute becomes abnormally high. In the following, a public attribute value having an abnormally high correlation with one or more private attribute values is referred to as a “highly corelative public attribute value”.
e) ".

本発明の一実施の形態によれば、プライベート属性値
と高い相関関係にあるパブリック属性値を含むタプル
は、タプルのパブリック属性をカムフラージュするか、
あるいは、このようなタプルをデータベース内のIDから
除去するように処理される。タプルは、タプルの１つ以
上の特定のプライベート属性値と高い相関関係をもつ、
タプルの特定のパブリック属性値をタプルの他のパブリ
ック属性値と組み合せて、その相関関係を低減させるこ
とによって「カムフラージュ」される。According to one embodiment of the present invention, a tuple including a public attribute value that is highly correlated with a private attribute value camouflages the public attribute of the tuple,
Alternatively, such a tuple is processed to be removed from the ID in the database. A tuple has a high correlation with one or more specific private attribute values of the tuple,
It is “camouflaged” by combining a particular public attribute value of the tuple with other public attribute values of the tuple to reduce their correlation.

従って、本発明による方法およびシステムでは、属性
はプライベートまたはパブリックとして分類され、パブ
リック属性とプライベート属性との間の相関は高相関パ
ブリック属性値をカムフラージュすることより低減され
る。本発明によれば、身上調査リレーショナル・データ
ベースに対して実行されたクエリーの結果からプライベ
ート情報を推論するときの調整可能な不確実性レベルが
導入されている。Thus, in the method and system according to the present invention, attributes are classified as private or public, and the correlation between public and private attributes is reduced by camouflaging highly correlated public attribute values. According to the present invention, an adjustable level of uncertainty is introduced when inferring private information from the results of a query performed on a background research relational database.

図面の簡単な説明図１は通常の通信ネットワークの従来例を示す図であ
る。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing a conventional example of a normal communication network.

図２は身上調査リレーショナル・データベースの従来
例を示す図である。FIG. 2 is a diagram showing a conventional example of a relational database for personal examination.

図３は顧客ネットワーク・アドレスのプライバシ保護
機能を備えた従来の通信ネットワークを示す図である。FIG. 3 is a diagram showing a conventional communication network having a privacy protection function of a customer network address.

図４は本発明実施の形態の通信ネットワークであって
プライベート顧客情報を匿名保護機能を有する通信ネッ
トワークを示す図である。FIG. 4 is a diagram showing a communication network according to an embodiment of the present invention, which has a function of anonymously protecting private customer information.

図５は本発明一実施の形態の方法を示すフローチャー
トである。FIG. 5 is a flowchart illustrating a method according to an embodiment of the present invention.

発明の詳細な説明本発明は、既に説明したように、リレーショナル・デ
ータベースおよび非リレーショナル・データベースであ
れ、通信ネットワークを含む種々の環境であれ、仮想的
にはあらゆる種類のデータの機密を保護することができ
る。以下、説明を簡明にするため、本発明の説明を、通
信ネットワーク環境を用いるとともに、身上調査情報を
含むリレーショナル・データベースを用いて行う。以下
で説明する実施の形態では、広告主は、広告を行うター
ゲット視聴者を識別する目的で、リレーショナル身上調
査データベースを実行するか否かのクエリーを行うが、
これは説明上の例である。本発明は、他のゴールを達成
するためにクエリーが行われる他のアプリケーションに
も適用できる。DETAILED DESCRIPTION OF THE INVENTION The present invention is directed to protecting the confidentiality of virtually any kind of data, as described above, whether in relational or non-relational databases or in various environments, including communication networks. Can be. Hereinafter, for the sake of simplicity, the present invention will be described using a communication network environment and a relational database including personal examination information. In the embodiments described below, the advertiser queries whether to run a relational background survey database for the purpose of identifying the target audience for advertising,
This is an illustrative example. The invention is applicable to other applications where queries are made to achieve other goals.

図４は本発明の通信ネットワーク100の例を示す。図
４に示すように、広告主121、122と、顧客131、132、13
3、134と、名前トランスレータ・ステーション140が存
在し、それぞれ、通信ネットワーク100に接続されてい
る。さらに、本発明に従って適応化されたフィルタ・ス
テーション150が存在する。フィルタ・ステーション150
には、プロセッサ155とメモリ160が接続されている。FIG. 4 shows an example of the communication network 100 of the present invention. As shown in FIG. 4, advertisers 121, 122 and customers 131, 132, 13
There are 3, 134 and a name translator station 140, each connected to the communication network 100. Furthermore, there is a filter station 150 adapted according to the invention. Filter station 150
, The processor 155 and the memory 160 are connected.

慣用のフィルタ・ステーション80（図３）のプロセッ
サ84とメモリ82（図３）のように、プロセッサ155とメ
モリ160も、生データが広告主121−122に開示されない
ようにする種々の機能を奏する。プロセッサ155とメモ
リ160は、広告主121−122が顧客のIDから（顧客のネッ
トワーク・アドレスから）プライベート情報を推論でき
ないようにする機能も備えている。プロセッサ155は顧
客131−134から身上調査情報を受信することができ、身
上調査リレーショナル・データベースを構築することが
できる。プロセッサ155は身上調査リレーショナル・デ
ータベースをメモリ160にストアすることができる。プ
ロセッサ155は、リレーショナル・データベースに対し
て実行するか否かのクエリーを含むプロファイルを、広
告主121−122、例えば、広告主122から受信することも
できる。そのプロファイルを受信すると、プロセッサ15
5はそのプロファイルとマッチするリレーショナル・デ
ータベースのタプルを識別する。そして、プロセッサ15
5はIDとエイリアスを広告主122に送信する。Like processor 84 and memory 82 (FIG. 3) of conventional filter station 80 (FIG. 3), processor 155 and memory 160 also perform various functions to prevent raw data from being disclosed to advertisers 121-122. . Processor 155 and memory 160 also provide functionality to prevent advertisers 121-122 from inferring private information from the customer's ID (from the customer's network address). Processor 155 can receive the background survey information from customers 131-134 and can build a background survey relational database. Processor 155 may store a background check relational database in memory 160. Processor 155 may also receive a profile from advertisers 121-122, eg, advertiser 122, that includes a query as to whether to execute against a relational database. Upon receiving that profile, processor 15
5 identifies the relational database tuple that matches the profile. And processor 15
5 sends the ID and alias to the advertiser 122.

フィルタ・ステーション150のプロセッサ155とメモリ
160は、広告主によるプロファイル・クエリーに応答し
てフィルタ・ステーションにより戻された結果からプラ
イベート情報を推論する能力を、広告主から削減するた
めに、身上調査リレーショナル・データベースを処理す
る機能も奏する。以下の説明は、プロファイル・クエリ
ーに応答として戻されたどのような結果にも適用できる
一般的な説明であるが、広告主は、プライベート情報を
推論するのに、戻されたエイリアスの数を用いるものと
する。Processor 155 and memory of filter station 150
160 also plays a role in processing the background research relational database to reduce the advertiser's ability to infer private information from the results returned by the filter station in response to the advertiser's profile query. The following description is a general description that can be applied to any result returned in response to a profile query, but advertisers use the number of aliases returned to infer private information Shall be.

プロセッサ155とメモリ160の処理は、要するに、デー
タベースを、機密保護を与える必要のないパブリック属
性と、機密保護が与えられるプライベート属性とに区分
することということができる。機密保護を与える場合に
は、身上調査リレーショナル・データベースの情報の中
には、既に、パブリックになっているとみなされている
が、あるいは、機密保護の価値がないものとみなすもの
もある点に注意されたい。例えば、ZIPコードと、電話
番号と、職業と、食餌制限と、所得水準のような属性を
含むフリークエント・フライヤ・データベース（freque
nt flyer database）について考えてみる。個々の顧客
の電話番号は電話帳にして広く公表することもできる。
さらに、個々の顧客の職業は、広く公表されていない
が、機密性がないか、あるいは、私事ではない（non−p
ersonal）と考えることができる。一方、食餌制限や所
得水準のような情報は、私事であり、機密性がある情報
と仮定することができる。データベースを区分した後、
パブリック属性とプライベート属性の間の相関関係は、
相関性の高い一部のパブリック属性値をカムフラージュ
するとともに、カムフラージュが困難であり高相関性パ
ブリック属性値を含むタプルを即時に除去することによ
って、希薄にされる。The processing of the processor 155 and the memory 160, in essence, can partition the database into public attributes that do not need to be secured and private attributes that are given security. When providing security, some information in the background research relational database may already be deemed public or otherwise deemed to have no security value. Please be careful. For example, a frequent flyer database (freque flyer database) that contains attributes such as ZIP codes, phone numbers, occupations, dietary restrictions, and income levels.
nt flyer database). Individual customer phone numbers can be widely publicized in a telephone directory.
In addition, the profession of each customer is not widely publicized, but is not confidential or non-private (non-p
ersonal). On the other hand, information such as dietary restrictions and income levels are private and can be assumed to be confidential. After partitioning the database,
The correlation between public and private attributes is
Some public attribute values with high correlation are camouflaged, and camouflage is diluted by immediately removing tuples that are difficult and contain high correlation public attribute values.

プロセッサ155は各タプルを一意的に識別するID属性
をデータベースから区分することもできる。このような
IDであれば、ネットワーク・アドレス、社会保障番号、
等々にすることができる。このような情報をプロファイ
ル・クエリーの対象として用いることができるのは、プ
ロファイル・クエリーがプライベート属性に対して実行
されない場合か、あるいは、プロファイル・クエリーが
データベースの対応するタプルを更新するためにだけ用
いられる場合である。Processor 155 may also partition ID attributes from the database that uniquely identify each tuple. like this
If ID, network address, social security number,
And so on. Such information can be used in profile queries only if the profile query is not performed on private attributes or if the profile query is only used to update the corresponding tuple in the database. Is the case.

説明上、パブリック属性は、さらに、重要パブリック
属性と非重要パブリック属性に区分される。広告主に
は、非重要パブリック属性よりも確実性の高い重要パブ
リック属性の属性値を指定することが許可されている。
説明上、広告主は、属性のどれを重要と扱うかを指定す
ることができる。以下に説明する本発明は、重要パブリ
ック属性と非重要パブリック属性に区分する例である。For the purpose of explanation, public attributes are further divided into important public attributes and non-important public attributes. Advertisers are allowed to specify attribute values of important public attributes that are more certain than non-important public attributes.
By way of illustration, an advertiser can specify which attributes are treated as important. The present invention described below is an example of classifying into important public attributes and non-important public attributes.

以下の説明では、ベクトルＡは指定されたタプルのセ
ットまたはグループのパブリック属性を表し、ベクトル
Ａの各コンポーネント＜A₁,...,A_n＞は個々のパブリッ
ク属性ベクトルを表す。ベクトルＡ′は指定されたタプ
ルのセットまたはグループの重要パブリック属性を表
し、Ａ′の各コンポーネント＜A₁,...,A_m＞は個々の重
要パブリック属性ベクトルを表す。ベクトルＡ″は指定
されたタプルのセットまたはグループの非重要パブリッ
ク属性を表し、Ａ″の各コンポーネント＜Ａ″₁,...,
A″_ｔ＞は個々の非重要パブリック属性ベクトルを表
す。ベクトルＰは指定されたタプルのセットまたはグル
ープのプライベート属性を表し、コンポーネント＜
P₁,...,P_q＞は個々のプライベート属性ベクトルを表
す。ベクトルＫはプライベート属性Ｐの不確実性閾値の
ベクトルを表す。説明上、Ｋの各スケーラ・コンポーネ
ントk_iは、P_iの離散的な異なるプライベート属性値の閾
値カウントである。各不確実性閾値k_iは固定にすること
もでき、機密保護のレベルを調整するようにプロセッサ
155によって動的に調整することもできる。ベクトル
Ｖ、Ｖ′、Ｖ″、Ｖ、Ｕは、単一のタプルのパブリッ
ク属性Ａ、Ａ′またはＡ″に特有のスケーラ属性値＜
v₁,..,Vn＞、＜ｖ′₁,..,v′_n,..v′_ｍ＞等々の離散的
なベクトルを表す。ここで、Ａ′_１＝v₁,...,A′_ｎ＝v_n
は、パブリック属性値、例えば、Ａ′_１が、対応する離
散的なスケーラ属性値、例えば、v₁をとる単一のタプル
（つまり、リレーショナル・データベースの行）をい
う。In the following description, vector A represents the public attributes of a set or group of the specified tuple, each component <A ₁ ,...,A _n vector A> represents the individual public attribute vector. Vector A 'represents an important public attributes of a set or group of the specified tuple, A' each component <A ₁ ,...,A _m of> represents a separate important public attribute vector. The vector A "represents the non-significant public attributes of the specified set or group of tuples, where each component of A"<A" ₁ , ...,
A " _t > represents an individual non-significant public attribute vector. Vector P represents the private attribute of the specified set or group of tuples, and the component <
P ₁ , ..., P _q > represent individual private attribute vectors. The vector K represents a vector of the uncertainty threshold of the private attribute P. By way of illustration, each scalar component k _i of K is a threshold count of discrete different private attribute values of P _i . Each uncertainty threshold k _i can be fixed and the processor adjusts the level of security.
155 can also be adjusted dynamically. The vectors V, V ', V ", V, U are the scalar attribute values <that are unique to a single tuple's public attribute A, A' or A".
v ₁ , .., Vn>, <v ′ ₁ , .., v ′ _n , .. v ′ _m >, etc. Here, A ′ ₁ = v ₁ ,..., A ′ _n = v _n
The public attribute values, e.g., A _'1 is the corresponding discrete scalar attribute values, for example, a tuple taking v ₁ (i.e., a row in a relational database) refers to.

図５は身上調査情報の機密を広告主121−122による推
論から保護するためにプロセッサ155とメモリ160によっ
て実行されるプロセスを説明するフローチャートであ
る。第１のステップ202にて、プロセッサ155は、データ
ベースの属性を、非機密情報を含むパブリック属性
A₁,..,A_nと、機密情報を含むプライベート属性P₁,..,P_q
に区分する。例えば、属性は年齢と、身長と、宗教的信
条と、給料とする。おそらく、年齢と身長の属性はパブ
リック属性として指定でき、宗教的信条と給料の属性は
プライベート属性として指定できる。FIG. 5 is a flowchart illustrating a process performed by processor 155 and memory 160 to protect the confidentiality of the background information from inference by advertisers 121-122. In a first step 202, the processor 155 converts the attributes of the database to public attributes including non-confidential information.
A ₁ , .., A _n and private attributes P ₁ , .., P _q including confidential information
Classify into. For example, the attributes are age, height, religious beliefs, and salary. Perhaps the age and height attributes could be designated as public attributes, and the religious beliefs and salary attributes could be designated as private attributes.

次に、ステップ204−226にて、プロセッサ155は、デ
ータベースのタプルのパブリック属性とプライベート属
性との間の高相関関係を取り除く。別の言い方をする
と、固有の属性値Ｖの具体的なベクトル、例えば、A₁＝
v₁、A₂＝v₂,...,A_n＝v_nを考慮する。このベクトルＶ
は、Ｖにマッチするパブリック属性A₁,..,A_nに対する値
を有するタプルのグループを識別する。ベクトルＶによ
り識別されたタプルのグループには、このグループのｉ
番目のプライベート属性P_iの値の近傍に、不確実性レベ
ルk_iの閾値レベルが存在することを保証するため、デー
タベースが処理される。ここで、年齢と職業のパブリッ
ク属性のみと、給料範囲のプライベート属性のみを有す
るデータベースについて考えることにする。このデータ
ベースは、年齢と職業のベクトル（例えば、〈age:35,o
ccupation:doctor〉）を有し、このベクトルは、給料
（例えば、salary:top 5％）に対して、比較的少ない異
なる値を有する。このデータベース処理では、容易に推
論される虞のあるプライベート属性値を有するタプルを
「カムフラージュ」するために、ある属性値が組み合わ
される。カムフラージュできない他のタプルは除去され
る。Next, at steps 204-226, the processor 155 removes the high correlation between the public and private attributes of the tuple in the database. Stated another way, a specific vector of the unique attribute value V, for example, A ₁ =
Consider v ₁ , A ₂ = v ₂ , ..., A _n = v _n . This vector V
Is public attributes A ₁ that matches V, .., identifying the groups of tuples having a value for A _n. The group of tuples identified by vector V contains the i of this group
The database is processed to ensure that there is a threshold level of uncertainty level k _i near the value of the th private attribute P _i . Here, consider a database having only the public attribute of age and occupation and only the private attribute of the salary range. This database contains age and occupation vectors (eg, <age: 35, o
ccupation: doctor>), which has relatively few different values for salary (eg, salary: top 5%). In this database process, certain attribute values are combined to "camouflage" tuples having private attribute values that may be easily inferred. Other tuples that cannot be camouflaged are removed.

（さらに詳しく説明するが、「除去された」タプルは、
数ある方法のうちの１つ方法として取扱われる。例え
ば、除去されたタプルは、クエリーの実行から除外し
て、ターゲット広告を受け取らないようにすることがで
きる。あるいはまた、「除去された」タプルは、クエリ
ー実行またはターゲット広告のいずれからも除外されな
い。しかし、プロセッサ155は、このような除去された
タプルのプライベート属性値の機密が、クエリー実行に
よって侵されないように保証する処置をとる必要があ
る。）ステップ204−210にて、プロセッサ155はデータベー
スを「安全（safe）」セットＦのタプルと「非安全」セ
ットＲのタプルとに区分する。ステップ204にて、プロ
セッサは、重要パブリック属性値Ｖ′の可能なベクトル
をそれぞれ作る。重要パブリック属性値Ｖ′は各重要パ
ブリック属性Ａ′₁,...,A′_j,..,A′_ｍに対して１つの
属性値v₁,..v_j,..,v_mを含む。例えば、ベクトル〈age＝
53,occupation＝doctor〉；〈age＝35,occupation＝doc
tor〉；〈age＝35,occupation＝minister〉；等々は、
別個のベクトルであって、重要パブリック属性（年齢
と、体重と、職業）と、プライベート属性（給料）を用
いて、データベース上に作ることができる。タプルのグ
ループはこれらのベクトルＶ′にそれぞれ対応してい
る。つまり、特定のグループの各タプルはそのグループ
が対応するベクトルＶ′と同じ重要属性値を含む。例え
ば、ベクトル〈age＝35,occupation＝minister〉は、次
のタプル、すなわち、 age＝35,occupation＝minister,salary＝70％ age＝35,occupation＝minister,salary＝70％ age＝35,occupation＝minister,salary＝65％ age＝35,occupation＝minister,salary＝35％ age＝35,occupation＝minister,salary＝40％ age＝35,occupation＝minister,salary＝40％ age＝35,occupation＝minister,salary＝15％を識別することができる。(More on that, the "removed" tuple is
Treated as one of many methods. For example, removed tuples can be excluded from performing the query so that they do not receive targeted ads. Alternatively, "removed" tuples are not excluded from either query execution or targeted advertising. However, processor 155 must take action to ensure that the confidentiality of the private attribute values of such removed tuples is not compromised by query execution. At steps 204-210, the processor 155 partitions the database into "safe" set F tuples and "unsafe" set R tuples. At step 204, the processor creates each possible vector of important public attribute values V '. Important public attribute value V 'each important public attributes _{A' 1, ..., A '} j, .., A' 1 single attribute value for _{_{_{m v 1, .. v j,}}} .., a v _m Including. For example, the vector <age =
53, occupation = doctor>;<age = 35, occupation = doc>
tor>;<age = 35, occupation = minister>;
It is a separate vector that can be created on a database using important public attributes (age, weight, and occupation) and private attributes (salary). A group of tuples respectively correspond to these vectors V '. That is, each tuple in a particular group contains the same important attribute values as the vector V 'to which the group corresponds. For example, the vector <age = 35, occupation = minister> is the following tuple: age = 35, occupation = minister, salary = 70% age = 35, occupation = minister, salary = 70% age = 35, occupation = minister, salary = 65% age = 35, occupation = minister, salary = 35% age = 35, occupation = minister, salary = 40% age = 35, occupation = minister, salary = 40% age = 35, occupation = minister, salary = 15% can be identified.

ステップ206にて、このようにして作られた各グルー
プに対して、プロセッサ155は、そのグループの各ｉ番
目プライベート属性P_iの別個の属性値の数を、対応する
非確実性閾値と比較する。各ｉ番目のプライベート属性
P_iに対して、そのグループに、少なくともk_i個の離散的
なプライベート属性値がある場合には、プロセッサ155
は、ステップ208にて、タプルのグループをセットＦに
加える。そうでない場合には、プロセッサ155は、ステ
ップ210にて、タプルのグループをセットＲに加える。
例えば、上記の年齢、職業、給料の例では、k_iは４にセ
ットされているとする。このような場合には、プライベ
ート属性である給料に対する５個の離散的な値、つま
り、70％、65％、40％、35％および15％が存在する。他
方、ベクトル〈age＝35 occupation＝doctor〉に対し
て、タプルの別のグループが次のように識別される。In step 206, with respect to each such group made in the processor 155 compares the number of distinct attribute values of each i-th private attribute P _i of the group, to the corresponding non-certainty threshold . Each ith private attribute
For P _i , if the group has at least k _i discrete private attribute values, processor 155
Adds the group of tuples to set F in step 208. Otherwise, the processor 155 adds the group of tuples to the set R at step 210.
For example, the age, occupation, the salary of example, and k _i is set to 4. In such a case, there are five discrete values for the salary that are private attributes: 70%, 65%, 40%, 35% and 15%. On the other hand, for the vector <age = 35 occupation = doctor>, another group of tuples is identified as follows:

age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝10％ age＝35,occupation＝doctor,salary＝10％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝10％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝15％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝15％このグループは３個の離散的給料属性値、つまり、５
％、10％、および15％のみを有する。従って、プロセッ
サ155はこれらのタプルをセットＲに追加することにな
る。age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 10% age = 35, occupation = doctor, salary = 10% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 10% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 15% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 15% This group has three discrete salary attribute values, namely 5
With only 10%, 10%, and 15%. Accordingly, processor 155 will add these tuples to set R.

次に、ステップ212−222にて、プロセッサ155は、選
択した重要パブリック属性値を組み合わせる。ステップ
212にて、プロセッサ155は重要パブリック属性Ａ′_ｊを
選択する。説明上、プロセッサ155は、データベース全
体の離散的属性値を、ｊ値目ごと降順に選択して行く。
そして、プロセッサ155は、選択した重要パブリック属
性Ａ′_ｊを用いて、ステップ214−226を実行する。ステ
ップ214にて、プロセッサ155は、セットＲ内の選択した
重要パブリック属性Ａ′_ｊの各離散値ｖ′_ｊを識別す
る。ステップ216にて、プロセッサ155は、セットＲとＦ
の両方にあるタプルであって、重要パブリック属性Ａ′
_ｊの（セットＲ内で識別された）各重要パブリック属性
値ｖ′_ｊを有するタプルをそれぞれ識別する。例えば、
年齢は属性Ａ′_ｊとして選択されたとする。すると、ag
e＝35は、セットＲ内のパブリック属性値〈age＝35 occ
upation＝doctor〉を有するタプルに含まれるパブリッ
ク属性値である。age＝35は、セットＦ内のパブリック
属性値〈age＝35 occupation＝ministor〉を有するタプ
ルに含まれるパブリック属性値でもある。従って、セッ
トＲとＦ内の次のタプルが識別される。Next, in steps 212-222, the processor 155 combines the selected important public attribute values. Steps
At 212, processor 155 selects an important public attribute _A'j . For the sake of explanation, the processor 155 selects the discrete attribute values of the entire database in descending order for each j value.
Then, the processor 155 uses the key public attributes A _'j selected, executes Step 214-226. In step 214, the processor 155 identifies the _j 'each discrete value v of _j' important public attributes A selected in the set R. At step 216, the processor 155 sets the sets R and F
And the important public attribute A '
_j identify each tuple with each important public attribute value v ′ _j (identified in set R). For example,
Assume that age is selected as attribute _A'j . Then ag
e = 35 is a public attribute value in set R <age = 35 occ
publication attribute value included in the tuple having “upation = doctor>”. age = 35 is also a public attribute value included in the tuple having the public attribute value <age = 35 occupation = ministor> in the set F. Thus, the next tuple in sets R and F is identified.

age＝35,occupation＝minister,salary＝70％ age＝35,occupation＝minister,salary＝70％ age＝35,occupation＝minister,salary＝65％ age＝35,occupation＝minister,salary＝35％ age＝35,occupation＝minister,salary＝40％ age＝35,occupation＝minister,salary＝40％ age＝35,occupation＝minister,salary＝15％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝10％ age＝35,occupation＝doctor,salary＝10％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝10％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝15％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝５％ age＝35,occupation＝doctor,salary＝15％次に、ステップ218にて、プロセッサは、セットＦと
Ｒの識別されたタプル内の各離散的ベクトルV"を識別す
る。ここで、ベクトルＶ″は、Ａ′_ｊを除いた重要パブ
リック属性Ａ′₁,..,A′_j-t,A_j+1,..,A′_ｍに関係する
重要パブリック属性値ｖ″₁,..v″_j+1,..,v″_ｍを含
む。セットＲとＦ内で識別されたタプルのグループは、
各離散的ベクトルＶ″に対応している。すなわち、特定
のグループ内の各タプルは、そのグループに対応する特
定の属性値ベクトルＶ″の属性値を有する。このような
タプルは、ステップ218にて、プロセッサ155によって識
別される。age = 35, occupation = minister, salary = 70% age = 35, occupation = minister, salary = 70% age = 35, occupation = minister, salary = 65% age = 35, occupation = minister, salary = 35% age = 35, occupation = minister, salary = 40% age = 35, occupation = minister, salary = 40% age = 35, occupation = minister, salary = 15% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 10% age = 35, occupation = doctor, salary = 10% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 10% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 15% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 5% age = 35, occupation = doctor, salary = 15% Next, at step 218, the processor identifies each discrete vector V "in the identified tuples of sets F and R. , The vector V ″ is A Important public attribute values v ″ ₁ , .. v ″ _{j + 1} _,. Related to the important public attributes A ′ ₁ , .., A ′ _jt , A _{j + 1} , .., A ′ _m excluding ' _j . ., v " _m . The group of tuples identified in sets R and F is
Each tuple in a particular group has an attribute value of a particular attribute value vector V "corresponding to that group. Such tuples are identified by the processor 155 at step 218.

例えば、パブリック属性が年齢、体重および身長であ
り、プライベート属性が給料であるとする。値ｖ′_ｊ＝
35およびｖ′_ｊ＝53は、次のようなタプルを識別する。For example, assume that the public attribute is age, weight, and height, and the private attribute is salary. Value v ′ _j =
35 and v ′ _j = 53 identify the following tuple:

age＝35,weight＝150,height＝6′,salary＝５％ age＝53,weight＝150,height＝6′,salary＝10％ age＝35,weight＝160,height＝6′,salary＝10％ age＝53,weight＝160,height＝5.5′,salary＝15％ age＝35,weight＝150,height＝5.5′,salary＝５％ age＝53,weight＝150,height＝5.5′,salary＝10％ age＝35,weight＝150,height＝5.5′,salary＝15％ age＝53,weight＝160,height＝6′,salary＝20％ベクトルＶ″は〈weight＝150,height＝６′〉，〈we
ight＝160,height＝６′〉，〈weight＝150,height＝5.
5′〉，〈weight＝160,height＝5.5′〉である。識別さ
れたグループは次のとおりである。age = 35, weight = 150, height = 6 ′, salary = 5% age = 53, weight = 150, height = 6 ′, salary = 10% age = 35, weight = 160, height = 6 ′, salary = 10 % Age = 53, weight = 160, height = 5.5 ', salary = 15% age = 35, weight = 150, height = 5.5', salary = 5% age = 53, weight = 150, height = 5.5 ', salary = 10% age = 35, weight = 150, height = 5.5 ', salary = 15% age = 53, weight = 160, height = 6', salary = 20% Vector V "is <weight = 150, height = 6 '> , <We
ight = 160, height = 6 '>, <weight = 150, height = 5.
5 ′>, <weight = 160, height = 5.5 ′>. The identified groups are as follows:

weight＝150,height＝６′ age＝35,weight＝150,height＝6′,salary＝５％ age＝53,weight＝150,height＝6′,salary＝10％ weight＝160,height＝６′ age＝35,weight＝160,height＝6′,salary＝10％ age＝53,weight＝150,height＝6′,salary＝20％ weight＝160,heiaht＝5.5′ age＝53,weight＝160,height＝5.5′,salary＝15％ weight＝150,height＝5.5′ age＝35,weight＝150,height＝5.5′,salary＝５％ age＝53,weight＝150,height＝5.5′,salary＝10％ age＝35,weight＝150,height＝5.5′,salary＝15％次に、ステップ220にて、各ｉ番目のプライベート属
性P_iに対して、グループ内にk_i個の離散プライベート属
性値がある場合には、プロセッサ155は重要パブリック
属性Ａ′_ｊのグループ内の全ての値を組み合わせる。説
明上、各値ｖ′_ｉは一度だけ組み合わせることができる
とする。例えば、給料がｋ＝３であるとする。すると、
ベクトルＶ″＝〈weight＝150,height＝5.5′〉に対応
するグループは、不確実性閾値を満足する。従って、次
のようなタプル、すなわち、 age＝｛35,53｝,weight＝150,height＝5.5′,salary＝５％ age＝｛35,53｝,weight＝150,height＝5.5′,salary＝10％ age＝｛35,53｝,weight＝150,height＝5.5′,salary＝15％を作るため、年齢属性値が組み合わされる。weight = 150, height = 6 ′ age = 35, weight = 150, height = 6 ′, salary = 5% age = 53, weight = 150, height = 6 ′, salary = 10% weight = 160, height = 6 ′ age = 35, weight = 160, height = 6 ', salary = 10% age = 53, weight = 150, height = 6', salary = 20% weight = 160, heiaht = 5.5 'age = 53, weight = 160, height = 5.5 ', salary = 15% weight = 150, height = 5.5' age = 35, weight = 150, height = 5.5 ', salary = 5% age = 53, weight = 150, height = 5.5', salary = 10 % Age = 35, weight = 150, height = 5.5 ', salary = 15% Next, in step 220, for each i-th private attribute P _i , k _i discrete private attribute values are included in the group. in some cases, the processor 155 combines all values in the group of important public attributes a _'j. For the sake of explanation, it is assumed that each value v ′ _i can be combined only once. For example, assume that the salary is k = 3. Then
The group corresponding to the vector V ″ = <weight = 150, height = 5.5 ′> satisfies the uncertainty threshold, so the following tuple: age = {35,53}, weight = 150, height = 5.5 ', salary = 5% age = {35,53}, weight = 150, height = 5.5', salary = 10% age = {35,53}, weight = 150, height = 5.5 ', salary = 15 The age attribute values are combined to make%.

ステップ222にて、プロセッサ155は、各組合せを、代
表的なパブリック属性値とと置換する。この例を引き続
き説明する。この代表値は最初に選択されたパブリック
属性値vv′_ｉ、つまり、age＝35にすると、次のような
タプルが得られる。At step 222, processor 155 replaces each combination with a representative public attribute value. This example will be described continuously. If this representative value is the first selected public attribute value vv ′ _i , that is, if age = 35, the following tuple is obtained.

age＝35,weight＝150,height＝5.5′,salary＝５％ age＝35,weight＝150,height＝5.5′,salary＝10％ age＝35,weight＝150,height＝5.5′,salary＝15％ステップ224にて、プロセッサ155は、セットＦ内の重
要パブリック属性Ａ′の各離散的ベクトルＶを識別す
る。ステップ226にて、プロセッサ155は、非重要パブリ
ック属性値の各ベクトルＵ、すなわち、Ａ″_１＝u₁,A″
_２＝u₂,A″_ｔ＝u_tのような値u₁,..u_tを識別する。各ベ
クトルＵは重要パブリック属性Ａ′の各離散的属性値ベ
クトルＶとともに現れるものである。ステップ226に
て、プロセッサ155は、非重要パブリック属性値Ａ′の
各ベクトルＵを、ベクトルＵとともに現われる重要パブ
リック属性Ａ′の離散的属性値ベクトルＶと組み合わ
せる。age = 35, weight = 150, height = 5.5 ', salary = 5% age = 35, weight = 150, height = 5.5', salary = 10% age = 35, weight = 150, height = 5.5 ', salary = 15 % At step 224, the processor 155 identifies each discrete vector V of the important public attribute A 'in the set F. At step 226, the processor 155 determines that each vector U of the non-significant public attribute values, ie, A ″ ₁ = u ₁ , A ″.
_{₂ =} u _2, A "value u ₁ such as _{_t =} u _t, .. identifies the u _t. Each vector U are those appearing with each discrete attribute value vector V important public attributes A '. Step At 226, processor 155 combines each vector U of non-important public attribute values A 'with a discrete attribute value vector V of important public attribute A' that appears with vector U.

例えば、セットＦは、重要属性の性別と年齢と、非重
要属性の身長と体重と、プライベート属性の給料を含む
ものとする。さらに、セットＦは、このステップ226の
前に、次のようなタプルを含む。For example, the set F includes gender and age of important attributes, height and weight of non-important attributes, and salary of private attributes. Further, before this step 226, set F includes the following tuple:

sex=M,age=35,weight=180,height=6′,salary=10％ sex=M,age=35,weight=175,height=5′,salary=15％ sex=M,age=35,weight=180,height=6′,salary=25％ sex=M,age=35,weight=180,height=6′,salary=15％ sex=M,age=35,weight=175,height=6′,salary=15％ sex=M,age=35,weight=180,height=5′,salary=10％ sex=M,age=35,weight=175,height=5′,salary=10％ sex=F,age=35,weight=120,height=6′,salary=10％ sex=F,age=35,weight=120,height=6′,salary=15％ sex=F,age=35,weight=120,height=5′,salary=25％ sex=F,age=30,weight=110,height=5′,salary=10％ sex=F,age=30,weight=110,height=5′,salary=15％ sex=F,age=30,weight=120,height=6′,salary=15％ sex=F,age=30,weight=110,height=5′,salary=25％重要パブリック属性値Ａ′の離散ベクトルＶは、
〈sex＝F,age＝35〉，〈sex＝F,age＝30〉および〈sex
＝M,age＝35〉である。Ｖ＝〈sex＝F,age＝35〉とと
もに現れるベクトルＵは、〈weight＝120,height＝
６′〉，〈weight＝120,height＝５′〉である。Ｖ＝
〈sex＝F,age＝30〉とともに現れるベクトルＵは、〈we
ight＝110,height＝５′〉と〈weight＝120,height＝
６′〉である。Ｖ＝〈sex＝M,age＝35〉とともに現れ
るベクトルＵは、〈weight＝180,height＝６′〉，〈we
ight＝175,height＝６′〉，〈weight＝175,height＝
５′〉および〈weight＝180,height＝５′〉である。組
み合わされたタプルは次の通りである。sex = M, age = 35, weight = 180, height = 6 ′, salary = 10% sex = M, age = 35, weight = 175, height = 5 ′, salary = 15% sex = M, age = 35, weight = 180, height = 6 ′, salary = 25% sex = M, age = 35, weight = 180, height = 6 ′, salary = 15% sex = M, age = 35, weight = 175, height = 6 ′ , salary = 15% sex = M, age = 35, weight = 180, height = 5 ′, salary = 10% sex = M, age = 35, weight = 175, height = 5 ′, salary = 10% sex = F , age = 35, weight = 120, height = 6 ′, salary = 10% sex = F, age = 35, weight = 120, height = 6 ′, salary = 15% sex = F, age = 35, weight = 120 , height = 5 ′, salary = 25% sex = F, age = 30, weight = 110, height = 5 ′, salary = 10% sex = F, age = 30, weight = 110, height = 5 ′, salary = 15% sex = F, age = 30, weight = 120, height = 6 ′, salary = 15% sex = F, age = 30, weight = 110, height = 5 ′, salary = 25% Important public attribute value A ′ Is a discrete vector V of
<Sex = F, age = 35>, <sex = F, age = 30> and <sex
= M, age = 35>. A vector U appearing with V = <sex = F, age = 35> is expressed as <weight = 120, height =
6 ′> and <weight = 120, height = 5 ′>. V =
The vector U that appears with <sex = F, age = 30> is <we
ight = 110, height = 5 '> and <weight = 120, height =
6 '>. The vector U appearing with V = <sex = M, age = 35> is represented by <weight = 180, height = 6 ′>, <we
ight = 175, height = 6 '>, <weight = 175, height =
5 ′> and <weight = 180, height = 5 ′>. The combined tuple is as follows:

sex＝M,age＝35,〈weight＝180,175〉，〈height＝６′,5′〉,salary＝10％ sex＝M,age＝35,〈weight＝180,175〉，〈height＝６′,5′〉,salary＝15％ sex＝M,age＝35,〈weight＝180,175〉，〈height＝６′,5′〉,salary＝25％ sex＝M,age＝35,〈weight＝180,175〉，〈height＝６′,5′〉,salary＝15％ sex＝M,age＝35,〈weight＝180,175〉，〈height＝６′,5′〉,salary＝15％ sex＝M,age＝35,〈weight＝180,175〉，〈height＝６′,5′〉,salary＝10％ sex＝M,age＝35,〈weight＝180,175〉，〈height＝６′,5′〉,salary＝10％ sex＝F,age＝35,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝10％ sex＝F,age＝35,〈weight＝120,110〉，〈height＝６′,S'〉,salary＝15％ sex＝F,age＝35,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝25％ sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝10％ sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝15％ sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝15％ sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝25％上記のプロセスでは、パブリック属性が重要パブリッ
ク属性と非重要パブリック属性に区分されていることに
注意すべきであり、カムフラージュする必要があるどう
かを判定するため、重要パブリック属性のみが検査され
ることに注意すべきである。非重要パブリック属性は、
ステップ224で説明したように、組み合わされるだけで
ある。上述したように、説明上、広告主はパブリック属
性Ａのどれが重要パブリック属性Ａ′であるか、どれが
非重要パブリック属性Ａ″であるかを識別する。この識
別は意味がある。といのとは、パブリック属性を重要と
非重要に区分すると、どのパブリック属性が検査されて
カムフラージュを必要とするかどうかが判定され、どの
パブリック属性をステップ224で組み合わせるだけにす
るかが決まるからである。sex = M, age = 35, <weight = 180,175>, <height = 6 ', 5'>, salary = 10% sex = M, age = 35, <weight = 180,175>, <height = 6 ', 5'>, Salary = 15% sex = M, age = 35, <weight = 180,175>, <height = 6 ′, 5 ′>, salary = 25% sex = M, age = 35, <weight = 180,175>, <height = 6 ', 5'>, salary = 15% sex = M, age = 35, <weight = 180,175>, <height = 6 ', 5'>, salary = 15% sex = M, age = 35, <weight = 180,175>, <height = 6 ', 5'>, salary = 10% sex = M, age = 35, <weight = 180,175>, <height = 6 ', 5'>, salary = 10% sex = F, age = 35, <weight = 120,110>, <height = 6 ', 5'>, salary = 10% sex = F, age = 35, <weight = 120,110>, <height = 6 ', S'>, salary = 15% sex = F, age = 35, <weight = 120,110>, <height = 6 ′, 5 ′>, salary = 25% sex = F, age = 30, <weight = 120,110>, <height = 6 ′, 5 '>, salary = 10% sex = F, age = 30, <weight = 120,110>, <height = 6', 5 '>, salary = 15% sex = F, age = 30, <weight = 120,110>, <height = 6 ′, 5 ′>, salary = 15% sex = F, age = 30, <weight = 120,110>, <height = 6 ′, 5 '>, salary = 25% In the above process, it should be noted that public attributes are classified into important public attributes and non-important public attributes. Note that only attributes are checked. Non-critical public attributes are:
It is only combined as described in step 224. As described above, for purposes of explanation, the advertiser identifies which of the public attributes A are important public attributes A 'and which are non-important public attributes A ". This identification is significant. This is because the classification of public attributes as important and non-important determines which public attributes are examined to require camouflage and determines which public attributes are only to be combined in step 224.

ステップ202−224を実行した後、プロセッサ155はセ
ットＦのタプルを新しい身上調査リレーショナル・デー
タベースとしてストアする。説明上、プロセッサ155は
セットＲのタプルを破棄する。つまり、これらのタプル
に対してクエリーを実行しない。この新しい身上調査リ
レーショナル・データベースに対してクエリーを実行す
ることができる。しかし、広告主は結合された値が存在
することを知っている必要があり、プロファイルクエリ
ーを作るとき、組み合わされたパブリック属性値を参照
する必要がある。After performing steps 202-224, processor 155 stores the tuples in set F as a new background check relational database. For illustrative purposes, processor 155 discards the tuple of set R. That is, no query is performed on these tuples. Queries can be performed on this new background check relational database. However, the advertiser needs to know that there is a combined value and needs to refer to the combined public attribute value when making the profile query.

あるいはまた、新しい身上調査リレーショナル・デー
タベースを構築する代わりに、プロセッサ155は属性値
の区分を示すレコードをメモリ160に保持する。ステッ
プ224に関連して説明した上記データベースを考える。
ステップ202−224の実行の結果得られた区分の例は次の
ようになる。Alternatively, instead of building a new background checkup relational database, processor 155 maintains a record in memory 160 indicating the segmentation of the attribute value. Consider the database described above in connection with step 224.
An example of a partition obtained as a result of performing steps 202-224 is as follows.

（１）sex＝F,age＝35に対して、タプルは sex＝F,age＝35,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝10％ sex＝F,age＝35,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝15％ sex＝F,age＝35,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝25％である。(1) For sex = F, age = 35, the tuple is sex = F, age = 35, <weight = 120,110>, <height = 6 ′, 5 ′>, salary = 10% sex = F, age = 35, <weight = 120,110>, <height = 6 ', 5'>, salary = 15% sex = F, age = 35, <weight = 120,110>, <height = 6 ', 5'>, salary = 25% It is.

（２）sex＝F,age＝30に対して、タプルは sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝10％ sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝15％ sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝15％ sex＝F,age＝30,〈weight＝120,110〉，〈height＝６′,5′〉,salary＝25％である。(2) For sex = F, age = 30, tuple is sex = F, age = 30, <weight = 120,110>, <height = 6 ′, 5 ′>, salary = 10% sex = F, age = 30, <weight = 120,110>, <height = 6 ', 5'>, salary = 15% sex = F, age = 30, <weight = 120,110>, <height = 6 ', 5'>, salary = 15% sex = F, age = 30, <weight = 120,110>, <height = 6 ′, 5 ′>, salary = 25%.

プロセッサ155はパーティションの標識を含むレコー
ドを保持している。Processor 155 maintains a record that includes a partition indicator.

しかし、レコードが保持された場合、プロセッサ155
は、幾つかの事後処理を行ってパーティションを侵して
いるプロファイル・クエリーがないことを確かめなけれ
ばならない。つまり、あるパーティション内の全てのタ
プルを識別するクエリーは、そのパーティションを侵し
てはならない。しかし、あるパーティション内のタプル
の一部だけを識別することを試みるクエリーは、パーテ
ィションを侵すことになる。もっと明示的に言うと、ク
エリーは、次のことが行われたときパーティションを侵
したと言われる。データベースの行ベクトルT₁＝〈A₁＝
v₁,..,A_k＝v_k,..,A_m＝v_m〉およびT₂＝〈A₁＝u₁,..,A_k＝
u_k,..,A_m＝u_m〉で表された２つのタプルがあり、タプル
T₁とT₂はどちらも同じパーティションにあるとする。つ
まり、各重要属性はA₁,..,A_k,v₁＝u₁,v₂＝u₂,v_k＝u_kに
なっている。クエリーは、パブリック属性とプライベー
ト属性の両方を指す基準をもち、クエリーがタプルT₁に
よって満たされたが、タプルT₂によって満たされないと
き、パーティションを侵すことになる。プロファイル・
クエリーがパーティションを侵しているかどうかを判断
するために、プロセッサ155は身上調査リレーショナル
・データベースに対してプロファイル・クエリーを実行
することができる。次に、プロセッサ155はプロファイ
ル・クエリーで識別されたタプルを、身上調査リレーシ
ョナル・データベースの識別されなかったタプルと比較
して、対応する属性値が上述したように同じパーティシ
ョンにある非特定タプルT₂と特定タプルT₁が存在するか
どうかを判断することができる。However, if the record is retained, the processor 155
Must perform some post-processing to ensure that no profile queries are violating the partition. That is, a query that identifies all tuples in a partition must not violate that partition. However, a query that attempts to identify only some of the tuples in a partition will violate the partition. More specifically, a query is said to have compromised a partition when: Database row vector T ₁ = <A ₁ =
v ₁ , .., A _k = v _k , .., A _m = v _m > and T ₂ = <A ₁ = u ₁ , .., A _k =
There are two tuples represented by u _k , .., A _m = u _m >
Both T ₁ and T ₂ is to be in the same partition. In other words, each important attribute A _1, .., has become _{_{_{A k, v 1 = u 1}}} , v 2 = u 2, v k = u k. Query has a reference that refers to both public attributes and private attributes, but query is satisfied by the tuple T _1, when not met by the tuple T _2, thus violating the partition. Profile
To determine whether the query has violated a partition, processor 155 can perform a profile query against the background survey relational database. Next, the processor 155 compares the tuple identified in the profile query with the unidentified tuple in the background survey relational database, and the non-specific tuple T ₂ whose corresponding attribute value is in the same partition as described above. it is possible to determine whether a particular tuple T ₁ exists with.

プロファイル・クエリーがパーティションを侵してい
る場合は、プロセッサ155はそのプロファイル・クエリ
ーをきっぱりと拒否することができる。あるいはまた、
プロセッサ155は最初にクエリーによって識別されなか
ったタプルT₂も識別することにより、つまり、それらを
含めることによって、識別されたタプルのセットを変更
してパーティション違反を除去する。しかし、このよう
な変更が行われた場合は、プロセッサ155はその変更と
内容を広告主に知らせる必要がある。説明上、プロセッ
サ155は、広告主のクエリーの中で指定された属性のパ
ーティションの内容を記述することによって、これを達
成している。例えば、プロセッサ155は変更に関するメ
ッセージを広告主に送ることができる。If the profile query violates a partition, the processor 155 may reject the profile query strictly. Alternatively,
Processor 155 by first identifying also tuples T ₂ not identified by the query, that is, by including them, to remove the partition violation by changing the set of identified tuple. However, if such changes are made, the processor 155 must inform the advertiser of the changes and content. By way of illustration, processor 155 accomplishes this by describing the contents of the partition for the attribute specified in the advertiser's query. For example, processor 155 may send a message regarding the change to the advertiser.

要約すると、データベース内の機密属性値の推論から
データベースを保護するシステムと方法が開示されてい
る。メモリはデータベースをストアするためのものであ
り、プロセッサはデータベースを処理するためのもので
ある。プロセッサを使用して、データベースは非機密属
性値を含むパブリック属性と、プライベート属性値を含
むプライベート属性に電子的に区分される。次に、プロ
セッサはプライベート属性値を電子的に処理して、パブ
リック属性値とプライベート属性値間の高相関関係を希
薄にするために使用される。具体的には、プロセッサは
データベースを安全タプルの非安全タプルに区分し、各
非安全タプルが次のグループのメンバとなるようにす
る。In summary, a system and method for protecting a database from inferring sensitive attribute values in the database is disclosed. The memory is for storing the database, and the processor is for processing the database. Using the processor, the database is electronically partitioned into public attributes containing non-sensitive attribute values and private attributes containing private attribute values. The processor is then used to electronically process the private attribute values to dilute the high correlation between public and private attribute values. Specifically, the processor partitions the database into non-secure tuples of secure tuples, such that each non-secure tuple is a member of the next group.

（１）属性値のベクトルで識別されたグループ（つま
り、そのグループの各タプルはベクトルにマッチングす
るパブリック属性値をもっている）。(1) A group identified by a vector of attribute values (that is, each tuple in the group has a public attribute value that matches the vector).

（２）プライベート属性の少なくとも１つの値につい
て、不確実性閾値レベル未満の不確実性レベルを有する
グループ。(2) A group having an uncertainty level less than an uncertainty threshold level for at least one value of a private attribute.

そして、プロセッサはタプルのパブリック属性値を選
択的に組み合わせて、不確実性閾値レベルを超えたプラ
イベート属性値を推論することから、そのタプルをカム
フラージュするか、あるいはそのタプルをデータベース
から除去する。これは以下によって達成される。The processor then selectively combines the public attribute values of the tuple and infers a private attribute value that exceeds the uncertainty threshold level, thereby camouflaging the tuple or removing the tuple from the database. This is achieved by:

（１）選択されたパブリック属性に対する特定の属性値
を含み、その特定の値が高相関パブリック属性値を有す
る少なくとも１つのタプルに含まれる全てのタプルを識
別する。(1) Identify all tuples that include a particular attribute value for the selected public attribute, the particular value being included in at least one tuple having a highly correlated public attribute value.

（２）選択されたパブリック属性以外のパブリック属性
に対する値の離散ベクトルに対応するパブリック属性
値、つまり、マッチするパブリック属性値を含むタプル
とグループを識別する。(2) A public attribute value corresponding to a discrete vector of values for public attributes other than the selected public attribute, that is, a tuple and a group including a matching public attribute value are identified.

（３）そのグループ内の各プライベート属性値に対し
て、不確実性閾値レベルが少なくとも１つ存在する場
合、各グループの選択されたパブリック属性の値を組み
合わせる。(3) For each private attribute value in the group, if there is at least one uncertainty threshold level, combine the values of the selected public attributes of each group.

（４）非安全タプルをカムフラージュするために組み合
わせることができない非安全タプルを除去する。(4) Remove non-secure tuples that cannot be combined to camouflage non-secure tuples.

最後に、以上の説明は本発明を説明するための単なる
例示である。従って、本発明の精神と範囲を逸脱しない
限り、当業者は種々の態様に変更することができる。Finally, the above description is merely illustrative for explaining the present invention. Accordingly, various modifications can be made by those skilled in the art without departing from the spirit and scope of the invention.

フロントページの続き (72)発明者グリフェス，ナンシー，デイヴィスアメリカ合衆国 07090 ニュージャージー州ウエストフィールドウエストダドレイアヴェニュ 264 (72)発明者カッツ，ジェイムズ，エヴェレットアメリカ合衆国 07960 ニュージャージー州モーリスタウンセンターアヴェニュ 39 (56)参考文献特開平５−63696（ＪＰ，Ａ) ＭＩＣＨＡＬＥＷＩＣＳ，Ｚ．，ＦＵＮＣＴＩＯＮＡＬＤＥＰＥＮＤＥＮＣＩＥＳＡＮＤＴＨＥＩＲＣＯＮＮＥＣＴＩＯＮＷＩＴＨＳＥＣＵＲＩＴＹＯＦＳＴＡＴＩＳＴＩＣＡＬＤＡＴＡＢＡＳＥＳ，ＩＮＯＦＯＲＭＡＩＴＯＮＳＹＳＴＥＭＳ，英国，ＰｅｒｇａｍｏｎＪｏｕｒｎａｌｓＬｔｄ，1987年６月24日，Ｖｏｌ．12 Ｎ. １，ｐｐ．17−27 ＤＥＮＮＩＮＧＤＥ，ＩＮＦＥＲＥＮＣＥＣＯＮＴＲＯＬＳＦＯＲＳＴＡＴＩＳＴＩＣＡＬＤＡＴＡＢＡＳＥＳ，ＣＯＭＰＵＴＥＲ，ＩＥＥＥ, 1983年７月27日，Ｖｏｌ．16 Ｎｏ. ７，ｐｐ．69−82 ＭＩＣＨＡＬＥＷＩＣＺＺ，ＣＨＥＮＫ−Ｗ，ＲＡＮＧＥＳＡＮＤＴＲＡＣＫＥＳＳＴＡＩＳＴＩＣＡＬＤＡＴＡＢＡＳＥＳ，ＬｅｃｔｕｒｅＮｏｔｅｓｉｎＣｏｍｐｕｔｅｒＳｃｉｅｎｃｅ，1989年４月４日，Ｖｏｌ．339，ｐｐ．193−206 ＡＤＡＭＮＲ，ＳＥＣＵＲＩＴＹ −ＣＯＮＴＲＯＬＭＥＴＨＯＤＳＦＯＲＳＴＡＴＩＳＴＩＣＡＬＤＡＴＡＢＡＳＥＳ：ＡＣＯＭＰＡＲＡＴＩＶＥＳＴＵＤＹＡＣＭＣＯＭＰＵＴＩＮＧＳＵＲＶＥＹＳ，Ｖｏｌ．21 Ｎｏ．４，Ｖｏｌ．21 Ｎｏ．４，ｐｐ．515−556，1989年12月発行ＧＲＩＦＦＥＴＨ，Ｎ．Ｄ．＆ＫＡＴＺ，Ｊ，ＡＮＡＮＯＮＹＭＩＴＹＳＥＲＶＩＣＥＦＯＲＴＡＲＧＥＴＥＤＡＤＶＥＲＴＩＳＩＮＧ，ＩＮＦＯＲＭＡＴＩＯＮＡＮＤＣＯＭＭＵＮＩＣＡＴＩＯＮＴＥＣＨＮＯＬＯＧＩＥＳＩＮＴＯＵＲＩＳＭ，米国，Ｓｐｉｎｇｅｒ−ＶｅｒｌａｇＷｉｅｎ，1995年９月25日，ｐｐ．180− 191 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ＪＩＣＳＴファイル（ＪＯＩＳ)Continued on the front page (72) Inventor Griffith, Nancy, Davis United States 07090 Westfield West Dudley Avenue, New Jersey 264 (72) Inventor Katz, James, Everett United States 07960 Mauricetown, NJ, New Jersey 39 (56 References: JP-A-5-63696 (JP, A) MICHALEWICS, Z. , FUNCTIONAL DEPENDENC IES AND THEIR CONNECTION WITH SECURITY OF OF STATISTICAL DATABASES, INNOFORMA ITON SYSTEMSTEMS, UK, Pergamon Journal, June 24, 1987 12 N.1, pp. 17-27 DENNING DE, INFER ENCE CONTROLS FOR STATISTICAL DATABA SES, COMPUTER, IEEE, July 27, 1983, Vol. 16 No.7, pp. 69-82 MICHAREWICZ Z, CHENK-W, RANGES AND TRACKES STAISTICAL DATABASES, Lecture Notes in Computer Science, April 4, 1989, Vol. 339, pp. 193-206 ADAM NR, SECURITY-CONTROL METHODS FOR STATISTICAL DAT ABASES: A COMPARATION VE STUDY ACM COMPU TING SURVEYS, Vol. 21 No. 4, Vol. 21 No. 4, pp. 515-556, issued December 1989 GRIFFETH, N.M. D. & KATZ, J, AN ANONYMITY SERVICE FOR TARGE TED ADVERTING, IN FORMATION AND COMMUNICATION TECHNOLO GIES IN TOURISM, USA, Springer-Verlag, 1995, September-VerlagW. 180-191 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 17/30 JICST file (JOIS)

Claims

(57) [Claims]

1. A method for protecting a database from inferring confidential attribute values in the database, comprising: using a processor to divide the database into a public attribute including a public attribute value and a private attribute including a private attribute value. Electronically processing the confidential attribute value using a processor to reduce a high correlation between the public attribute value and the private attribute value. Features method.

2. The method of claim 1 wherein said processing step further comprises the step of electronically partitioning said database tuples into a secure set and a non-secure set using said processor. Features method.

3. The method of claim 2, wherein a tuple is present for the corresponding public attribute of the attribute value, wherein the vector of the attribute value identifies a group of tuples having the vector of the attribute value. If the level of uncertainty for at least one value of the private attribute of the group is less than an uncertainty threshold level, the unsafe set is partitioned and the uncertainty for at least one value of the private attribute of the group is The method of claim 1, wherein the certainty level is less than the uncertainty threshold level if the number of discrete values of the value of the one private attribute included in the group is less than a threshold number.

4. The method of claim 2, wherein the public attribute values are further divided into important public attribute values and non-significant public attribute values, and the tuples are: If the vector of attribute values identifying a group of tuples having the vector of attribute values is present and an uncertainty level for at least one value of the private attribute of the group is less than an uncertainty threshold level; , Said non-secure set being partitioned.

5. The method of claim 2, wherein the step of partitioning the tuple into a secure set and a non-secure set comprises, using the processor, different public attribute values for the public attribute. Electronically creating a vector, using the processor, for each group of tuples identified by the vector of public attribute values, if there is a threshold level of uncertainty for private attribute values in the group Electronically partitioning the tuples of the group into the secure set; otherwise, electronically partitioning the tuples of the group into the non-secure set. Method.

6. The method of claim 1, wherein the processing step prevents using the processor to infer the private attribute value of the tuple beyond an uncertainty threshold level. Further comprising the step of electronically combining the plurality of public attribute values of the tuple, the method further comprising: using the processor to determine a specific value of the selected public attribute, the highly correlated public attribute value. Electronically identifying all tuples containing a particular value contained in at least one of the tuples having the particular value for each public attribute other than the selected public attribute using the processor. Electronically identifying discrete vectors having the following formulas and electronically identifying a group of tuples for each of the discrete vectors: Wherein each tuple of the identified group is a discrete vector of values for a public attribute of the tuple other than the particular public attribute; and The at least one uncertainty threshold level for each private attribute value is present in the group corresponding to the discrete vector, the selected one of the groups of the group corresponding to one of the discrete vectors is selected. Combining the values of the public attributes electronically.

7. The method of claim 1 wherein, after the partitioning and processing steps, electronically receiving a profile query from an advertiser using the processor; and using the processor. Executing the profile query against the database; and using the processor to electronically communicate to the advertiser an ID corresponding to the profile query and an alias of the tuple identified in the profile query. Transmitting the information in a dynamic manner.

8. A system for protecting a database from inference of confidential attribute values located in the database, comprising: a memory for storing the database; a public attribute including the public attribute value for the database; A processor that electronically divides the private attribute into a public attribute value and electronically processes the value to reduce a high correlation between the public attribute value and the private attribute value.

9. A memory for storing a database, and the database is electronically divided into a public attribute containing a public attribute value and a private attribute containing a private attribute value. A filter station comprising a processor for electronically processing said values to reduce the high correlation of said filter station; and an advertiser for sending a profile query to said processor of said filter station. Communication system.

10. The communication system of claim 9, wherein said processor electronically partitions and processes said database, wherein said processor executes said profile query electronically against said database. A processor electronically transmitting an ID from the profile query and an alias of the tuple identified by the profile query to the advertiser; the communication system further comprises a name translator station; Electronically constructing a table for translating tuple aliases to the tuple's network address, electronically transmitting the ID and the table of the profile query to the name translator station, In addition, each ad distribution And a plurality of customers who have a network address for, and the advertisers, and the processor of the filter station, and the name of the translator station,
A communication network interconnecting the plurality of customers, wherein the advertiser sends an advertisement, the tuple alias, and the profile query ID to the communication network, the name translator station includes the advertisement,
The tuple alias and the profile query
Receiving an ID from the communication network, using the table to convert the tuple alias to the network address, and converting the advertisement using the network address of the tuple to identify the plurality of customers through the communication network. A communication system characterized by transmitting to a customer.