CA2345805A1 - Enterprise privacy system and method - Google Patents

Enterprise privacy system and method Download PDF

Info

Publication number
CA2345805A1
CA2345805A1 CA002345805A CA2345805A CA2345805A1 CA 2345805 A1 CA2345805 A1 CA 2345805A1 CA 002345805 A CA002345805 A CA 002345805A CA 2345805 A CA2345805 A CA 2345805A CA 2345805 A1 CA2345805 A1 CA 2345805A1
Authority
CA
Canada
Prior art keywords
data
privacy
database
risk
policy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002345805A
Other languages
French (fr)
Inventor
Hooman Katirai
George Favvas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZERO-KNOWLEDGE SYSTEMS Inc
Original Assignee
ZERO-KNOWLEDGE SYSTEMS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZERO-KNOWLEDGE SYSTEMS Inc filed Critical ZERO-KNOWLEDGE SYSTEMS Inc
Priority to CA002345805A priority Critical patent/CA2345805A1/en
Publication of CA2345805A1 publication Critical patent/CA2345805A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

EN'1'~;RPRISE P~'IVAC~' SYSTEM AND METHOD
~~ system for implementing a privacy Ix~licy on a database comprises the following components:
De-identification layer The de-identification layer allows for means by which data or groupings of data which can be used t~~ identify an individual is exposed and assigned a risk factor. If the risk factor exceeds the tl:u-eshold for a given situation, various scenarios can be modeled with the goal of obtaining a satisfactory resolution.
l)B analysis tool fVhile the presence of some types of fif;lds can definitively allow linkage to an individual's identity, the ability to link a given data set to a unique individual is not necessarily binary. For example, a 9-digit zip code and date of birth together have a high-probability of yielding someone's identity, whereas a 9-digit zip code and only a year of birth have a yield a lower probability.
~~ tool is needed which, for a given database structure, will assign a risk factor to fields or field combinations.
Solution The DB analysis tool uses mathematical algorithms, and industry specific knowledge, to examine an existing le~~a~v database and identify fields or groups of fields that constitute PII.
E3ased on toolkits developed in cooperation with industry-specific experts, the privacy engineer can map fields in the customer's database t~~ known datatypes.
Then, the DB analysis tool associates a quantitative number, called a privacy risk factor (PRF) to each individual field, or group of fields. This PRF is a number between zero and one that indicates tile level of probability that a given field or combination of fields can uniquely identify an individual. A PRF of zero indicates no privacy risk while a PRF of one defines a high privacy risk.
Depending on the particular customer or industry, different risk thresholds may be set.
>E~:xample ~~uppose we have a pizza delivery database with the following three fields:
name, postal code and telephone number. The output from the: DB analysis tool might look like this:

Fields .__.__.___-._ _ ________ _ _- i Privacy Rislc-____-_________,i j Factor (PRF') {Postal}Code} _ __- ______ __.__.-___-__ x.03 _ __---_ s Telephone Number} 0.5(?
_____ _ - --Name, Postal Code} _ _ __ 0.98 __ Name, Telephone Number - -_- - 0.97 - _-tfustal Code, Telephone Number} _ - 1.00_ --{Name, Postal Code, Telephone Number-- 1.00__ * refers to the average number of people that can be found using these fields as search criteria ~'rivacy Resolution Tool Used to create a privacy policy to mitigate the PII identified by the DB
analysis tool.
evolution 'This tool is used by a privacy engineer t:o create a data de-identification policy to address problems fQimd in the previous step. Creating this policy is a complex process where each decision affects subsequent decisions. Using proprietary algorithms, this tool helps the privacy engineers leverage the maximum amount of business information from the database while also satisfying privacy concerns.
'The database is said to be "free" of PII when the Correlation Risk Factor (CRF) for every field or combination of fields is below some given threshold. In our example we have defined the CRF as the inverse of the Expected Bin Size (1_:BS) a factor which is defined in the glossary. Supposing our minimum satisfactory threshold is 0.2 we would continue for several iterations until we create a privacy policy that would certify our database to be "free" from PII. We illustrate each iteration for the Pizza Delivery example, introduced in the DB analysis tool section.
Iteration l:
FIELDS PRESENTED TO USER PRF Before User Action PRF After jName} 0.91 1-Way Hash -___ _ 0.03 }Postal Code} ___ _ 0_.03 _ 0.50 .Telephone Number} 0.50 _ -_ 0.98 _ 0.03 ;,Name, Postal Code}
-_-- 0.97 0._50 ;Name, Telephone Number} ___ 3 Postal Code, Telephone Number _0.99- - 0.99 - --3 Name, Postal Code, Telephone l .00 _ 1.00 Number rJote: H[X] denotes the hash of field X.

Iteration 2:
FIELDS PRESETNED TO USER ~ PRE Befca-e User Action PRE Aftcr { H[Name] } - - ___ _ ________L=-.__ -_ _-___.__ ____ __ {Postal Code} ~ 0 0 .03 .03 _ _____ _ _ {Telephone Number} _ ____ 1-Way Hash _ _____ -0.50 i~

_._._ _ __ { H[Name], Postal Code _--__ __._ 0_03 -__ _. _ - _ -_- _0_.9_8 __ _-{ H[Name], Telephone Numbers__ 0.97 ___-_ --_ _ _-{Postal Code., Telephone _ __ _0.9_9 _ __ 0.03 __ Number} __ ___ ~

{H[Name], Postal Code, _ 0.03 Telephone N umber l0_ Note: H[X] denotes the hash of field ~,'.
Since all fields have a PRE below 0.2 vve do not proceed with further iterations.
The final privacy policy of the database is:
1'Jame -~ H[Name]
Postal Code --~ (leave intact) Telephone Number ~ H[Telephone Number].
I)e-Linking Engine Implements the privacy policy created by the privacy resolution tool evolution This tool implements the privacy policy defined by the Privacy Engineer using the privacy resolution tool. Unlike the privacy resolution tool which only creates a policy, this tool actually snakes changes to the database. It calls upon the Encryption, Minimization, Aggregation, and Interest Vectoring engines as required by the privacy policy. For example, it may triply encrypt an e;mail address for use with the blinded communication system described later in this document.
Example 'hhe following illustrates the effect of the de-linking engine on a record in the Pizza delivery database before and after the de-linking engine. For demonstrative purposes we have added a "Date of Birth" field to the database.
'~ I field Name Contents of Field Operation Contents of Field (Before) After Name ___ ___ 1-Way Hash 12sh1#dj ASJ;
'l:'elephone Number"John Smith"_ _ _______I-Way Hash 72dsfi32233 "505-555-1 _ 4"

Postal Code _ D_o Nothing "L4B 3H7"
_ "L4B 3H7"_ _ __ Date of Birth "12/21 / 1971 " Minimize 1971 cull text analysis tool ~~o protect a company from privacy cc>ncerns when sharing unrestricted text that is not stored as a record in a database.
Solution 'l~;'his system removes PII by locating and replacing personally-identifying information in unrestricted text documents using techniques that extend beyond simple search-and-replace procedures. This minimizes risk and maintain confidentiality when files such as doctor's notes need to be shared with 3rd parties who do not require the subject's identity.
The system employs pattern :recognition techniques including detection algorithms that employ templates and specialized knowledge of what constitutes a name, address, phone number and so forth to automatically detect and remove PII. It must also be noted that the success of this system is domain dependant and that preliminary investigation must be conducted before its successful delivery can be promised to a. client.
Privacy enhancing layer ~~:'he various engines which make up this layer transform data into a form which represents a lower privacy risk. They can be run either in batch mode, or on the fly as new datasets are being created.
Data minimisation engine 'loo maintain important information in a IJB field while keeping the user anonymous.
Solution 'lf'Ihis engine is used to remove unneeded information from the fields of a record by converting the fields to a more general or less specific form. For example, a market researcher may employ minimization to convert the date of birth into a year of birth. Industry specific minimization routines could allow a full blood analysis to be reduced to a simple blood type.
Sample fore minimization data -be Date of Zi Code _ Income Car birth __ ~.

;s/5/1973 90210 $90,000 Lexus ',x/2/1968 _ __ $40,000 None __ 11 / 12/ ___ _ $65,000 Pathfinder __ _ _S_a~mEle data - after n_~inimization __ _ Year of birth ~ State _ _______ Income Car cate or~-1973 _ CA _______ $75,000 - $100,000 Luxury _ _ 1968 UT ___ _ ___ _ $35_,_000 - $ 60,000 None _ __ 1975 NY $60,000 - $ 75,000 SUV
Interest vectoring engine bellows transactional records to be mined for one specific individual without knowledge of the actual transactions.
Solution This engine uses industry specific modules to convert a series of items into a set of perceived user interests. For example a clickstream could be converted into a vector representing a user's perceived interest in sports, entertainment, and news based on the frequency by which the user visits the web sites in the latter categories.
Data aggregation engine bellows aggregate data to be gleaned from records when the raw data from the records isn't needed.
Solution This engine converts a set of records to aggregate statistics and measures based on those records.
Industry specific modules allow specialized aggregation functions to be computed relevant to a given industry.
Sample d_a_ta- _ Patient name Age__ _____ before Disease _ John Doe 65 data Parkinson'_s a re ation Sex ___ M

Peter Smith ____ M Cancer >=;rica Peterson ___ F De ression 19 _ Jane Doe ___ F Cancer 27 ____ Mark Rogers 32 M Depression Sample data -after data a re atio n ~~ge 18-30 =2 --- _ 51+)=1 -31-50_2 ~~ex M=3 F=2 Disease _ _ Cancer=2 De ression=2 Parkinson's=1 f~
Encryption engine Used when an identifier is suitable ire place of an identifrcr linked to personal identity, or when access to information needs to be restricted and only released with the consent of several parties.
~iolution ~:Che encryption engine can perform a number of actions:
i) One way hash functions fellow information to be converted in an irreversible transformation from a human readable form to a unique identifier. For example the names of users can be encoded using a 1-way hash function in a marketing database, thereby transforming each name into a unique code. This would allow marketers to profile databases without knowing t:he names of the people in the database.
ii) Two way encryption This process is used whenever sensitive information needs to be converted to a different form.
Encryption is a reversible process therefore it is only used if the actual information may be needed at a later time. For example, a marketesr could encrypt the email address of people in their database before sharing their user profiles with third parties to ensure that the 3rd parties do not email their customers without their consent.
Data access !aver ~f'lhis is the layer where previously encrypted data is decrypted in a controlled fashion, in order to unlock its value.
udentity verification engine ~~ means of verifying the identity of an individual so that they can subsequently authenticate their identity to the holder of their data. Such verification is needed in cases where the data collection occurred either offline or without user consent.
~iolution 'Che identity verification engine uses known contact or personal data to verify the identity of the user.Depending on the particular customer requirements and level of verification certainty required, various scenarios are possible:
.. E-mail: A validation token in sent to a known e-mail address belonging to the user.
~~ Telephone: The user is called at a given phone number on file and is given a validation token.

~

Snail mail: A validation token is physically mailed to a known address on file.
~ Third party database checks: 'fhe user is challenged by being asked to supply personal infomnation which is contained in opine databases such as credit reports. The queries should be such that it would be difficult for a person other than the person associated with the data to possess the information. This process can occur either online or offline.
F;egardless of the verification method used, this one-time process results in an unique credential being issued to the user. This credential is what they subsequently use to authenticate in order to view their own data or This credential may take one or more of the following forms:
~ A P IN numb er ~ A username/password combination ~ An x.509 digital certificate downlo;~ded to the user's browser ~ A Brands or other type of credential stored locally within a Freedom client ~llinded two-way communication engine The company needs a means of communicating with a consumer whose data record they hold, without knowing that consumer's identity.
Solution ~~ny PII that could be used to contact an individual (name, e-mail, address, phone) is multiply encrypted using keys belonging to different entities.
V'Vhen a company wishes to contact a specific individual, they forward the text of the message to be sent, along with the encrypted contact info, to a sanitization facility.
There, the blinded data passes through a chain of servers belonging t:o the customer, designated party and an audit partner. The chain of servers serves to distribute trust such that no one entity can link the user's identity with other information in the database.
~, marketing company wishes to send targeted ads to individuals. The individual's e-mail address is encrypted first with a key belonging t:o an audit partner, a designated parties key, then with the customer's key.
when the customer wishes to send e-mail to that individual, they send the message and the encrypted e-mail address to the sanitizution facility, where:
1. The company:
a.) encrypts the message with the audit partner's public key b~) decrypts one layer of encryption on the individual's e-mail address c) forwards both the above to a designated server a!. Designated Server:
a) does not touch the message contents b) decrypts one layer of encryption on the individual's e-mail address c;) forwards both to the audit partner ~~. The audit partner:
~i) Decrypts the final layer of encryption to reveal the individual's e-mail address b) Decrypts the contents of the message c) Forwards the message to the individual in question, on behalf of the customer ~'~he following table describes who amongst the three parties has access to which data:
-- Sees personal. data Sees profile data Sees message contents (customer No _ _____ Yes__ _ Yes Designate No _____ _ No _ _ No f~udit Partner Yes No Yes ~~ariations on the above scenario include:
~~ using the Freedom network to rout; e-mail where a greater degree of privacy and where a lesser degree of auditability is perhaps reduired;
~~ using a similar process for the blinded addressing of physical mail.
I3y reversing the process, the user can privately reply to the company. This is accomplished by snaking the from address in the email, {the encrypted email address}@private-mail-~;ateway.pwc.~°.c~m~. If each party reverses the process the user can securely send messages back. If we don't do this, the company could easily find their real email addresses by sending them a note amd asking them to reply back. Further, this solves our problem of how to opt out. We no longer reed to know the real user's address to opt in or out.
~~ecure profile access engine ~~ means is required by which individuals can access their own profile data.
evolution f~ "personal portal," powered by the ;>ecure profile access engine, is contained in the sanitization facility. When a user authenticates, his or her personal data, is decrypted by a chain of several servers and it is presented to the user. '7W a personal portal could be cobranded.
;Third party data access engine Similar to the above, but we provide a means whereby authorized entities can decrypt predetermined sets of data under specific circumstances.

C) The data access engrine, which alsca resides within the sanitization facility, brings togetl~~er the multiple keys required in order to dec~r.ypt data.

~o PDD Plans ~ Framework:
- User defines high-level privacy policy - Privacy architect uses MPS tools to define a low-level privacy policy - MPS data access components enforce the low-level privacy policy Nigh-Level Privacy Policy ~ A corporation drafts a "High-Level Privacy Policy" (HLPP) - This HLPP is a statement of claims - Examples of claims may include:
~ Our systems are compliant with HIPPA.
~ We do not sh~~re PII with 3rd parties.
~ European citiaens' data will be treated according to the EU guidelines. and safe-harbor legislation ~ PII will only be; visible to internal employees - specifically those in our credit department.
~ All accesses will be logged The Lore-Lesel Privacy Policy ~ Using the system tools, the privacy architect creates a low-level privacy policy that will enforce the high-level privacy policy.
~ The low-level privacy policy defines the transformations that will performed on specific fields depending on the role players accessing the database.
~ These transformations may include encryption, minimization, hashing, deletion (following a certain date), etc ...
Examples of Privacy Policies ~An Example of a High-Level Privacy Policy:
-No PII shall be shared with third Parties -PII will be used by internal parties.
~The corresponding Low-Level Policies:
- Role: 3~ Parties - Role: Employees Date of Birth: Minimize Date of Birth: Leave to year of Birth Intact 9-Digit Zip: Minimize 9-Digit Zip: Leave to 7- Intact digit Zip Gender: Leave Intact Gender: Leave Intact Email: Leave Intact E-mail: Encrypt Credit Card #: Encrypt;

SSN: not available delete after 3 months i1 ;
Database Prisatization Components The outline to the right indicates the various steps taken to privatize a database. The steps are numbered chronologically.
Data Preparation: Consolidation *~~
s , DATA PREPARATKHi: TAGCiIN(3 w e~
' DATA PREPARATKNI: NORMALIZATION
o . __ RISK ANALYSIS
?-I ,.~~r b ~ POLICY COMPLIANCE
m ~y,~ a~ ~' ~~ Drta Acrwu ~ Data concerning customers is typically not found in single database table; rather it is found:
- scattered across different tables within the same database - In different information systems such as call center systems, web-systems, and offline databases.
~ In order to perform a privacy analysis, we need to know everything that the company knows about the customer.
~ This stage requires a tool that gathers data related to customers from diverse sources and provides the appearance of a single consolidated record containing all information known about a customer.

Data Prepara~on: Tagging fir' ~y DATA PREPARATION: CONSOLIDATION , Y.
a ~i~, ; DATA PREPARATION: NORMALPATIOH
'. ~"~X RISK ANALYSIS
a~~y~~ POLICY COMPLU1NCE
Fk r.~~.. ---q.;w~:'! OafaAcuss There are no standard field names in databases, nor do databases include any metadata concerning privacy attributes of any field. This information is required before any risk analysis can be performed.
This stage requires a tool which associates each field with descriptive attributes including:
- The field's classification in a common ontology - The legal jurisdictions) to which the field is subject - The conditions under which the field was obtained (with consent, without consent, not sure, and so forth).
- The weights and measures used in the field - The business value of the field; one of {low, medium, high}
This tool could allow for manual data classification of the data according to an ontology, and then semi.-<~utomatically associate the field with recommended tags.
Data PreparaDon: Normaliza~on y"2 DATA PREPARATION:
" CONSOLIDATION
~

T DATA PREPARATKNi:
~x TAGGING

x ~ ~ :6;

ks :' RISK ANALYSIS ,~.
- "~r...

POLICY COMPLIANCE

,r "s<....'.-.~ x g ~

-a~.~., Data Acuas s"...

Different databases may use different standards to represent information. In order to estimate privacy risk, all measurements must be converted to a common scale.
~ This stage requires a tool that converts information to a common representation. This is a prerequisite to risk analysis e.g. if you are an 8 ft tall U.S. citizen (high privacy risk), and there exists a database with the heights of all U.S.
citizens, your height must be converted from meters to feet to estimate the privacy risk associated with your height.

Risk Malysis DATA PREPARATION: CONSOLIDATION 'x '' a:.';
l, ~ DATA PREPARATION: TAGGING
DATA PREPARATION: NORMALQATION
POLICY COMPLIANCE
Dd~ Acuts ~ :S~"T
The construction of a low-level privacy policy requires an understanding of i:he statistical risk of not complying with the declarations made in the high level privacy policy.
This stage requires a tool that measures the probability of non-compliance with the high-level privacy policy. For example, it might measure the risk of:
~ System non-compliance with HIPPA.
~ Re-identification of 'supposedly anonymous' records.
~ Etc ...
Policy Compliance DATA PREPARATION: ~
CONSOLIDATION ,~.

~

. DATA PREPARATION:., TAGGING

DATA PREPARATION: d' NORMALIZATION ,:

. _ ~
' RISK ANALYSIS A
rt f t ~

g "
~~ ~..

~C o.u Au.w ..

Once the risks of non-compliance to a HLPP is understood, it is possible to determine what data transformations would reduce privacy risk, while maximizing business laiowledge.
This stage requires a tool that allows a privacy architect to iteratively create a low-level privacy policy so as to meet the HLPP while while maximizing business information. 'Chis policy determines:
- Which data items will r;ontinue to exist - What data transformations will be applied to the database permanently - How long a piece of data is retained (i.e. expiry date) - 'The role players who have access to different pieces of data (i.e. "credit department employees," "marketers," "finance deparnnent").
- The access privileges of each role player which include the privacy enhancing real-time data transformations that must be applied to the data before it is delivered to the role player.
- The logging policy for each field.

i ~~
Data access ~ ~.. DATA PREPARATION: CONSOI.IDAT10H '.~,,F~ ~~~':
SAr.: ?;
~~~. ~. DATA PREPARATK)H: TAGGING
DATA PREPARATION: NORMALIZATION r"~'.' y,j;
RISK ANALYSIS

t ~.
Once a low-level Privacy Policy has been established, it must be enforced.
Data Access is carried out via a set of components that serve as a privacy-enforcing proxy to the database.
The database is set up in such a way to make it impossible to access the database except through the proxy (similar to an Internet firewally.
Data Access tcont'd1 ~ Data Access components can be broken down conceptually into three distinct layers:
- The application layer: This represents the clientlserver architecture through which the data will be accessed.
- The security layer: which provides services such as authentication, access-control, non-repudiation, logging, and time-synchronization.
- The privacy enhancing layer: which applies data transformations to both incoming and outgoing data. The data transformations that will be applied to each data flow are wholly determined by the role-player accessing the data and the low-level privacy policy corresponding to that role.
- These 3 sub-layers are illustrated in the next slide.

Data Access Layer Centinned ~ Incoming and outgoing information is transformed in real-time by the privacy enhanang layer. The privac~~ enhancing layer is completely controlled by the low-level privacy policy. For example, the date of birth may be transformed into an age range such as (35-50 years of age) if a particular role such as "online marketer"
accesses the data, while another role such as "Credit department" may see the actual date of birth.
~ Access is controlled by the security layer (which includes services such as authentication, access control, logging, non-repudiation, and time synchronization).
~ The application layer simply represents the client/server mechanism by which the database is accessed.
Industry Specific Knowledge ~ Industry specific toolkits bring industry-specific knowledge. E.g.
- Consolidation: Field detectors could help discover DNA or prescription data in the database.
- Tagging: a medical toolkit may include additions to the ontology such as "DNA"
and "Blood Type"
- Normalization: Unit:. of measure such as CC's are introduced.
- Risk Analysis: Algorithms enable the software to measure the risk of re-identification associated with dental fields.
- Policy Compliance: ~A data handling algorithm could be written to encrypt a record, and distribute keys to the hospital and physician.
.. _w..~.~..,..,......~,..~-...~... ~.~..,.~w..."~.....,.~..-...".......,.~..~...,..". ,.. , _ ... ....

i What do we mean by PriuacyP
~ First let us define two terms: Anonymous and de-identified.
- Anonymous records are simply those records that lack elements such a,; SSN, first and last name, address, and telephone number-elements that are commonly believed to constitute PII.
- De-identified records are those that cannot be uniquely traced to the individuals they are associated with.
For example, the fields {date of birth, 9-digit zip, and gender} can in combination uniquely identify 86% of all U.S. citizens. Such records are anonymous but not de-identified.
What do we m~an by pri~acyP tcont'd1 ~ Our notion of privacy is a probabilistic one. Before explicitly defining our measure, we first appeal to some intuitive notions of privacy risk.
Example l:
- Suppose for example that we have a record as follows: {City = Oje, Gender = female, Occupation = mathematician}
- Suppose Oje is a fishing village with a population of 400 people. There is probably only one female mathematician who lives in this village. Thus the occupation, gender and city of this person is sufficient to uniquely identify this individual. Thus this information may be considered high risk.
On the other hand suppose we have the a record with the following characteristics: {City=?~TYC, gender-male, occupation=lawyer}.
- There are probably several thousand people matching this profile. Thus this information is insufficient to uniquely identify this individual.
'Chus this information may be considered low risk.
Latanya Sweeney calls the number of people who match a given characteristic the "bin size."

The Virtual Ittack Database In order to determine your bin size you must determine your universe (in the probability sense of the word). Le. is it all American citizens.? Is it all North Americans? Is it everyone in the world?
To estimate bin size, we introduce the notion of a virtual attack database. This database is the consolidation of all possible databases that an attacker could use to re-identify a supposedly "anonymized record."
Virtual Attack Database Icont'd1 For example, the attack database for a record containing {date of birth, zip code } would probably be "the U. S.
census database" i f the record was associated with a US-only retail chain since the census database can be expected to contain the date of birth, and zip code of all American citizens. If no virtual attack database can be defined for a field then it has no privacy risk.
For example, if I know you like to grow your own food and there is no database that with this information, then that piece of information has no privacy risk.

v9 The Correlation Risk Factor ~ Privacy risk can be measured according to the probability of re-identif canon. 'tee introduce the Correlation Risk Factor (CRF) as a means of measuring this risk.
~ The CRF is:
- A quantitative, objective notion of privacy - 'that ranges between 0 and 1 - And represents the probability of re-identification.
The CRF is calculated by 1/BinSize(X) where X is the data elements contained in the record.
What is ~e acceptahle CRE for an applications ~ There is no standard -- however:
- 'The Social Security Administration is known to only release data with a bin size of 8-1'?.
- 'The EU defines PII using the words "disproportionate effort" thus we must be able to convince a judge that re-identifying an individual requires more effort then the data is worth to the attacker.
~ We expect the public to require a higher threshold for sensitive health information and a lover threshold for advertisers.
~ Modified notions of the CRF can take into account the geographic distance that must be traveled by an attacker to re-identify an individual - this may be required in some applications to meet the European standard of "disproportionate effort" when bin sizes are low (<1000).
- For example, consider SEOUL. The attack DB is calculator's DB. A bin size of 20 may seem adequate for this application - but what if all the individuals live in IVYC? An attacker could presumably visit all 20 in a matter of days .. .It is considerably harder to visit 20 individuals if they are spread across the U.S.

Illustration ef the Data PreparaCen n Policy Cempllaaee Layers Records industry-specific fields and normalize data.
.Industry-specific ontologies.
.Normalisation a oridvms r s I'.
.Risk factors for fields that require analysis outside of z, the DB
.Algorithms to detect legislative '' compliance. _~
Industry 5pecific:~
.Minimization Procedures .Key Escrow arrangem~ts .Data Handling Procedures Unapproved Low-Level Privacy Policy Yes Final Low-Level Privacy Policy

Claims

CA002345805A 2001-05-03 2001-05-03 Enterprise privacy system and method Abandoned CA2345805A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA002345805A CA2345805A1 (en) 2001-05-03 2001-05-03 Enterprise privacy system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002345805A CA2345805A1 (en) 2001-05-03 2001-05-03 Enterprise privacy system and method

Publications (1)

Publication Number Publication Date
CA2345805A1 true CA2345805A1 (en) 2002-11-03

Family

ID=4168934

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002345805A Abandoned CA2345805A1 (en) 2001-05-03 2001-05-03 Enterprise privacy system and method

Country Status (1)

Country Link
CA (1) CA2345805A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090228990A1 (en) * 2006-07-03 2009-09-10 Weifeng Chen System and method for privacy protection using identifiability risk assessment
WO2009127771A1 (en) * 2008-04-16 2009-10-22 Nokia Corporation Privacy management of data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090228990A1 (en) * 2006-07-03 2009-09-10 Weifeng Chen System and method for privacy protection using identifiability risk assessment
US20120291144A1 (en) * 2006-07-03 2012-11-15 International Business Machines Corporation System and method for privacy protection using identifiability risk assessment
US8332959B2 (en) * 2006-07-03 2012-12-11 International Business Machines Corporation System and method for privacy protection using identifiability risk assessment
US8429758B2 (en) * 2006-07-03 2013-04-23 International Business Machines Corporation System and method for privacy protection using identifiability risk assessment
WO2009127771A1 (en) * 2008-04-16 2009-10-22 Nokia Corporation Privacy management of data
US8397304B2 (en) 2008-04-16 2013-03-12 Nokia Corporation Privacy management of data

Similar Documents

Publication Publication Date Title
US11790117B2 (en) Systems and methods for enforcing privacy-respectful, trusted communications
CN111149332B (en) System and method for implementing centralized privacy control in decentralized systems
US20220050921A1 (en) Systems and methods for functionally separating heterogeneous data for analytics, artificial intelligence, and machine learning in global data ecosystems
US8042193B1 (en) Systems and methods for controlling data access by use of a universal anonymous identifier
US9361481B2 (en) Systems and methods for contextualized data protection
US9087216B2 (en) Dynamic de-identification and anonymity
US20180307859A1 (en) Systems and methods for enforcing centralized privacy controls in de-centralized systems
US9087215B2 (en) Dynamic de-identification and anonymity
AU2009332566B2 (en) Double blinded privacy-safe distributed data mining protocol
US8595857B2 (en) Persona-based identity management system
US20100223349A1 (en) System, method and apparatus for message targeting and filtering
US11170130B1 (en) Apparatus, systems and methods for storing user profile data on a distributed database for anonymous verification
EP3811265A1 (en) Systems and methods for enforcing privacy-respectful, trusted communications
Rai Ephemeral pseudonym based de-identification system to reduce impact of inference attacks in healthcare information system
Youm An overview of de-identification techniques and their standardization directions
Hicks et al. Vams: Verifiable auditing of access to confidential data
CA2345805A1 (en) Enterprise privacy system and method
Guesdon et al. Securizing data linkage in french public statistics
Oh et al. Data De-identification Framework.
KR102308528B1 (en) Electronic voting system and Electronic voting method
Boyens Privacy trade-offs in Web-based services
Ali et al. SAFEGUARDING PRIVACY AND ACCOMPLISHING DATA TRUTHFULNESS IN DATA MARKETS
CN117436107A (en) Logistics information encryption method and system
WO2008069825A2 (en) System and method of providing unique personal identifiers for use in the anonymous and secure exchange of data
Anitha et al. Enhanced Batch Generation based Multilevel Trust Privacy Preserving in Data Mining

Legal Events

Date Code Title Description
FZDE Discontinued
FZDE Discontinued

Effective date: 20040310