WO2003001402A2

WO2003001402A2 - System for implementing privacy policies written using a markup language

Info

Publication number: WO2003001402A2
Application number: PCT/CA2002/000983
Authority: WO
Inventors: Alexis Smirnov; Philippe Boucher; Corey Velan; Roger Mcfarlane
Original assignee: Zero Knowledge Systems Inc.
Priority date: 2001-06-26
Filing date: 2002-06-26
Publication date: 2003-01-03
Also published as: CA2351696A1; AU2002312692A1; WO2003001402A3

Abstract

A method for the distribution of privacy policies and their interpretation throughout an enterprise, comprising encoding the privacy principles and data handling practices of an enterprise; and representing the encoded information as an xml document, whereby the document is distributed throughout the enterprise.

Description

SYSTEM FOR IMPLEMENTING PRIVACY POLICIES WRITTEN USING A

MARKUP LANGUAGE

The present invention relates to a system and method for managing privacy policies, and more particularly to a system and method for creating, distributing, and using a structured privacy policy in an enterprise.

BACKGROUND OF THE INVENTION

Privacy has become a pressing operational issue for businesses, and many have already

begun re-engineering their information systems and data-handling practices to deal with

the issue effectively and efficiently.

Corporate privacy programs and infrastructures can be said to evolve over five stages, as outlined in Table 1.

It is thus desirable for an enterprise privacy management system to fulfil the following goals. Firstly, privacy policies must be digital. Since plain text cannot be read and understood by enterprise data applications, privacy policies should be expressed in a machine-readable form. Once digital and machine-readable, policies can be easily catalogued, updated, modified, and referenced for audit and assessment purposes. XML (extensible markup language) has quickly emerged as the universal format for data interchange and is therefore the most suitable.

Secondly, data-handling practices must also be digital. Today, most companies struggle with ways to best track and understand their data-handling practices. The sheer magnitude of this task makes the need for digital models even more apparent. To evaluate its own compliance with stated policies, a company must ask itself a series of questions: Do any of our current business activities violate the company's privacy policy? Will any planned or proposed activities violate policy? If a new policy is to be introduced, which departments and programs will be impacted? If a new regulation is passed, which policies will need to be modified? Which practices? Modeled together, for true gap analysis or potential conflict identification to be possible.

Thirdly, privacy tools must incorporate privacy intelligence. The automation of privacy enforcement will raise the stakes significantly for authors of policy, since the policy that will be created will be consumed automatically by mission-critical applications. Before a digital policy can be pressed into service, several issues must be resolved: Are all of the policies consistent with each other? Do they overlap or conflict with one another? Have the desired (and required) business practices been tested against policy prior to "going live" with the policy? Are the policies consistent with relevant external regulations, contractual obligations, and industry guidelines? It is important to note that privacy introduces a set of concepts like customer notification, customer permission, and purpose of data use that have not yet been addressed by other types of "policy" tools, such as network access control. Effective tools to create digital privacy policy can only be developed by marrying both technical and privacy policy expertise. SUMMARY OF THE INVENTION

In accordance with this invention there is provided a method for the distribution of privacy policies and their interpretation throughout an enterprise, comprising:

(a) encoding the privacy principles and data handling practices of an enterprise; and

(b) representing the encoded information as an xml document, whereby the document is distributed throughout the enteφrise.

In accordance with another aspect of the invention there is provided a system for generating the xml document.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features of the preferred embodiments of the invention will become more apparent in the following detailed description in which reference is made to the appended drawings wherein:

Figure 1 is a schematic diagram of the components of an enterprise privacy management (EPM) system according to the present invention; and

Figures 2-6 shows UML static diagrams for the PRML.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In the following description like numerals refer to like structures in the drawings.

Referring to figure 1 there is shown a schematic diagram of the components of an enterprise privacy management (EPM) system 100 according to an embodiment of the present invention. The EPM system 100 comprises a console 110 which provides a platform for managing enterprise wide personal information (PI) resources; an EPM server; customer-facing systems such as audit, preference, specialized apps; "back-end" systems such as transaction processing, billing, ERP, manufacturing; "front office" applications such as a customer relationship manager (CRM) and a web office such as web services or partner web sites and one or more clients.

The EPM system 100 includes a language for defining the data exchange called "privacy rights markup language" PRML which provides a standardized mechanism for the components to communicate with each other.

Each of the components will be discussed in detail below.

The console consists of an inference engine a consistency engine, collaboration tools, a user interface, and a set of tools for dealing with structured information, on which all of the other portions rely. It also includes a variety of other components, such as an installer and help file. The console is written in a mixture of C using Microsoft libraries and HTML.

The EPM server provides support for integrity, collaboration, discovery and distribution. The EPM system responds to information queries called "requests". The EPM server consists of a web server, a request service, request forms, a request repository, and discovery agents. The web server provides basic HTTP protocol support to the request service. All communication between clients and the request service is via HTTP using SOAP(Simple Object Access Protocol). Secondly, the web server hosts web forms, servlets and scripts to provide a UI for fulfillment of the request.

In use, the client sends a request for specific information, such as details about what data is contained in a particular database, to a request service in the EPM server. The request service processes the requests sent by the client and directs the request to the intended- recipient. The request is stored in the server-side repository. When the recipient completes the request, the results are also stored in the repository. The result is forwarded to the client where it is integrated with the client's data set. The request service may also direct the request to another user if so desired. Similarly the request may be directed to a discovery service. In this case the discovery service runs the process on some target system such as a database, web server or directory server. Once again the completed request result is stored in the repository. The discovery service can also expose its interface to request recipients

As may be seen the request service is the core of the console on the server-side. This component listens for the client calls via HTTP and responds accordingly. The main communications between a client and the server include (a) the client sending a new request to the service; (b) the Client enumerates all requests that match certain criteria (ex: give me all uncompleted requests). Discovery services are J2EE-based applications. Each service includes its own web-based UI, the discovery and persistence logic.

As mentioned above PRML provides a standardized mechanism for the components to communicate with each other. The console and the EPM server may use a variety of protocols to communicate, including SOAP and a version control protocol, such as CVS (concurrent versioning system). SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of three parts: an envelope that defines a framework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of application-defined datatypes, and a convention for representing remote procedure calls and responses. CVS provides support for document version management activity. Those activities include putting files into a repository, getting files, making changes to them, and committing those changes to one or more branches. All of these facilities are available to one or more users on one or more hosts. It also offers management interfaces that allow examination of the history and content of file creation, modification, and deletion; comparisons between arbitrary file versions by date, author, or version; security and access control around each of these facilities; and management facilities for the import and export of files into different repositories. The EPM server distributes information about how to implement a privacy policy to a variety of systems (back-end, front-office, web-office) through a variety of mechanisms (directory, web server), both push and pull based, using the PRML markup language.

The preferred pull mechanism is using SOAP; the prefferred push mechanisms are via HTTP POST and push to a directory, such as LDAP. Software (libraries) on these other systems uses the information in these policies to make decisions of various sorts, especially of the form "allow or deny this action." A closely related decision is "This

rule would allow/deny this action." This allows for the use of installation modes, audits, upgrades, etc.

These libraries are constructed variously of C, C++, Java, and Perl, depending on the calling code. These libraries take the information contained within the documents, and construct an interface suitable for the calling program to use. For example, if the calling code is C, a library might offer a function PrivacyCheck(*data, *actor, action, Recipient, purpose). If the code is object-oriented, it might construct an object which collects such data.

The process of implementing a PRML-driven privacy policy consists of the creation of a policy, its distribution, and its use.

I. Creation. The creation of a privacy policy involves, at its essesnse, the decision of what can and can not be allowed within or between an enterprise, its partners and suppliers, for privacy reasons. This usually proceeds via a manual process of inventorying data, processes, activities, conditions, etc. These data are correlated and analyzed, and a policy is produced. ^' New software aims to make this task easier. Such software can also output structured information, such as a PRML policy, for distribution of the policy information it has gathered. It is also possible to create such policy files with a standard text. SGML or XML editing program. The creation phase ends when a properly crafted XML file has been created. II. Distribution. Once a policy has been created, it must be distributed to the software which an organization wishes to see it controlled. Any of the standard methods for distribution of XML may be used. There must be software installed on the receiving host to ensure that the XML can be processed. That software will take the XML, and provide it in a useful form to the data processing software, at points chosen during the systems integration phase. Those points will be functoin calls, or object invocation, or assertion-tests, or other appropriate methods for the caller. The caller will pass a pre-determined set of information for decision making, such as (actor, data, recipient, action, purpose). These data will be tested against the policies encoded in the XML, and a result, perhaps as simple as yes/no, perhaps more complex will be returned. More complex responses might include "This purpose would be acceptable" or "This subset of data is the cause of the refusal.

III. Use. Once a policy has been delivered, it may be used in one of several ways. There should be an advisory mode, which can be used either in the development and testing phase, where the calls are of an advisory nature, rather than an enforcing nature. This can be implemented by a configuration file, or by an extra argument to calls. A simple configuration file may make it easier to change everything at once, whereas a more complex configuration file might support the use

^' of symbolic arguments in the process calls. However, the ultimate use of the language is to control the flow and use of personal data. This is done either pro- actively, by preventing it, or retro-actively, via audit trails and the like.

Example of use:

A programmer, working for a hospital, has created a marketing application that will send email on behalf of the firm and its affiliates for a variety of purposes. He as implemented privacy via PRML. There are a number of phases that the data goes through. We will illustrate those that interact with privacy settings.

We assume that there are only a few rules in the XML. For ease of reabdability, we present them in a format closer to English. A competent programmer will easily see the relationship. "1. The Hospital may send email to customers if opt-in"

"2. any actor may not take any action for any purpose"

"3. Statement 1 has higher precedence than statement 2"

"4. Opt-in information is stored in http ://CRM.example.org/privacy/opt-in/recipient-id "

These rules are accompanied by sufficient backing information for them to be interperable.

Gathering a list of names. The program is provided with a list of names. At this stage, the programmer assembles a set of parameters with which to decide if the program should move ahead. The programmer calls a function PrivacyCheck with arguments (client (hospital), purpose (marketing), action (send- email), recipient (data-subject)). The Privacy library parses these arguments, and checks them against an internal model of the privacy policy. There are several ways to create such a model, including truth tables, nested-conditionals, etc, with various benefits to each. The privacy library take the arguments, and passing them through the nested conditionals, will then return either an ok or a not-ok message. In this particular instance, the rules will (internally to the library) parse as "ok by statement 1" "not ok by statement 2", and then statement 3 will resolve the conflict as statement 1 has precedence. So, the library will return the result of statement 1, which is that this is ok with an opt-in. The programmer will then ensure that his program calls the privacycheck function again on a per user basis.

Choosing who to email For each name on the list, the programmer will invoke PrivacyCheck(hospital, marketing, send-email, recipient-identifier(n) ). The libraries will then go and collect information from the URL given in rule 4 regarding the opt-in status of the recipient with identifier recipient-identifier, using standard HTTP methods. (Any other method of finding this information will be equivalent, we use a web example for readability.) There may be a privacycheck_prep call which causes the application to gather and cache some of this data for speed. • PRML authoring tool

The PRML authoring tool is a basic utility which facilitates the creation of PRML policies. It will allow a user to describe her organization's privacy and data handling practices and render them as a set of PRML documents which can be passed to the PRML compiler or to PRML aware software components which can then act on the policy.

• PRML compiler

The PRML compiler provides complex analysis of a PRML policy. It can be used to compute all implied statements within the policy, fully describe a role, identify how specific data items can be manipulated and by whom, etc. The compiler is used to make a policy completely explicit so that a PRML aware component does not need to do extensive computation in order to apply that policy to its functions.

The PRML will now be described in detail below under the following headings.

Introduction

1. Goals and Capabilities

,1.1. Rights management

1.2. Reporting Accountability

1.3. Rights interpretation

,1.4. Document extension

.2. Examples

.3. Terminology and Documentation Conventions

2. Technical Overview

2.1. UML Usage 2.2. UML to XML Mapping

2.3. PRML Document Structure

2.4. PRML within Zero-Knowledge Privacy Platform

2.5. PRML Authoring Tools

3. Object Dictionary

4. Privacy Declarations

5. ' Data references

6. Base declarations

6.1. Owner Access

6.2. Notice of policy amendments

Introduction

One of the key goals of privacy relationship management system is creating the ability for the enterprise to easily create and maintain privacy policies around sensitive data. A robust implementation of privacy practices within the organization requires all applications and tools that work with sensitive data adhere a comprehensive set of privacy policies. The Privacy Rights Markup Language (PRML) allows the formalization of privacy policies are they relate to the data and business processes.

The standard offers a comprehensive set of constructs in order to represent privacy policies in full compliance with the Fair Information Practices.

Optionally, privacy declarations can be linked with procedures to be executed by runtime environment. The link with runtime environment allows the privacy policy to specify constrains or actions to be evaluated dynamically. The privacy declarations of PRML are not only means for validate access todata by a certain user. The declaration can also be used as a mean todeclare what happens when a declaration takes effect.

In order to simplify the formalization of privacy policies, a framework of generic PRML objects and declarations is specified. The PRML declaration framework can be used in order to accelerate the creation of a new PRML policy. It can also be used as a set of guidelines to help to develop a new strong privacy policy. 1.1. Goals and Capabilities

1.1.1. Rights management

^" The language allows an organization to formalize its privacy policies.

PRML enables an application create declaration that may be offered to the PII owner for the purpose of giving consent. The language shall also allow the specification of policies around altering privacy policies themselves. For example PRML document may specify that a notice must follow any change to the privacy policy. The notice must be sent to all individuals who have agreed with the previous privacy policy.

1.1.2. Reporting Accountability

PRML should allow expressing the necessary information about what operation was performed by whom and why.

1.1.3. Rights interpretation

Objects such as operation, purpose and role are organized in hierarchies.

These hierarchies are defined in Object Dictionary. A single declaration may be expanded into a set of declarations.

PRML shall contain sufficient detail to allow expansion of high-level declarations into a set of low-level declarations. Consider the following example.

PRML document defines role hierarchy when the role 'doctor' has two children roles

'general-practitioner' and 'er-doctor'. A rule stating that a doctor can update patient profile can be expanded into two declarations: 'general practitioner can update patient's record' and 'ER doctor can update patient's record. 1.1.4. Document extension

A PRML document may not contain the full set of declarations or objects. A

mechanism for document extension shall be provided. 1.2. Examples

An example of personal record is a medical record containing patient's name, address and medical condition.

An example of operation on personal record is "view", "update" or

"delete".

An example of purpose of operation is "providing care" or "targeted marketing" An example of role is "practicing physician" or "data-mining company"

A declaration is a way of saying "I allow my physician to view and update my medical record for the purpose of providing care. I also allow the hospital administrator to see my address for the purpose of billing.

1.3. Terminology and Documentation Conventions

The terminology used for identification of language constructs comes from in part from the domain of Fair Information Practices. Terms such as 'dataschema' and data schema syntax are borrowed form P3P.

2. Technical Overview

2.1. Unified Modeling Language (UML) Usage

The objects and attributes of a PRML policy document are described informally in this specification with Unified Modeling Language (UML) static object model diagrams. The UML object diagrams capture the information and relationships which are then represented in XML format according to the PRML Document Type Definition (DTD) files. UML class diagrams capture the object types (classes), their attributes, the attribute types, and relationships between classes.

Inheritance relationships show how one object class (subclass) extends another object class (superclass) to contain both the data of the superclass and add additional attributes. For instance, PRML makes extensive use of the concept of mixing classes. A mixin class in one having orthogonal functionality to any other class such that its attributes and properties can simply be added to a derived class in order to add a well defined facet of functionality to the derived class. For example, almost all PRML constructs represent instances of Identifiable object. Also, PRML allows operations, purposes, and roles to each form their own hierarchy of extension. The object model represents this by each of them inheriting from an ExtendsSingle or ExtendsMultiple base.

Associations show how an object of one class references or contains other objects (of the some or of a different class). Associations have cardinality and navigation characteristics. Cardinality defines how many objects of one end of the association are associated with how many objects on the other end of the association. A cardinality or 1 would denote a mandatory association to 1 other object. A cardinality or n..m would denote that an object is associated with at least n objects and at most m objects.

Associations also indicate navigation direction. Please note that this information reflects the expression syntax of the language but is not necessarily indicative of the navigability of such relationships in the run-time environment in which a parsed and processed PRML document might be used. For instance, one can express in the language that a policy declaration is associated with a particular role, but not that a role is associated with a particular declaration. This dichotomy of expression exists both for economy or expression and to avoid redundancy. For this particular example, a PRML compiler or processing engine, in building the run-time model of the policy, can construct a bi-directional relationship; it does not need to be expressed directly in the language as it can be automatically inferred by the tools. 2.2. UML to XML Mapping

PRML is an XML application. In the future, it will be formally defined by an XML scheme. Currently, the XML representation is defined in XML DTD files. Some validation and data type knowledge that can be expressed in an XML Schema will be lost in the DTD representation. The XML representation is generated from the UML drawings according to a set of rules. These rules are based on those defined in the Customer Profile Exchange (CPExchange) specification and are described in the remainder of this section.

Firstly, a set of primitive data types are defined to indicate how

#PCDATA values should be constrained to match the XML Schema data types. Some of these are the built-in datatypes defined by the XML Scheme Datatypes standard. Others are PRML definitions of new XML Scheme generated data types. The intent of the constraints imposed by each data type is documented in this specification, or, in many cases, other standards are referenced. The XML 1.0 DTD cannot express the data type constraint; instead, the data type is merely represented with a parameter entity reference. For example:

<!— Primitive Types: they match the XML Scheme Data Types — >

<!ENTITY % timelnstant "#PCDATA">

A class may represented two parameter ENTITY definitions in the

DTDs, where warranted. One ENTITY expresses the content of the class (if any), while the other ENTITY expresses programmatic attributes of the class (if any). Subclass entities include the superclass entities. Data and relationships which are core to the language concepts are expressed as the content of the relevant class and are represented by element ENTITY definitions. XML attributes, on the other hand, are used to express meta-data about the construct, or instructions to the tools, which must process the construct. Where a class has member values, they are defined following the ENTITY definitions for the contents of that class. For example:

<!-- Identifiable Mix-in Class »> <!ENTITY % Identifiable " oid">

<!ELEMENT oid (%key;)>

<!- ExternalReference-Attr

(describes classes with meta-data telling the tool to import

data from an external resource -->

<!ENTITY % ExternalReference-Attrs " external-ref CDATA #IMPLIED">

<!ENTITY % Role-Set " role*">

<!ENTITY % Role-Set-Attrs " %ExternalReference-Attrs;, ...">

<!ELEMENT role-set (%Role-Set;)>

<! ATTLIST role-set (%Role-Set_Attrs)>

<!ENTITY % Role " %Identifiable;, ...">

<!ELEMENT role (%Role;)>

2.3. PRML Document Structure

PRML is Privacy Rights Modeling Language is a language describes the relationship between:

<*>, personal record

_j<*>j operation

!<* purpose of operation

The above relationship is called declaration. Declarations are used to express privacy rights of owners and other actors involved in handling of PII. If any of the declaration if more than one declaration is applicable to a particular relationship, the operation will be allowed if at least one of the declaration allows it.

In order words declarations are OR-ed together.

A typical PRML document is composed of three parts:

Object dictionary.

The object dictionary defines objects referenced declarations. The dictionary is separated in sets. Every set contains a collection of objects of the same type (ex: operations-set). Single object can be reference by multiple declarations.

Data schema.

Data schema section defines the data dictionary as it describes the existing data environment (database structure). The elements of data schema are referenced to create data elements for declarations. See section 5.

Declarations set.

Declaration set includes the collection of declarations. Declarations refer to objects found in the dictionary in order to specify the relations between them.

2.4. PRML within a Privacy Platform

PRML is used to describe privacy policies for the informed release of information to authorized parties. This markup language will interact with a number of components within a privacy platform. Refer to conespondent design documents for details on architecture of components mentioned in this section.

2.5. PRML Authoring Tools This is a standalone component which allows a CPO or other privacy rights administrator to easily define a PRML policy. This tool will generate a set of PRML documents, which can then be loaded into the PRML compiler and other tools. Ideally, this consists of a GUI, which manages the various PRML components, which can be created, the data schema, and the links between them. An authoring tool can also be as simple as an XML editor, which is working with the PRML DTD.

2.6. PRML Compiler

The PRML Compiler takes a PRML policy and assorted files and expands it to a set of privacy rights meta data. This information will enumerate all possible rules, which can be applied to data given the various roles, purposes, and declarations. This meta data is then further converted to a set of information, which the legacy database can use to implement the privacy policy in the case where the PRM is actually implemented by the legacy database system. It can also be further converted to data used by a standalone PRM in the case where the PRM is a separate component, which is contacted by a legacy database system.

2.7. PRML Conversion Tools

The conversion tools allow a set of PRML components to be expressed in different representation formats. Two immediate tools which can be built around the PRML compiler are:

PRML2P3P: This tool expresses the PRML policy as a set of P3P files.

There will be some information lost since PRML has a wider range of concepts that it can express.

PRML2natlang: When properly designed, PRML files can be processed to generate a natural language description of the policy. This tool takes a PRML file and creates this description. The above tools are based on XSLT templates. PRML's structure allows to create other XSLT templates to convert a PRML document in to a document in other format.

2.8. Privacy Rights Manager (PRM)

This component uses the data generated by the PRML compiler to decide whether or not information is released to a query. This can be implemented a number of ways and is not addressed in this document.

Relationship Management

Relationship management requires that long term relationship between users, owners, and specific roles be identified and kept up to date. This can be a fairly complex problem and is dependent on an application/entity to be able to keep track of this information accurately. An example of this it the PERSONAL-PHYSICIAN role. Every doctor is a personal-physician and every patient has a personal-physician, however the relationship management system must be able to link a specific patient to a specific doctor for this role in order to properly apply the privacy rules, which refer to this role.

2.9. Consent Management

Consent management requires a new data path, which allows information owners to consent to specific declarations stated in the PRML privacy policy.

2.10. Authentication System

The authentication system database must be augmented with the roles, purposes, and operations, which can be assigned to specific users of the application.

3. Object Dictionary This section describes the contents of object dictionary section of PRML file.

The purpose of object dictionary is to define all objects that make up declarations. The dictionary includes collections for:

- roles

operations

purposes

data elements

constraints

Every collection may refer to the external prml file. Roles, operations and purposes create conespondent ontology. An object within ontology extends another object higher in the ontology. For example operation 'send email' extends operation 'read email address'.

Every object in object dictionary has object ID (oid). The ODD is used in order to reference the object from the declaration. It is also used in order to specify the extended object to create ontology of objects.

The ID should be unique within the system. A PRML document may import whole or parts of object dictionary from a different file. This allows for creation of multiple sets of declarations based on the same object dictionary.

The static diagram is shown in figure 2.

4. Privacy Declarations

Privacy declaration creates a relationships between objects from different collections in the dictionary. Every declaration must specify one of from each collection. The static diagram is shown in figure 3.

5. Data references

5.1. PRML Data Definition

A UML static structure diagram of a document is shown in figure 4, a declaration in figure 5 and a dictionary in figure 6.

PRML data definitions consist of the following types of elements:

data-set This is a set of data items to which a particular PRML declaration applies. Data-sets contain one or more data items. Each <data-set> element must have an oid. This can be referred to within a declaration using a <data- set-id> element.

data This is a reference to a specific data record type. These refer to local or remote data-defs.

data-def A data-def optionally links a data record name to a structure definition which describes the record. If there is no link, the data record type exists but its description is unavailable or unused by the PRML policy.

data-struct A data-struct describes the columns which make up a data record.

Each data struct can optionally point to other local or remote data- structs to further refine the description of the record.

A PRML declaration will identify the record types to which it applies by specifying a <data-set-id> element, which refers to a <data-set>. This allows multiple declarations to refer to the same set of data-record. The <data-set> elements can include the import=URI attribute which will indicate that the specified record types are described in a <data-schema> element of the referenced document. Data- schemas should always be defined in a separate file, so this attribute should always be present. If it is not present, the PRML compiler will assume that the PRML document contains a <data-schema> that describes the <data> items. There can be one <data-set-id> per declaration.

Each <data-set> contains one or more <data> elements. Each <data> element must contain a <name> element which refers to a <data-def> or <data-struct> within the <data-schema>.

The <name> element as applied to the data definition has a special use beyond the normal one for PRML; it is used to link the data definitions and data structures together. Data definitions and structures are named according to a namespace convention which seperates parent objects by periods ("."). There are two reasons for this. It allows the names to map to a database system namespace and it allows an object to identify its children. This allows the data-schemas to refer to other data-schema documents. Examples:

vehicle.model

vehicle.year

vehicle.manufacturer.location

vehicle.manufacturer.company

When making reference to a <data-def> or <data-struct> which is contained in the document, you must use the URI convention of placing a hash ('#') character in front of the name. This character does not appear in the <name> element.

The <data-def> elements list all of the record types, which can exist under a particular schema. Each of these can optionally have their structure described through links to <data-struct> elements.

The <data-struct> elements describe the structure of various types of data record. Note that different data record types (as identified by the various <data- def> elements) can actually have the same structure simply by pointing to the same

<data-struct> root. Each <data-struct> can optionally point to a local or remote

<data-struct> that further defines the structure. The <data-def> and <data-struct> elements do not contain real data. They only describe the structure of the data records to which the PRML policies apply. In most cases it will not be nescessary to completely describe a data record beyond the name, which is need to identify it in the database.

5.2. Examples

This example shows how the various data reference and definition elements are put together to allow a PRML policy file to refer to data records. The following might be included inside a PRML declaration to identify the record types to which it applies. In this case, the records involved are "medical-history" and "insurance-coverage". These will be described in the <data-schema> section of the file "data-def.xml".

<data-set-id>DS000K/data-set-id>

</declaration>

<data-set import="data-def.xml">

<data><name>#medical-history</name></data>

<data><name>#insurance-coverage</name></data>

</data-set>

The "data-def.xml" file contains a <data-schema> section as follows:

<data-schema>

<data-def>

<name>insurance-coverage</name> </data-de£>

<data-def>

<name>medical-history</name>

<description>Lists known conditions and diagnoses</description>

<data-struct-ref>#med-cond</data-struct-ref>

</data-de£>

<data-struct>

<name>med-cond.condition</name>

<description>A chronic or recurring^'illness or condition</description>

</data-struct>

<data-struct>

<name>med-cont.incident</name>

<description>A one time illness or injury</description>

</data-struct>

<data-struct>

<name>med-cond.doctor-notes</name>

<data-stmct-ref http://someplace.com/schema#diagnosis</data-struct- re£>

</data-struct>

</data-schema> This schema defines two types of records, "insurance-coverage", and "medical-history". Since "insurance-coverage" does not have a <data-struct-ref> element, it is not further described and its structure is unknown for the purposes of the PRML policy. The "medical-condition" definition however, points to the "med-cond" data structures. This allows us to see the structure of a "medical-condition" record. All <data-structs> whose <name> elements contain the prefix "med-cond" belong to this record. In the case of "med-cond. doctor-notes", there is an additional description available, however it must be obtained from the file "schema", stored on the site "someplace.com". The "schema" file must contain <data-schema> which has one or more <data-struct>s with the prefic diagnosis". An example of what this file might contain:

<data-schema>

<data-struct>

<name>diagnosis:doctor</name>

<description>Identity of doctor making diagnosis</description>

</data-struct>

<data-struct>

<name>opinion</name>

<description>The doctor's diagnosis</description>

<data-struct>

<name>treatment</name>

<descriρtion>The doctor's suggested treatment</description>

</data-struct> :

</data-schema> When taken together, the <declaration> in the original PRML policy file applies to two record types, "medial-history" and "insurance-coverage". The "insurance-coverage" record type is not further described, however, the medical history record type has the following structure defined through two data-schemas:

medical-history, condition

medical-history.illness

medical-history.doctor-notes.doctor

medical-history.doctor-notes.opinion

medical-history, doctor-notes .treatment

Any of these names or prefices can be referenced by a <data> element in the

<data-set> of a <declaration>. The above declaration could therefore also reference items such as:

<data><name>medical-history.doctor-notes</name></data>

or

<data><name>medical-history.illness</data>

5.3. Converting a PRML data-schema to P3P

The PRML data reference and definition mechanism is strongly influenced by the one used by P3P. The following guidelines are provided to indicate the relationship and to assist in conversion from one to the other.

• PRML data definitions provide a name and an optional description. There is no "short-description" attribute, which can be specified so these are never generated when converting to a P3P data schema. • P3P defines an attribute "optional" for its DATA element while PRML does not. This attribute indicates whether or not a visitor to a site can withhold the specified piece of data. If not specified, it is set to "no". When converting from PRML to P3P, this value should be explicitly set to "no". Since PRML deals with releasing data rather than collecting it, a visitor to the site should be obliged to provide it. This should be examined further however.

• PRML does not define data categories. P3P attaches categories to DATA, DATA-DEF and or DATA-STRUCT elements in order to provide a hint regarding the intended use of the data. This must be specified somewhere inside a P3P data schema. How to do this from PRML is still an open issue, but one approach may be to use P3P's extension mechanism and assign the following for each DATA- DEF:

<CATEGORY><other-category>PRML Data Schema</other- category></CATEGORIES>

• The <data-set> element maps directly to DATA-GROUP.

<data-set> can specify an "import" attribute. This also maps directly to "base". It is assumed that the PRML data-schema will always be in a separate file. In this case, the link to that file will be identified through a "base" attribute specified for the <DATA-GROUP> element. If the PRML data-schema is exported to the P3P file itself, the "base" attribute value must be set to the empty string ("").

• When converting PRML <data to P3P <DATA>, the <name> element must be converted to the attribute "ref '.

• The <data-def> element maps to P3P's <DATA-DEF>. The <name> element becomes the "name" attribute and is transferred as is. The same thing is done for the <struct-ref> element; it becomes the "structref parameter. There is no equivalent to the "short-description" attribute. Since this is optional in P3P, the conversion process does not specify it. • The PRML <data-struct> elements map to P3P's <DATA- STRUCT> and are treated the same way as <data-def .

• Within PRML data definitions, instances of <description> elements become <LONG-DESCRIPTION> when transferred to P3P data schemas. 6. Base declarations

A certain number of declarations shall be present in any privacy policy that is to adhere to Fair Information Practices. This section defines such declaration in a general case.

The specification of a language without usage guidelines is difficult to use. The base declarations along with base objects create a framework for development of richer and customized declarations. The indented usage of the declarations in this section is to provide a starting point for privacy office and integrator to create specific corporate privacy policy.

6.1. Owner Access

The PII owner shall be able to access its personal data.

The PII owner shall be able to view the access log.

6.2. Notice of policy amendments

When a declaration is amended, all individuals that have consented to this declaration must be notified.

7.3. PRML Document Examples

The following examples are based on hypothetical, but non-trivial privacy policies. Note that every privacy policy and correspondent PRML document should be considered as fragments of a comprehensive set of policies. (Note: XML documents themselves are included in the development cvs tree)

7.3.1. Basic declarations

As specified in the section 0, every privacy policy should include some basic declarations in relation to the fair information practices.

7.3.2. Events and properties

The following statement is encoded in the PRML document below:

This e-mail address may be used for coreespondence regarding transaction number 1234 only, and is to be purged when transaction number 1234 is complete. In no case may this infonnation be retained after date D.

7.3.3. More events and properties

The following statement is encoded in the PRML document below:

This e-mail address may be used for correspondence regarding transaction number 1234, or for product recalls or other reports of serious safety or security issues regarding product X as purchased in transaction number 1234. The address is to be purged when product X is declared obsolete.

7.3.4. Extending purpose object

The following statement is encoded in the PRML document below:

This postal address may be used by corporation X to advertise products falling under SIC code blah.

7.3.5. Multiple declarations, data groups

The following statement is encoded in the PRML document below:

This name, patient room number, diagnosis code, physician's notes, and attached medical imaging may be provided to licensed health care professionals at hospital X for the purposes of treating the named patient. Authorization is not granted for access to the patient's billing information.

This diagnosis code, physician's diagnostic note, and list of provided treatments may be used by designated claims adjusters for companies in group foo, for evaluation of medical insurance claim number 69, provided that no PII is provided to the adjuster in a way that can be linked to this diagnosis code.

This name, address,, and authorized claim amount may be provided to designated check issuers for companies in group foo, provided that no medical diagnostic information is disclosed to the check issuer,

Information on claims paid is to be purged on date D;

7„3.6. Transformation setting for write operation

The following statement is encoded in the PRML document below:

This biometric information (which is to be stored only in hashed form), may be used by authentication service X for the purpose of validating access to Web sites certified by privacy auditor Y.

7.3.7. More transformation settings

The following statement is encoded in the PRML document below:

This survey response may be used for political advocacy when statistically aggregated with all other responses to this survey question.

7.3.8. Some more transformation settings

The following statement is encoded in the PRML document below:

This survey response may be used for political advocacy when statistically aggregated with all other responses to this survey question. 7.4. Relationship to Other Standards 7.4.1. P3P

Cover the relation with P3P, especially DATA-GROUP syntax. 7.4.2. CPExchange

PRML uses the same approach towards designing XML language. The language is generated from data model defined by UML static diagram. UML to XML mapping methodology is used to generate the DTD.

7.4.3. Datatypes

The following primitive and complex datatypes are used to constrain the #PCDATA content of elements and attributes. The primitive datatypes are defined in the W3C XML Schema:

Datatypes standard, h some cases the W3C XML Schema standard references another standard for the exact syntax of the datatype representation.

Although the invention has been described with reference to certain specific embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the spirit and scope of the invention as outlined in the claims appended hereto.

Claims

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. A method for the distribution of privacy policies and their interpretation throughout an enterprise, comprising:

(b) representing said encoded information as an xml document, whereby said document is distributed throughout said enterprise.