WO2023047114A1 - Process for embedding a digital watermark in tokenised data - Google Patents

Process for embedding a digital watermark in tokenised data Download PDF

Info

Publication number
WO2023047114A1
WO2023047114A1 PCT/GB2022/052401 GB2022052401W WO2023047114A1 WO 2023047114 A1 WO2023047114 A1 WO 2023047114A1 GB 2022052401 W GB2022052401 W GB 2022052401W WO 2023047114 A1 WO2023047114 A1 WO 2023047114A1
Authority
WO
WIPO (PCT)
Prior art keywords
watermark
tokens
data
token
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2022/052401
Other languages
English (en)
French (fr)
Inventor
Paul Mellor
Sasi Kumar MURAKONDA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Privitar Ltd
Original Assignee
Privitar Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Privitar Ltd filed Critical Privitar Ltd
Priority to CA3231917A priority Critical patent/CA3231917A1/en
Priority to AU2022353195A priority patent/AU2022353195A1/en
Priority to EP22793782.8A priority patent/EP4405837A1/en
Priority to US18/693,056 priority patent/US20250005115A1/en
Priority to JP2024517449A priority patent/JP2024535885A/ja
Publication of WO2023047114A1 publication Critical patent/WO2023047114A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/321Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority
    • H04L9/3213Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority using tickets or tokens, e.g. Kerberos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3242Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving keyed hash functions, e.g. message authentication codes [MACs], CBC-MAC or HMAC

Definitions

  • WO2017093736A1 discloses a process of altering an original data set by combining data anonymization and digital watermarking.
  • the anonymisation of the original data set can be achieved using a tokenisation technique, where tokenised values are generated with a regular expression.
  • the regular expression must be known at the extraction time of the watermark.
  • the tokenisation technique used includes a central vault which can cause problems for customers who have high throughput needs, or a requirement to consistently tokenise values in remote locations.
  • Figure 12 shows a diagram illustrating the output with the final token ordinal.
  • Figure 14 shows a diagram illustrating the process of extracting a watermark.
  • Figure 16 shows a diagram illustrating the process of extracting a watermark on a set of parallel hash array.
  • Figure 19 shows a diagram illustrating the extraction token count requirements as the number of data releases grows.
  • Figure 23 shows a plot of the normalized computation time for watermark embedding.
  • Figure 24 shows a plot of the normalized computation time for watermark extraction.
  • Figure 25 shows the ultimate outcome of a watermark extraction performed with a confidence level of 95%, as the number of tokens processed in the extraction increases.
  • Figure 28 shows a diagram plotting false positive occurrence percentage for the same experiments.
  • Figure 30 shows results of an experiment with two data release watermarks mixed together.
  • Input space - a space of all possible inputs from a set of original data that might need to be tokenised.
  • the input space may be described using a regular expression. For example, when tokenising credit card numbers, a simple input space definition might be “[0-9] ⁇ 16 ⁇ ” - 16 decimal digits (this example ignores the complication that not all prefixes are valid, and the Luhn digit check, etc).
  • Data release - generally refers to any release of tokenised data to a particular recipient for a particular purpose.
  • Each data release is therefore associated with its own digital watermark.
  • the digital watermark may be a number or other ‘ID’ which is stored in a watermark registry alongside metadata.
  • Metadata may include for example the one or more recipients allowed to receive the data release, the purpose or intended use of the data release, how long the one or more recipients are legally allowed to retain the data, with whom they are allowed to share the data.
  • Previous watermarking technique allows the generation of these tokens to be controlled so that a pattern is embedded within them.
  • This pattern can be varied for each data release and allows a unique identifier for the release to be embedded within and across the data itself.
  • This identifier can be used as a pointer to an arbitrary store of metadata about the data release - the intended recipient and purpose of the release, its lineage including the privacy treatments that have been applied to it, the date by which the data must be deleted, etc.
  • This embedded pattern is probabilistic and is extractable from a sample of the generated tokens rather than being reliant on any individual tokens, so that it is still extractable from a sufficiently large subset of a data release.
  • Each set of individual Hash Array keys is generated using a scheme like HKDF (a simple key derivation function KDF based on HMAC message authentication code) that allows expansion of a single master key into many different derived keys (and the ability to efficiently obtain a specific key by providing the ‘ID’ of the key in the input key material).
  • KDF simple key derivation function
  • HMAC message authentication code a simple key derivation function based on HMAC message authentication code
  • Hash Array • Key Rolling: if each new Hash Array has its own key, any single Hash Array key is only in use for as long as the data releases within it are open and active (though old key versions must be kept around for as long as we wish to be able to extract watermarks generated using them).
  • the number of unique watermarks that can be embedded and the fraction of tokens rejected depend on the configuration parameters of the algorithm, which are:
  • the data set that we are attempting to extract a watermark from may not be a clean collection of tokens with no watermark tokens: it may have been doctored through the addition of new synthetic rows; it may be a combination of outputs from several data releases; or it may be that the assumptions made about the data shape when assigning the watermark inputs were not perfectly correct). Since the watermark is embedded using a secret key, it is not possible to craft noise that will be overrepresented in any particular bin without access to this key (either directly or indirectly through the watermark extraction function), which we assume is not available to anyone trying to erase a watermark.
  • Figure 18 shows three histograms of token counts within each bin for the case of a ‘pure’ watermark (18A), a watermark with noise ( 18B), and two mixed watermarks.
  • the p-value is defined as the probability of obtaining results at least as extreme as the observed results when the null hypothesis is true. In our case, this is the probability of getting the observed number of hashes (or fewer) in the bin when the data doesn’t contain a watermark corresponding to the bin.
  • the computed p-value must be less than or equal to the (Holm- Bonferroni corrected) significance level, thus the minimum number of tokens is the point where:
  • Getting an explicit expression for n that satisfies the above expression may be challenging, and so an estimation function instead may perform a brute force search over n and Ho find the number of tokens that satisfies the above inequality.
  • Figure 21 shows a diagram illustrating extraction token count requirements as the number of data releases grows.
  • Figure 22 shows a plot of the tokens required to extract the watermark at 99.9% confidence for the first Hash Array instance (supporting up to 256 data releases) as a function of the percentage of input noise.
  • Embedding a watermark requires no state to be stored in memory and so is unaffected.
  • Figure 25 shows the ultimate outcome of a watermark extraction performed with a confidence level of 95%, as the number of tokens processed in the extraction increases.
  • Each data point is the average of 10,000 experiments and shows the split of outcomes across three mutually exclusive possibilities: no results returned; only the correct data release returned; the correct data release and an incorrect data release returned (note that there is a theoretical fourth outcome - only incorrect data releases returned - but this never occurred).
  • Figure 28 shows the false positive occurrence percentage for the same experiments (here a false positive is recorded whenever at least one erroneous data release was returned, regardless of whether the correct data release was also returned).
  • the false positive rate is bounded by the supplied confidence level (and that it does not depend on the level of noise).
  • the false positive rate is not shown in the graphs above but is bounded at 5% as expected.
  • Watermark tokens are assigned or determined dynamically at run time to avoid having to brute force search the entire token space.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Engineering (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)
PCT/GB2022/052401 2021-09-22 2022-09-22 Process for embedding a digital watermark in tokenised data Ceased WO2023047114A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CA3231917A CA3231917A1 (en) 2021-09-22 2022-09-22 Process for embedding a digital watermark in tokenised data
AU2022353195A AU2022353195A1 (en) 2021-09-22 2022-09-22 Process for embedding a digital watermark in tokenised data
EP22793782.8A EP4405837A1 (en) 2021-09-22 2022-09-22 Process for embedding a digital watermark in tokenised data
US18/693,056 US20250005115A1 (en) 2021-09-22 2022-09-22 Process for embedding a digital watermark in tokenised data
JP2024517449A JP2024535885A (ja) 2021-09-22 2022-09-22 トークン化されたデータにデジタルウォーターマークを埋め込むためのプロセス

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB202113485 2021-09-22
GB2113485.3 2021-09-22

Publications (1)

Publication Number Publication Date
WO2023047114A1 true WO2023047114A1 (en) 2023-03-30

Family

ID=83995444

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2022/052401 Ceased WO2023047114A1 (en) 2021-09-22 2022-09-22 Process for embedding a digital watermark in tokenised data

Country Status (6)

Country Link
US (1) US20250005115A1 (https=)
EP (1) EP4405837A1 (https=)
JP (1) JP2024535885A (https=)
AU (1) AU2022353195A1 (https=)
CA (1) CA3231917A1 (https=)
WO (1) WO2023047114A1 (https=)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118282779B (zh) * 2024-05-31 2024-07-26 杭州海康威视数字技术股份有限公司 基于神经网络的密态多媒体数据安全防御方法及装置
CN120374346B (zh) * 2025-06-26 2025-08-26 南京信息工程大学 基于生成对抗网络和多令牌的抗屏摄鲁棒水印方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017093736A1 (en) 2015-12-01 2017-06-08 Privitar Limited Digital watermarking without significant information loss in anonymized datasets
US20200327252A1 (en) * 2016-04-29 2020-10-15 Privitar Limited Computer-implemented privacy engineering system and method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI272547B (en) * 2005-07-28 2007-02-01 Academia Sinica Asymmetric watermarking
RU2008135353A (ru) * 2006-01-30 2010-03-10 Конинклейке Филипс Электроникс Н.В. (Nl) Поиск водяного знака в сигнале данных
EP2991028B1 (en) * 2014-08-29 2019-12-11 Thomson Licensing Method for watermarking a three-dimensional object and method for obtaining a payload from a threedimensional object
FI129030B (en) * 2020-04-09 2021-05-31 Veikkaus Oy Electronic depleting pool lottery
US20240211552A1 (en) * 2021-06-26 2024-06-27 Zhong Li System and Methods for Asset Management
CN119026095B (zh) * 2024-08-16 2025-09-26 山东大学 基于物理不可克隆函数水印和区块链的版权保护及溯源方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017093736A1 (en) 2015-12-01 2017-06-08 Privitar Limited Digital watermarking without significant information loss in anonymized datasets
US20200327252A1 (en) * 2016-04-29 2020-10-15 Privitar Limited Computer-implemented privacy engineering system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SOLTANI PANAH AREZOU ET AL: "On the Properties of Non-Media Digital Watermarking: A Review of State of the Art Techniques", IEEE ACCESS, vol. 4, 19 May 2016 (2016-05-19), pages 2670 - 2704, XP011614031, DOI: 10.1109/ACCESS.2016.2570812 *

Also Published As

Publication number Publication date
AU2022353195A1 (en) 2024-04-04
EP4405837A1 (en) 2024-07-31
AU2022353195A2 (en) 2024-05-09
CA3231917A1 (en) 2023-03-30
US20250005115A1 (en) 2025-01-02
JP2024535885A (ja) 2024-10-02

Similar Documents

Publication Publication Date Title
US20250328691A1 (en) Digital watermarking without significant information loss in anonymized datasets
Mandal et al. Symmetric key image encryption using chaotic Rossler system
US20250005115A1 (en) Process for embedding a digital watermark in tokenised data
Chang et al. Hiding secret points amidst chaff
US7730037B2 (en) Fragile watermarks
CN119150329B (zh) 固态硬盘的数据加密方法及系统
JP2019508832A (ja) データベース・テーブル、テキスト・ファイル、及びデータ・フィード中におけるソルティング・テキスト及びフィンガープリンティング
EP4673851A1 (en) Process for embedding a digital watermark within generated content
Breitinger et al. Security and implementation analysis of the similarity digest sdhash
WO2021115589A1 (en) Devices and methods for applying and extracting a digital watermark to a database
Hadian Dehkordi et al. Changeable essential threshold secret image sharing scheme with verifiability using bloom filter
CN118590587A (zh) 基于递归msb平面预测的高容量加密图像可逆信息隐藏方法
Esponda Hiding a needle in a haystack using negative databases
US20110123023A1 (en) Apparatus for video encryption by randomized block shuffling and method thereof
CN115834792A (zh) 基于人工智能的视频数据处理方法及系统
GB2611640A (en) Watermarking of genomic sequencing data
Zhang et al. HOPE-L: A Lossless Database Watermarking Method in Homomorphic Encryption Domain
CN114124469A (zh) 数据处理的方法、装置和设备
Yang et al. An efficient PIR construction using trusted hardware
KR101895848B1 (ko) 문서 보안 방법
CN116992495B (zh) 办公室文件加密存储方法、系统、存储介质及电子设备
Ker Information hiding
JPWO2023047114A5 (https=)
Parameswaran Learning With Errors Parameter Analysis
CN117909551A (zh) 加密数据检索方法、数据加密方法和数据库管理系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22793782

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 3231917

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2024517449

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 18693056

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2022353195

Country of ref document: AU

Ref document number: AU2022353195

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 2022353195

Country of ref document: AU

Date of ref document: 20220922

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2022793782

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022793782

Country of ref document: EP

Effective date: 20240422