WO2023047114A1 - Process for embedding a digital watermark in tokenised data - Google Patents
Process for embedding a digital watermark in tokenised data Download PDFInfo
- Publication number
- WO2023047114A1 WO2023047114A1 PCT/GB2022/052401 GB2022052401W WO2023047114A1 WO 2023047114 A1 WO2023047114 A1 WO 2023047114A1 GB 2022052401 W GB2022052401 W GB 2022052401W WO 2023047114 A1 WO2023047114 A1 WO 2023047114A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- watermark
- tokens
- data
- token
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/321—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority
- H04L9/3213—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority using tickets or tokens, e.g. Kerberos
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3236—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
- H04L9/3242—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving keyed hash functions, e.g. message authentication codes [MACs], CBC-MAC or HMAC
Definitions
- WO2017093736A1 discloses a process of altering an original data set by combining data anonymization and digital watermarking.
- the anonymisation of the original data set can be achieved using a tokenisation technique, where tokenised values are generated with a regular expression.
- the regular expression must be known at the extraction time of the watermark.
- the tokenisation technique used includes a central vault which can cause problems for customers who have high throughput needs, or a requirement to consistently tokenise values in remote locations.
- Figure 12 shows a diagram illustrating the output with the final token ordinal.
- Figure 14 shows a diagram illustrating the process of extracting a watermark.
- Figure 16 shows a diagram illustrating the process of extracting a watermark on a set of parallel hash array.
- Figure 19 shows a diagram illustrating the extraction token count requirements as the number of data releases grows.
- Figure 23 shows a plot of the normalized computation time for watermark embedding.
- Figure 24 shows a plot of the normalized computation time for watermark extraction.
- Figure 25 shows the ultimate outcome of a watermark extraction performed with a confidence level of 95%, as the number of tokens processed in the extraction increases.
- Figure 28 shows a diagram plotting false positive occurrence percentage for the same experiments.
- Figure 30 shows results of an experiment with two data release watermarks mixed together.
- Input space - a space of all possible inputs from a set of original data that might need to be tokenised.
- the input space may be described using a regular expression. For example, when tokenising credit card numbers, a simple input space definition might be “[0-9] ⁇ 16 ⁇ ” - 16 decimal digits (this example ignores the complication that not all prefixes are valid, and the Luhn digit check, etc).
- Data release - generally refers to any release of tokenised data to a particular recipient for a particular purpose.
- Each data release is therefore associated with its own digital watermark.
- the digital watermark may be a number or other ‘ID’ which is stored in a watermark registry alongside metadata.
- Metadata may include for example the one or more recipients allowed to receive the data release, the purpose or intended use of the data release, how long the one or more recipients are legally allowed to retain the data, with whom they are allowed to share the data.
- Previous watermarking technique allows the generation of these tokens to be controlled so that a pattern is embedded within them.
- This pattern can be varied for each data release and allows a unique identifier for the release to be embedded within and across the data itself.
- This identifier can be used as a pointer to an arbitrary store of metadata about the data release - the intended recipient and purpose of the release, its lineage including the privacy treatments that have been applied to it, the date by which the data must be deleted, etc.
- This embedded pattern is probabilistic and is extractable from a sample of the generated tokens rather than being reliant on any individual tokens, so that it is still extractable from a sufficiently large subset of a data release.
- Each set of individual Hash Array keys is generated using a scheme like HKDF (a simple key derivation function KDF based on HMAC message authentication code) that allows expansion of a single master key into many different derived keys (and the ability to efficiently obtain a specific key by providing the ‘ID’ of the key in the input key material).
- KDF simple key derivation function
- HMAC message authentication code a simple key derivation function based on HMAC message authentication code
- Hash Array • Key Rolling: if each new Hash Array has its own key, any single Hash Array key is only in use for as long as the data releases within it are open and active (though old key versions must be kept around for as long as we wish to be able to extract watermarks generated using them).
- the number of unique watermarks that can be embedded and the fraction of tokens rejected depend on the configuration parameters of the algorithm, which are:
- the data set that we are attempting to extract a watermark from may not be a clean collection of tokens with no watermark tokens: it may have been doctored through the addition of new synthetic rows; it may be a combination of outputs from several data releases; or it may be that the assumptions made about the data shape when assigning the watermark inputs were not perfectly correct). Since the watermark is embedded using a secret key, it is not possible to craft noise that will be overrepresented in any particular bin without access to this key (either directly or indirectly through the watermark extraction function), which we assume is not available to anyone trying to erase a watermark.
- Figure 18 shows three histograms of token counts within each bin for the case of a ‘pure’ watermark (18A), a watermark with noise ( 18B), and two mixed watermarks.
- the p-value is defined as the probability of obtaining results at least as extreme as the observed results when the null hypothesis is true. In our case, this is the probability of getting the observed number of hashes (or fewer) in the bin when the data doesn’t contain a watermark corresponding to the bin.
- the computed p-value must be less than or equal to the (Holm- Bonferroni corrected) significance level, thus the minimum number of tokens is the point where:
- Getting an explicit expression for n that satisfies the above expression may be challenging, and so an estimation function instead may perform a brute force search over n and Ho find the number of tokens that satisfies the above inequality.
- Figure 21 shows a diagram illustrating extraction token count requirements as the number of data releases grows.
- Figure 22 shows a plot of the tokens required to extract the watermark at 99.9% confidence for the first Hash Array instance (supporting up to 256 data releases) as a function of the percentage of input noise.
- Embedding a watermark requires no state to be stored in memory and so is unaffected.
- Figure 25 shows the ultimate outcome of a watermark extraction performed with a confidence level of 95%, as the number of tokens processed in the extraction increases.
- Each data point is the average of 10,000 experiments and shows the split of outcomes across three mutually exclusive possibilities: no results returned; only the correct data release returned; the correct data release and an incorrect data release returned (note that there is a theoretical fourth outcome - only incorrect data releases returned - but this never occurred).
- Figure 28 shows the false positive occurrence percentage for the same experiments (here a false positive is recorded whenever at least one erroneous data release was returned, regardless of whether the correct data release was also returned).
- the false positive rate is bounded by the supplied confidence level (and that it does not depend on the level of noise).
- the false positive rate is not shown in the graphs above but is bounded at 5% as expected.
- Watermark tokens are assigned or determined dynamically at run time to avoid having to brute force search the entire token space.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Power Engineering (AREA)
- Editing Of Facsimile Originals (AREA)
- Image Processing (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CA3231917A CA3231917A1 (en) | 2021-09-22 | 2022-09-22 | Process for embedding a digital watermark in tokenised data |
| AU2022353195A AU2022353195A1 (en) | 2021-09-22 | 2022-09-22 | Process for embedding a digital watermark in tokenised data |
| EP22793782.8A EP4405837A1 (en) | 2021-09-22 | 2022-09-22 | Process for embedding a digital watermark in tokenised data |
| US18/693,056 US20250005115A1 (en) | 2021-09-22 | 2022-09-22 | Process for embedding a digital watermark in tokenised data |
| JP2024517449A JP2024535885A (ja) | 2021-09-22 | 2022-09-22 | トークン化されたデータにデジタルウォーターマークを埋め込むためのプロセス |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB202113485 | 2021-09-22 | ||
| GB2113485.3 | 2021-09-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023047114A1 true WO2023047114A1 (en) | 2023-03-30 |
Family
ID=83995444
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/GB2022/052401 Ceased WO2023047114A1 (en) | 2021-09-22 | 2022-09-22 | Process for embedding a digital watermark in tokenised data |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US20250005115A1 (https=) |
| EP (1) | EP4405837A1 (https=) |
| JP (1) | JP2024535885A (https=) |
| AU (1) | AU2022353195A1 (https=) |
| CA (1) | CA3231917A1 (https=) |
| WO (1) | WO2023047114A1 (https=) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118282779B (zh) * | 2024-05-31 | 2024-07-26 | 杭州海康威视数字技术股份有限公司 | 基于神经网络的密态多媒体数据安全防御方法及装置 |
| CN120374346B (zh) * | 2025-06-26 | 2025-08-26 | 南京信息工程大学 | 基于生成对抗网络和多令牌的抗屏摄鲁棒水印方法 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017093736A1 (en) | 2015-12-01 | 2017-06-08 | Privitar Limited | Digital watermarking without significant information loss in anonymized datasets |
| US20200327252A1 (en) * | 2016-04-29 | 2020-10-15 | Privitar Limited | Computer-implemented privacy engineering system and method |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI272547B (en) * | 2005-07-28 | 2007-02-01 | Academia Sinica | Asymmetric watermarking |
| RU2008135353A (ru) * | 2006-01-30 | 2010-03-10 | Конинклейке Филипс Электроникс Н.В. (Nl) | Поиск водяного знака в сигнале данных |
| EP2991028B1 (en) * | 2014-08-29 | 2019-12-11 | Thomson Licensing | Method for watermarking a three-dimensional object and method for obtaining a payload from a threedimensional object |
| FI129030B (en) * | 2020-04-09 | 2021-05-31 | Veikkaus Oy | Electronic depleting pool lottery |
| US20240211552A1 (en) * | 2021-06-26 | 2024-06-27 | Zhong Li | System and Methods for Asset Management |
| CN119026095B (zh) * | 2024-08-16 | 2025-09-26 | 山东大学 | 基于物理不可克隆函数水印和区块链的版权保护及溯源方法 |
-
2022
- 2022-09-22 WO PCT/GB2022/052401 patent/WO2023047114A1/en not_active Ceased
- 2022-09-22 AU AU2022353195A patent/AU2022353195A1/en active Pending
- 2022-09-22 JP JP2024517449A patent/JP2024535885A/ja active Pending
- 2022-09-22 US US18/693,056 patent/US20250005115A1/en active Pending
- 2022-09-22 CA CA3231917A patent/CA3231917A1/en active Pending
- 2022-09-22 EP EP22793782.8A patent/EP4405837A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017093736A1 (en) | 2015-12-01 | 2017-06-08 | Privitar Limited | Digital watermarking without significant information loss in anonymized datasets |
| US20200327252A1 (en) * | 2016-04-29 | 2020-10-15 | Privitar Limited | Computer-implemented privacy engineering system and method |
Non-Patent Citations (1)
| Title |
|---|
| SOLTANI PANAH AREZOU ET AL: "On the Properties of Non-Media Digital Watermarking: A Review of State of the Art Techniques", IEEE ACCESS, vol. 4, 19 May 2016 (2016-05-19), pages 2670 - 2704, XP011614031, DOI: 10.1109/ACCESS.2016.2570812 * |
Also Published As
| Publication number | Publication date |
|---|---|
| AU2022353195A1 (en) | 2024-04-04 |
| EP4405837A1 (en) | 2024-07-31 |
| AU2022353195A2 (en) | 2024-05-09 |
| CA3231917A1 (en) | 2023-03-30 |
| US20250005115A1 (en) | 2025-01-02 |
| JP2024535885A (ja) | 2024-10-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250328691A1 (en) | Digital watermarking without significant information loss in anonymized datasets | |
| Mandal et al. | Symmetric key image encryption using chaotic Rossler system | |
| US20250005115A1 (en) | Process for embedding a digital watermark in tokenised data | |
| Chang et al. | Hiding secret points amidst chaff | |
| US7730037B2 (en) | Fragile watermarks | |
| CN119150329B (zh) | 固态硬盘的数据加密方法及系统 | |
| JP2019508832A (ja) | データベース・テーブル、テキスト・ファイル、及びデータ・フィード中におけるソルティング・テキスト及びフィンガープリンティング | |
| EP4673851A1 (en) | Process for embedding a digital watermark within generated content | |
| Breitinger et al. | Security and implementation analysis of the similarity digest sdhash | |
| WO2021115589A1 (en) | Devices and methods for applying and extracting a digital watermark to a database | |
| Hadian Dehkordi et al. | Changeable essential threshold secret image sharing scheme with verifiability using bloom filter | |
| CN118590587A (zh) | 基于递归msb平面预测的高容量加密图像可逆信息隐藏方法 | |
| Esponda | Hiding a needle in a haystack using negative databases | |
| US20110123023A1 (en) | Apparatus for video encryption by randomized block shuffling and method thereof | |
| CN115834792A (zh) | 基于人工智能的视频数据处理方法及系统 | |
| GB2611640A (en) | Watermarking of genomic sequencing data | |
| Zhang et al. | HOPE-L: A Lossless Database Watermarking Method in Homomorphic Encryption Domain | |
| CN114124469A (zh) | 数据处理的方法、装置和设备 | |
| Yang et al. | An efficient PIR construction using trusted hardware | |
| KR101895848B1 (ko) | 문서 보안 방법 | |
| CN116992495B (zh) | 办公室文件加密存储方法、系统、存储介质及电子设备 | |
| Ker | Information hiding | |
| JPWO2023047114A5 (https=) | ||
| Parameswaran | Learning With Errors Parameter Analysis | |
| CN117909551A (zh) | 加密数据检索方法、数据加密方法和数据库管理系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22793782 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 3231917 Country of ref document: CA |
|
| ENP | Entry into the national phase |
Ref document number: 2024517449 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18693056 Country of ref document: US |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022353195 Country of ref document: AU Ref document number: AU2022353195 Country of ref document: AU |
|
| ENP | Entry into the national phase |
Ref document number: 2022353195 Country of ref document: AU Date of ref document: 20220922 Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022793782 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022793782 Country of ref document: EP Effective date: 20240422 |