EP3348017A1 - A method of protecting data using compression algorithms - Google Patents
A method of protecting data using compression algorithmsInfo
- Publication number
- EP3348017A1 EP3348017A1 EP16843763.0A EP16843763A EP3348017A1 EP 3348017 A1 EP3348017 A1 EP 3348017A1 EP 16843763 A EP16843763 A EP 16843763A EP 3348017 A1 EP3348017 A1 EP 3348017A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- output
- dictionary
- algorithm
- input
- compression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/30—Compression, e.g. Merkle-Damgard construction
Definitions
- This invention relates to the fields of data compression and data encryption and particularly to the field of the security of data transmitted in electronic form.
- communication often takes the form of information embodied in electronic transmissions including transmissions along metal wires, glass fibres, by electromagnetic radiation through air, and in other ways.
- Electronic digital computers often facilitate the transmission and storage of such information, and respective computing devices might include desktop computers, laptops, net books, tablets, smart phones and other devices.
- Encryption means transforming data into a form that hides its information content.
- An encryption alphabet or cipher alphabet, is a set of codewords of one or more symbols each, that could be used to comprise part or all of a cryptogram.
- repetition called redundancy, is identified and removed by compression algorithms.
- a compression algorithm in any particular instance of its use may or may not be used for the purpose of compression.
- One method of removing repetition from data is for subsequent instances of a given symbol group to be replaced by references, also called pointers or addresses, back to the first instance.
- references also called pointers or addresses
- Such pointer-reference pairs for a given symbol group value typically change within the output of a respective compression algorithm because of buffering and performance or other considerations, and one respective type of buffering is referred to as the sliding window.
- the reference plus associated codes such as a code to identify the reference as a reference, comprise fewer bits than the symbol group referenced, then compression has been achieved, but this is not guaranteed.
- the pointer and the symbol group pointed to might be contained within a separate data structure called an external codec dictionary.
- an external codec dictionary In this case the pointers in the compressed data reference symbol groups inside the external dictionary, not symbol groups within the compressed stream or file itself. Because such dictionaries are external to the compressed stream it is possible that the compressed stream consists entirely of references.
- the pointer-reference pairs collectively comprise a codec dictionary, but one contained inside, or embodied within, the compressed stream or file.
- a pointer value may be repeated in a compressed stream but not refer to the same symbol group or symbol group value. For example, when input data is processed in blocks, when the referencing system is reset for each block, a given pointer will refer to a symbol group within the current block. Pointers in different blocks may have the same value but refer to different symbol group values.
- Some external dictionaries including the type described and illustrated in W098/39723 are such that all symbol groups in the original data can be replaced by references to symbol groups within the dictionary. Since such external-dictionary methods completely transform the original, they can be thought of as performing complete encryption, and the external dictionary is the encryption key, or shared secret.
- an encrypted transmission comprises addresses of information contained inside a separate codec dictionary structure, because the dictionary is the encryption key, it is not transmitted with the address stream.
- the codewords of the alphabet are the references (pointers, addresses) to the symbol groups also contained within the external dictionary.
- the codec dictionary is contained within the output of the compression step, that is, it is embodied within the compressed data.
- it comprises references to first instances of a symbol group along with the first instances themselves.
- the first instance remains and subsequent instances are replaced by pointers to the starting position of the first instance, say position 123, along with a count of the number of letters in the respective word, 8.
- One or two delimiter codes may be added to the pointer to identify it as a pointer.
- the pointer may indicate how many atomic symbols need to be passed in a backwards direction in order to arrive at the "I" of the first instance of "Internet", along with the integer 8.
- a third criticism of data compression as a form of encryption is that while compression algorithms typically remove redundancy, or repetition, they do not remove all redundancy, and that residual redundancy is a cryptographic weakness. For example, in some sliding window compression schemes, there may be repetition of pointer values within a given block of compressed data.
- patterns if they exist, are not inherited from the original data or data type. For example, they might be inherited without harm from structural elements of an external dictionary used as the encryption key. Furthermore, a cryptogram might inherit patterns from the original data but also different patterns from a second source such as an external dictionary such that a form of interference occurs between the two sets of patterns that makes decryption impracticable.
- the present invention is a method of protecting data in which the data is processed through a sequence of data compression algorithms the output of any one except the last being the input of the next.
- These algorithms are of basic types: those whose output embodies a codec dictionary, and those that use an external pre-existing codec dictionary and whose output does not embody a codec dictionary.
- the first purpose which is the purpose of the first type, is to remove redundancy from the original data. Redundancy is also refereed to as frequency patterns or patterns. Thus the first step may also be said to be one of changing, reducing or destroying patterns within the original data.
- One or more algorithms are of this first type.
- the second purpose which is the purpose of the second type, is to provide a cipher alphabet for encrypting the output of the last algorithm in the sequence of the first type.
- One algorithm is of the second type. Algorithms of first type would typically realise standard compression method such as RLE, LZ77, LZ78 or variant, in which the references and what they refer to (collectively, the codec dictionary) are contained within the output of the compression process and exist inside the compressed file or stream, as mentioned earlier.
- the second type employs a codec method such as that described in W098/39723 in which the codec dictionary is not embodied in the output of the compression process, and is a separate data structure accessed by the compression algorithm during compression in order to obtain the references, or codewords, to be used to comprise the encrypted data.
- the output of this compression step contains addresses of, or references to, places inside the separate codec dictionary which is the encryption key.
- the input to the first algorithm of the first type might already be compressed and algorithms of the first type might be skipped.
- algorithms of both the first and second types might be used whether or not the original data is already compressed.
- the same or similar codec algorithm might be used for both types of processing.
- the purpose of the first type is to remove redundancy and thus respective dictionaries have a mainly codec purpose.
- the dictionary provides a cipher alphabet that is used as an encryption key to encode the output of the last algorithm of the first type-
- the first such algorithm might employ LZ77 encoding which processes byte-sized units of input.
- a second algorithm may use Huffman encoding which employs bit-wise processing, and the output will contain one or more embedded dictionaries in the form of Huffman trees.
- the output of the algorithm of the second type will be the same on different occasions of encrypting the same original data.
- An additional step is applied to yield cryptograms composed of different address values when encrypting the same original data on different occasions.
- Algorithms of the present invention may relate to each other in batch mode or in stream mode.
- batch mode an algorithm processes an input file producing an output file, then this output file is the input of the next algorithm, if any.
- stream mode the next algorithm begins processing the output of the current algorithm before the current algorithm has finished processing its own input.
- the present invention has a general applicability in improving security and privacy for business, community and personal use of electronic communication generally.
- the present invention may be used in a variety of different ways whose primary utility may not be limited to or may not relate to privacy and security of information.
- the purpose and use of the present invention is therefore expressly not limited to the purpose and use exemplified in the embodiments described herein.
- FIG. 1 is a flow chart illustrating the step of creating a compressed stream embodying codec dictionaries developed from byte elements of the input data treated in blocks.
- FIG. 2 is a flow chart illustrating the step of creating a compressed stream embodying a codec dictionary developed from bit elements of the input data being the output data of FIG. 1.
- FIG. 3 is a flow chart illustrating the step of using an external codec dictionary as a cipher alphabet to encrypt the output of FIG. 2.
- FIG. 4 is a flow chart illustrating the function of modifying the selected symbols of the cipher alphabet of FIG. 3, which symbols are references to items contained within the external codec dictionary, for the purpose of yielding a different final symbol when encrypting the same original data on different occasions.
- FIG. 2 the output of FIG 1. is processed along the lines that the original data as input is processed in FIG. 1 except that the atomic unit of input data is the bit, and input blocks are not used.
- the output of FIG. 2 starts to be received 305.
- the next data element of this input 310 is one or more contiguous bytes of the output of FIG 2.
- the value of this data element is looked up in the external codec dictionary and its dictionary reference identified 315.
- the lookup process may entail a loop starting with first selecting the next byte of input, finding a dictionary instance of the same value, adding the value of the next byte again of input to the lookup string which is now two bytes long, looking up the dictionary again, and repeating this loop until a dictionary entry is not found as indicated in W098/39723.
- the algorithm emits as output or stores as output the found dictionary reference 320, which is the address within the dictionary of the dictionary instance of the selected input data element value, or of one of the dictionary instances of the selected input data element value in the case of a dictionary that contains more than one instance of the value of the selected input data element.
- the output of FIG. 3 is a sequence of references, or addresses, to places inside the external codec dictionary which dictionary in the terminology of cryptology is the encryption key.
- the atomic unit of input processing is the dictionary reference.
- the next reference is received 405, then a function is applied to the reference value that uses a value unique to the current processing session 410. For example, the product or XOR of two random numbers one generated during the current processing session and the other at dictionary creation time which is used to modify the reference in a reversible manner and the reversing algorithm requires access to the two random numbers and method of combination.
- the modified reference is stored or emitted as output 415. This loop continues 420, N until all references are processed 420, Y.
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NZ15712067 | 2015-09-09 | ||
PCT/IB2016/055256 WO2017042676A1 (en) | 2015-09-09 | 2016-09-02 | A method of protecting data using compression algorithms |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3348017A1 true EP3348017A1 (en) | 2018-07-18 |
EP3348017A4 EP3348017A4 (en) | 2019-07-17 |
Family
ID=62527706
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP16843763.0A Withdrawn EP3348017A4 (en) | 2015-09-09 | 2016-09-02 | A method of protecting data using compression algorithms |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP3348017A4 (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5479512A (en) * | 1991-06-07 | 1995-12-26 | Security Dynamics Technologies, Inc. | Method and apparatus for performing concryption |
CN102970530B (en) * | 2012-10-23 | 2015-06-03 | 重庆大学 | Graphic interchange format (GIF) image encryption method based on compressed encoding |
US10417187B2 (en) * | 2013-06-03 | 2019-09-17 | Brown University | Secure compression |
-
2016
- 2016-09-02 EP EP16843763.0A patent/EP3348017A4/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
EP3348017A4 (en) | 2019-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sharma et al. | Data security using compression and cryptography techniques | |
US6122379A (en) | Method and apparatus for performing simultaneous data compression and encryption | |
CA2723319C (en) | A closed galois field cryptographic system | |
US20070028088A1 (en) | Polymorphic encryption method and system | |
US7868788B2 (en) | System and method for encoding data based on a compression technique with security features | |
Rahim | Combination of the Blowfish and Lempel-Ziv-Welch algorithms for text compression | |
WO1997010659A1 (en) | Method and device for compressing and ciphering data | |
US20190036543A1 (en) | A Method of Protecting Data Using Compression Algorithms | |
JP2005217842A (en) | Data compression method, data restoration method, and program thereof | |
US7003111B2 (en) | Method, system, and program, for encoding and decoding input data | |
Duda et al. | Lightweight compression with encryption based on asymmetric numeral systems | |
Pande et al. | Using chaotic maps for encrypting image and video content | |
Begum et al. | An efficient and secure compression technique for data protection using burrows-wheeler transform algorithm | |
Zhang et al. | Secure binary arithmetic coding based on digitalized modified logistic map and linear feedback shift register | |
Kodabagi et al. | Multilevel security and compression of text data using bit stuffing and huffman coding | |
Duan et al. | A secure arithmetic coding based on Markov model | |
Mukesh et al. | Enhancing AES algorithm with arithmetic coding | |
KR101048661B1 (en) | Method, apparatus and computer readable recording medium for compression and encryption operations on data | |
EP3348017A1 (en) | A method of protecting data using compression algorithms | |
Brindhashree et al. | Data security based on cryptography steganography combined with OTP algorithm and Huffman coding in the cloud environment | |
Gbashi | Text Compression & Encryption Method Based on RNA and MTF | |
Sagheer et al. | Ensure security of compressed data transmission | |
Zhou et al. | Joint security and performance enhancement for secure arithmetic coding | |
Huang et al. | A secure arithmetic coding algorithm based on integer implementation | |
Stanek | Attacking scrambled burrows-wheeler transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20180409 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20190617 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H03M 7/30 20060101ALI20190611BHEP Ipc: H04L 9/06 20060101AFI20190611BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20210517 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20211130 |