WO2020104858A1 - Data security and obfuscation using extremely large integers - Google Patents
Data security and obfuscation using extremely large integersInfo
- Publication number
- WO2020104858A1 WO2020104858A1 PCT/IB2019/051144 IB2019051144W WO2020104858A1 WO 2020104858 A1 WO2020104858 A1 WO 2020104858A1 IB 2019051144 W IB2019051144 W IB 2019051144W WO 2020104858 A1 WO2020104858 A1 WO 2020104858A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- remainder
- input
- cell
- steps
- Prior art date
Links
- 230000014509 gene expression Effects 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 12
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 11
- 230000004224 protection Effects 0.000 claims abstract 5
- 239000003550 marker Substances 0.000 claims description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000002085 persistent Effects 0.000 claims description 2
- 241001274660 Modulus Species 0.000 claims 1
- 239000000470 constituent Substances 0.000 claims 1
- 238000009795 derivation Methods 0.000 claims 1
- 238000003780 insertion Methods 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 6
- 230000001131 transforming Effects 0.000 abstract description 6
- 230000036961 partial Effects 0.000 abstract description 2
- 238000000844 transformation Methods 0.000 abstract 1
- 238000005192 partition Methods 0.000 description 12
- 239000000047 product Substances 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 230000000875 corresponding Effects 0.000 description 1
- 230000003247 decreasing Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000873 masking Effects 0.000 description 1
- 229920000136 polysorbate Polymers 0.000 description 1
- 230000002633 protecting Effects 0.000 description 1
- 230000002441 reversible Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0894—Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
Abstract
A computer-readable device comprising of instructions to perform data protection and obfuscation by utilizing the capabilities of modern computing systems to perform mathematical operations on extremely large integers. The computing device can be used for partial encryption and protection of disk files and cloud data. The device comprises of instructions to convert raw data to a large integer followed by a series of mathematical transformations to break down the large integer into much simpler components. The components are integers that can expand to large numbers when combined with mathematical functions. A binary expression tree is created to represent the mathematical decomposition of input. A subset of the post order traversal of this expression tree is encrypted with a cipher and the result recorded to disk. Large data is broken down into sequences of integers. The process of recovering the original data is done through a stack-based evaluation of traversal information.
Description
Data Security and Obfuscation using Extremely Large Integers
PREAMBLE TO THE DESCRIPTION
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
Software tools for data security and obfuscation. Specifically, a method for pro tecting data using fast manipulations of extremely large integers.
BACKGROUND
Data obfuscation refers to a general transformation of data whereas data encryp tion is transformation of data with respect to a secret key.
The current state-of-the-art in data security heavily uses byte-based manipula tion of data. Data transformation in symmetric encryption algorithms is achieved by repeated rounds of data substitution using lookup tables. Similarly, asymmet ric encryption uses ordinary integers for modular arithmetic. This results in a very slow process for mangling and deciphering large amounts of data.
The common techniques for data obfuscation are masking, substitutions, nulling, data shuffling and mapping rules. These techniques work on individual fields of data without affecting the underlying structure and relationships between data items.
SUMMARY
A computing device that deals with a multitude of mechanisms for transforming raw data into non-decipherable forms using large integer arithmetic. Very large data is broken down into smaller segments, with each segment being treated as a large integer. Each large integer corresponding to a segment is transformed into a sum-of-products format using a deterministic function of input. Such decom positions of large integers by application software needs compact representations of large integers.
During the process of decomposing data, a binary expression tree is created for each segment of data. A post order traversal of each expression tree is per formed. By this stage, data has assumed a form that is significantly different from its original structure. The next step is to secure data by encrypting initial portion of each segment of data using a key. The partially encrypted contents of consecutive segments are appended one after another to an output file. Such par tial encryption prevents recalculation of original value.
The reverse of the above process is used to reconstruct the original input: a pass word is used to decrypt partially encrypted components followed by evaluation and concatenation of individual segments.
DESCRIPTION OF DIAGRAMS
Figure 1
This figure introduces a two-dimensional grid characterized by horizontal axis increasing from left to right and vertical axis increasing from top to bottom.
The horizontal axis is marked at powers-of-two positions and can extend indefi nitely in positive and negative directions.
The vertical axis is also marked at powers-of-two positions and can extend in definitely in positive and negative directions.
Each cell contains the product of its x and y coordinates.
Figure 2
This figure shows positive-slope lines originating from powers-of-two positions within the grid.
Any cell (x, y) that has both x and y as powers-of-two values with x > 1 and y >
1 can be the origin of a positive-slope line. Positive-slope lines have increasing y coordinates when moving from left-to-right.
Figure 3
This figure shows negative-slope lines originating from powers-of-two positions within the grid.
Any cell (x, y) that has both x and y as powers-of-two values with x > 1 and y >
1 can be the origin of a negative-slope line. Negative-slope lines have decreasing y coordinates when moving from left-to-right.
Figure 4
This figure shows the points of intersection of positive-slope and negative-slope lines. Since positive and negative-slope lines can extend indefinitely, there are many possibilities to form points-of-intersection.
The tuple <+vl, -h3> represents the negative-slope line originating from point (2, 8) and extending indefinitely in the positive direction.
The tuple <-vl, +h3> represents the negative-slope line originating from point (2, 8) and extending indefinitely in the negative direction.
The tuple <+vl, +h3> represents the positive-slope line originating from point (2, 8) and extending indefinitely in the positive direction.
The tuple <-vl, -h3> represents the positive-slope line originating from point (2, 8) and extending indefinitely in the negative direction.
Figure 5
This figure (shown in landscape) is a 256 x 256 plot of the binary grid depicting horizontals, verticals, diagonal lines, and points of intersections.
The region bounded by two vertical lines can be treated as a partition of the grid. So, all points with x coordinates such that 128 <= x < 256 form a partition. Simi larly, restricting y values within a partition can give rise to blocks of points:
Block 0 are points within the partition 128 <= x < 256 where 0 < y <= 128.
Block 1 are points within the partition 128 <= x < 256 where 128 < y <= 256.
Block 2 are points within the partition 128 <= x < 256 where 256 < y <= 512.
Block N (negative) are points within the partition 128 <= x < 256 where y <= 0.
This indicates that it is possible to divide the region with positive y coordinates into rectangular blocks of points bounded by lines with power-of-two origins.
Figure 6
This figure shows block 0 and block 1 of partition 128 <= x < 256. Further, within block 0 there are clusters of point of intersections:
The first cluster is formed by the intersection of positive-slope lines from verti cal line 128 with negative-slope lines.
The second cluster is formed by the intersection of positive-slope lines from ver tical line 64 with negative-slope lines.
The third cluster is formed by the intersection of positive-slope lines from verti cal line 32 with negative-slope lines.
The fourth cluster is formed by the intersection of positive-slope lines from ver tical line 16 with negative-slope lines.
The fifth cluster is formed by the intersection of positive-slope lines from verti cal line 8 with negative-slope lines.
The sixth cluster is formed by the intersection of positive-slope lines from verti cal line 4 with negative-slope lines.
The seventh cluster is formed by the intersection of positive-slope lines from vertical line 2 with negative-slope lines.
The highest density of points-of-intersection for any partition lies in block 0. Further, the left half of block 0 is denser than its right half.
Figure 7
This figure shows the first few blocks of the partition 32 < x <= 64. It shows how diagonal lines get thinner while going deep down into a partition.
Figure 8
Flowcharts 8A, 8B, and 8C combined explain the various steps involved in XDivision algorithm.
Figure 9
This flowchart explains how to select the divisor and the point-of-intersection that upon multiplication most closely approximates the input value.
Figure 10
This flowchart explains the subroutine for computing the nearest point-of-inter- section less than input.
Figure 11
This flowchart explains a loopless technique to find diagonal lines passing in the immediate vicinity of a given cell with diagonals having identical x-origins.
Figure 12
This flowchart shows how the XEvaluation algorithm recovers the contents of a partially encrypted file.
PETATUEP DESCRIPTION
A computer readable device containing software that enables treatment of data as sequences of large binary numbers. Data can be transformed into different forms using Mathematical operations on extremely large integers. This enables data transformation in a format-invariant manner.
XDivision is an algorithm that can decompose large integers using a variety of mathematical operations related to exponentials, divisions, Mersenne numbers, and large numbers identified as points-of-intersections in a grid.
XEvaluation is an algorithm to recover data by parsing numbers and operators for use with a stack-based evaluation (see Figure 12).
Thus, the input value may be transformed using any possible combination of mathematical operations like the ones shown in the expressions below:
FXO(input) = factor 1 * number 1 + factor2 * number2 + ... + factorN * numberN + remainder
Or
FXl(input) = FXO(input) + (2m - 1) / number + kJ + remainder, where m, k, and j are 32-bit numbers.
Using a Binary Grid for XDivision
A binary two-dimensional grid as depicted in Figure 1 can be used to impose structure on raw data by using markers in two-dimensional space.
Diagonal lines in the grid (Figure 2 and Figure 3) can be represented by a pair of numbers, where each number is related to a power-of-two. It suffices to use four numbers, where each number is derived from a power-of-two, to represent the intersection of two diagonal lines. A point of intersection can be called a marker.
In Figure 4, a positive-slope line originates from point (16, 16) whereas a nega tive-slope diagonal originates from (16, 2) to intersect at the cell containing the value 207. The cell with the content 207 = 23 x 9 is a marker within the grid.
The value of a point of intersection, or a marker, is taken be the value of its x-co- ordinate. The value of cell containing 207 is 23. The cell (23, 9) can be repre sented by a tuple <4, 4, 4, 1> using just the exponents from power-of-two val ues. This allows simplification of certain extremely large integers to quadruples of 32-bit numbers.
A nested sequence of grids can be established by basing inner grids at marker positions within the preceding grids thereby giving rise to more complex schemes for data mangling.
XDivision algorithm comprises of selecting divisors and computing markers to generate a mathematical sum-of-products decomposition of data (see Figure 8A, Figure 8B, and Figure 8C).
A list of potential divisors is examined to find the divisor that most closely ap proximates the current input. An integer quotient is obtained by dividing the cur rent remainder by the candidate divisor. Next, the point-of-intersection nearest to this integer quotient is computed. The x-value of the point-of-intersection is multiplied with the candidate divisor to get the first approximation. This process is repeated for other divisors in the list to arrive at the best approximation (see Figure 9). The remainder is updated by subtracting the best approximation. Re peating this process until the remainder becomes small leads to a sum-of-prod- ucts: marker 1 * divisor 1 + marker2 * divisor2 + ... + remainder = input.
To speed up the checking of divisors, a reverse lookup data structure is utilized. A reverse lookup table contains pointers into a list of divisors; these pointers
help in skipping over some divisors. Figure 5 and Figure 6 show large gaps be tween points-of-intersection. Divisors that always produce quotients lying within the same gap can be skipped over with the help of reverse indices.
Figure 7 shows even more places for data obfuscation: There are points-of-inter section in deeper blocks that have the same value as some markers in upper blocks including the negative region. A data mangling algorithm can use a map ping scheme to replace equivalent marker points.
Alternatively, XDivision can construct divisors on the fly by selecting bits from random positions of input value. If‘r’ denotes the number of bits in some binary input x,‘p’ the number of bits in the divisor, and‘q’ the number of bits in the quotient, then | p + q - r | <= 1. This permits construction of divisors of different sizes.
Another way of utilizing a binary grid is to use distances between diagonal lines to break down a large integer into a sum of parts. A pair of horizontal and verti cal lines can be made to guide a moving cell until it is directly over a marker. This is done by taking the cell as close to horizontal and vertical lines as possi ble. The output file will only record the origins of the diagonal lines and the marker. Decoding is achieved through reversal of these steps.
Data obfuscation can also be done with Mersenne numbers, which are integers with all I’s in their binary representation. A sufficiently large number can be constructed by dividing a Mersenne value by a 32-bit number. This number can be used in arithmetic operations along with the current remainder.
Exponential values provide yet another way for data mangling. This is done by constructing a number kJ where the base‘k’ is derived from input and the expo nent‘j’ is determined using logarithmic and floor calculations.
The preferred embodiment of this device incorporates exponential numbers, Mersenne numbers, divisors constructed from input values, divisor lists, and ref erences to points-of-intersection and diagonal lines in a grid.
Calculation of the Nearest Marker
Consider the partition [250, 251) which has the vertical line 250 as its left bound ary and the vertical line 251 as its right boundary.
Block 0 is defined as all points for which 0 < y < 250 and 250 <= x < 251.
The positive-slope lines entering block 0 have origins in the vertical lines 250, 249, 248, ..., 2. The regular expression“< [+-]?v[0-9] +, [+-]?h[0-9] +>” incorpo rates vertical and horizontal origins of diagonal lines, with the numbers standing for exponent value of the origin when expressed in a power-of-two form.
The set of negative-slope lines in block 0 are N = {<v50, -hl>, <v50, -h2>, ..., <v50, -h49>, <v50, -h50>, <-v51, hl>, <-v51, h2>, <-v51, h3>, <-v51, h4>, ..., <-v51, h48>, <-v51, h49>}.
The set of positive-slope lines in block 0 originating from the vertical 250 is {<v50, hl>, <v50, h2>, <v50, h3>, ..., <v50, h49>}.
The set of positive-slope lines in block 0 originating from the vertical 249 is {<v49, hl>, <v49, h2>, <v49, h3>, ..., <v49, h48>}.
The set of positive-slope lines in block 0 originating from the vertical 22 is {<v2, hl>, <v2, h2>, <v2, h3>, ..., <v2, h48>}.
The set of positive-slope lines in block 0 originating from the vertical 21 is {<vl, hl>, <vl, h2>, <vl, h3>, ..., <vl, h48>}.
For a given cell (x, y) in block 0, the nearest neighboring positive-slope lines can be determined by projecting the cell onto the nearest vertical on its left, in the di rection parallel to positive lines. Then the y-intercept of this line is evaluated in comparison to the intercepts of diagonal lines:
Positive Line y-intercept Simplified y-intercept
<v50, hl> 0 + 21 2
<v50, h2> 0 + 22 4
<v50, h3> 0 + 23 8
<v50, h49> 0 + 249 249
<v49, hl> 249 + 21 (250 - 249) + 2
<v49, h2> 249 + 22 (250 - 249) + 22
<v49, h3> 249 + 23 (250 - 249) + 23
<v49, h48> 249 + 248 (250 . 249) + 248 <v48, hl> 249 + 248 + 21 (250 . 24 ) + 2 i
<v48, h2> 249 + 248 + 22 (250 . 248) + 22
<v48, h3> 249 + 248 + 23 (250 . 248) + 23
<v48, h47> 249 + 248 + 247 (250 . 248) + 247
<v3, hl> (249 + 248 + 247 + ... + 23) + 21 (250 - 23) + 21
<v3, h2> (249 + 248 + 247 + ... + 23) + 22 (250 - 23) + 22
<v2, hl> (249 + 248 + 247 + ... + 22) + 21 (250 - 22) + 2
<vl, hl> 25° 25°
Let ylntercept > 0 denote the y intercept of a cell when projected onto its nearest left vertical. Define yO = 250 - ylntercept and rlndex= ceil(log2(y0)). From the expressions above, 2rIndex is the vertical. Define yl = 2rIndex - yO to calculate flndex = floor(log2(yl)) and clndex = ceil(log2(yl)). 2flndex is lower bound and 2dndex js tbe Upper bound line (see Figure 11).
The above technique gives the nearest positive-slope lines around a cell. Simi larly, the nearest negative-slope lines around a cell are determined by taking pro jections onto the nearest vertical.
Figure 6 and Figure 7 indicate that the nearest marker less than an input value must be located deeper down block 0; the positive-slope lines are more numer ous and closer together in deeper region of block 0. Select the negative slope line that is farthest down in block-0 as the line on which the nearest point-of-in- tersection is located. The cell on this negative-slope line that has same x coordi nate as input is used as a reference point to compute the upper and lower bound ing positive-slope lines. The point of intersection of negative-slope line with up per bounding positive-line gives the nearest marker (see Figure 10).
XDivision Algorithm
Step 1 : Load the list of divisors and reverse indices table from data files.
Step 2: Determine the size of next data segment.
Step 3 : Read raw data from input file.
Step 4: If data read can be represented by a large integer, then go to Step 6.
Step 5: Apply padding to convert raw data to an extremely large integer.
Step 6: Initialize remainder to large integer representing the data.
Step 7: Set current approximation to 0.
Step 8. Initialize expression tree for storing numbers, operators, and markers. Step 9: If the remainder is small, go to Step 14.
Step 10: Find marker and divisor that most closely matches the remainder.
Step 11 : Insert marker, divisor, and operator nodes into the expression tree. Step 12: Set remainder = remainder - (point-of-intersection * divisor).
Step 13: Go to Step 9.
Step 14. Add remainder and operator nodes to the expression tree.
Step 15. Perform post order traversal of the expression tree.
Step 16. Encrypt a prefix of traversal results with a known cipher.
Step 17. Reconfigure encrypted and obfuscated data into contiguous memory. Step 18. Write partially encrypted buffer to output file.
Step 19. Steps 2 thru 18 for each segment of input file.
SubAlgorithm for Determining Point-of-Intersection and Divisor
Step 1 : Select the first divisor from the list of divisors.
Step 2: Calculate integer quotient obtained by dividing remainder by divisor. Step 3 : Compute the nearest point-of-intersection that is less than the quotient. Step 4: Multiply point-of-intersection with divisor to get new approximation. Step 5: If new approximation is better than the existing one, then go to Step 7. Step 6: Go to Step 8.
Step 7: Save point-of-intersection, divisor, and approximation
Step 8: Repeat Step 2 thru Step 5 for next divisor in the list.
SubAlgorithm to Find the Nearest Point of Intersection
Step 1 : Compute clndex = ceiling(log2(x)) for large integer x.
Step 2: Compute flndex = floor(log2(x)) for large integer x
Step 3 : Compute origin of nearest vertical on right side as 2clndex.
Step 4: Compute origin of nearest vertical on left side as 2flndex.
Step 5: Project referential cell (x, x) onto right vertical along negative lines. Step 6: Compute y-intercept = x - (2clndex- x) as the intercept from Step 5. Step 7: Compute rlndex = ceil(log2(y-intercept)).
Step 8: Compute the negative slope line as having the origin: (2clndex, 2rIndex). Step 9: Compute the cell (x, y) located on the negative-slope line of Step 8.
Step 10: Project cell (x, y) onto the vertical on left along positive- slope lines. Step 11: Use y-intercept with mathematical operations to compute positive lines. Step 12: Select the positive-slope line having lesser y-intercept.
Step 13: Compute the point-of-intersection of positive and negative-slope lines.
SubAlgorithm to Find the Nearest Positive-Slope Lines
Step 1 : Calculate distance of cell (x, y) from the nearest vertical line on its left. Step 2: Compute y-intercept by subtracting distance from y value of cell (x, y). Step 3 : Define yO = x-origin-of-vertical - ylntercept.
Step 4: Define rlndex= ceil(log2(y0)).
Step 5: 2rIndex is x-origin of positive-slope line.
Step 6: Define yl = 2rIndex- yO
Step 7: Compute flndex = floor(log2(yl)) and clndex = ceil(log2(yl)).
Step 8: 2flndex and 2clndex are y-origins of positive-slope lines.
Algorithm for Building a Reverse Indices Table
Step 1 : Read divisors‘listDivisors’ from a file.
Step 2: Let numDivisors be the count of divisors in listDivisors.
Step 3: Set rTable [0] = 0, rTable [1] = 0, rTable [2] = 1.
Step 4: Set tlndex = 3.
Step 5: if (tlndex > (numDivisors - 1)), then go to Step 12.
Step 6: If (tlndex is in listDivisors), then go to Step 9.
Step 7: rTable [tlndex] = rTable [tlndex - 1]
Step 8: Go to Step 10.
Step 9: rTable [tlndex] = rTable [tlndex - 1] + 1.
Step 10: Set tlndex = tlndex + 1.
Step 11 : Go to Step 5.
Step 12: Return from subroutine.
Algorithm for Constructing a 32-bit Divisor
Step 1 : Let numBits be the count of bits in the input.
Step 2: Set tResult = 0.
Step 3 : Repeat Step 4 thru Step 7 exactly 32 times.
Step 4: Get 4 bytes of random data and construct an integer‘tValue’.
Step 5: Calculate bitPosition as‘tValue mod numBits’.
Step 6: Extract bit value at position bitPosition.
Step 7: Insert bit value into rightmost bit position of tResult.
Step 8: Calculate and return absolute value of tResult.
Algorithm to Obfuscate Data Using Diagonal Lines
Step 1 : Convert raw data to a large integer by applying padding if necessary. Step 2: Select horizontal and vertical axes as reference lines.
Step 3: If there is a marker along x value, then go to Step 14.
Step 4: Set referential cell‘rCell’ to (x, 0) for input value of x.
Step 5: Determine negative-slope lines in the immediate vicinity of‘rCell’. Step 6: Select lower negative line i.e. the line with lower y-intercept.
Step 7: Update‘rCell’ to position (x, y) on the lower negative line.
Step 8: Write the origins of negative line to output file.
Step 9: Determine negative lines around cell (0, y) where y is from‘rCell’. Step 10: Select greater negative line i.e. the line with greater y-intercept.
Step 11 : Update‘rCell’ to position (x, y) on the greater negative line.
Step 12: Write the origins of greater negative line to output file.
Step 13: Go to Step 3.
Step 14: Write the quadruple representation of marker to output file.
Algorithm to Decode Data from References to Diagonal Lines
Step 1 : Read encoded data from input file.
Step 2: Interpret the first four numbers as origins of a point-of-intersection. Step 3: Initialize referential cell to the coordinates of the point-of-intersection.
Step 4: Update referential cell’s y-coordinate to lie on next diagonal line.
Step 5: Update referential cell’s x-coordinate to lie on next diagonal line.
Step 6: Repeat Step 4 and Step 5 until there is no more data to be processed.
Step 7. Convert x origin of referential cell to raw data by removing any padding.
XEvaluation Algorithm
Step 1 : Read metadata and encrypted bytes of current segment.
Step 2. Apply password for conversion of bytes to plain data.
Step 3 : Read obfuscated bytes into separate memory buffer.
Step 4. Reconfigure encrypted and obfuscated bytes into a contiguous buffer. Step 5: Initialize push down stack for postfix evaluation.
Step 6: Parse into numbers and operators while performing evaluation.
Step 7: Remove padding information from the result of evaluation.
Step 8: Append plain data to output file.
Step 9: Repeat Steps 1 thru Step 8 for each segment of the input file.
Algorithm for Obfuscating Data Using Mersenne Numbers Step 1 : Let‘numBits’ be the count of bits in the input.
Step 2: Let‘tDivisor’ be a 32-bit integer derived from input.
Step 3 : Construct a Mersenne number‘MValue’ = 2numBlts + 32 + 2 _ i
Step 4: Set‘quotient’ =‘MValue’ /‘tDivisor’.
Step 5: Set‘remainder’ =‘quotient’ -‘input’.
Algorithm for Obfuscating Data Using Exponential Numbers
Step 1 : Generate a 32-bit number‘k’ derived from bit positions of the input.
Step 2: Set k = k mod 100.
Step 3: Set k = k + 10.
Step 4: If k is a power-of-two, then k = k - 1.
Step 5: Compute j = floor(logk(x)) = floor(log2(x)/log2(k)).
Step 6: Set remainder = remainder - k>.
Computing Platforms for XDivision Implementations
XDivision can be implemented on computing platforms having support for big integer arithmetic. Microsoft’s .NET platform offers Biglnteger class with a wide range of facilities for operating on large integers; Java has Biglnteger class while Python offers BigNum data type. There are many open source libraries written in C and C++ for arbitrary precision arithmetic. Such software engines internally use advanced algorithms like the Fast Fourier Transform to speed up multiplication of large integers.
Intel’s Skylake X processors provide support for advanced vectorization AVX- 512 instructions for more efficient large integer arithmetic. XDivision can also be deployed on Intel Core processors. Advanced integer arithmetic libraries enjoy support on a wide range of operating systems from Windows 8.1, Windows 10, Mac OS, to Linux.
For practical reasons, there are limits imposed by available memory on language libraries for large integers. Most systems restrict the number of bits in a large in- teger to the maximum value of a 32-bit index. This forces XDivision algorithm to divide input into manageable chunks. Implementations will keep track of seg ment boundaries and adhere to a well-defined file format for recording to persis tent storage.
Claims
I Claim
Claim 1: A computer-readable storage device comprising of instructions that upon execution cause a computer to perform obfuscation, protection, and decod- ing of data using mathematical operations on extremely large integers.
Claim 2: The device as defined in claim 1 wherein said obfuscation and said protection of data using extremely large integers further comprises the steps of: determining the size of next segment of data from metadata; reading raw data from input file; if necessary, applying padding to convert raw data to a large in teger; decomposing the large integer into a mathematical expression; building a binary expression tree; traversing the binary tree and encrypting an initial por tion of traversal result with a cipher; creating a contiguous memory layout for storing encrypted and obfuscated portions of data segment; writing the partially encrypted segment to persistent storage; repeating the above steps for each data segment.
Claim 3: The device as defined in claim 2 wherein said decomposition of said large integer and said building of binary expression tree further comprises the steps of: imposing structure on the large integer using a grid containing horizon tal, vertical, diagonal lines, and their points of intersection; defining the value of point-of-intersection as the value of its x coordinate; loading a predefined set of divisors from a file and building a list of reverse indices; initializing a remainder to the large integer obtained from input; initializing an approximation to zero; initializing the expression tree for representing the decomposition; finding the point-of-intersection and the divisor that most closely matches the remainder; in serting the point-of-intersection, the divisor, and the operator nodes into the ex pression tree; updating the remainder by subtracting the approximation value; re peating the immediately preceding three steps of finding the marker and the divi sor combination, inserting the operands and the operators into the expression tree, and updating the remainder until the remainder becomes small; adding nodes for the final remainder and the associated operator to the expression tree.
Claim 4: The device as defined in claim 3 wherein said process of computing said point-of-intersection and said divisor combination further comprises the steps of: selecting the first divisor from the list of divisors; calculating a quotient as value obtained upon dividing the remainder by the divisor; computing the nearest point-of-intersection that is less than the quotient; multiplying the value of the point-of-intersection with the divisor for a new approximation; saving the point-of-intersection, the divisor, and the new approximation if new approxima tion is better than existing approximation; using a reverse indices table to lookup the next potential divisor; and, repeating all of the preceding steps for the next divisor to be processed.
Claim 5: The device as defined in claim 4 wherein said computation of said nearest point-of-intersection further comprises the steps of: computing the near est vertical lines having power-of-two origins on left and right sides of the large integer; setting up a reference cell whose both coordinates are equal to the large integer; projecting the cell in the direction parallel to negative slope lines to ob tain a y-intercept; using the y-intercept to compute negative-slope lines originat ing from the vertical on right side; saving the origins of the negative-slope line with lesser y-intercept; updating the referential cell to be located on the saved negative-slope line at x coordinate equal to the large integer; projecting the refer ential cell onto the vertical on left side in the direction parallel to positive-slope lines to compute the y-intercept; determining the positive-slope lines passing in the immediate vicinity of the referential cell using the y-intercept; recording the positive-slope line with lesser y-intercept; computing the point of intersection of the recorded positive and negative-slope lines.
Claim 6: The device as defined in claim 5 wherein said computation of said nearest positive-slope lines passing around said referential cell in the grid further comprises the steps of: calculating the distance between the referential cell and the nearest vertical on left; subtracting distance from the y-coordinate of the ref erential cell to compute y-intercept; computing a variable yO by subtracting the y-intercept from the x-origin of the vertical line; taking the logarithm to base 2 of variable yO and then the ceiling of the result to compute an integer rlndex; computing the x-origin of positive-slope lines as 2-raised-to-rIndex; computing a variable yl by subtracting yO from x-origin of positive-slope lines; computing the logarithm of yl with respect to base 2 and then using the ceiling and floor functions to compute the integers flndex and clndex; computing 2-raised-to- flndex and 2-raised-to-cIndex to determine the y-origins of positive-slope lines.
Claim 7: The device as defined in claim 2 wherein said decomposition of said large integer further comprises the steps of: initializing a remainder to the large integer associated with input; initializing an approximation value to zero; initial izing an expression tree for representing the decomposition; deriving an integer from bit values at random positions of input data; calculating a quotient by di viding the remainder by the input-derived integer; computing the nearest point- of-intersection less than the quotient; calculating the approximation value as the product of point-of-intersection with the divisor; updating the remainder by sub tracting the approximation value from its existing value; inserting the point-of- intersection, the divisor, and the operator nodes into the expression tree; repeat ing the preceding steps of deriving integers, calculating quotients, determining points-of-intersection, calculating approximation values, inserting nodes into the expression tree, and updating the remainder until the remainder becomes too small; adding nodes for the final remainder and the associated operator into the expression tree.
Claim 8: The device as defined in claim 7 wherein said derivation of said inte ger from said input further comprises the steps of: calculating the number of bits in the input; using a pseudo random number generator to obtain bytes of random data and subsequently converting this data into an integer; computing the modu lus of the random integer with respect to the number of bits; using the value of modulus as location index to derive a bit value; inserting the bit value into a number; repeating‘n’ number of times the steps of random data generation, modulus calculation, selection and insertion of bits to construct a n-bit divisor.
Claim 9: The device as defined in claim 2 wherein said decomposition of said large integer further comprises the steps of: initializing a remainder to the large integer associated with the input; initializing an approximation value to zero; ini tializing an expression tree for representing the decomposition; calculating the number of bits in the input; deriving a n-bit number from random positions of the large integer; dividing a sufficiently large Mersenne number large by the n- bit number to obtain a quotient; updating the remainder by performing arithme tic operations between the quotient and the remainder; repeating the preceding steps of computing the number of bits and deriving a number from input posi tions, dividing Mersenne number by the derived number, and updating the re mainder until the remainder becomes too small; adding nodes for the remainder and the relevant operator to the expression tree.
Claim 10: The device as defined in claim 2 wherein said decomposition of said large integer further comprises the steps of: initializing a remainder to the large integer associated with the input; initializing an approximation value to zero; ini tializing an expression tree for representing the decomposition; generating a 32- bit number as a function of input bits; computing the modulus of the 32-bit num ber with respect to 100; adding a constant to the number computed in the preced ing step to make it greater than 10; subtracting a constant from the number ob tained in the preceding step if the number was a power-of-two; saving the num ber in the immediately preceding step as the‘base’; computing the logarithm of the remainder with respect to the‘base’ and then applying the floor function to obtain the‘exponent’; computing the value‘base’ raised to the‘exponent’ and subtracting from the remainder; repeating the preceding steps related to compu tation of the base, the exponent, and the remainder until the remainder becomes too small; adding nodes for the final remainder and the associated operator to the expression tree.
Claim 11: The device as defined in claim 1 wherein said decoding of data fur ther comprises the steps of: reading metadata including information on any pad ding; reading encrypted bytes of the current segment of input; using a password and cipher algorithm to decrypt bytes; reading the plain bytes of the current seg ment; reconfiguring decrypted and plain bytes into a contiguous memory layout; initializing a push down stack; parsing input bytes into numbers, points, and op erators while evaluating the traversal information using push and pop operations; removing any padding information for conversion to raw data; appending the data to output file; repeating the preceding steps until all input segments have been processed.
Claim 12: The device as defined in claim 1 wherein said obfuscation and protec tion of data using extremely large integers further comprises the steps of: defin ing a binary grid characterized by diagonal lines having powers-of-two origins; converting raw input to a large integer by applying padding if necessary; initial izing a referential cell to have x coordinate equal to the large integer associated with input; selecting a horizontal and a vertical as reference lines in the binary grid; determining diagonal lines passing in the immediate vicinity of the hori zontal reference lines; moving the referential cell as close to the horizontal line as possible using the diagonals; writing the origins of the diagonal line in the im mediately preceding step to output file; determining the diagonal lines in the im mediate vicinity of the vertical reference lines; moving the referential cell as close to the vertical reference as possible using diagonal lines; writing the ori gins of the diagonal line in the immediately preceding step to output file; repeat ing the process of moving the referential cell closer to horizontal and vertical lines until it is positioned directly over a marker; writing the quadruple repre senting the marker to output file.
Claim 13: The device as defined in claim 1 wherein said decoding of data fur ther comprises the steps of: reading data from the input file; interpreting the first four numbers as the constituents of a quadruple representing a point-of-intersec- tion; initializing the referential cell to the coordinates of the point-of-intersec- tion; updating the referential cell’s y coordinate to lie on next diagonal line; up dating the referential cell’s x coordinate to lie on next diagonal line; repeating the steps of updating the y coordinate and x coordinate of referential cell until there is no more data to be processed; converting the x-coordinate of referential cell to raw data by removing any padding.
Claim 14: A memory for storing data for access by software being executed on an information processing system, comprising: a data structure stored in the memory, the data structure including a list of positive numbers in sorted order and a second data structure as an array such that the value at any index of the second data structure helps locate, in constant time 0(1), the immediate succes sor of that index in the list of sorted integers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3120417A CA3120417A1 (en) | 2018-11-20 | 2019-02-13 | Data security and obfuscation using extremely large integers |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN201841043588 | 2018-11-20 | ||
IN201841043588 | 2018-11-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020104858A1 true WO2020104858A1 (en) | 2020-05-28 |
Family
ID=70773536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2019/051144 WO2020104858A1 (en) | 2018-11-20 | 2019-02-13 | Data security and obfuscation using extremely large integers |
Country Status (2)
Country | Link |
---|---|
CA (1) | CA3120417A1 (en) |
WO (1) | WO2020104858A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9021263B2 (en) * | 2012-08-31 | 2015-04-28 | Cleversafe, Inc. | Secure data access in a dispersed storage network |
US9245148B2 (en) * | 2009-05-29 | 2016-01-26 | Bitspray Corporation | Secure storage and accelerated transmission of information over communication networks |
-
2019
- 2019-02-13 CA CA3120417A patent/CA3120417A1/en active Pending
- 2019-02-13 WO PCT/IB2019/051144 patent/WO2020104858A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9245148B2 (en) * | 2009-05-29 | 2016-01-26 | Bitspray Corporation | Secure storage and accelerated transmission of information over communication networks |
US9021263B2 (en) * | 2012-08-31 | 2015-04-28 | Cleversafe, Inc. | Secure data access in a dispersed storage network |
Also Published As
Publication number | Publication date |
---|---|
CA3120417A1 (en) | 2020-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2020532771A (en) | High-precision privacy protection real-valued function evaluation | |
US10778410B2 (en) | Homomorphic data encryption method and apparatus for implementing privacy protection | |
CN108829899B (en) | Data table storage, modification, query and statistical method | |
JP6044738B2 (en) | Information processing apparatus, program, and storage medium | |
Wu et al. | Secure and efficient outsourced k-means clustering using fully homomorphic encryption with ciphertext packing technique | |
Mahdi et al. | Secure similar patients query on encrypted genomic data | |
CN105814833A (en) | Secure data transformations | |
Goodrich et al. | Data-oblivious graph drawing model and algorithms | |
Shatilov et al. | Solution for secure private data storage in a cloud | |
Karresand et al. | Using ntfs cluster allocation behavior to find the location of user data | |
WO2020104858A1 (en) | Data security and obfuscation using extremely large integers | |
JPWO2011013463A1 (en) | Range search system, range search method, and range search program | |
Basu et al. | Asymptotic normality of scrambled geometric net quadrature | |
WO2020145340A1 (en) | Secret array access device, secret array access method, and program | |
Hendrix et al. | On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs | |
JP4789536B2 (en) | Data division apparatus, data division method, and computer program | |
Hu et al. | New pseudo-planar binomials in characteristic two and related schemes | |
CN107667368B (en) | System, method and storage medium for obfuscating a computer program | |
US11281688B2 (en) | Ranking and de-ranking data strings | |
JPWO2019208486A1 (en) | Secret Aggregation Median System, Secret Computing Unit, Secret Aggregation Median Method, and Program | |
JP2020095437A (en) | Clustering device, clustering method, and clustering program | |
JP4924177B2 (en) | Program obfuscation device and program | |
JP2019184852A (en) | Data analysis server, data analysis system, and data analysis method | |
Matula et al. | A p× p bit fraction model of binary floating point division and extremal rounding cases | |
JP7173328B2 (en) | Secure division system, secure computing device, secure division method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19886193 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3120417 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19886193 Country of ref document: EP Kind code of ref document: A1 |