WO2020104858A1  Data security and obfuscation using extremely large integers  Google Patents
Data security and obfuscation using extremely large integersInfo
 Publication number
 WO2020104858A1 WO2020104858A1 PCT/IB2019/051144 IB2019051144W WO2020104858A1 WO 2020104858 A1 WO2020104858 A1 WO 2020104858A1 IB 2019051144 W IB2019051144 W IB 2019051144W WO 2020104858 A1 WO2020104858 A1 WO 2020104858A1
 Authority
 WO
 WIPO (PCT)
 Prior art keywords
 data
 remainder
 input
 cell
 steps
 Prior art date
Links
 230000014509 gene expression Effects 0.000 claims abstract description 25
 238000000034 method Methods 0.000 claims abstract description 12
 238000000354 decomposition reaction Methods 0.000 claims abstract description 11
 230000004224 protection Effects 0.000 claims abstract 5
 239000003550 marker Substances 0.000 claims description 18
 238000004422 calculation algorithm Methods 0.000 claims description 16
 238000004364 calculation method Methods 0.000 claims description 2
 238000006243 chemical reaction Methods 0.000 claims description 2
 230000002085 persistent Effects 0.000 claims description 2
 241001274660 Modulus Species 0.000 claims 1
 239000000470 constituent Substances 0.000 claims 1
 238000009795 derivation Methods 0.000 claims 1
 238000003780 insertion Methods 0.000 claims 1
 238000011156 evaluation Methods 0.000 abstract description 6
 230000001131 transforming Effects 0.000 abstract description 6
 230000036961 partial Effects 0.000 abstract description 2
 238000000844 transformation Methods 0.000 abstract 1
 238000005192 partition Methods 0.000 description 12
 239000000047 product Substances 0.000 description 4
 238000006467 substitution reaction Methods 0.000 description 2
 238000010276 construction Methods 0.000 description 1
 230000000875 corresponding Effects 0.000 description 1
 230000003247 decreasing Effects 0.000 description 1
 238000010586 diagram Methods 0.000 description 1
 230000000873 masking Effects 0.000 description 1
 229920000136 polysorbate Polymers 0.000 description 1
 230000002633 protecting Effects 0.000 description 1
 230000002441 reversible Effects 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
 G06F21/60—Protecting data
 G06F21/602—Providing cryptographic facilities or services

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
 H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
 H04L9/0894—Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
Abstract
A computerreadable device comprising of instructions to perform data protection and obfuscation by utilizing the capabilities of modern computing systems to perform mathematical operations on extremely large integers. The computing device can be used for partial encryption and protection of disk files and cloud data. The device comprises of instructions to convert raw data to a large integer followed by a series of mathematical transformations to break down the large integer into much simpler components. The components are integers that can expand to large numbers when combined with mathematical functions. A binary expression tree is created to represent the mathematical decomposition of input. A subset of the post order traversal of this expression tree is encrypted with a cipher and the result recorded to disk. Large data is broken down into sequences of integers. The process of recovering the original data is done through a stackbased evaluation of traversal information.
Description
Data Security and Obfuscation using Extremely Large Integers
PREAMBLE TO THE DESCRIPTION
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
Software tools for data security and obfuscation. Specifically, a method for pro tecting data using fast manipulations of extremely large integers.
BACKGROUND
Data obfuscation refers to a general transformation of data whereas data encryp tion is transformation of data with respect to a secret key.
The current stateoftheart in data security heavily uses bytebased manipula tion of data. Data transformation in symmetric encryption algorithms is achieved by repeated rounds of data substitution using lookup tables. Similarly, asymmet ric encryption uses ordinary integers for modular arithmetic. This results in a very slow process for mangling and deciphering large amounts of data.
The common techniques for data obfuscation are masking, substitutions, nulling, data shuffling and mapping rules. These techniques work on individual fields of data without affecting the underlying structure and relationships between data items.
SUMMARY
A computing device that deals with a multitude of mechanisms for transforming raw data into nondecipherable forms using large integer arithmetic. Very large data is broken down into smaller segments, with each segment being treated as a large integer. Each large integer corresponding to a segment is transformed into a sumofproducts format using a deterministic function of input. Such decom positions of large integers by application software needs compact representations of large integers.
During the process of decomposing data, a binary expression tree is created for each segment of data. A post order traversal of each expression tree is per formed. By this stage, data has assumed a form that is significantly different from its original structure. The next step is to secure data by encrypting initial portion of each segment of data using a key. The partially encrypted contents of consecutive segments are appended one after another to an output file. Such par tial encryption prevents recalculation of original value.
The reverse of the above process is used to reconstruct the original input: a pass word is used to decrypt partially encrypted components followed by evaluation and concatenation of individual segments.
DESCRIPTION OF DIAGRAMS
Figure 1
This figure introduces a twodimensional grid characterized by horizontal axis increasing from left to right and vertical axis increasing from top to bottom.
The horizontal axis is marked at powersoftwo positions and can extend indefi nitely in positive and negative directions.
The vertical axis is also marked at powersoftwo positions and can extend in definitely in positive and negative directions.
Each cell contains the product of its x and y coordinates.
Figure 2
This figure shows positiveslope lines originating from powersoftwo positions within the grid.
Any cell (x, y) that has both x and y as powersoftwo values with x > 1 and y >
1 can be the origin of a positiveslope line. Positiveslope lines have increasing y coordinates when moving from lefttoright.
Figure 3
This figure shows negativeslope lines originating from powersoftwo positions within the grid.
Any cell (x, y) that has both x and y as powersoftwo values with x > 1 and y >
1 can be the origin of a negativeslope line. Negativeslope lines have decreasing y coordinates when moving from lefttoright.
Figure 4
This figure shows the points of intersection of positiveslope and negativeslope lines. Since positive and negativeslope lines can extend indefinitely, there are many possibilities to form pointsofintersection.
The tuple <+vl, h3> represents the negativeslope line originating from point (2, 8) and extending indefinitely in the positive direction.
The tuple <vl, +h3> represents the negativeslope line originating from point (2, 8) and extending indefinitely in the negative direction.
The tuple <+vl, +h3> represents the positiveslope line originating from point (2, 8) and extending indefinitely in the positive direction.
The tuple <vl, h3> represents the positiveslope line originating from point (2, 8) and extending indefinitely in the negative direction.
Figure 5
This figure (shown in landscape) is a 256 x 256 plot of the binary grid depicting horizontals, verticals, diagonal lines, and points of intersections.
The region bounded by two vertical lines can be treated as a partition of the grid. So, all points with x coordinates such that 128 <= x < 256 form a partition. Simi larly, restricting y values within a partition can give rise to blocks of points:
Block 0 are points within the partition 128 <= x < 256 where 0 < y <= 128.
Block 1 are points within the partition 128 <= x < 256 where 128 < y <= 256.
Block 2 are points within the partition 128 <= x < 256 where 256 < y <= 512.
Block N (negative) are points within the partition 128 <= x < 256 where y <= 0.
This indicates that it is possible to divide the region with positive y coordinates into rectangular blocks of points bounded by lines with poweroftwo origins.
Figure 6
This figure shows block 0 and block 1 of partition 128 <= x < 256. Further, within block 0 there are clusters of point of intersections:
The first cluster is formed by the intersection of positiveslope lines from verti cal line 128 with negativeslope lines.
The second cluster is formed by the intersection of positiveslope lines from ver tical line 64 with negativeslope lines.
The third cluster is formed by the intersection of positiveslope lines from verti cal line 32 with negativeslope lines.
The fourth cluster is formed by the intersection of positiveslope lines from ver tical line 16 with negativeslope lines.
The fifth cluster is formed by the intersection of positiveslope lines from verti cal line 8 with negativeslope lines.
The sixth cluster is formed by the intersection of positiveslope lines from verti cal line 4 with negativeslope lines.
The seventh cluster is formed by the intersection of positiveslope lines from vertical line 2 with negativeslope lines.
The highest density of pointsofintersection for any partition lies in block 0. Further, the left half of block 0 is denser than its right half.
Figure 7
This figure shows the first few blocks of the partition 32 < x <= 64. It shows how diagonal lines get thinner while going deep down into a partition.
Figure 8
Flowcharts 8A, 8B, and 8C combined explain the various steps involved in XDivision algorithm.
Figure 9
This flowchart explains how to select the divisor and the pointofintersection that upon multiplication most closely approximates the input value.
Figure 10
This flowchart explains the subroutine for computing the nearest pointofinter section less than input.
Figure 11
This flowchart explains a loopless technique to find diagonal lines passing in the immediate vicinity of a given cell with diagonals having identical xorigins.
Figure 12
This flowchart shows how the XEvaluation algorithm recovers the contents of a partially encrypted file.
PETATUEP DESCRIPTION
A computer readable device containing software that enables treatment of data as sequences of large binary numbers. Data can be transformed into different forms using Mathematical operations on extremely large integers. This enables data transformation in a formatinvariant manner.
XDivision is an algorithm that can decompose large integers using a variety of mathematical operations related to exponentials, divisions, Mersenne numbers, and large numbers identified as pointsofintersections in a grid.
XEvaluation is an algorithm to recover data by parsing numbers and operators for use with a stackbased evaluation (see Figure 12).
Thus, the input value may be transformed using any possible combination of mathematical operations like the ones shown in the expressions below:
FXO(input) = factor 1 * number 1 + factor2 * number2 + ... + factorN * numberN + remainder
Or
FXl(input) = FXO(input) + (2^{m}  1) / number + k^{J} + remainder, where m, k, and j are 32bit numbers.
Using a Binary Grid for XDivision
A binary twodimensional grid as depicted in Figure 1 can be used to impose structure on raw data by using markers in twodimensional space.
Diagonal lines in the grid (Figure 2 and Figure 3) can be represented by a pair of numbers, where each number is related to a poweroftwo. It suffices to use four numbers, where each number is derived from a poweroftwo, to represent the intersection of two diagonal lines. A point of intersection can be called a marker.
In Figure 4, a positiveslope line originates from point (16, 16) whereas a nega tiveslope diagonal originates from (16, 2) to intersect at the cell containing the value 207. The cell with the content 207 = 23 x 9 is a marker within the grid.
The value of a point of intersection, or a marker, is taken be the value of its xco ordinate. The value of cell containing 207 is 23. The cell (23, 9) can be repre sented by a tuple <4, 4, 4, 1> using just the exponents from poweroftwo val ues. This allows simplification of certain extremely large integers to quadruples of 32bit numbers.
A nested sequence of grids can be established by basing inner grids at marker positions within the preceding grids thereby giving rise to more complex schemes for data mangling.
XDivision algorithm comprises of selecting divisors and computing markers to generate a mathematical sumofproducts decomposition of data (see Figure 8A, Figure 8B, and Figure 8C).
A list of potential divisors is examined to find the divisor that most closely ap proximates the current input. An integer quotient is obtained by dividing the cur rent remainder by the candidate divisor. Next, the pointofintersection nearest to this integer quotient is computed. The xvalue of the pointofintersection is multiplied with the candidate divisor to get the first approximation. This process is repeated for other divisors in the list to arrive at the best approximation (see Figure 9). The remainder is updated by subtracting the best approximation. Re peating this process until the remainder becomes small leads to a sumofprod ucts: marker 1 * divisor 1 + marker2 * divisor2 + ... + remainder = input.
To speed up the checking of divisors, a reverse lookup data structure is utilized. A reverse lookup table contains pointers into a list of divisors; these pointers
help in skipping over some divisors. Figure 5 and Figure 6 show large gaps be tween pointsofintersection. Divisors that always produce quotients lying within the same gap can be skipped over with the help of reverse indices.
Figure 7 shows even more places for data obfuscation: There are pointsofinter section in deeper blocks that have the same value as some markers in upper blocks including the negative region. A data mangling algorithm can use a map ping scheme to replace equivalent marker points.
Alternatively, XDivision can construct divisors on the fly by selecting bits from random positions of input value. If‘r’ denotes the number of bits in some binary input x,‘p’ the number of bits in the divisor, and‘q’ the number of bits in the quotient, then  p + q  r  <= 1. This permits construction of divisors of different sizes.
Another way of utilizing a binary grid is to use distances between diagonal lines to break down a large integer into a sum of parts. A pair of horizontal and verti cal lines can be made to guide a moving cell until it is directly over a marker. This is done by taking the cell as close to horizontal and vertical lines as possi ble. The output file will only record the origins of the diagonal lines and the marker. Decoding is achieved through reversal of these steps.
Data obfuscation can also be done with Mersenne numbers, which are integers with all I’s in their binary representation. A sufficiently large number can be constructed by dividing a Mersenne value by a 32bit number. This number can be used in arithmetic operations along with the current remainder.
Exponential values provide yet another way for data mangling. This is done by constructing a number k^{J} where the base‘k’ is derived from input and the expo nent‘j’ is determined using logarithmic and floor calculations.
The preferred embodiment of this device incorporates exponential numbers, Mersenne numbers, divisors constructed from input values, divisor lists, and ref erences to pointsofintersection and diagonal lines in a grid.
Calculation of the Nearest Marker
Consider the partition [2^{50}, 2^{51}) which has the vertical line 2^{50} as its left bound ary and the vertical line 2^{51} as its right boundary.
Block 0 is defined as all points for which 0 < y < 2^{50} and 2^{50} <= x < 2^{51}.
The positiveslope lines entering block 0 have origins in the vertical lines 2^{50}, 2^{49}, 2^{48}, ..., 2. The regular expression“< [+]?v[09] +, [+]?h[09] +>” incorpo rates vertical and horizontal origins of diagonal lines, with the numbers standing for exponent value of the origin when expressed in a poweroftwo form.
The set of negativeslope lines in block 0 are N = {<v50, hl>, <v50, h2>, ..., <v50, h49>, <v50, h50>, <v51, hl>, <v51, h2>, <v51, h3>, <v51, h4>, ..., <v51, h48>, <v51, h49>}.
The set of positiveslope lines in block 0 originating from the vertical 2^{50} is {<v50, hl>, <v50, h2>, <v50, h3>, ..., <v50, h49>}.
The set of positiveslope lines in block 0 originating from the vertical 2^{49} is {<v49, hl>, <v49, h2>, <v49, h3>, ..., <v49, h48>}.
The set of positiveslope lines in block 0 originating from the vertical 2^{2} is {<v2, hl>, <v2, h2>, <v2, h3>, ..., <v2, h48>}.
The set of positiveslope lines in block 0 originating from the vertical 2^{1} is {<vl, hl>, <vl, h2>, <vl, h3>, ..., <vl, h48>}.
For a given cell (x, y) in block 0, the nearest neighboring positiveslope lines can be determined by projecting the cell onto the nearest vertical on its left, in the di rection parallel to positive lines. Then the yintercept of this line is evaluated in comparison to the intercepts of diagonal lines:
Positive Line yintercept Simplified yintercept
<v50, hl> 0 + 2^{1} 2
<v50, h2> 0 + 2^{2} 4
<v50, h3> 0 + 2^{3} 8
<v50, h49> 0 + 2^{49} 2^{49}
<v49, hl> 2^{49} + 2^{1} (2^{50}  2^{49}) + 2
<v49, h2> 2^{49} + 2^{2} (2^{50}  2^{49}) + 2^{2}
<v49, h3> 2^{49} + 2^{3} (2^{50}  2^{49}) + 2^{3}
<v49, h48> 2^{49} + 2^{48} (_{2}50 . _{2}49) + _{2}48 <v48, hl> 2^{49} + 2^{48} + 2^{1} (_{2}50 . _{2}4 ) + _{2 i}
<v48, h2> 2^{49} + 2^{48} + 2^{2} (_{2}50 . _{2}48) + 22
<v48, h3> 2^{49} + 2^{48} + 2^{3} (_{2}50 . _{2}48) + 23
<v48, h47> 2^{49} + 2^{48} + 2^{47} (250 . _{2}48) + _{2}47
<v3, hl> (2^{49} + 2^{48} + 2^{47} + ... + 2^{3}) + 2^{1} (2^{50}  2^{3}) + 2^{1}
<v3, h2> (2^{49} + 2^{48} + 2^{47} + ... + 2^{3}) + 2^{2} (2^{50}  2^{3}) + 2^{2}
<v2, hl> (2^{49} + 2^{48} + 2^{47} + ... + 2^{2}) + 2^{1} (2^{50}  2^{2}) + 2
<vl, hl> _{2}5° _{2}5°
Let ylntercept > 0 denote the y intercept of a cell when projected onto its nearest left vertical. Define yO = 2^{50}  ylntercept and rlndex= ceil(log_{2}(y0)). From the expressions above, 2^{rIndex} is the vertical. Define yl = 2^{rIndex}  yO to calculate flndex = floor(log2(yl)) and clndex = ceil(log2(yl)). 2^{flndex} is lower bound and 2^{dndex} j_{s t}b_{e U}pp_{er} bound line (see Figure 11).
The above technique gives the nearest positiveslope lines around a cell. Simi larly, the nearest negativeslope lines around a cell are determined by taking pro jections onto the nearest vertical.
Figure 6 and Figure 7 indicate that the nearest marker less than an input value must be located deeper down block 0; the positiveslope lines are more numer ous and closer together in deeper region of block 0. Select the negative slope line that is farthest down in block0 as the line on which the nearest pointofin tersection is located. The cell on this negativeslope line that has same x coordi nate as input is used as a reference point to compute the upper and lower bound ing positiveslope lines. The point of intersection of negativeslope line with up per bounding positiveline gives the nearest marker (see Figure 10).
XDivision Algorithm
Step 1 : Load the list of divisors and reverse indices table from data files.
Step 2: Determine the size of next data segment.
Step 3 : Read raw data from input file.
Step 4: If data read can be represented by a large integer, then go to Step 6.
Step 5: Apply padding to convert raw data to an extremely large integer.
Step 6: Initialize remainder to large integer representing the data.
Step 7: Set current approximation to 0.
Step 8. Initialize expression tree for storing numbers, operators, and markers. Step 9: If the remainder is small, go to Step 14.
Step 10: Find marker and divisor that most closely matches the remainder.
Step 11 : Insert marker, divisor, and operator nodes into the expression tree. Step 12: Set remainder = remainder  (pointofintersection * divisor).
Step 13: Go to Step 9.
Step 14. Add remainder and operator nodes to the expression tree.
Step 15. Perform post order traversal of the expression tree.
Step 16. Encrypt a prefix of traversal results with a known cipher.
Step 17. Reconfigure encrypted and obfuscated data into contiguous memory. Step 18. Write partially encrypted buffer to output file.
Step 19. Steps 2 thru 18 for each segment of input file.
SubAlgorithm for Determining PointofIntersection and Divisor
Step 1 : Select the first divisor from the list of divisors.
Step 2: Calculate integer quotient obtained by dividing remainder by divisor. Step 3 : Compute the nearest pointofintersection that is less than the quotient. Step 4: Multiply pointofintersection with divisor to get new approximation. Step 5: If new approximation is better than the existing one, then go to Step 7. Step 6: Go to Step 8.
Step 7: Save pointofintersection, divisor, and approximation
Step 8: Repeat Step 2 thru Step 5 for next divisor in the list.
SubAlgorithm to Find the Nearest Point of Intersection
Step 1 : Compute clndex = ceiling(log2(x)) for large integer x.
Step 2: Compute flndex = floor(log2(x)) for large integer x
Step 3 : Compute origin of nearest vertical on right side as 2^{clndex}.
Step 4: Compute origin of nearest vertical on left side as 2^{flndex}.
Step 5: Project referential cell (x, x) onto right vertical along negative lines. Step 6: Compute yintercept = x  (2^{clndex} x) as the intercept from Step 5. Step 7: Compute rlndex = ceil(log2(yintercept)).
Step 8: Compute the negative slope line as having the origin: (2^{clndex}, 2^{rIndex}). Step 9: Compute the cell (x, y) located on the negativeslope line of Step 8.
Step 10: Project cell (x, y) onto the vertical on left along positive slope lines. Step 11: Use yintercept with mathematical operations to compute positive lines. Step 12: Select the positiveslope line having lesser yintercept.
Step 13: Compute the pointofintersection of positive and negativeslope lines.
SubAlgorithm to Find the Nearest PositiveSlope Lines
Step 1 : Calculate distance of cell (x, y) from the nearest vertical line on its left. Step 2: Compute yintercept by subtracting distance from y value of cell (x, y). Step 3 : Define yO = xoriginofvertical  ylntercept.
Step 4: Define rlndex= ceil(log_{2}(y0)).
Step 5: 2^{rIndex} is xorigin of positiveslope line.
Step 6: Define yl = 2^{rIndex} yO
Step 7: Compute flndex = floor(log2(yl)) and clndex = ceil(log2(yl)).
Step 8: 2^{flndex} and 2^{clndex} are yorigins of positiveslope lines.
Algorithm for Building a Reverse Indices Table
Step 1 : Read divisors‘listDivisors’ from a file.
Step 2: Let numDivisors be the count of divisors in listDivisors.
Step 3: Set rTable [0] = 0, rTable [1] = 0, rTable [2] = 1.
Step 4: Set tlndex = 3.
Step 5: if (tlndex > (numDivisors  1)), then go to Step 12.
Step 6: If (tlndex is in listDivisors), then go to Step 9.
Step 7: rTable [tlndex] = rTable [tlndex  1]
Step 8: Go to Step 10.
Step 9: rTable [tlndex] = rTable [tlndex  1] + 1.
Step 10: Set tlndex = tlndex + 1.
Step 11 : Go to Step 5.
Step 12: Return from subroutine.
Algorithm for Constructing a 32bit Divisor
Step 1 : Let numBits be the count of bits in the input.
Step 2: Set tResult = 0.
Step 3 : Repeat Step 4 thru Step 7 exactly 32 times.
Step 4: Get 4 bytes of random data and construct an integer‘tValue’.
Step 5: Calculate bitPosition as‘tValue mod numBits’.
Step 6: Extract bit value at position bitPosition.
Step 7: Insert bit value into rightmost bit position of tResult.
Step 8: Calculate and return absolute value of tResult.
Algorithm to Obfuscate Data Using Diagonal Lines
Step 1 : Convert raw data to a large integer by applying padding if necessary. Step 2: Select horizontal and vertical axes as reference lines.
Step 3: If there is a marker along x value, then go to Step 14.
Step 4: Set referential cell‘rCell’ to (x, 0) for input value of x.
Step 5: Determine negativeslope lines in the immediate vicinity of‘rCell’. Step 6: Select lower negative line i.e. the line with lower yintercept.
Step 7: Update‘rCell’ to position (x, y) on the lower negative line.
Step 8: Write the origins of negative line to output file.
Step 9: Determine negative lines around cell (0, y) where y is from‘rCell’. Step 10: Select greater negative line i.e. the line with greater yintercept.
Step 11 : Update‘rCell’ to position (x, y) on the greater negative line.
Step 12: Write the origins of greater negative line to output file.
Step 13: Go to Step 3.
Step 14: Write the quadruple representation of marker to output file.
Algorithm to Decode Data from References to Diagonal Lines
Step 1 : Read encoded data from input file.
Step 2: Interpret the first four numbers as origins of a pointofintersection. Step 3: Initialize referential cell to the coordinates of the pointofintersection.
Step 4: Update referential cell’s ycoordinate to lie on next diagonal line.
Step 5: Update referential cell’s xcoordinate to lie on next diagonal line.
Step 6: Repeat Step 4 and Step 5 until there is no more data to be processed.
Step 7. Convert x origin of referential cell to raw data by removing any padding.
XEvaluation Algorithm
Step 1 : Read metadata and encrypted bytes of current segment.
Step 2. Apply password for conversion of bytes to plain data.
Step 3 : Read obfuscated bytes into separate memory buffer.
Step 4. Reconfigure encrypted and obfuscated bytes into a contiguous buffer. Step 5: Initialize push down stack for postfix evaluation.
Step 6: Parse into numbers and operators while performing evaluation.
Step 7: Remove padding information from the result of evaluation.
Step 8: Append plain data to output file.
Step 9: Repeat Steps 1 thru Step 8 for each segment of the input file.
Algorithm for Obfuscating Data Using Mersenne Numbers Step 1 : Let‘numBits’ be the count of bits in the input.
Step 2: Let‘tDivisor’ be a 32bit integer derived from input.
Step 3 : Construct a Mersenne number‘MValue’ = 2^{numBlts + 32 + 2} _ i
Step 4: Set‘quotient’ =‘MValue’ /‘tDivisor’.
Step 5: Set‘remainder’ =‘quotient’ ‘input’.
Algorithm for Obfuscating Data Using Exponential Numbers
Step 1 : Generate a 32bit number‘k’ derived from bit positions of the input.
Step 2: Set k = k mod 100.
Step 3: Set k = k + 10.
Step 4: If k is a poweroftwo, then k = k  1.
Step 5: Compute j = floor(log_{k}(x)) = floor(log2(x)/log2(k)).
Step 6: Set remainder = remainder  k>.
Computing Platforms for XDivision Implementations
XDivision can be implemented on computing platforms having support for big integer arithmetic. Microsoft’s .NET platform offers Biglnteger class with a wide range of facilities for operating on large integers; Java has Biglnteger class while Python offers BigNum data type. There are many open source libraries written in C and C++ for arbitrary precision arithmetic. Such software engines internally use advanced algorithms like the Fast Fourier Transform to speed up multiplication of large integers.
Intel’s Skylake X processors provide support for advanced vectorization AVX 512 instructions for more efficient large integer arithmetic. XDivision can also be deployed on Intel Core processors. Advanced integer arithmetic libraries enjoy support on a wide range of operating systems from Windows 8.1, Windows 10, Mac OS, to Linux.
For practical reasons, there are limits imposed by available memory on language libraries for large integers. Most systems restrict the number of bits in a large in teger to the maximum value of a 32bit index. This forces XDivision algorithm to divide input into manageable chunks. Implementations will keep track of seg ment boundaries and adhere to a welldefined file format for recording to persis tent storage.
Claims
I Claim
Claim 1: A computerreadable storage device comprising of instructions that upon execution cause a computer to perform obfuscation, protection, and decod ing of data using mathematical operations on extremely large integers.
Claim 2: The device as defined in claim 1 wherein said obfuscation and said protection of data using extremely large integers further comprises the steps of: determining the size of next segment of data from metadata; reading raw data from input file; if necessary, applying padding to convert raw data to a large in teger; decomposing the large integer into a mathematical expression; building a binary expression tree; traversing the binary tree and encrypting an initial por tion of traversal result with a cipher; creating a contiguous memory layout for storing encrypted and obfuscated portions of data segment; writing the partially encrypted segment to persistent storage; repeating the above steps for each data segment.
Claim 3: The device as defined in claim 2 wherein said decomposition of said large integer and said building of binary expression tree further comprises the steps of: imposing structure on the large integer using a grid containing horizon tal, vertical, diagonal lines, and their points of intersection; defining the value of pointofintersection as the value of its x coordinate; loading a predefined set of divisors from a file and building a list of reverse indices; initializing a remainder to the large integer obtained from input; initializing an approximation to zero; initializing the expression tree for representing the decomposition; finding the pointofintersection and the divisor that most closely matches the remainder; in serting the pointofintersection, the divisor, and the operator nodes into the ex pression tree; updating the remainder by subtracting the approximation value; re peating the immediately preceding three steps of finding the marker and the divi sor combination, inserting the operands and the operators into the expression tree, and updating the remainder until the remainder becomes small; adding nodes for the final remainder and the associated operator to the expression tree.
Claim 4: The device as defined in claim 3 wherein said process of computing said pointofintersection and said divisor combination further comprises the steps of: selecting the first divisor from the list of divisors; calculating a quotient as value obtained upon dividing the remainder by the divisor; computing the nearest pointofintersection that is less than the quotient; multiplying the value of the pointofintersection with the divisor for a new approximation; saving the pointofintersection, the divisor, and the new approximation if new approxima tion is better than existing approximation; using a reverse indices table to lookup the next potential divisor; and, repeating all of the preceding steps for the next divisor to be processed.
Claim 5: The device as defined in claim 4 wherein said computation of said nearest pointofintersection further comprises the steps of: computing the near est vertical lines having poweroftwo origins on left and right sides of the large integer; setting up a reference cell whose both coordinates are equal to the large integer; projecting the cell in the direction parallel to negative slope lines to ob tain a yintercept; using the yintercept to compute negativeslope lines originat ing from the vertical on right side; saving the origins of the negativeslope line with lesser yintercept; updating the referential cell to be located on the saved negativeslope line at x coordinate equal to the large integer; projecting the refer ential cell onto the vertical on left side in the direction parallel to positiveslope lines to compute the yintercept; determining the positiveslope lines passing in the immediate vicinity of the referential cell using the yintercept; recording the positiveslope line with lesser yintercept; computing the point of intersection of the recorded positive and negativeslope lines.
Claim 6: The device as defined in claim 5 wherein said computation of said nearest positiveslope lines passing around said referential cell in the grid further comprises the steps of: calculating the distance between the referential cell and the nearest vertical on left; subtracting distance from the ycoordinate of the ref erential cell to compute yintercept; computing a variable yO by subtracting the yintercept from the xorigin of the vertical line; taking the logarithm to base 2 of variable yO and then the ceiling of the result to compute an integer rlndex; computing the xorigin of positiveslope lines as 2raisedtorIndex; computing a variable yl by subtracting yO from xorigin of positiveslope lines; computing the logarithm of yl with respect to base 2 and then using the ceiling and floor functions to compute the integers flndex and clndex; computing 2raisedto flndex and 2raisedtocIndex to determine the yorigins of positiveslope lines.
Claim 7: The device as defined in claim 2 wherein said decomposition of said large integer further comprises the steps of: initializing a remainder to the large integer associated with input; initializing an approximation value to zero; initial izing an expression tree for representing the decomposition; deriving an integer from bit values at random positions of input data; calculating a quotient by di viding the remainder by the inputderived integer; computing the nearest point ofintersection less than the quotient; calculating the approximation value as the product of pointofintersection with the divisor; updating the remainder by sub tracting the approximation value from its existing value; inserting the pointof intersection, the divisor, and the operator nodes into the expression tree; repeat ing the preceding steps of deriving integers, calculating quotients, determining pointsofintersection, calculating approximation values, inserting nodes into the expression tree, and updating the remainder until the remainder becomes too small; adding nodes for the final remainder and the associated operator into the expression tree.
Claim 8: The device as defined in claim 7 wherein said derivation of said inte ger from said input further comprises the steps of: calculating the number of bits in the input; using a pseudo random number generator to obtain bytes of random data and subsequently converting this data into an integer; computing the modu lus of the random integer with respect to the number of bits; using the value of modulus as location index to derive a bit value; inserting the bit value into a number; repeating‘n’ number of times the steps of random data generation, modulus calculation, selection and insertion of bits to construct a nbit divisor.
Claim 9: The device as defined in claim 2 wherein said decomposition of said large integer further comprises the steps of: initializing a remainder to the large integer associated with the input; initializing an approximation value to zero; ini tializing an expression tree for representing the decomposition; calculating the number of bits in the input; deriving a nbit number from random positions of the large integer; dividing a sufficiently large Mersenne number large by the n bit number to obtain a quotient; updating the remainder by performing arithme tic operations between the quotient and the remainder; repeating the preceding steps of computing the number of bits and deriving a number from input posi tions, dividing Mersenne number by the derived number, and updating the re mainder until the remainder becomes too small; adding nodes for the remainder and the relevant operator to the expression tree.
Claim 10: The device as defined in claim 2 wherein said decomposition of said large integer further comprises the steps of: initializing a remainder to the large integer associated with the input; initializing an approximation value to zero; ini tializing an expression tree for representing the decomposition; generating a 32 bit number as a function of input bits; computing the modulus of the 32bit num ber with respect to 100; adding a constant to the number computed in the preced ing step to make it greater than 10; subtracting a constant from the number ob tained in the preceding step if the number was a poweroftwo; saving the num ber in the immediately preceding step as the‘base’; computing the logarithm of the remainder with respect to the‘base’ and then applying the floor function to obtain the‘exponent’; computing the value‘base’ raised to the‘exponent’ and subtracting from the remainder; repeating the preceding steps related to compu tation of the base, the exponent, and the remainder until the remainder becomes too small; adding nodes for the final remainder and the associated operator to the expression tree.
Claim 11: The device as defined in claim 1 wherein said decoding of data fur ther comprises the steps of: reading metadata including information on any pad ding; reading encrypted bytes of the current segment of input; using a password and cipher algorithm to decrypt bytes; reading the plain bytes of the current seg ment; reconfiguring decrypted and plain bytes into a contiguous memory layout; initializing a push down stack; parsing input bytes into numbers, points, and op erators while evaluating the traversal information using push and pop operations; removing any padding information for conversion to raw data; appending the data to output file; repeating the preceding steps until all input segments have been processed.
Claim 12: The device as defined in claim 1 wherein said obfuscation and protec tion of data using extremely large integers further comprises the steps of: defin ing a binary grid characterized by diagonal lines having powersoftwo origins; converting raw input to a large integer by applying padding if necessary; initial izing a referential cell to have x coordinate equal to the large integer associated with input; selecting a horizontal and a vertical as reference lines in the binary grid; determining diagonal lines passing in the immediate vicinity of the hori zontal reference lines; moving the referential cell as close to the horizontal line as possible using the diagonals; writing the origins of the diagonal line in the im mediately preceding step to output file; determining the diagonal lines in the im mediate vicinity of the vertical reference lines; moving the referential cell as close to the vertical reference as possible using diagonal lines; writing the ori gins of the diagonal line in the immediately preceding step to output file; repeat ing the process of moving the referential cell closer to horizontal and vertical lines until it is positioned directly over a marker; writing the quadruple repre senting the marker to output file.
Claim 13: The device as defined in claim 1 wherein said decoding of data fur ther comprises the steps of: reading data from the input file; interpreting the first four numbers as the constituents of a quadruple representing a pointofintersec tion; initializing the referential cell to the coordinates of the pointofintersec tion; updating the referential cell’s y coordinate to lie on next diagonal line; up dating the referential cell’s x coordinate to lie on next diagonal line; repeating the steps of updating the y coordinate and x coordinate of referential cell until there is no more data to be processed; converting the xcoordinate of referential cell to raw data by removing any padding.
Claim 14: A memory for storing data for access by software being executed on an information processing system, comprising: a data structure stored in the memory, the data structure including a list of positive numbers in sorted order and a second data structure as an array such that the value at any index of the second data structure helps locate, in constant time 0(1), the immediate succes sor of that index in the list of sorted integers.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CA3120417A CA3120417A1 (en)  20181120  20190213  Data security and obfuscation using extremely large integers 
Applications Claiming Priority (2)
Application Number  Priority Date  Filing Date  Title 

IN201841043588  20181120  
IN201841043588  20181120 
Publications (1)
Publication Number  Publication Date 

WO2020104858A1 true WO2020104858A1 (en)  20200528 
Family
ID=70773536
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

PCT/IB2019/051144 WO2020104858A1 (en)  20181120  20190213  Data security and obfuscation using extremely large integers 
Country Status (2)
Country  Link 

CA (1)  CA3120417A1 (en) 
WO (1)  WO2020104858A1 (en) 
Citations (2)
Publication number  Priority date  Publication date  Assignee  Title 

US9021263B2 (en) *  20120831  20150428  Cleversafe, Inc.  Secure data access in a dispersed storage network 
US9245148B2 (en) *  20090529  20160126  Bitspray Corporation  Secure storage and accelerated transmission of information over communication networks 

2019
 20190213 CA CA3120417A patent/CA3120417A1/en active Pending
 20190213 WO PCT/IB2019/051144 patent/WO2020104858A1/en active Application Filing
Patent Citations (2)
Publication number  Priority date  Publication date  Assignee  Title 

US9245148B2 (en) *  20090529  20160126  Bitspray Corporation  Secure storage and accelerated transmission of information over communication networks 
US9021263B2 (en) *  20120831  20150428  Cleversafe, Inc.  Secure data access in a dispersed storage network 
Also Published As
Publication number  Publication date 

CA3120417A1 (en)  20200528 
Similar Documents
Publication  Publication Date  Title 

JP2020532771A (en)  Highprecision privacy protection realvalued function evaluation  
US10778410B2 (en)  Homomorphic data encryption method and apparatus for implementing privacy protection  
CN108829899B (en)  Data table storage, modification, query and statistical method  
JP6044738B2 (en)  Information processing apparatus, program, and storage medium  
Wu et al.  Secure and efficient outsourced kmeans clustering using fully homomorphic encryption with ciphertext packing technique  
Mahdi et al.  Secure similar patients query on encrypted genomic data  
CN105814833A (en)  Secure data transformations  
Goodrich et al.  Dataoblivious graph drawing model and algorithms  
Shatilov et al.  Solution for secure private data storage in a cloud  
Karresand et al.  Using ntfs cluster allocation behavior to find the location of user data  
WO2020104858A1 (en)  Data security and obfuscation using extremely large integers  
JPWO2011013463A1 (en)  Range search system, range search method, and range search program  
Basu et al.  Asymptotic normality of scrambled geometric net quadrature  
WO2020145340A1 (en)  Secret array access device, secret array access method, and program  
Hendrix et al.  On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs  
JP4789536B2 (en)  Data division apparatus, data division method, and computer program  
Hu et al.  New pseudoplanar binomials in characteristic two and related schemes  
CN107667368B (en)  System, method and storage medium for obfuscating a computer program  
US11281688B2 (en)  Ranking and deranking data strings  
JPWO2019208486A1 (en)  Secret Aggregation Median System, Secret Computing Unit, Secret Aggregation Median Method, and Program  
JP2020095437A (en)  Clustering device, clustering method, and clustering program  
JP4924177B2 (en)  Program obfuscation device and program  
JP2019184852A (en)  Data analysis server, data analysis system, and data analysis method  
Matula et al.  A p× p bit fraction model of binary floating point division and extremal rounding cases  
JP7173328B2 (en)  Secure division system, secure computing device, secure division method, and program 
Legal Events
Date  Code  Title  Description 

121  Ep: the epo has been informed by wipo that ep was designated in this application 
Ref document number: 19886193 Country of ref document: EP Kind code of ref document: A1 

ENP  Entry into the national phase 
Ref document number: 3120417 Country of ref document: CA 

NENP  Nonentry into the national phase 
Ref country code: DE 

122  Ep: pct application nonentry in european phase 
Ref document number: 19886193 Country of ref document: EP Kind code of ref document: A1 