CN112287374A - Excel ciphertext document recovery method, computer equipment and storage medium - Google Patents

Excel ciphertext document recovery method, computer equipment and storage medium Download PDF

Info

Publication number
CN112287374A
CN112287374A CN202011295247.2A CN202011295247A CN112287374A CN 112287374 A CN112287374 A CN 112287374A CN 202011295247 A CN202011295247 A CN 202011295247A CN 112287374 A CN112287374 A CN 112287374A
Authority
CN
China
Prior art keywords
document
key
excel
ciphertext
bytes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011295247.2A
Other languages
Chinese (zh)
Inventor
张李军
于飞
吉庆兵
谈程
石玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202011295247.2A priority Critical patent/CN112287374A/en
Publication of CN112287374A publication Critical patent/CN112287374A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Abstract

The invention discloses an Excel ciphertext document recovery method, computer equipment and a storage medium, wherein the document recovery method comprises the following steps: constructing a key rainbow table of an Excel document: constructing a rainbow table according to an RC4 encryption algorithm adopted by the Excel document; deciphering the decryption key of the Excel ciphertext document: breaking the intermediate key of the Excel document by utilizing rainbow table attack and taking the intermediate key as a decryption key; and (3) decrypting and restoring to obtain a corresponding Excel plaintext document: and decrypting and restoring the plaintext document by using the decrypted key and the Excel document ciphertext data storage structure information. The rainbow table for key cracking only needs to be constructed once, and the generated rainbow table is suitable for key cracking of all Excel ciphertext documents, so that recovery of the Excel ciphertext documents is realized. The document recovery method of the invention is irrelevant to the complexity of the document encryption password, avoids the defect that the complex password cannot be cracked in the existing document cracking method, and can effectively restore the Excel document encrypted by the RC 4.

Description

Excel ciphertext document recovery method, computer equipment and storage medium
Technical Field
The invention relates to the technical field of document recovery, in particular to an Excel ciphertext document recovery method, computer equipment and a storage medium.
Background
Excel is spreadsheet software written by microsoft for users using Windows and Apple Macintosh operating systems, and has become the most popular data processing software at present due to the advantages of intuitive interface, excellent computing function, perfect graph tool and the like. Excel software provides encryption protection function for the generated spreadsheet document, and a user can set two types of document protection passwords, namely an opening password of the document and a document modification password. The user can browse the document contents only if the correct document opening password is input, and the document modification password is used for determining whether the user has the right to modify the document. After the document is opened, the document content can be viewed even if the document modification password is not input, so the document opening password is more important.
In practice, users often forget to open passwords for set documents, and once important documents cannot be opened, serious loss is brought to individuals or companies. In addition, the need to decrypt Excel ciphertext documents is often encountered in forensic electronic scenarios. Therefore, the research on the decryption of the Excel ciphertext document has important practical value.
If the open password of the file is cracked, the password can be directly input to finish the automatic decryption of the file content when the Excel ciphertext file is to be decrypted. Therefore, decrypting a document by cracking a password is a relatively intuitive method. In fact, at present, the decryption and restoration of the Excel ciphertext document are also realized by recovering the document password. For Password Recovery of Excel, there are three main known Password Recovery tools in the market, including Advanced Office Password Recovery of Elcomsoft, Passware Kit force of Passware, and Passfab for Excel of Passfab. The three tools all provide a password cracking function for the Excel ciphertext document, and the cracking modes comprise three modes: dictionary attack, mask attack, brute force attack. Dictionary attacks are directed to finding the correct document opening password by trying each password in the password dictionary in turn, with the success or failure of password recovery depending on whether the user password is present in the dictionary. The mask attack artificially determines characters at certain positions in advance for an attempted password, which requires that partial password information of a target user must be known, and although the password search can be theoretically narrowed, the partial password information is basically determined by guessing, such as birthday, name abbreviation and the like. The brute force attack generally searches a password space by adopting an exhaustive mode after the first two password cracking modes fail, and if the password set by a user is longer or the complexity of the password is higher, the brute force attack cannot be effective frequently. Therefore, the password recovery rate is not high in a cracking mode of searching the password space, and the recovery time cannot be determined.
In summary, the current decryption and restoration of the Excel ciphertext document basically depends on a password cracking mode, and although the success rate of cracking can be improved to a certain extent by adopting password dictionary or mask cracking, the following main problems exist:
(1) complex passwords are difficult to crack successfully. If the password length of the target user is long or the set password adopts a mixed mode of various character types, the space of the attempted password is large, and the password is difficult to crack successfully.
(2) The password recovery time of the ciphertext document is long, and the actual time effect requirement cannot be met. In practice, the shorter the password cracking time of the ciphertext document is, the more practical it is, and sometimes even the document is required to be decrypted in real time. The password cracking speed of the existing method is difficult to meet the actual requirement.
(3) The success rate of password cracking of the ciphertext document is not high. The success rate of password cracking is not high through the test of the existing cracking tool, and the successful cracking of the target ciphertext document can not be ensured.
Disclosure of Invention
In order to solve the above problems, the present invention provides an Excel ciphertext document recovery method, a computer device and a storage medium, which can be used for decrypting and restoring a 2003 version and previous Excel encrypted documents to realize document recovery, and the technical scheme is as follows.
The invention discloses a method for recovering an Excel ciphertext document, which comprises the following steps of:
step one, constructing a key rainbow table of an Excel document: constructing a rainbow table according to an RC4 encryption algorithm adopted by the Excel document;
step two, deciphering a decryption key of the Excel ciphertext document: breaking the intermediate key of the Excel document by utilizing rainbow table attack and taking the intermediate key as a decryption key;
step three, decrypting and restoring a corresponding Excel plaintext document: and D, decrypting and restoring the plaintext document by using the decryption key decoded in the step two and the Excel document ciphertext data storage structure information.
Further, in the first step, the rainbow table is constructed as follows: selecting m starting points S in a key space K1,S2,…,SmDefining a reduction function R from the ciphertext space C to the key space K: c → K, and complex function f (K) ═ R (E)k(p)); using function F to the m starting points SiCalculating to obtain m rainbow chains, and storing only the starting point and the end point pairs (S) after the calculation is finishedi,Ei) The table thus obtained is the rainbow table to be established.
Further, the rainbow table needs the following parameters when being constructed: the method comprises the following steps that a key space N, a key cracking success rate p, the number N of rainbow tables, the number M of single rainbow table chains, the length t of the rainbow chains and a storage space M are obtained, and the relevance of the parameters is shown in formulas (1) to (3);
n=-ln(1-p)/2 (1)
m=M/n (2)
t=-(N/M)ln(1-p) (3)。
further, for the intermediate key decoding, where the cryptographic algorithm E is RC4, it is necessary to generate 206 bytes of key stream ks, and intercept 16 bytes of key stream ks from 191 th to 206 th, and encrypt the encryption function Ek(p) is the truncated RC4 keystream generation function, denoted as TRC4(ks, 16); the reduction function R reduces the key stream ks of 16 bytes to 5 bytes, and directly intercepts the first 5 bytes of the key stream ks, which is marked as R (ks, 5); using an encryption function Ek(p) and a constraining function R generate a rainbow chain for different starting points, and store all pairs of starting and ending points (S)i,Ei) A rainbow table was obtained.
Further, in the second step, when the intermediate key is decoded, reading 16 bytes of data starting at 702 byte offset of the Excel ciphertext document as a target ciphertext C0Then exclusive-or'd with the 16 bytes plain text 0x20 to obtain the 16 bytes key stream Ks0(ii) a Applying a reduction function R to the key stream Ks0Get the secret key Y1Then using a function F to derive a key Y from1Starting to continuously iterate, and calculating the result Y once per iterationsAnd end point of storage EiComparing until there is the same match; after finding a match, the matching chain is started from the starting point SjStart regenerating the chain and get the key k ═ Xj(t-s)=F(t-s-1)(Sj) And verifies the equation TRC4(Ks,16) ═ Ks0Whether the result is true or not; if yes, verifying the correctness of the key k, if the verification is passed, the key k is a correct intermediate key, otherwise, returning to the rainbow table to continuously search for matching and calculate the key k again.
Further, when verifying the correctness of the key k, the following method for verifying the correctness of the intermediate key is adopted:
a) reading the encrypted head structure data of the encrypted text document RC4, analyzing and extracting an EncryptedVerifier value of 16 bytes and an EncryptedVerifierHash value of 16 bytes;
b) decrypting a plaintext value PlainVerifier [16] of the EncrypteddVerifier and a plaintext value PlainVerifierHash [16] of the EncrypteddVerifierHash by using an RC4 algorithm and a key k decrypted by a rainbow table;
c) calculating the MD5 Hash value Hash of plain verifier-MD 5(plain verifier);
d) and comparing whether the Hash is the same as the PlainVerifierHash, if so, determining that the key k is correct, otherwise, determining that the key k is wrong.
Further, since RC4 is a stream cipher algorithm, the cipher text data is decrypted by using RC4 encryption algorithm, and the plaintext data can be obtained by using the generated key stream ks or the cipher text.
Further, analyzing the byte format of the work book according to the document structure of Excel, wherein the document offset 0x200 is the recorded data of the worksheet work book; reading FilePass in a worksheet WorkBook, and analyzing a Salt value, an EncryptedVerifier value and an EncryptedVerifierHash value in an RC4 encryption algorithm and an intermediate key correctness verification method; after decryption, setting all 0 byte data at the plaintext FilePass position to represent that the document is plaintext data; and sequentially decrypting all recorded contents of the WorkBook, and keeping other non-encrypted fields unchanged at corresponding positions of the document to finally form the whole plaintext document.
The computer equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the Excel ciphertext document recovery method when executing the computer program.
The invention relates to a computer readable storage medium, which stores a computer program, wherein the computer program realizes the steps of the Excel ciphertext document recovery method when being executed by a processor.
The invention has the beneficial effects that:
(1) the rainbow table for key cracking only needs to be constructed once, and the generated rainbow table is suitable for key cracking of all Excel ciphertext documents, so that recovery of the Excel ciphertext documents is realized;
(2) the document recovery method of the invention is irrelevant to the complexity of the document encryption password, thereby avoiding the defect that the complex password cannot be cracked in the existing document cracking method;
(3) the document recovery method can effectively restore the Excel document encrypted by the RC4, and the success rate can be ensured to be over 99 percent;
(4) the file recovery method of the invention can quickly decrypt the ciphertext file, and the average decryption time of the ciphertext file is within 3 minutes under the computing power of a common desktop computer (the dominant frequency is more than 3.0 GHz) after testing, thereby well meeting the aging requirement of decryption in practice.
Drawings
FIG. 1 is a flowchart of an Excel ciphertext document recovery process according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a conventional Excel document data encryption scheme;
FIG. 3 is a diagram illustrating the construction and storage of a rainbow table according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an embodiment of decrypting a decryption key using a rainbow table;
FIG. 5 is a schematic diagram of a data storage structure of a conventional Excel document.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, specific embodiments of the present invention will now be described. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides an Excel ciphertext document recovery method, a computer device and a storage medium, wherein as shown in fig. 1, the Excel ciphertext document recovery method specifically comprises the following three steps:
the method comprises the following steps: key rainbow table for constructing Excel document
According to the encryption principle of Excel documents published by microsoft, Excel adopts an RC4 cryptographic algorithm to encrypt the data content of the documents. The RC4 algorithm is a stream cipher algorithm, and the seed key is generated from a password entered by the user. The key generation and data encryption process comprises the following steps: firstly, a user password is subjected to a series of calculations (see algorithm 1 for details of the calculation process) to obtain a 5-byte intermediate key, then the intermediate key is cascaded with the position number of each 1024-byte data block in the Excel document, an MD5 hash algorithm is used for calculating a seed key (16 bytes) of an RC4 algorithm, and finally an RC4 algorithm generates a key stream with the same length as that of each data block, and the key stream is subjected to XOR to obtain a corresponding ciphertext (the whole encryption process is shown in FIG. 2).
Algorithm 1: excel data block encryption algorithm
(1) Converting the user password into a unicode-form password UniPwd Unicode (password);
(2) computing MD5 hash value H of password0=MD5(UniPwd);
(3) Reading the encrypted head structure data of the encrypted text document RC4, analyzing and extracting a Salt value of 16 bytes;
(4) note H0The truncated hash value of the first 5 bytes of (a) is T0Hash[5]Concatenating the 16 byte Salt values to form an array Buffer of 21 bytes in length0[21]=T0Hash[5]+Salt[16];
(5) Buffer is added0Self-copy 16 times to form an array of 336 bytes long InterBuffer [336 ]];
(6) Calculating MD5 hash value H of Interbuffer1=MD5(InterBuffer);
(7) Note H1The first 5 bytes of (a) is T1Hash[5]These 5 bytes are called intermediate key, and then the intermediate key is concatenated with 4 bytes of data block number BlockNum to form 9 bytes of array Buffer1[9]=T1Hash[5]+BlockNum[4];
(8) Calculate Buffer1MD5 hash value H of2=MD5(Buffer1),H2Is 16 bytes, as the seed key input for the RC4 encryption algorithm. For each data block (1024 bytes) in the document, the RC4 algorithm generates an exclusive or of the key stream with length of 1024 bytes and the data bytes of the current block, and outputs a ciphertext of the data. The value of the number BlockNum in step (7) is increased by 1 every time a data block is encrypted.
The conventional decryption method of the Excel ciphertext document is from the viewpoint of decrypting the password of a user, and the longer the password is, the larger the search space is, and the longer the decryption time is. Especially, when the user sets a password with higher complexity (for example, the password contains numbers, upper and lower case letters and special characters, and the length is greater than 8 bits), the probability of successful cracking is very low. But we have found from the document data encryption process in algorithm 1 that the security strength of the encryption algorithm depends only on the 5-byte length of the intermediate key that determines the RC4 seed key, regardless of the length of the password entered by the user. Therefore, the 5-byte (40-bit) intermediate key can be directly cracked, and the cracking method is independent of the complexity of the user password and has obvious efficiency advantage.
We have found by studying the data storage structure of an Excel document that there are 112 bytes in the WriteAccess record in an unencrypted Excel document, the data content is first the name of the user who created the document, and then the remainder is fully filled with hexadecimal bytes 0x 20. The user name space is usually below 20 bytes, so the remaining bytes of the recorded content are all 0x 20. In conjunction with the document storage data structure, the present embodiment selects the 16 plaintext fields of all 0x20 beginning at the document offset of 702 bytes as the known plaintext. By the data encryption process in algorithm 1, taking the intermediate key as k, the RC4 algorithm will produce a key stream of ks corresponding to the intermediate key k and block number 0, while the ciphertext is the exclusive or (XOR) of the plaintext data and the key stream bytes, i.e., c ═ p XOR ks, and the plaintext bytes p are all 0x 20. Therefore, according to ks ═ pXOR c, ciphertext data c in the ciphertext document can be read, and then exclusive or is performed with known plaintext p to obtain the key stream ks. Thus, the corresponding relation between the intermediate key k and the key stream ks is established, and the intermediate key k can be decoded by adopting rainbow table attack due to the characteristic that p is a fixed plaintext.
The rainbow table attack is a space-time compromise algorithm, aiming at a cryptographic algorithm E and a known fixed plaintext p, the attack target is to search out a key k in the rainbow table to satisfy E for a given target ciphertext ck(p) ═ c. Therefore, to perform the rainbow table attack, a rainbow table is first constructed according to a specific encryption algorithm E. The rainbow table is constructed by selecting m starting points S in a key space K1,S2,…,SmDefining a reduction function R from the ciphertext space C to the key space K: c → K, and complex function f (K) ═ R (E)k(p)). Using function F to the m starting points SiCalculating to obtain m rainbow chains, and storing only the starting point and the end point pairs (S) after the calculation is finishedi,Ei) The table thus obtained is the rainbow table to be created (as shown in fig. 3).
For intermediate key decryption, where the cryptographic algorithm E is RC4, it is necessary to generate a 206-byte keystream ks and truncate the 191 th to 206 th 16-byte keystream ks. Thus, the encryption function Ek(p) is the truncated RC4 key stream generationAnd as a function, denoted TRC4(ks, 16). The reduction function R reduces the 16-byte key stream ks to 5 bytes, and directly intercepts the first 5 bytes of the key stream ks, which is denoted as R (ks, 5). Using an encryption function Ek(p) and a constraining function R, which generate a rainbow chain for different starting points and store all starting and ending point pairs (S)i,Ei) A rainbow table was obtained.
The rainbow table also needs the following parameters when being constructed: the method comprises the following steps of a key space N, a key cracking success rate p, the number N of rainbow tables, the number M of single rainbow table chains, the length t of the rainbow chains and a storage space M. The correlation of these parameters is shown in equations (1) to (3).
n=-ln(1-p)/2 (1)
m=M/n (2)
t=-(N/M)ln(1-p) (3)
For the rainbow table cracked by the intermediate key of Excel, the intermediate key to be cracked is 5 bytes (40 bits), so the parameter is set to be N-240Setting the success rate p of cracking to 99%, n is 4, m is 5350 ten thousands, t is 35800, and all 4 rainbow tables occupy 3.2GB of storage space according to equations (1) - (3).
Step two: decryption key for deciphering Excel ciphertext document
The decryption key is a 5-byte intermediate key. After the key rainbow table of the Excel document is constructed, the intermediate key k can be decoded by utilizing rainbow table attack. When the Excel ciphertext file is decoded, reading 16 bytes of data starting at 702 byte offset of the Excel ciphertext file as a target ciphertext C0Then exclusive-or'd with the 16 bytes plain text 0x20 to obtain the 16 bytes key stream Ks0. As shown in fig. 4, a reduction function R is first applied to the key stream Ks0Get the secret key Y1Then using a function F to derive a key Y from1Starting to continuously iterate, and calculating the result Y once per iterationsAnd end point of storage EiThe alignment is performed until there is an identical match. After finding a match, the matching chain is started from the starting point SjStart regenerating the chain and get the key k ═ Xj(t-s)=F(t-s-1)(Sj) And verifies the equation TRC4(Ks,16) ═ Ks0Whether or not it is established. If yes, the correctness of the key k is verified by using the following algorithm 2, if the verification is passed, the key k is a correct intermediate key, otherwise, the key k is returned to the rainbow table to continuously search for matching and calculate the key k again.
And 2, algorithm: intermediate key correctness verification algorithm
(1) Reading the encrypted head structure data of the encrypted text document RC4, analyzing and extracting an EncryptedVerifier value of 16 bytes and an EncryptedVerifierHash value of 16 bytes;
(2) decrypting a plaintext value PlainVerifier [16] of the EncrypteddVerifier and a plaintext value PlainVerifierHash [16] of the EncrypteddVerifierHash by using an RC4 algorithm and a key k decrypted by a rainbow table;
(3) calculating the MD5 Hash value Hash of plain verifier-MD 5(plain verifier);
(4) and comparing whether the Hash is the same as the PlainVerifierHash, if so, determining that the key k is correct, otherwise, determining that the key k is wrong.
Step three: decrypting and restoring to obtain corresponding Excel plaintext document
And after the decryption key k is decoded in the second step, the plaintext document can be decrypted and restored by using the decryption key and the document ciphertext data storage structure information. Note that the decryption key k is not a user password, but is simply an intermediate key that cannot be used as a password to open a ciphertext document.
The data storage of Excel adopts a Microsoft compound document structure, and a compound document is a document which can contain not only texts but also other information such as pictures, sound, tabular data, videos and the like. The compound document corresponds to a data container, and can store various types of data information. Intuitively, a compound document is a file system that divides the stored data into a number of streams (Stream) and stores the streams in different repositories (stores), with the top being the Root repository. A stream is analogous to a file in a file system, while a repository is analogous to a folder. A stream may be composed of a plurality of sub streams (substreams). For Excel documents, each substream is in turn made up of records (records) of a specified format. Specifically, an Excel Document is used as a stream and is composed of sub-streams such as Header structure (Header), worksheet (WorkBook), Summary Information (Summary Information), Document Summary Information (Document Summary Information), Header Extension (Header Extension), Big sector Pointer (Big Block Pointer), and Root directory (Root Entry), and the storage order and size of these data segments are shown in fig. 5. Wherein all table data of Excel is stored in substream WorkBook.
Since RC4 is a stream cipher algorithm, it can decrypt ciphertext data by using algorithm 1 as well, and obtain plaintext data by using the generated key stream ks or ciphertext. During decryption, it should be noted that only the workbok sub-stream is encrypted, and the other sub-streams remain unchanged. And the data in the WorkBook only encrypts the content of each record, and the record identification and the record size are both plaintext data. Specifically, the 512 bytes of the header remain unchanged, and the 512 bytes are the header of the compound document and are not encrypted. The document offset 0x200 is recorded data of the WorkBook, and the WorkBook byte format is analyzed according to the document structure of Excel. Note that the files are organized in records (records), each Record having three parts: record identification, record size, record content. The data Of the WorkBook is first a BOF record (i.e., the Beginning Of the document identifies the Beginning Of File), the record identification is 0x0809, the record size is 16 bytes (0x0010), and then the record content is 16 bytes. The record takes a total of 20 bytes and the record is not encrypted. Then follows a FilePass record, which is associated with the verification of the user's password, and is identified as 0x002F, and records a size of 54 bytes (0x0036), and then 54 bytes of record content, so that a total of 58 bytes is occupied. The record is newly added when the document is encrypted, the record indicates that the document is encrypted, and the RC4 encrypted header structural data is the record.
This embodiment reads this record, i.e., the FilePass record in the worksheet workbench, and parses out the Salt value, EncryptedVerifier value, and encryptedveriferash value in algorithm 1 and algorithm 2. After decryption, setting all 0 to the byte data at the plaintext FilePass position, and indicating that the document is plaintext data. And sequentially decrypting all recorded contents of the WorkBook, keeping other non-encrypted fields unchanged at corresponding positions of the document, and finally forming the whole plaintext document, wherein the decrypted plaintext document can be directly opened.
Correspondingly, the computer device of the embodiment includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the Excel ciphertext document recovery method when executing the computer program. A computer-readable storage medium of this embodiment stores a computer program, and the computer program, when executed by a processor, implements the steps of the above Excel ciphertext document recovery method.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for recovering an Excel ciphertext document is characterized by comprising the following steps:
step one, constructing a key rainbow table of an Excel document: constructing a rainbow table according to an RC4 encryption algorithm adopted by the Excel document;
step two, deciphering a decryption key of the Excel ciphertext document: breaking the intermediate key of the Excel document by utilizing rainbow table attack and taking the intermediate key as a decryption key;
step three, decrypting and restoring a corresponding Excel plaintext document: and D, decrypting and restoring the plaintext document by using the decryption key decoded in the step two and the Excel document ciphertext data storage structure information.
2. The Excel ciphertext document recovery method, the computer device and the storage medium according to claim 1, wherein in the first step, the rainbow table is constructed as follows: selecting m starting points S in a key space K1,S2,…,SmDefining a reduction function R from the ciphertext space C to the key space K: c → K, and complex function f (K) ═ R (E)k(p)); using function F to the m starting points SiCalculating to obtain m rainbow chains, and storing only the starting point and the end point pairs (S) after the calculation is finishedi,Ei) The table thus obtained is the rainbow table to be established.
3. The Excel ciphertext document recovery method, the computer device and the storage medium according to claim 2, wherein the rainbow table is constructed by further requiring the following parameters: the method comprises the following steps that a key space N, a key cracking success rate p, the number N of rainbow tables, the number M of single rainbow table chains, the length t of the rainbow chains and a storage space M are obtained, and the relevance of the parameters is shown in formulas (1) to (3);
n=-ln(1-p)/2 (1)
m=M/n (2)
t=-(N/M)ln(1-p) (3)。
4. the method, the computer device and the storage medium for recovering an Excel ciphertext document according to claim 2, wherein for the intermediate key decryption, when the cryptographic algorithm E is RC4, 206 bytes of keystream ks are required to be generated, 16 bytes of keystream ks from 191 th to 206 th are intercepted, and the encryption function E isk(p) is the truncated RC4 keystream generation function, denoted as TRC4(ks, 16); the reduction function R reduces the key stream ks of 16 bytes to 5 bytes, and directly intercepts the first 5 bytes of the key stream ks, which is marked as R (ks, 5); using an encryption function Ek(p) and a constraining function R generate a rainbow chain for different starting points, and store all pairs of starting and ending points (S)i,Ei) A rainbow table was obtained.
5. The Excel ciphertext document recovery method, the computer device and the storage medium according to claim 3, wherein in the second step, when the intermediate key is decrypted, 16 bytes of data starting at 702 byte offset of the Excel ciphertext document are read as the target ciphertext C0Then 16 bytes withExclusive-or operation is performed on the plaintext 0x20 to obtain the key stream Ks of 16 bytes0(ii) a Applying a reduction function R to the key stream Ks0Get the secret key Y1Then using a function F to derive a key Y from1Starting to continuously iterate, and calculating the result Y once per iterationsAnd end point of storage EiComparing until there is the same match; after finding a match, the matching chain is started from the starting point SjStart regenerating the chain and get the key k ═ Xj(t-s)=F(t-s-1)(Sj) And verifies the equation TRC4(Ks,16) ═ Ks0Whether the result is true or not; if yes, verifying the correctness of the key k, if the verification is passed, the key k is a correct intermediate key, otherwise, returning to the rainbow table to continuously search for matching and calculate the key k again.
6. The Excel ciphertext document recovery method, the computer device and the storage medium according to claim 5, wherein when verifying the correctness of the key k, the following intermediate key correctness verification method is adopted:
a) reading the encrypted head structure data of the encrypted text document RC4, analyzing and extracting an EncryptedVerifier value of 16 bytes and an EncryptedVerifierHash value of 16 bytes;
b) decrypting a plaintext value PlainVerifier [16] of the EncrypteddVerifier and a plaintext value PlainVerifierHash [16] of the EncrypteddVerifierHash by using an RC4 algorithm and a key k decrypted by a rainbow table;
c) calculating the MD5 Hash value Hash of plain verifier-MD 5(plain verifier);
d) and comparing whether the Hash is the same as the PlainVerifierHash, if so, determining that the key k is correct, otherwise, determining that the key k is wrong.
7. The Excel ciphertext document recovery method, the computer device and the storage medium according to claim 6, wherein since RC4 is a stream cipher algorithm, the ciphertext data is decrypted by using an RC4 encryption algorithm, and the plaintext data can be obtained by using the generated key stream ks or the ciphertext.
8. The Excel ciphertext document recovery method, the computer device and the storage medium according to claim 7, wherein a work book byte format is analyzed according to a document structure of Excel, and a document offset 0x200 is recorded data of a worksheet work book; reading FilePass in a worksheet WorkBook, and analyzing a Salt value, an EncryptedVerifier value and an EncryptedVerifierHash value in an RC4 encryption algorithm and an intermediate key correctness verification method; after decryption, setting all 0 byte data at the plaintext FilePass position to represent that the document is plaintext data; and sequentially decrypting all recorded contents of the WorkBook, and keeping other non-encrypted fields unchanged at corresponding positions of the document to finally form the whole plaintext document.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the method according to any of claims 1-8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202011295247.2A 2020-11-18 2020-11-18 Excel ciphertext document recovery method, computer equipment and storage medium Pending CN112287374A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011295247.2A CN112287374A (en) 2020-11-18 2020-11-18 Excel ciphertext document recovery method, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011295247.2A CN112287374A (en) 2020-11-18 2020-11-18 Excel ciphertext document recovery method, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112287374A true CN112287374A (en) 2021-01-29

Family

ID=74398022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011295247.2A Pending CN112287374A (en) 2020-11-18 2020-11-18 Excel ciphertext document recovery method, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112287374A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672963A (en) * 2021-08-30 2021-11-19 国家计算机网络与信息安全管理中心 Matching method and device based on rainbow table, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130290731A1 (en) * 2012-04-26 2013-10-31 Appsense Limited Systems and methods for storing and verifying security information
CN103916456A (en) * 2013-01-09 2014-07-09 国际商业机器公司 Transparent Encryption/decryption Gateway For Cloud Storage Services
CN105933120A (en) * 2016-04-06 2016-09-07 清华大学 Spark platform-based password hash value recovery method and device
CN106357384A (en) * 2016-08-26 2017-01-25 广州慧睿思通信息科技有限公司 Word2003 document cracking system based on FPGA hardware and method
CN106778292A (en) * 2016-11-24 2017-05-31 中国电子科技集团公司第三十研究所 A kind of quick restoring method of Word encrypted documents

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130290731A1 (en) * 2012-04-26 2013-10-31 Appsense Limited Systems and methods for storing and verifying security information
CN103916456A (en) * 2013-01-09 2014-07-09 国际商业机器公司 Transparent Encryption/decryption Gateway For Cloud Storage Services
CN105933120A (en) * 2016-04-06 2016-09-07 清华大学 Spark platform-based password hash value recovery method and device
CN106357384A (en) * 2016-08-26 2017-01-25 广州慧睿思通信息科技有限公司 Word2003 document cracking system based on FPGA hardware and method
CN106778292A (en) * 2016-11-24 2017-05-31 中国电子科技集团公司第三十研究所 A kind of quick restoring method of Word encrypted documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIJUN ZHANG 等: "Fast Decryption of Excel Document Encrypted by RC4 Algorithm", 《2020 IEEE 20TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672963A (en) * 2021-08-30 2021-11-19 国家计算机网络与信息安全管理中心 Matching method and device based on rainbow table, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US9172533B2 (en) Method and system for securing communication
Krawczyk et al. HMAC: Keyed-hashing for message authentication
US8284933B2 (en) Encrypting variable-length passwords to yield fixed-length encrypted passwords
Krawczyk et al. RFC2104: HMAC: Keyed-hashing for message authentication
JP5914604B2 (en) Apparatus and method for decrypting encrypted file
US9537657B1 (en) Multipart authenticated encryption
US8300828B2 (en) System and method for a derivation function for key per page
US20140355754A1 (en) Partial CipherText Updates Using Variable-Length Segments Delineated by Pattern Matching and Encrypted by Fixed-Length Blocks
US10461924B2 (en) Format-preserving cipher
US10009169B2 (en) Format-preserving cipher
CN106878013B (en) File encryption and decryption method and device
CN106778292B (en) A kind of quick restoring method of Word encrypted document
US9313023B1 (en) Format-preserving cipher
Karimov et al. Encryption Methods and Algorithms Based on Domestic Standards in Open-Source Operating Systems
CN112287374A (en) Excel ciphertext document recovery method, computer equipment and storage medium
US20230050675A1 (en) Data processing device, data processing method, and computer program
Cortez et al. Cryptanalysis of the Modified SHA256
Sorini et al. Pylocky ransomware source code analysis
Tang et al. Side channel attack resistant cross-user generalized deduplication for cloud storage
Zhang et al. An extensive analysis of truecrypt encryption forensics
Rahouma Reviewing and applying security services with non-english letter coding to secure software applications in light of software trade-offs
CN116894273B (en) File encryption method, decryption method, equipment and medium based on exclusive or sum remainder
Zhang et al. An Efficient Recovery Method of Encrypted Word Document
CN113569262B (en) Ciphertext storage method and system based on block chain
Zhang et al. Fast Decryption of Excel Document Encrypted by RC4 Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210129

WD01 Invention patent application deemed withdrawn after publication