CN105117235A - Method for reorganizing Office file - Google Patents
Method for reorganizing Office file Download PDFInfo
- Publication number
- CN105117235A CN105117235A CN201510600871.1A CN201510600871A CN105117235A CN 105117235 A CN105117235 A CN 105117235A CN 201510600871 A CN201510600871 A CN 201510600871A CN 105117235 A CN105117235 A CN 105117235A
- Authority
- CN
- China
- Prior art keywords
- data
- file
- office file
- office
- xml
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a method for reorganizing an Office file. The method includes the following steps: analyzing disk partitions; distinguishing unused space; finding the characteristics of a header of the Office file; finding an index table of the Office file; analyzing the xml index table; finding a data area of the Office file; reorganizing the Office file; decompressing and displaying the Office file. The method has the advantages that the data area and the tail xml index table of the lost Office file can be found according to the file characteristics, and the position of the data area of the Office file can be analyzed according to the xml index table; the data area and the xml index table of the Office file are combined into the file in the specific format; even parts of the file are damaged, the data of other parts can be recovered, and the losses caused by data losing are reduced to the maximum degree.
Description
Technical field
The present invention relates to field of information security technology, particularly a kind of method of Office file of recombinating.
Background technology
In the 21 century that informationization develops rapidly, computer technology is maked rapid progress, closely bound up with daily life, storage device data is lost often can to individual, even business unit brings endless worry and trouble, relating in computer data storage work, as in enterprise work, it is do not starve appraisal that business data loses the loss caused, little Ze Shi company cannot carry out the work, large then may business failure be caused, particularly in information enterprise, the electronic bits of data of company is exactly their lifeblood, wherein main electronic bits of data is again Office compound document, drop to minimum if recover the key message in Office compound document by the loss of enterprise, as at police field, perhaps, key message in these Office compound documents is exactly the electronic evidence of case, if reviewer can obtain the data message in more multicomputer, so just advantageously in the detection early of case, also just more evidence reliably can be provided to court, this is to evidence obtaining, and particularly date restoring work is had higher requirement.At present, market use data evidence obtaining business software the most popular have Encase, Winhex, R-stdio etc., but these softwares can only carry out the recovery of RAW mode according to looking for file header feature to file in Office compound document recovers, and Office composite file type is the codec format having oneself, only have the data of file header cannot normally by existing content revealing, therefore RAW mode be recovered can not reach very good effect for incomplete Office compound document.But in the date restoring or evidence obtaining of reality, be exactly often need to recover the key message in complete Office compound document or Office compound document.
Electronic information file is all based on certain standard or certain format design, the form that the file of every type has oneself exclusive or decoding restrictive condition, we can use these information to realize the recovery of discrete date cleverly in date restoring, farthest promote the probability recovering complete file.Make the data recovered be by normal decoder, can normally represent content wherein, reach evidence obtaining or the object of date restoring, the present invention particularly has for Office compound document and recovers thinking preferably.
Summary of the invention
The present invention is directed to the defect of prior art, provide a kind of method of Office file of recombinating, effectively can solve above-mentioned prior art Problems existing.
To recombinate the method for Office file, comprise the following steps:
S1: open MBR disk partition table, resolves MBR disk partition table, obtains subregion starting position and partition size from partition table;
S2: file partition can be found to show to obtain the parameter of file system according to the partition information in S1, resolution file system obtains property value and the stored position information of All Files and catalogue, distinguishes normal usage space and untapped free space;
S3: the data head feature scanning all Office files in clear area;
S4: offset downward the xml concordance list that " 0x1e " can find this Office file according to data head feature;
S5: resolve xml concordance list, obtain all concordance list message bit patterns;
S6: the particular location obtaining Office file data district data according to concordance list message bit pattern;
S7: according to concordance list message bit pattern and Office file data district's data preparation Office file;
S8: use the decompression algorithm of algorithm standard rules that the Office file data of restructuring is carried out the data stream that decompress(ion) obtains normal encoding.
As preferably, the detailed step of described S6 is as follows:
S601: extract an xml index information in order;
S602: find corresponding data block header by the position feature in xml index information, finding data block reference position and end position by data block header feature;
S603: the structure according to xml index information is mated with data in data block, if the match is successful with architectural feature, thinks that this data block data can be used, and performs S604; If do not mate check code, this Data Area data damaged execution S605 is described;
S604: record this position, data field;
S605: judge whether xml concordance list has extracted, and completes, and terminates, and does not complete, and performs S301.
As preferably, described S7 for according to the data block of the order of xml index information corresponding to index information, to recombinate Office file according to the mode of index information order array data block from top to bottom.
Compared with prior art the invention has the advantages that: the data field and afterbody xml concordance list of losing Office file can be found according to file characteristic, parse Office file data zone position according to xml concordance list; The file of a specific format is combined into according to Office file data district and xml concordance list; Even if file part is damaged, also can recover the data of remainder, factor data be lost the loss brought and drops to minimum.
Embodiment
For making object of the present invention, technical scheme and advantage clearly understand, to develop simultaneously embodiment referring to accompanying drawing, the present invention is described in further details.
To recombinate the method for Office file, comprise the following steps:
S1: open MBR disk partition table, resolves MBR disk partition table, obtains subregion starting position and partition size from partition table;
S2: file partition can be found to show to obtain the parameter of file system according to the partition information in S1, resolution file system obtains property value and the stored position information of All Files and catalogue, distinguishes normal usage space and untapped free space;
S3: the data head feature " 0x504B0304 " scanning all Office files in clear area;
S4: offset downward the xml concordance list that " 0x1e " can find this Office file according to data head feature;
S5: resolve xml concordance list, obtain all concordance list message bit patterns;
S6: the particular location obtaining Office file data district data according to concordance list message bit pattern;
S7: according to concordance list message bit pattern and Office file data district data recombination Office file;
Described S7 for according to the data block of the order of xml index information corresponding to index information, to recombinate Office file according to the mode of index information order array data block from top to bottom.
Such as: in DOCX file, the bitmap table structure of XML is
[Content_Types].xml
word/_rels/document.xml.rels
word/document.xml
word/media/image1.png
word/theme/theme1.xml
word/settings.xml
word/webSettings.xml
word/styles.xml
docProps/core.xml
word/numbering.xml
word/fontTable.xml
docProps/app.xml
The corresponding data block of above-mentioned each xml index table information, extracts database after finding database and form complete file according to said sequence by index table information.
S8: use the decompression algorithm of algorithm standard rules that the Office file data of restructuring is carried out the data stream that decompress(ion) obtains normal encoding.
The algorithmic code of Office decompressed data block is as follows:
The detailed step of described S6 is as follows:
S601: extract an xml index information in order;
S602: find corresponding data block header by the position feature in xml index information, finding data block reference position and end position by data block header feature;
S603: the structure according to xml index information is mated with data in data block, if the match is successful with architectural feature, thinks that this data block data can be used, and performs S604; If do not mate check code, this Data Area data damaged execution S605 is described;
S604: record this position, data field;
S605: judge whether xml concordance list has extracted, and completes, and terminates, and does not complete, and performs S301.
Those of ordinary skill in the art will appreciate that, embodiment described here is to help reader understanding's implementation method of the present invention, should be understood to that protection scope of the present invention is not limited to so special statement and embodiment.Those of ordinary skill in the art can make various other various concrete distortion and combination of not departing from essence of the present invention according to these technology enlightenment disclosed by the invention, and these distortion and combination are still in protection scope of the present invention.
Claims (3)
1. to recombinate the method for Office file, it is characterized in that comprising the following steps:
S1: open MBR disk partition table, resolves MBR disk partition table, obtains subregion starting position and partition size from partition table;
S2: file partition can be found to show to obtain the parameter of file system according to the partition information in S1, resolution file system obtains property value and the stored position information of All Files and catalogue, distinguishes normal usage space and untapped free space;
S3: the data head feature scanning all Office files in clear area;
S4: offset downward the xml concordance list that " 0x1e " can find this Office file according to data head feature;
S5: resolve xml concordance list, obtain all concordance list message bit patterns;
S6: the particular location obtaining Office file data district data according to concordance list message bit pattern;
S7: according to concordance list message bit pattern and Office file data district's data preparation Office file;
S8: use the decompression algorithm of algorithm standard rules that the Office file data of restructuring is carried out the data stream that decompress(ion) obtains normal encoding.
2. the method for a kind of Office file of recombinating according to claim 1, is characterized in that the detailed step of described S6 is as follows:
S601: extract an xml index information in order;
S602: find corresponding data block header by the position feature in xml index information, finding data block reference position and end position by data block header feature;
S603: the structure according to xml index information is mated with data in data block, if the match is successful with architectural feature, thinks that this data block data can be used, and performs S604; If do not mate check code, this Data Area data damaged execution S605 is described;
S604: record this position, data field;
S605: judge whether xml concordance list has extracted, and completes, and terminates, and does not complete, and performs S301.
3. the method for a kind of Office file of recombinating according to claim 1, it is characterized in that: described S7 for according to the data block of the order of xml index information corresponding to index information, to recombinate Office file according to the mode of index information order array data block from top to bottom.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510600871.1A CN105117235A (en) | 2015-09-18 | 2015-09-18 | Method for reorganizing Office file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510600871.1A CN105117235A (en) | 2015-09-18 | 2015-09-18 | Method for reorganizing Office file |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105117235A true CN105117235A (en) | 2015-12-02 |
Family
ID=54665237
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510600871.1A Pending CN105117235A (en) | 2015-09-18 | 2015-09-18 | Method for reorganizing Office file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105117235A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960608A (en) * | 2017-12-26 | 2019-07-02 | 北京安天网络安全技术有限公司 | The processing method and processing system of office document |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101482835A (en) * | 2008-01-11 | 2009-07-15 | 李晶 | File name regeneration technique |
US20100161608A1 (en) * | 2008-12-18 | 2010-06-24 | Sumooh Inc. | Methods and apparatus for content-aware data de-duplication |
CN102651057A (en) * | 2011-02-27 | 2012-08-29 | 孙星明 | OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof |
US20120265762A1 (en) * | 2010-10-06 | 2012-10-18 | Planet Data Solutions | System and method for indexing electronic discovery data |
CN102937924A (en) * | 2012-10-30 | 2013-02-20 | 厦门市美亚柏科信息股份有限公司 | File allocation table (FAT) data recovery method based on file characteristic and file system |
CN103645974A (en) * | 2013-12-31 | 2014-03-19 | 厦门市美亚柏科信息股份有限公司 | Method and device for recovering portable document format (PDF) file |
CN104462433A (en) * | 2014-12-17 | 2015-03-25 | 四川效率源信息安全技术有限责任公司 | Method for recovering data of FAT32 partition |
-
2015
- 2015-09-18 CN CN201510600871.1A patent/CN105117235A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101482835A (en) * | 2008-01-11 | 2009-07-15 | 李晶 | File name regeneration technique |
US20100161608A1 (en) * | 2008-12-18 | 2010-06-24 | Sumooh Inc. | Methods and apparatus for content-aware data de-duplication |
US20120265762A1 (en) * | 2010-10-06 | 2012-10-18 | Planet Data Solutions | System and method for indexing electronic discovery data |
CN102651057A (en) * | 2011-02-27 | 2012-08-29 | 孙星明 | OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof |
CN102937924A (en) * | 2012-10-30 | 2013-02-20 | 厦门市美亚柏科信息股份有限公司 | File allocation table (FAT) data recovery method based on file characteristic and file system |
CN103645974A (en) * | 2013-12-31 | 2014-03-19 | 厦门市美亚柏科信息股份有限公司 | Method and device for recovering portable document format (PDF) file |
CN104462433A (en) * | 2014-12-17 | 2015-03-25 | 四川效率源信息安全技术有限责任公司 | Method for recovering data of FAT32 partition |
Non-Patent Citations (1)
Title |
---|
杨德明: "FAT32下有效数据快速恢复方法", 《计算机应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960608A (en) * | 2017-12-26 | 2019-07-02 | 北京安天网络安全技术有限公司 | The processing method and processing system of office document |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100499824C (en) | Methods and systems for preventing start code emulation at locations that include non-byte aligned and/or bit-shifted positions | |
CN105976303B (en) | A kind of reversible information based on vector quantization hides and extracting method | |
CN104035839A (en) | Method for implementation of recovery of Android system private data | |
JP6720788B2 (en) | Log management device and log management program | |
CN102682024B (en) | Method for recombining incomplete JPEG file fragmentation | |
US9882582B2 (en) | Non-transitory computer-readable recording medium, encoding method, encoding device, decoding method, and decoding device | |
US20100278427A1 (en) | Method and system for processing text | |
CN104021217A (en) | System and method for extracting fragment file and deleted file of mobile phone | |
Tang et al. | Recovery of heavily fragmented JPEG files | |
US20120163475A1 (en) | Fast matching system for digital video | |
CN101558405A (en) | Migration apparatus which convert database of mainframe system into database of open system and method for thereof | |
CN104462433A (en) | Method for recovering data of FAT32 partition | |
CN105447168A (en) | Method for restoring and recombining fragmented files in MP4 format | |
US20120193424A1 (en) | Method of encoding and decoding data on a matrix code symbol | |
Sari et al. | A review of graph theoretic and weightage techniques in file carving | |
Griffen-Foley | Party games: Australian politicians and the media from war to dismissal | |
CN107122424B (en) | A kind of relational database log abstracting method | |
CN105677797B (en) | A kind of fragment recombination method based on data similarity in JPEG picture file | |
CN105117235A (en) | Method for reorganizing Office file | |
US10037476B2 (en) | Method and device for use when reassembling a fragmented JPEG image | |
Ravi et al. | A method for carving fragmented document and image files | |
CN106648988A (en) | Method for extracting data in monitoring equipment | |
CN105022677A (en) | USB device usage record recovery and check method | |
CN103942122A (en) | Method for recognizing AVI type block | |
US9436551B2 (en) | Method for codec-based recovery of a video using a cluster search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20151202 |