CN105117235A - Method for reorganizing Office file - Google Patents

Method for reorganizing Office file Download PDF

Info

Publication number
CN105117235A
CN105117235A CN201510600871.1A CN201510600871A CN105117235A CN 105117235 A CN105117235 A CN 105117235A CN 201510600871 A CN201510600871 A CN 201510600871A CN 105117235 A CN105117235 A CN 105117235A
Authority
CN
China
Prior art keywords
data
file
office file
office
xml
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510600871.1A
Other languages
Chinese (zh)
Inventor
梁效宁
许超明
赵飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SICHUAN XLY INFORMATION SAFETY TECHNOLOGY Co Ltd
Original Assignee
SICHUAN XLY INFORMATION SAFETY TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SICHUAN XLY INFORMATION SAFETY TECHNOLOGY Co Ltd filed Critical SICHUAN XLY INFORMATION SAFETY TECHNOLOGY Co Ltd
Priority to CN201510600871.1A priority Critical patent/CN105117235A/en
Publication of CN105117235A publication Critical patent/CN105117235A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a method for reorganizing an Office file. The method includes the following steps: analyzing disk partitions; distinguishing unused space; finding the characteristics of a header of the Office file; finding an index table of the Office file; analyzing the xml index table; finding a data area of the Office file; reorganizing the Office file; decompressing and displaying the Office file. The method has the advantages that the data area and the tail xml index table of the lost Office file can be found according to the file characteristics, and the position of the data area of the Office file can be analyzed according to the xml index table; the data area and the xml index table of the Office file are combined into the file in the specific format; even parts of the file are damaged, the data of other parts can be recovered, and the losses caused by data losing are reduced to the maximum degree.

Description

A kind of method of Office file of recombinating
Technical field
The present invention relates to field of information security technology, particularly a kind of method of Office file of recombinating.
Background technology
In the 21 century that informationization develops rapidly, computer technology is maked rapid progress, closely bound up with daily life, storage device data is lost often can to individual, even business unit brings endless worry and trouble, relating in computer data storage work, as in enterprise work, it is do not starve appraisal that business data loses the loss caused, little Ze Shi company cannot carry out the work, large then may business failure be caused, particularly in information enterprise, the electronic bits of data of company is exactly their lifeblood, wherein main electronic bits of data is again Office compound document, drop to minimum if recover the key message in Office compound document by the loss of enterprise, as at police field, perhaps, key message in these Office compound documents is exactly the electronic evidence of case, if reviewer can obtain the data message in more multicomputer, so just advantageously in the detection early of case, also just more evidence reliably can be provided to court, this is to evidence obtaining, and particularly date restoring work is had higher requirement.At present, market use data evidence obtaining business software the most popular have Encase, Winhex, R-stdio etc., but these softwares can only carry out the recovery of RAW mode according to looking for file header feature to file in Office compound document recovers, and Office composite file type is the codec format having oneself, only have the data of file header cannot normally by existing content revealing, therefore RAW mode be recovered can not reach very good effect for incomplete Office compound document.But in the date restoring or evidence obtaining of reality, be exactly often need to recover the key message in complete Office compound document or Office compound document.
Electronic information file is all based on certain standard or certain format design, the form that the file of every type has oneself exclusive or decoding restrictive condition, we can use these information to realize the recovery of discrete date cleverly in date restoring, farthest promote the probability recovering complete file.Make the data recovered be by normal decoder, can normally represent content wherein, reach evidence obtaining or the object of date restoring, the present invention particularly has for Office compound document and recovers thinking preferably.
Summary of the invention
The present invention is directed to the defect of prior art, provide a kind of method of Office file of recombinating, effectively can solve above-mentioned prior art Problems existing.
To recombinate the method for Office file, comprise the following steps:
S1: open MBR disk partition table, resolves MBR disk partition table, obtains subregion starting position and partition size from partition table;
S2: file partition can be found to show to obtain the parameter of file system according to the partition information in S1, resolution file system obtains property value and the stored position information of All Files and catalogue, distinguishes normal usage space and untapped free space;
S3: the data head feature scanning all Office files in clear area;
S4: offset downward the xml concordance list that " 0x1e " can find this Office file according to data head feature;
S5: resolve xml concordance list, obtain all concordance list message bit patterns;
S6: the particular location obtaining Office file data district data according to concordance list message bit pattern;
S7: according to concordance list message bit pattern and Office file data district's data preparation Office file;
S8: use the decompression algorithm of algorithm standard rules that the Office file data of restructuring is carried out the data stream that decompress(ion) obtains normal encoding.
As preferably, the detailed step of described S6 is as follows:
S601: extract an xml index information in order;
S602: find corresponding data block header by the position feature in xml index information, finding data block reference position and end position by data block header feature;
S603: the structure according to xml index information is mated with data in data block, if the match is successful with architectural feature, thinks that this data block data can be used, and performs S604; If do not mate check code, this Data Area data damaged execution S605 is described;
S604: record this position, data field;
S605: judge whether xml concordance list has extracted, and completes, and terminates, and does not complete, and performs S301.
As preferably, described S7 for according to the data block of the order of xml index information corresponding to index information, to recombinate Office file according to the mode of index information order array data block from top to bottom.
Compared with prior art the invention has the advantages that: the data field and afterbody xml concordance list of losing Office file can be found according to file characteristic, parse Office file data zone position according to xml concordance list; The file of a specific format is combined into according to Office file data district and xml concordance list; Even if file part is damaged, also can recover the data of remainder, factor data be lost the loss brought and drops to minimum.
Embodiment
For making object of the present invention, technical scheme and advantage clearly understand, to develop simultaneously embodiment referring to accompanying drawing, the present invention is described in further details.
To recombinate the method for Office file, comprise the following steps:
S1: open MBR disk partition table, resolves MBR disk partition table, obtains subregion starting position and partition size from partition table;
S2: file partition can be found to show to obtain the parameter of file system according to the partition information in S1, resolution file system obtains property value and the stored position information of All Files and catalogue, distinguishes normal usage space and untapped free space;
S3: the data head feature " 0x504B0304 " scanning all Office files in clear area;
S4: offset downward the xml concordance list that " 0x1e " can find this Office file according to data head feature;
S5: resolve xml concordance list, obtain all concordance list message bit patterns;
S6: the particular location obtaining Office file data district data according to concordance list message bit pattern;
S7: according to concordance list message bit pattern and Office file data district data recombination Office file;
Described S7 for according to the data block of the order of xml index information corresponding to index information, to recombinate Office file according to the mode of index information order array data block from top to bottom.
Such as: in DOCX file, the bitmap table structure of XML is
[Content_Types].xml
word/_rels/document.xml.rels
word/document.xml
word/media/image1.png
word/theme/theme1.xml
word/settings.xml
word/webSettings.xml
word/styles.xml
docProps/core.xml
word/numbering.xml
word/fontTable.xml
docProps/app.xml
The corresponding data block of above-mentioned each xml index table information, extracts database after finding database and form complete file according to said sequence by index table information.
S8: use the decompression algorithm of algorithm standard rules that the Office file data of restructuring is carried out the data stream that decompress(ion) obtains normal encoding.
The algorithmic code of Office decompressed data block is as follows:
The detailed step of described S6 is as follows:
S601: extract an xml index information in order;
S602: find corresponding data block header by the position feature in xml index information, finding data block reference position and end position by data block header feature;
S603: the structure according to xml index information is mated with data in data block, if the match is successful with architectural feature, thinks that this data block data can be used, and performs S604; If do not mate check code, this Data Area data damaged execution S605 is described;
S604: record this position, data field;
S605: judge whether xml concordance list has extracted, and completes, and terminates, and does not complete, and performs S301.
Those of ordinary skill in the art will appreciate that, embodiment described here is to help reader understanding's implementation method of the present invention, should be understood to that protection scope of the present invention is not limited to so special statement and embodiment.Those of ordinary skill in the art can make various other various concrete distortion and combination of not departing from essence of the present invention according to these technology enlightenment disclosed by the invention, and these distortion and combination are still in protection scope of the present invention.

Claims (3)

1. to recombinate the method for Office file, it is characterized in that comprising the following steps:
S1: open MBR disk partition table, resolves MBR disk partition table, obtains subregion starting position and partition size from partition table;
S2: file partition can be found to show to obtain the parameter of file system according to the partition information in S1, resolution file system obtains property value and the stored position information of All Files and catalogue, distinguishes normal usage space and untapped free space;
S3: the data head feature scanning all Office files in clear area;
S4: offset downward the xml concordance list that " 0x1e " can find this Office file according to data head feature;
S5: resolve xml concordance list, obtain all concordance list message bit patterns;
S6: the particular location obtaining Office file data district data according to concordance list message bit pattern;
S7: according to concordance list message bit pattern and Office file data district's data preparation Office file;
S8: use the decompression algorithm of algorithm standard rules that the Office file data of restructuring is carried out the data stream that decompress(ion) obtains normal encoding.
2. the method for a kind of Office file of recombinating according to claim 1, is characterized in that the detailed step of described S6 is as follows:
S601: extract an xml index information in order;
S602: find corresponding data block header by the position feature in xml index information, finding data block reference position and end position by data block header feature;
S603: the structure according to xml index information is mated with data in data block, if the match is successful with architectural feature, thinks that this data block data can be used, and performs S604; If do not mate check code, this Data Area data damaged execution S605 is described;
S604: record this position, data field;
S605: judge whether xml concordance list has extracted, and completes, and terminates, and does not complete, and performs S301.
3. the method for a kind of Office file of recombinating according to claim 1, it is characterized in that: described S7 for according to the data block of the order of xml index information corresponding to index information, to recombinate Office file according to the mode of index information order array data block from top to bottom.
CN201510600871.1A 2015-09-18 2015-09-18 Method for reorganizing Office file Pending CN105117235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510600871.1A CN105117235A (en) 2015-09-18 2015-09-18 Method for reorganizing Office file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510600871.1A CN105117235A (en) 2015-09-18 2015-09-18 Method for reorganizing Office file

Publications (1)

Publication Number Publication Date
CN105117235A true CN105117235A (en) 2015-12-02

Family

ID=54665237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510600871.1A Pending CN105117235A (en) 2015-09-18 2015-09-18 Method for reorganizing Office file

Country Status (1)

Country Link
CN (1) CN105117235A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960608A (en) * 2017-12-26 2019-07-02 北京安天网络安全技术有限公司 The processing method and processing system of office document

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101482835A (en) * 2008-01-11 2009-07-15 李晶 File name regeneration technique
US20100161608A1 (en) * 2008-12-18 2010-06-24 Sumooh Inc. Methods and apparatus for content-aware data de-duplication
CN102651057A (en) * 2011-02-27 2012-08-29 孙星明 OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof
US20120265762A1 (en) * 2010-10-06 2012-10-18 Planet Data Solutions System and method for indexing electronic discovery data
CN102937924A (en) * 2012-10-30 2013-02-20 厦门市美亚柏科信息股份有限公司 File allocation table (FAT) data recovery method based on file characteristic and file system
CN103645974A (en) * 2013-12-31 2014-03-19 厦门市美亚柏科信息股份有限公司 Method and device for recovering portable document format (PDF) file
CN104462433A (en) * 2014-12-17 2015-03-25 四川效率源信息安全技术有限责任公司 Method for recovering data of FAT32 partition

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101482835A (en) * 2008-01-11 2009-07-15 李晶 File name regeneration technique
US20100161608A1 (en) * 2008-12-18 2010-06-24 Sumooh Inc. Methods and apparatus for content-aware data de-duplication
US20120265762A1 (en) * 2010-10-06 2012-10-18 Planet Data Solutions System and method for indexing electronic discovery data
CN102651057A (en) * 2011-02-27 2012-08-29 孙星明 OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof
CN102937924A (en) * 2012-10-30 2013-02-20 厦门市美亚柏科信息股份有限公司 File allocation table (FAT) data recovery method based on file characteristic and file system
CN103645974A (en) * 2013-12-31 2014-03-19 厦门市美亚柏科信息股份有限公司 Method and device for recovering portable document format (PDF) file
CN104462433A (en) * 2014-12-17 2015-03-25 四川效率源信息安全技术有限责任公司 Method for recovering data of FAT32 partition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨德明: "FAT32下有效数据快速恢复方法", 《计算机应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960608A (en) * 2017-12-26 2019-07-02 北京安天网络安全技术有限公司 The processing method and processing system of office document

Similar Documents

Publication Publication Date Title
CN100499824C (en) Methods and systems for preventing start code emulation at locations that include non-byte aligned and/or bit-shifted positions
CN105976303B (en) A kind of reversible information based on vector quantization hides and extracting method
CN104035839A (en) Method for implementation of recovery of Android system private data
JP6720788B2 (en) Log management device and log management program
CN102682024B (en) Method for recombining incomplete JPEG file fragmentation
US9882582B2 (en) Non-transitory computer-readable recording medium, encoding method, encoding device, decoding method, and decoding device
US20100278427A1 (en) Method and system for processing text
CN104021217A (en) System and method for extracting fragment file and deleted file of mobile phone
Tang et al. Recovery of heavily fragmented JPEG files
US20120163475A1 (en) Fast matching system for digital video
CN101558405A (en) Migration apparatus which convert database of mainframe system into database of open system and method for thereof
CN104462433A (en) Method for recovering data of FAT32 partition
CN105447168A (en) Method for restoring and recombining fragmented files in MP4 format
US20120193424A1 (en) Method of encoding and decoding data on a matrix code symbol
Sari et al. A review of graph theoretic and weightage techniques in file carving
Griffen-Foley Party games: Australian politicians and the media from war to dismissal
CN107122424B (en) A kind of relational database log abstracting method
CN105677797B (en) A kind of fragment recombination method based on data similarity in JPEG picture file
CN105117235A (en) Method for reorganizing Office file
US10037476B2 (en) Method and device for use when reassembling a fragmented JPEG image
Ravi et al. A method for carving fragmented document and image files
CN106648988A (en) Method for extracting data in monitoring equipment
CN105022677A (en) USB device usage record recovery and check method
CN103942122A (en) Method for recognizing AVI type block
US9436551B2 (en) Method for codec-based recovery of a video using a cluster search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151202