CN115422122A - Deep analysis method and system for mirror image file - Google Patents

Deep analysis method and system for mirror image file Download PDF

Info

Publication number
CN115422122A
CN115422122A CN202211104975.XA CN202211104975A CN115422122A CN 115422122 A CN115422122 A CN 115422122A CN 202211104975 A CN202211104975 A CN 202211104975A CN 115422122 A CN115422122 A CN 115422122A
Authority
CN
China
Prior art keywords
file
mirror image
analysis
node
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211104975.XA
Other languages
Chinese (zh)
Inventor
蒋蕊
许全聪
丁文波
吴江煌
周锐
邢健坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Yian Information Technology Co ltd
Sdic Intelligent Technology Co ltd
Original Assignee
Xiamen Meiya Yian Information Technology Co ltd
Sdic Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Yian Information Technology Co ltd, Sdic Intelligent Technology Co ltd filed Critical Xiamen Meiya Yian Information Technology Co ltd
Priority to CN202211104975.XA priority Critical patent/CN115422122A/en
Publication of CN115422122A publication Critical patent/CN115422122A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Abstract

The invention discloses a deep analysis method and a deep analysis system for a mirror image file, which comprise the steps of configuring an analysis plug-in according to the type of the mirror image file, and analyzing a file node of a file system directory structure of the mirror image file by using the analysis plug-in; constructing a file system tree based on the parsed file nodes; responding to the file nodes to be analyzed existing in the file system tree, wherein the file nodes to be analyzed are nested mirror image files, and the analyzing plug-in reads data for analyzing according to the offset of the nested mirror image files in the host files; and storing the file system tree and the mirror image file analysis configuration, positioning the target file according to the search character string, and reading and exporting the target file. The method can be used for electronic data evidence obtaining products or file searching products, and can realize deep analysis and nested file searching.

Description

Deep analysis method and system for mirror image file
Technical Field
The invention relates to the technical field of data analysis, in particular to a deep analysis method and a deep analysis system for a mirror image file.
Background
The mirror image file is a file storage form, and a plurality of files can be made into a mirror image. The mirror files are of various types, each type has a specific format, and common mirror file formats include an advanced forensic format file (. Aff), an original mirror file (. Dd.001.Raw. Bin), a DMG mirror file (. DMG), an Encase mirror file (. E01.Ex 01), a GHOSTE mirror file (. Gho), a Clone CD mirror file (. Img.dvd), a CD/DVD mirror file (. Iso), and the like. In addition, the common document formats such as a compression package, an office compound document and the like can also comprise internal files like the image file. The image file can be made and the file stored in the image file can be read by a special tool. For example, winRAR is used to create and decompress commonly used compressed files, and FTK Imager is used to parse multiple types of image files. However, these tools can only parse the specified image file, and do not support further parsing of other image files contained in the image file. In a real scene, the mirror image file generally includes the conditions of the mirror image file, a compressed package and a compound document, so a method for deep parsing of the mirror image file needs to be found.
In the prior art, the analysis of the image file is performed based on a specific image type, and other image files nested in the image file cannot be analyzed, so that the prior art has the following problems:
1. analyzing based on the specific mirror image file type, wherein the data analysis capability is limited;
2. analyzing based on a single mirror image file, wherein nested mirror image files, compressed packages, compound documents and the like cannot be deeply analyzed;
3. the mirror image file types are various, and the newly added mirror image file types cannot be supported in an expansion mode;
4. since the nested image file is not deeply analyzed, the embedded data cannot be searched in the analysis result.
Disclosure of Invention
In order to solve the technical problem that in the prior art, the analysis of the mirror image file is performed based on a specific mirror image type, and other mirror image files nested in the mirror image file cannot be analyzed, the invention provides a deep analysis method and a deep analysis system for the mirror image file, which are used for solving the technical problem.
According to an aspect of the present invention, a method for deep parsing of an image file is provided, including:
s1: configuring an analysis plug-in according to the type of the mirror image file, and analyzing file nodes of a file system directory structure of the mirror image file by using the analysis plug-in;
s2: constructing a file system tree based on the parsed file nodes;
s3: responding to the file nodes to be analyzed existing in the file system tree, wherein the file nodes to be analyzed are nested mirror image files, and the analysis plug-in reads data for analysis according to the offset of the nested mirror image files in the host files;
s4: and storing the file system tree and the mirror image file analysis configuration, positioning the target file according to the search character string, and reading and exporting the target file.
In some specific embodiments, the content of the file node includes a file name, a parent node pointer, a file time attribute, a file type, a child node pointer list, a resolution handle of an image where the file is located, a file size, a location of the file in the image, or an ID of a file corresponding to an exported file.
In some specific embodiments, if the location of the file cannot be located in the image, the file in the image file is extracted into the temporary directory, a mapping table between the file node and the temporary file is established, and the ID in the mapping table is recorded in the ID field of the export file in the file node.
In some specific embodiments, the file nodes store the file nodes by dynamically allocating memory blocks, one memory block stores a plurality of file nodes, and the next memory block is reallocated when the memory block is full. The allocation method reduces the memory allocation times and improves the speed of searching the file nodes.
In some specific embodiments, S2 is specifically configured to construct a file system tree according to the parsed parent node pointer and child node pointer list of the file node.
In some specific embodiments, in S3, for a nested image file that cannot be directly parsed, the nested image file is extracted into a temporary directory, the temporary file is parsed by using a parsing plug-in, and a parsed file node is mounted below a file node of the nested image file in the file system tree.
In some specific embodiments, the step S4 specifically includes starting a plurality of search threads according to the number of CPU cores, where the search threads search file names one by one with the file node memory blocks as a unit until all the memory blocks are searched; and reading and exporting the target file according to the mirror image file analysis handle stored in the target file node and the position of the file in the mirror image file.
In some embodiments, if the target file is exported to a file of the temporary directory, the exported temporary file is located according to the ID of the file corresponding to the exported file, and the data of the temporary file is read to export and read the target file.
According to a second aspect of the invention, a computer-readable storage medium having one or more computer programs stored thereon, wherein the one or more computer programs, when executed by a computer processor, implement the above-described method.
According to a third aspect of the present invention, a deep parsing system for an image file is provided, the system comprising:
a file node acquisition unit: configuring a file node which is used for configuring an analysis plug-in according to the type of the image file and analyzing a file system directory structure of the image file by using the analysis plug-in;
a file system tree construction unit: configuring a file system tree based on the parsed file nodes;
a nested mirror image analysis unit: the analysis plug-in is configured to respond to the file nodes to be analyzed existing in the file system tree, the file nodes to be analyzed are nested mirror image files, and the analysis plug-in reads data for analysis according to the offset of the nested mirror image files in the host files of the nested mirror image files;
a target file export unit: the configuration is used for storing file system tree and mirror image file analysis configuration, positioning the target file according to the search character string, and reading and exporting the target file.
The deep analysis method and the system for the mirror image file have the advantages that the analysis support on various mirror image file types is realized, the mirror image file is nested and analyzed, the analysis results of all the mirror image files are mounted in the same file system tree, and the effect of deep analysis of data is achieved. The embedded image file processing method and device can perform deep analysis on the embedded image files, the compressed packages, the compound documents and the like; the newly added data source type can be expanded and supported; there is also support for deep parsing of non-mirrored type compound documents. The embedded data can be searched in the same file system tree, and embedded files can be deeply analyzed and searched in electronic data evidence obtaining products and file searching products.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding similar parts.
FIG. 1 is a flow diagram of a method for deep parsing of an image file according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for deep parsing of an image file according to a specific embodiment of the present application;
FIG. 3 is a block diagram of each file node in the file system tree of a particular embodiment of the present application;
FIG. 4 is a diagram illustrating a depth resolution result of an image file according to an embodiment of the present disclosure;
FIG. 5 is a framework diagram of a depth resolution system for image files according to an embodiment of the present application
FIG. 6 is a block diagram of a multiple data source file system for image file depth resolution of a specific embodiment of the present application;
FIG. 7 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a flowchart of a depth resolution method for an image file according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
s101: and configuring an analysis plug-in according to the type of the mirror image file, and analyzing the file nodes of the file system directory structure of the mirror image file by using the analysis plug-in. The content of the file node comprises a file name, a parent node pointer, a file time attribute, a file type, a child node pointer list, an analysis handle of the mirror image where the file is located, a file size, a position of the file in the mirror image or an ID of the file corresponding to the exported file.
In a specific embodiment, if the position of a file cannot be located in an image, extracting the file in the image file into a temporary directory, establishing a mapping table of a file node and a temporary file, and recording an ID in the mapping table into an ID field of an export file in the file node.
In a specific embodiment, the file nodes store the file nodes by dynamically allocating memory blocks, one memory block stores a plurality of file nodes, and the next memory block is reallocated when the memory block is full.
S102: and constructing a file system tree based on the parsed file nodes. And constructing a file system tree according to the analyzed father node pointer and child node pointer list of the file nodes.
S103: and responding to the file nodes to be analyzed existing in the file system tree, wherein the file nodes to be analyzed are nested mirror image files, and the analyzing plug-in reads data for analyzing according to the offset of the nested mirror image files in the host files.
S104: and storing the file system tree and the mirror image file analysis configuration, positioning the target file according to the search character string, and reading and exporting the target file. Starting a plurality of search threads according to the number of the CPU cores, wherein the search threads search file names one by taking the file node memory blocks as units until all the memory blocks are searched; and reading and exporting the target file according to the mirror image file analysis handle stored in the target file node and the position of the file in the mirror image file.
In a specific embodiment, if the target file is exported to the file of the temporary directory, the exported temporary file is located according to the ID of the file corresponding to the exported file, and the data of the temporary file is read to export and read the target file.
Fig. 2 shows a flowchart of a depth resolution method for an image file according to a specific embodiment of the present application, and as shown in fig. 2, the method specifically includes the following steps:
step 1: and configuring the type of the image file to be analyzed. The configured content comprises the type of the image file needing to be analyzed and the analysis plug-in needed for analyzing the type of the image file. Wherein the type of the image file is a file extension of the image file. And in this step the image file to be parsed is selected.
And 2, step: and judging whether the image file is analyzed. And judging whether the image file is analyzed or not according to the image analysis record generated in the subsequent step, if not, executing the step 3, and if so, executing the step 8.
And step 3: and analyzing the mirror image file. And analyzing the mirror image file to be analyzed by using the corresponding analysis plug-in, and storing the analysis handle of the plug-in for the subsequent steps. The plug-in parses the directory structure of the file system, where the content of the file node is shown in fig. 3 as a structure diagram of each file node in the file system tree according to a specific embodiment of the present application, and includes a file name, a parent node pointer, a file time attribute, a file type, a child node pointer list, a parsing handle of an image where the file is located, a file size, a location of the file in the image, or an ID of an exported file corresponding to the file. For the condition that the file structure contained in some mirror image file types is complex and the position of the file cannot be simply located in the mirror image, extracting the file in the mirror image file into a temporary directory, establishing a mapping table of a file node and the temporary file, and recording the ID in the mapping table into the ID field of an export file in the file node. And in the subsequent step, the file nodes for establishing the mapping relation are accessed by reading the exported temporary files. For example, the picture embedded in the PDF may be parsed by releasing the picture to the temporary directory, and then the picture embedded in the PDF may be read by redirecting to the picture in the temporary directory.
And 4, step 4: and constructing a system file tree. And constructing a file system tree for the parsed file nodes (comprising the step 3 and the step 6). The file nodes are stored in a mode of dynamically allocating memory blocks, each memory block stores N file nodes, and when one memory block is full, the next memory block is reallocated. The allocation method reduces the times of memory allocation and improves the speed of file node searching. The file node comprises a file name, a file attribute and a position of a file in the mirror image, and also comprises a pointer pointing to a father node, a pointer queue pointing to a child node and a mirror image file handle for reading the file. And each parsed file node constructs a file system tree according to the parent-child structure. And if the analyzed file is the mirror image type needing to be analyzed and set in the step 1, storing the mirror image file node into a queue to be analyzed. As shown in fig. 4, which is a schematic diagram of a deep parsing result of an image file according to a specific embodiment of the present application, after "image file 1.iso" is parsed, two embedded image files, "image file 2.dmg" and "image file 3.docx", are detected in a constructed file system tree, and therefore, the two file nodes need to be stored in a queue to be parsed for subsequent processing.
And 5: and judging whether the mirror image file to be analyzed exists in the file system tree or not. And if the mirror image file to be analyzed exists in the step 4, the step 6 is executed to analyze the file to be analyzed, and if the file to be analyzed does not exist, the analysis is finished.
And 6: and analyzing the nested mirror image file. And the analysis plug-in reads the data according to the offset of the nested mirror image file in the host file thereof for analysis. For nested image files which cannot be directly analyzed, the image files can be extracted into a temporary directory, and then the temporary files are subjected to image analysis by using analysis plug-ins. And (4) mounting the analyzed file node to the file node of the nested mirror image file in the file system tree through the step (4). As shown in the example of fig. 4, the nested image file "image file 2.Dmg" is directly mounted into the file system tree of its hosted file after parsing.
And 7: and storing the file system tree and the mirror image file analysis configuration. After the image file is deeply analyzed, the file system tree in the memory, the configuration information of the image file analyzed in the step 1, the mapping information of the exported temporary file and the image file analyzed this time are recorded and saved in a file for persistent saving. So that the analysis result of the image file can be directly loaded next time without analyzing the image file again.
And step 8: and loading a file system tree and a mirror image file analysis configuration. And reading the file system data stored in the disk. And the memory blocks are allocated for storing the file nodes. And setting the pointing direction of the parent-child pointer in the file node to construct a file system tree in the memory.
And step 9: and opening the parsed image file. And for the file system directly loaded from the disk, opening the image files opened in the steps 3 and 6 by using the corresponding analysis plug-in, and setting handles of the opened image files into corresponding file nodes. As shown in fig. 4, an example of a deep resolution result diagram of the image file is that a DMG resolution plug-in is used to open "image file 2.dmg", and an open resolution handle is set to a child node directly resolved from "image file 2.dmg".
Step 10: and positioning the target file according to the search character string. And starting a plurality of search threads according to the number of the computer CPU cores. And the search thread searches file names one by taking the file node memory blocks as units until all the memory blocks are searched. The search thread can use a violent matching algorithm and a quick matching algorithm to perform file name matching search. The fast matching algorithm may comprise a BNDM algorithm, a KMP algorithm, a BM algorithm or a modification of any of the above. The searched file node can reach the path of the file according to the parent pointer in the node.
Step 11: and reading and exporting the target file. And reading and exporting the target file according to the mirror image file analysis handle stored in the target file node and the position of the file in the mirror image file. And for the case that the target file is exported to the file of the temporary directory, positioning the exported temporary file according to the ID of the file corresponding to the exported file, and exporting and reading the target file by reading the data of the temporary file.
With continuing reference to fig. 5, fig. 5 shows a framework diagram of a deep parsing system for image files according to an embodiment of the present application, as shown in fig. 5, the system includes a file node obtaining unit 501, a file system tree building unit 502, a nested image parsing unit 503, and a target file exporting unit 504. The file node obtaining unit 501 is configured to configure an analysis plug-in according to the type of the image file, and analyze a file node of a file system directory structure of the image file by using the analysis plug-in; the file system tree construction unit 502 is configured to construct a file system tree based on the parsed file nodes; the nested mirror image analysis unit 503 is configured to respond to a file node to be analyzed existing in the file system tree, where the file node to be analyzed is a nested mirror image file, and the analysis plug-in reads data for analysis according to an offset of the nested mirror image file in its host file; the target file exporting unit 504 is configured to store a file system tree and a mirror file parsing configuration, locate a target file according to the search string, and read and export the target file.
In a specific embodiment, fig. 6 shows a frame diagram of a multi-data source file system for deep image file parsing according to a specific embodiment of the present application, and as shown in fig. 6, the file system mainly includes an image file parsing and scheduling module, a file system constructing and reading module, a storage module, and a plurality of image file parsing modules (in the diagram, an ISO parsing module, a DMG parsing module, a PDF parsing module, and an OFFICE parsing module are included). The image file analysis scheduling module is used for scheduling the analysis plug-in for deep analysis of the image file. The file system constructing and reading module constructs a file system tree in the memory according to the analysis result of the plug-in or the analysis result stored in the storage module, and provides the functions of quickly searching the target file and exporting the target file in a multithread way. The storage module is used for storing the analysis result of the mirror image file and the temporary file which needs to be extracted and analyzed in the analysis process. The image analysis module is composed of analysis plug-ins of various image files, the analysis plug-ins realize standard interfaces and functions of analyzing the image files, reading and exporting file data and the like, and the image analysis module can expand the analysis plug-ins to support analysis of more image files. The file system is based on analysis support of various mirror image file types, the files such as mirror image files, compressed packages, compound documents and the like contained in the mirror image files are nested and analyzed, and analysis results of all the mirror image files are mounted in the same file system tree. Through the file system tree, a target file can be searched, read and exported, and the deep analysis effect of data is achieved.
Compared with the prior art, the deep analysis method and the system for the mirror image file can carry out deep analysis on the nested mirror image file, the compression package, the compound document and the like; the newly added data source type can be expanded and supported; the method comprises the following steps of supporting deep analysis on a non-mirror image type compound document; the advantage of inline data can be searched in the same file system tree. The method can be applied to some electronic data evidence products and file searching products, and nested files can be deeply analyzed and searched.
Referring now to FIG. 7, shown is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.
As shown in fig. 7, the computer system includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for system operation are also stored. The CPU701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU) 701, performs the above-described functions defined in the method of the present application. Note that the computer-readable storage medium of the present application can be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a deployment unit, an instruction processing unit, and a file access unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: the method comprises the steps of configuring an analysis plug-in according to the type of a mirror image file, and analyzing a file node of a file system directory structure of the mirror image file by using the analysis plug-in; constructing a file system tree based on the parsed file nodes; responding to the file nodes to be analyzed existing in the file system tree, wherein the file nodes to be analyzed are nested mirror image files, and the analyzing plug-in reads data for analyzing according to the offset of the nested mirror image files in the host files; and storing the file system tree and the mirror image file analysis configuration, positioning the target file according to the search character string, and reading and exporting the target file.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A deep parsing method for an image file is characterized by comprising the following steps:
s1: configuring an analysis plug-in according to the type of the mirror image file, and analyzing a file node of a file system directory structure of the mirror image file by using the analysis plug-in;
s2: constructing a file system tree based on the parsed file nodes;
s3: responding to the file system tree with the file nodes to be analyzed, wherein the file nodes to be analyzed are nested mirror image files, and the analyzing plug-in reads data for analyzing according to the offset of the nested mirror image files in the host files;
s4: and storing the file system tree and the mirror image file analysis configuration, positioning a target file according to the search character string, and reading and exporting the target file.
2. The deep parsing method for image file of claim 1, wherein the content of the file node comprises a file name, a parent node pointer, a file time attribute, a file type, a child node pointer list, a parsing handle of the image where the file is located, a file size, a location of the file in the image, or an ID of the file corresponding to the exported file.
3. The method according to claim 1, wherein if the location of the file cannot be located in the image, extracting the file in the image to a temporary directory, creating a mapping table between a file node and a temporary file, and recording an ID in the mapping table to an ID field of an export file in the file node.
4. The method according to claim 1, wherein the file nodes are configured to store the image file by dynamically allocating memory blocks, one memory block stores a plurality of file nodes, and the next memory block is reallocated when the memory block is full.
5. The method according to claim 2, wherein S2 is to construct a file system tree according to the parent node pointer and the child node pointer list of the parsed file nodes.
6. The method according to claim 1, wherein the nested image file that cannot be directly parsed in S3 is extracted into a temporary directory, the temporary file is parsed by using a parsing plug-in, and the parsed file node is mounted below the file node of the nested image file in the file system tree.
7. The deep parsing method for an image file according to claim 1, wherein the step S4 specifically includes starting a plurality of search threads according to the number of CPU cores, where the search threads search file names one by one with a file node memory block as a unit until all memory blocks are completely searched; and reading and exporting the target file according to the mirror image file analysis handle stored in the target file node and the position of the file in the mirror image file.
8. The method of claim 7, wherein if the target file is exported to a file of a temporary directory, locating the exported temporary file according to an ID of the file corresponding to the exported file, and reading the data of the temporary file to export and read the target file.
9. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any one of claims 1 to 8.
10. A deep parsing system for an image file, the system comprising:
a file node acquisition unit: configuring an analysis plug-in according to the type of the image file, and analyzing a file node of a file system directory structure of the image file by using the analysis plug-in;
a file system tree construction unit: configuring a file system tree based on the parsed file nodes;
a nested mirror image analysis unit: the file system tree analysis method comprises the steps that a configuration is used for responding to the fact that a file node to be analyzed exists in the file system tree, the file node to be analyzed is a nested mirror image file, and an analysis plug-in reads data to be analyzed according to offset of the nested mirror image file in a host file of the analysis plug-in;
a target file export unit: the configuration is used for storing the file system tree and the mirror image file analysis configuration, positioning the target file according to the search character string, and reading and exporting the target file.
CN202211104975.XA 2022-09-09 2022-09-09 Deep analysis method and system for mirror image file Pending CN115422122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211104975.XA CN115422122A (en) 2022-09-09 2022-09-09 Deep analysis method and system for mirror image file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211104975.XA CN115422122A (en) 2022-09-09 2022-09-09 Deep analysis method and system for mirror image file

Publications (1)

Publication Number Publication Date
CN115422122A true CN115422122A (en) 2022-12-02

Family

ID=84202909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211104975.XA Pending CN115422122A (en) 2022-09-09 2022-09-09 Deep analysis method and system for mirror image file

Country Status (1)

Country Link
CN (1) CN115422122A (en)

Similar Documents

Publication Publication Date Title
US11544623B2 (en) Consistent filtering of machine learning data
US10366053B1 (en) Consistent randomized record-level splitting of machine learning data
US8788471B2 (en) Matching transactions in multi-level records
CN110647579A (en) Data synchronization method and device, computer equipment and readable medium
US20150379072A1 (en) Input processing for machine learning
JP2019053729A (en) Test method and test apparatus of smart contract
CN110764748B (en) Code calling method, device, terminal and storage medium
CN110737460A (en) platform project management method and device
JP2012113706A (en) Computer-implemented method, computer program, and data processing system for optimizing database query
CN109063091B (en) Data migration method and device for hybrid coding and storage medium
US9990213B2 (en) Systems and methods for data brick creation and use
CN112817657A (en) Application program starting item loading method, device and system and storage medium
US9064042B2 (en) Instrumenting computer program code by merging template and target code methods
CN111831750A (en) Block chain data analysis method and device, computer equipment and storage medium
CN113821486B (en) Method and device for determining dependency relationship between pod libraries and electronic equipment
CN115422122A (en) Deep analysis method and system for mirror image file
CN113127496B (en) Method and device for determining change data in database, medium and equipment
CN114936269A (en) Document searching platform, searching method, device, electronic equipment and storage medium
CN111209075B (en) Multilingual interface maintenance method, system, storage medium and intelligent device
CN114090124A (en) Data processing method, data processing device, computer equipment and storage medium
CN114090514A (en) Log retrieval method and device for distributed system
CN113297181A (en) Configuration item management database, data processing method and device
CN112148746A (en) Method and device for generating database table structure document, electronic device and storage medium
JP2022526886A (en) Expandable data skip methods, systems, programs
US11941465B2 (en) File discovery on a data storage device based on a filesystem location of microservices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination