CN109933342B - Method and device for extracting file content from local docker mirror image - Google Patents

Method and device for extracting file content from local docker mirror image Download PDF

Info

Publication number
CN109933342B
CN109933342B CN201910207652.5A CN201910207652A CN109933342B CN 109933342 B CN109933342 B CN 109933342B CN 201910207652 A CN201910207652 A CN 201910207652A CN 109933342 B CN109933342 B CN 109933342B
Authority
CN
China
Prior art keywords
layer
mirror image
installation package
reading
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910207652.5A
Other languages
Chinese (zh)
Other versions
CN109933342A (en
Inventor
杜雄
程度
张福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shengxin Network Technology Co ltd
Original Assignee
Beijing Shengxin Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shengxin Network Technology Co ltd filed Critical Beijing Shengxin Network Technology Co ltd
Priority to CN201910207652.5A priority Critical patent/CN109933342B/en
Publication of CN109933342A publication Critical patent/CN109933342A/en
Application granted granted Critical
Publication of CN109933342B publication Critical patent/CN109933342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for extracting file contents from a local docker mirror image, which comprises the following steps: enumerating a current host mirror list through a Docker API (Application Program Interface); traversing the mirror image list, acquiring a mirror image id and a layer storage path, forming a layer chain table, scanning the layer chain table layer by layer, and acquiring the installation package information of the layer; the method for forming the layer chain table comprises the following steps: selecting a mirror image from the mirror image list, and acquiring detailed information of the mirror image through a Docker API; judging whether the storage mode of the mirror image is effective drive or not; if yes, entering a first reading path; if not, entering a second reading path. Another aspect of the present invention provides an apparatus for extracting file content from a local docker image, including: the device comprises a traversal judgment module, a first reading path module and a second reading path module.

Description

Method and device for extracting file content from local docker mirror image
Technical Field
The invention relates to the field of computer information processing, in particular to a method and a device for extracting file contents from a local docker mirror image.
Background
At present, Docker is the most popular open-source application container engine at present, so that developers can package their applications and dependency packages into a portable container and then release the portable container to a machine, thereby achieving the purpose of one-time creation and arbitrary operation. The using core of the Docker is the mirror image, and the mirror image warehouse as the back end of the mirror image storage plays a very important role in the development process of the Docker. Under a container operation environment, all service programs are packaged into a container mirror image, the mirror image is distributed along with a mirror image package, and due to the fact that irregular operation exists in the mirror image packaging and distributing process, illegal installation package downloading sources exist, man-in-the-middle attacks and the like, the mirror image is led into a Trojan horse or a bug, so that a mechanism is needed to be provided for quickly extracting file contents in the mirror image and discovering various CVEs (Common Vulnerabilities and Exposuers, bugs) existing in the mirror image package.
The existing open source version mirror image risk scanning tool, i.e. the tool, claim/clairctl, provides a general scanning method:
1) and analyzing the mirror image package information according to the Identity (ID) of the mirror image to obtain a layer id list.
2) The id information of the API sending layer is transmitted through the mirror layer, and the client downloads the specified mirror layer from the docker engine (mirror source station) through the local docker API.
3) And decompressing all files of the layer in the memory, finding an installation package information database file, scanning the installation package information of the layer and returning a result.
However, the scanning method does not optimize data access modes such as layer access sequence and decompression method aiming at local mirroring characteristics, which causes a Central Processing Unit (CPU) to have high memory performance consumption and low scanning speed.
Therefore, there is a need in the art for a method for extracting file contents from a local docker image better, so as to improve access speed.
The invention is provided in view of the above.
Disclosure of Invention
The invention aims to provide a method and a device capable of better extracting file contents from a local Docker image, and aims to solve the technical problems of high CPU and memory performance consumption and low scanning speed in the existing process of acquiring related file information in the local Docker image. The technical scheme is as follows:
the invention provides a method for extracting file contents from a local docker mirror image, which comprises the following steps:
enumerating a current host mirror list through a Docker API (Application Program Interface);
and traversing the mirror image list, acquiring the mirror image id and the storage path of the layer, forming a layer chain table, scanning the layer chain table layer by layer, and acquiring the installation package information of the layer.
Preferably, the method for forming the layer chain table includes the following steps:
selecting a mirror image from the mirror image list, and acquiring detailed information of the mirror image through a Docker API;
judging whether the storage mode of the mirror image is effective drive or not;
if yes, entering a first reading path, wherein the first reading path comprises the following steps:
acquiring a mirror image id and a storage path of a layer on a host through effective driving to form a layer chain table;
if not, entering a second reading path, wherein the second reading path comprises the following steps:
exporting and storing the mirror image compression packet to a local cache directory through a Docker API;
decompressing the mirror image compression packet to a temporary directory, reading a manifest (metadata) file, obtaining a mirror image id, and storing paths of layers in the temporary directory to form a layer chain table.
Preferably, under the effective drive, the storage mode of the image is an AUFS (advanced Union File System) or an Overlay fs (Overlay File System), and the storage mode can be read by one of the AUFS, the Overlay2 and the Overlay.
Preferably, the top end of the layer chain table is a mirror layer id, and the layer chain table is arranged from top to bottom in a chain table mode: the storage path information of the first reading path lower layer on the host or the storage path information of the second reading path lower layer on the temporary directory.
Preferably, the method for scanning the layer chain table layer by layer comprises the following steps:
judging whether a next layer exists according to the layer chain table;
if yes, searching the installation package information of the layer in a cache database;
if not, reading the next mirror image according to the mirror image list or finishing the reading of the file content after traversing.
Preferably, the step of searching the installation package information of the layer from the cache database under the first reading path includes the following steps:
judging whether fault installation package information exists in a cache database;
if yes, obtaining installation package information of the layer from the cache database, and returning the information;
the jump-out loop returns to judge whether the traversal is completed;
if not, searching the installation package information of the layer by using the mirror image installation package database file under the directory of each layer.
Preferably, in the first read path, the searching for the installation package information of the layer by using the mirror image installation package database file includes the following steps:
checking whether a mirror image installation package database file exists or not;
if so, scanning the database file line by line to acquire installation package information of the layer;
storing the installation package information of the layer in a cache database, and returning the information;
the jump-out loop returns to judge whether the traversal is completed;
if not, returning to judge whether the next layer exists.
Preferably, the method for searching the installation package information of the layer from the cache database in the second reading path includes the following steps:
judging whether fault installation package information exists in a cache database according to the layer id;
if yes, obtaining installation package information of the layer from the cache database, and returning the information;
the jump-out loop returns to judge whether the traversal is completed;
if not, reading the compressed package of the layer, and searching the installation package information of the layer by using the installation package database file.
Preferably, the searching of the installation package information of the layer by using the installation package database file in the second reading path includes the following steps:
checking whether the installation package database file exists or not;
if yes, decompressing the installation package database file;
scanning a database file line by line to acquire installation package information of a layer;
storing the installation package information of the layer in a cache database, and returning the information;
the jump-out loop returns to judge whether the traversal is completed;
if not, returning to judge whether the next layer exists.
Another aspect of the present invention provides an apparatus for extracting file content from a local docker image, including:
the device comprises a traversing judgment module, a first reading path module and a second reading path module;
the traversal judging module is used for enumerating a current host mirror image list, judging whether the current host mirror image list is an effective drive storage mode or not, and entering different reading paths;
the first read path module is used for a read method when the mirror image storage mode can be effectively driven, and is generated according to the method of the first read path in the technical scheme;
the second read path module is used for a read method when the mirror image storage mode cannot be effectively driven, and is generated according to the method of the second read path in the above technical scheme.
The mirroring facilitates to sum up, the invention has the following beneficial effects:
1. by enumerating a current host mirror image list, adopting different layers of installation package information reading modes according to the storage mode of whether the current host mirror image list is an effective drive or not, optimizing a data access mode aiming at the local mirror image characteristics and improving the reading speed;
2. by establishing a layer linked list, reading layer by layer and establishing a layer access sequence, the occupation of a CPU and an internal memory is reduced;
3. the occupation of a CPU and an internal memory is reduced and the reading speed is improved by two decompression methods with different reading paths;
4. constructing a local url to directly access file contents by using a docker storage drive type on a host;
5. the method for acquiring the file in the docker image after rollback is used for solving the problem that the file cannot be directly acquired;
6. in a backspacing mode, required file contents are extracted as required, and the space of a disk and a memory is saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for extracting file contents from a local docker image according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an apparatus for extracting file content from a local docker image according to a second embodiment of the present invention;
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
The present invention will be described in detail below by way of examples.
Example one
Referring to fig. 1, an embodiment of the present invention provides a method for extracting file content from a local docker image, including the following steps:
s101, enumerating a current host mirror image list through a Docker API (Application Program Interface);
in the process of executing S101, the current host image may refer to an image stored in a local or cloud host, may be an image saved from another repository save, or may be an image pulled from a remote repository pull, and accesses Docker information on the local or cloud host through a Docker API to enumerate a current image list, where the image list may be one image or multiple images.
S102, judging whether the mirror image list is traversed or not,
if yes, ending the process of extracting the file content;
s103, if not, selecting a mirror image from the mirror image list, and acquiring detailed information of the mirror image through a Docker API;
s104, judging whether the storage mode of the mirror image is effective drive or not;
in a preferred embodiment of the present invention, under an effective drive, the storage mode of the mirror image is an AUFS (advanced Union File System) or an Overlay fs (Overlay File System), and the mirror image can be read by one of the AUFS, the Overlay2, and the Overlay storage drive.
In the process of executing S102 and S103, after reading the file content by one mirror image, reading the next mirror image until the mirror image list is traversed, and in the process of reading each mirror image, firstly, determining whether the storage mode of the mirror image can be an effective drive.
If yes, entering a first reading path, wherein the first reading path comprises the following steps:
s1001, acquiring a mirror image id and a storage path of a layer on a host through effective driving, and forming a layer chain table;
in the process of executing step S1001, a first reading path is entered, and id of the mirror image and storage paths of each mirror image layer are quickly read by using the effective driving energy in docker, so that the layer chain table is formed from top to bottom.
If not, entering a second reading path, wherein the second reading path comprises the following steps:
s2001, exporting and storing the mirror image compression packet to a local cache directory through a Docker API;
s2002, decompressing the mirror image compression packet to a temporary directory, reading a manifest (metadata) file, obtaining a mirror image id and a storage path of the layer in the temporary directory, and forming a layer chain table.
In the process of executing S2001 and S2002 in the step, a second reading path is entered, the mirror image is compressed locally, then a manifest file is found, the mirror image id and the storage path of the layer are obtained through the manifest file, and the layer chain table is formed from top to bottom.
In a preferred embodiment of the present invention, the top of the layer chain table is a mirror layer id, and is arranged from top to bottom in a chain table manner: the storage path information of the first reading path lower layer on the host or the storage path information of the second reading path lower layer on the temporary directory.
In a preferred embodiment of the present invention, the method for scanning the layer link table layer by layer under the first reading path includes the following steps:
s1002, judging whether a next layer exists according to the layer chain table;
if yes, searching the installation package information of the layer in a cache database;
if not, the process returns to the step S102.
In the process of executing the step S1002, reading the next layer, checking the cache database generated by the docker, and searching the installation package information of the layer in the cache database; if not, the next layer does not exist, the lowest layer of the layer chain table is read, the whole mirror image is read completely, the information reading fails, the judgment is returned to judge whether the mirror image is not read, and the next mirror image reading is started.
In a preferred embodiment of the present invention, the method for scanning the layer-by-layer scan layer linked list under the second read path includes the following steps:
s2003, judging whether a next layer exists according to the layer chain table;
if yes, searching the installation package information of the layer in a cache database;
if not, returning to execute S102 to judge whether the traversal is completed.
In the process of step execution S2003, the same as the S1002 process.
In a preferred embodiment of the present invention, a method for searching installation package information of a layer from a cache database in a first read path includes the following steps:
s1003, judging whether fault installation package information exists in a cache database;
s1004, if yes, obtaining installation package information of the layer from the cache database, and returning the information;
the jump-out loop returns to S102 to judge whether the traversal is completed;
if not, searching the installation package information of the layer by using the mirror image installation package database file under the directory of each layer.
In the process of executing step S1003, in the cache database, if the installation package information of the layer exists, obtaining and returning the information, and performing reading of the next mirror image; and if the mirror image installation package does not exist, searching the installation package information of the layer by using the mirror image installation package database file.
In a preferred embodiment of the present invention, in the first read path, the step of searching the installation package information of the layer by using the mirror image installation package database file includes the following steps:
s1005, checking whether the mirror image installation package database file exists;
s1006, if yes, scanning the database file line by line, acquiring installation package information of the layer, and returning the information;
the jump-out loop returns to S102 to judge whether the traversal is completed;
if not, returning to S1002 to judge whether a next layer exists;
in the process of step S1005, if the mirror image installation package database file exists, scanning the database file, acquiring the installation package information of the layer, returning the information, and reading the next mirror image; if not, the next level of the level linked list is read.
In a preferred embodiment of the present invention, the method for searching the installation package information of the layer from the cache database in the second read path includes the following steps:
s2004, judging whether fault installation package information exists in a cache database according to the layer id;
s2005, if yes, obtaining installation package information of the layer from the buffer database, and returning the information;
the jump-out loop returns to S102 to judge whether the traversal is completed;
if not, reading the compressed package of the layer, and searching the installation package information of the layer by using the installation package database file.
In the process of executing step S2004, if there is installation package information of a layer in the cache database, returning the information, and performing reading of the next mirror image; and if the layer does not exist, reading the compressed package of the layer, and searching the installation package information of the layer by using the installation package database file.
In a preferred embodiment of the present invention, the searching the installation package information of the layer by using the installation package database file in the second reading path includes the following steps:
s2006, checking whether an installation package database file exists;
s2007, if yes, decompressing the installation package database file;
scanning a database file line by line to acquire installation package information of a layer;
storing the installation package information of the layer in a cache database, and returning the information;
the jump-out loop returns to S102 to judge whether the traversal is completed;
if not, return is made to S2003 to determine whether or not there is a next layer.
In the process of executing step S2006, if the installation package database file exists, decompressing the installation package database file, acquiring the installation package information of the layer, and reading the next mirror image; and if not, entering the lower side of the layer linked list for reading.
Example two
Referring to fig. 2, a second embodiment of the present invention provides an apparatus for extracting file content from a local docker image, where the apparatus includes:
a traversal judging module 101, a first read path module 102 and a second read path module 103;
the traversal judging module 101 is configured to enumerate a current host mirror image list, judge whether the current host mirror image list is an active drive storage manner, and enter different read paths;
the first read path module 102 is used for a read method when the mirror storage mode can be effectively driven, and is generated according to the method of the first read path in the above technical solution;
the second read path module 103 is used for a reading method when the mirror storage mode cannot be effectively driven, and is generated according to the method of the second read path in the above technical solution.
It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the protection scope of the claims of the present invention.

Claims (9)

1. A method for extracting file contents from a local docker image is characterized by comprising the following steps:
enumerating a current host mirror image list through a Docker API; traversing the mirror image list, acquiring a mirror image id and a layer storage path, forming a layer chain table, scanning the layer chain table layer by layer, and acquiring the installation package information of the layer;
the method for forming the layer chain table comprises the following steps:
selecting a mirror image from the mirror image list, and acquiring detailed information of the mirror image through a Docker API; judging whether the storage mode of the mirror image is effective drive or not;
if yes, entering a first reading path, wherein the first reading path comprises the following steps: acquiring a mirror image id and a storage path of a layer on a host through effective driving to form a layer chain table;
if not, entering a second reading path, wherein the second reading path comprises the following steps: exporting and storing the mirror image compression packet to a local cache directory through a Docker API; and decompressing the mirror image compression packet to a temporary directory, reading a manifest file, acquiring a mirror image id, storing paths of layers in the temporary directory, and forming a layer chain table.
2. The method for extracting file contents from a local docker image according to claim 1, wherein under an active drive, the storage mode of the image is AUFS or OverlayFS, and the image can be read by one of the storage drives of AUFS, overlay2 and overlay.
3. The method for extracting file contents from a local docker image according to claim 1, wherein the top of the layer linked list is an image layer id, and is arranged in a linked list manner from top to bottom: the storage path information of the first reading path lower layer on the host or the storage path information of the second reading path lower layer on the temporary directory.
4. The method of claim 3, wherein the step of scanning the layer-by-layer list comprises the steps of:
judging whether a next layer exists according to the layer chain table;
if yes, searching the installation package information of the layer in a cache database;
if not, reading the next mirror image according to the mirror image list or finishing the reading of the file content after traversing.
5. The method for extracting file contents from a local docker image according to claim 4, wherein the method for searching the installation package information of the layer from the cache database in the first read path comprises the following steps:
judging whether fault installation package information exists in a cache database;
if yes, obtaining installation package information of the layer from the cache database, and returning the information; the jump-out loop returns to judge whether the traversal is completed;
if not, searching the installation package information of the layer by using the mirror image installation package database file under the directory of each layer.
6. The method for extracting file contents from a local docker image as claimed in claim 5, wherein in the first read path, the step of searching the installation package information of the layer using the image installation package database file comprises the steps of:
checking whether a mirror image installation package database file exists or not;
if so, scanning the database file line by line to acquire installation package information of the layer; storing the installation package information of the layer in a cache database, and returning the information; the jump-out loop returns to judge whether the traversal is completed;
if not, returning to judge whether the next layer exists.
7. The method for extracting file contents from a local docker image according to claim 4, wherein the method for searching the installation package information of the layer from the cache database in the second reading path comprises the following steps:
judging whether fault installation package information exists in a cache database according to the layer id;
if yes, obtaining installation package information of the layer from the cache database, and returning the information; the jump-out loop returns to judge whether the traversal is completed;
if not, reading the compressed package of the layer, and searching the installation package information of the layer by using the installation package database file.
8. The method for extracting file contents from a local docker image according to claim 7, wherein the step of searching the installation package information of the layer using the installation package database file in the second reading path comprises the steps of:
checking whether the installation package database file exists or not;
if yes, decompressing the installation package database file; scanning a database file line by line to acquire installation package information of a layer;
storing the installation package information of the layer in a cache database, and returning the information; the jump-out loop returns to judge whether the traversal is completed;
if not, returning to judge whether the next layer exists.
9. An apparatus for extracting file contents from a local docker image, comprising:
the device comprises a traversing judgment module, a first reading path module and a second reading path module;
the traversal judging module is used for enumerating a current host mirror image list, judging whether the current host mirror image list is an effective drive storage mode or not, entering different reading paths, and generating the file content from the local docker mirror image according to the method for extracting the file content from the local docker mirror image in any one of claims 1 to 8;
the first read path module is used for a read method when the mirror image storage mode can be effectively driven, and is generated according to the method of the first read path of any one of claims 1-6;
the second read path module is used for a read method when the mirror storage mode cannot be effectively driven, and is generated according to the method of the second read path of any one of claims 1 to 4 and 7 to 8.
CN201910207652.5A 2019-03-18 2019-03-18 Method and device for extracting file content from local docker mirror image Active CN109933342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910207652.5A CN109933342B (en) 2019-03-18 2019-03-18 Method and device for extracting file content from local docker mirror image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910207652.5A CN109933342B (en) 2019-03-18 2019-03-18 Method and device for extracting file content from local docker mirror image

Publications (2)

Publication Number Publication Date
CN109933342A CN109933342A (en) 2019-06-25
CN109933342B true CN109933342B (en) 2020-10-16

Family

ID=66987599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910207652.5A Active CN109933342B (en) 2019-03-18 2019-03-18 Method and device for extracting file content from local docker mirror image

Country Status (1)

Country Link
CN (1) CN109933342B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124442A (en) * 2019-12-20 2020-05-08 珠海金山网络游戏科技有限公司 Docker container installation package comparison method and device and readable medium
CN113885936A (en) * 2021-08-16 2022-01-04 统信软件技术有限公司 Solution method for software package dependence in customized mirror image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106487850B (en) * 2015-08-29 2019-10-25 华为技术有限公司 The methods, devices and systems of mirror image are obtained under a kind of cloud environment
CN105511943B (en) * 2015-12-03 2019-04-12 华为技术有限公司 A kind of Docker container operation method and device
CN107729020B (en) * 2017-10-11 2020-08-28 北京航空航天大学 Method for realizing rapid deployment of large-scale container

Also Published As

Publication number Publication date
CN109933342A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
US8776027B2 (en) Extracting and collecting platform use data
JP5976020B2 (en) System and method for performing anti-malware metadata lookup
US8296758B2 (en) Deployment and versioning of applications
US9111094B2 (en) Malware detection
US9819695B2 (en) Scanning method and device, and client apparatus
EP1782191B1 (en) Method for loading software with an intermediate object oriented language in a portable device
CN102982121B (en) A kind of file scanning method, file scanning device and file detection system
CN109933342B (en) Method and device for extracting file content from local docker mirror image
CN111049889B (en) Static resource uploading method and device, integrated server and system
WO2020010724A1 (en) Front-end static resource management method, apparatus, computer device and storage medium
CN109460345B (en) Real-time data calculation method and system
CN114047949A (en) Application system domestic platform migration adaptation method
KR101228902B1 (en) Cloud Computing-Based System for Supporting Analysis of Malicious Code
Raharjo et al. Reliability Evaluation of Microservices and Monolithic Architectures
CN106529281A (en) Executable file processing method and device
CN112214231A (en) CI-based virtualized software upgrade package generation method and system
US20160162365A1 (en) Storing difference information in a backup system
CN116795486A (en) Analysis method and device for container mirror image file purification, storage medium and terminal
CN116069729A (en) Intelligent document packaging method, system and medium
CN104281486A (en) Processing method and device of VM (virtual machine)
US10878104B2 (en) Automated multi-credential assessment
ElBanna et al. NONYM! ZER: mitigation framework for browser fingerprinting
CN104285221A (en) Efficient in-place preservation of content across content sources
CN112860481A (en) Local Docker mirror image information acquisition system and acquisition method thereof
CN106354602A (en) Service monitoring method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant