CN116841972A - Container mirror image redundancy removing method through mirror image layer reconstruction - Google Patents

Container mirror image redundancy removing method through mirror image layer reconstruction Download PDF

Info

Publication number
CN116841972A
CN116841972A CN202310702283.3A CN202310702283A CN116841972A CN 116841972 A CN116841972 A CN 116841972A CN 202310702283 A CN202310702283 A CN 202310702283A CN 116841972 A CN116841972 A CN 116841972A
Authority
CN
China
Prior art keywords
mirror image
layer
file
files
redundancy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310702283.3A
Other languages
Chinese (zh)
Inventor
王晓飞
沈仕浩
冯一诚
张程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202310702283.3A priority Critical patent/CN116841972A/en
Publication of CN116841972A publication Critical patent/CN116841972A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a container mirror image redundancy elimination method through mirror image layer reconstruction, which is characterized in that S1, collecting paths, names, sizes and hash values of all files in a mirror image layer in a parallel traversal mode to obtain corresponding mirror image file metadata; s2, establishing a mirror image merging view according to the mirror image file metadata; s3, comparing the merged views of different images, and confirming redundancy of each file; s4, dividing a mirror image unique layer and a mirror image sharing layer according to the redundancy of the file; s5, judging whether each newly built layer has the necessity of creation or not through a layer creation threshold value, and judging whether the integral mirror image reconstruction has the necessity of execution or not through weighing the threshold value; by reconstructing the files contained in the mirror image layer, the invention ensures that as many layers as possible are completely consistent, thereby reducing mirror image redundancy on the premise of not damaging the original layer structure and keeping high compatibility.

Description

Container mirror image redundancy removing method through mirror image layer reconstruction
Technical field:
the invention belongs to the technical field of containerized service, and particularly relates to a method for removing redundancy of a container mirror image through reconstruction of a mirror image layer.
The background technology is as follows:
as cloud computing technology has grown and applied more and more widely, containerization technology has also become more and more important. In containerization technology, container mirroring is a very important concept that provides a lightweight, portable, reusable way to build and deploy applications. However, this can result in higher storage and transport costs and also more time and resources being consumed in the deployment process, as the size of the container image is typically larger. Therefore, how to optimize the deployment cost of the container image becomes an important issue. The background technology mainly related to the invention comprises the following aspects:
container (Container): the container is a lightweight virtualization technology that allows applications to run in an isolated environment without being affected by the host operating system. The container is a virtualization technology at the operating system level, unlike a conventional virtual machine, it does not require an additional virtual machine manager nor virtualizing hardware, so the performance of the container is very high. Containers play a very important role in application development and deployment, and they provide a lightweight, portable, reusable way to build and deploy applications. Using the container, a developer can package an application and all dependencies into one image that can run on any platform that supports containerization techniques without additional configuration and installation. The isolation of the containers allows the applications to run in separate environments, meaning that different applications can run on the same host without interfering with each other. In addition, the container may also be connected to other containers and external services through a network in order to implement complex application architecture. In addition to developing and deploying applications, containers may also be used in testing, continuous integration, and delivery, among other fields. The container may enable testing and deployment of applications to be more automated and repeatable, while also improving efficiency and productivity of developers.
Container Image (Container Image): a container image is a read-only file that contains all the files and dependencies used to build and run the container. The container image typically contains the contents of an operating system, applications, libraries, configuration files, and other dependent items. They may be provided by a developer, system administrator, or third party, and are typically available through a mirror warehouse or other distribution channel. The container image is the basis of the container, and contains a set of layers, each of which is a read-only file system. The container is run by combining the layers together to form a complete file system and running the application on this basis. Because each layer of the image is read-only, the container can isolate different applications and operating environments to ensure security and reliability.
Mirror Layer (Image Layer): the container image is made up of multiple image layers, each of which is a read-only file system. The mirror layer is a basic component of the container mirror that contains applications, libraries, configuration files, dependencies, etc. The mirror layers are combined by a joint mount technique that mounts multiple mirror layers into a file system (container) to form a complete container. The mirror layers have the advantage that they can improve the repeatability and portability of the container mirror. Each mirror layer is a separate component that can be shared and reused between different mirrors. This may reduce the size of the mirror, thereby improving the efficiency of transmission and deployment. In addition, the non-repairable and isolated nature of the mirror layer can ensure the safety and reliability of the container mirror image.
The resource isolation capability of container technology allows multiple containers to run on the same server without interfering with each other. However, this isolation also prevents reuse of files between images, resulting in file redundancy. Currently, the division of the mirror layer is based on Dockerfile (configuration file), that is, each line of instructions in Dockerfile constructs a mirror layer, and can be shared only if the two layers are identical. However, the randomness of the Dockerfile instruction prevents data sharing, resulting in high redundancy between the different layers. In particular, it results in many similar but non-identical layers, i.e. some files are identical in both layers, but individual files are also not identical. However, the current reuse mechanism of the docker can only share two completely consistent layers, and even if only one file in the two layers is inconsistent, the whole layer cannot be shared. Therefore, it is necessary to reconstruct the files contained in the mirror layer by the present solution so that as many layers as possible are completely identical, thereby reducing mirror redundancy without damaging the native layer structure and maintaining high compatibility.
The invention comprises the following steps:
aiming at the technical problems in the prior art, the invention provides a method for reducing the redundancy of files of layers by reconstructing the files contained in different layers in the mirror image, and simultaneously keeping the compatibility with the original mirror image layer structure; the invention enables as many layers as possible to be completely consistent by reconstructing the files contained in the mirror image layer, thereby reducing mirror image redundancy on the premise of not damaging the original layer structure and keeping high compatibility.
The core content of the invention can be summarized as follows:
a method for removing redundancy of a container image reconstructed through an image layer comprises the following steps:
s1, collecting paths, names, sizes and hash values of all files in a mirror image layer in a parallel traversal mode to obtain corresponding mirror image file metadata;
s2, establishing a mirror image merging view according to the mirror image file metadata;
s3, comparing the merged views of different images, so that the redundancy of the file is determined;
s4, dividing a mirror image unique layer and a mirror image sharing layer according to the redundancy of the mirror image file; wherein:
dividing files with the same file name and hash value which appear in different images into a sharing layer;
dividing files that exist in only one image into unique layers;
s5, judging whether to execute mirror image reconstruction or not through weighing the threshold value; wherein:
calculating whether the size of the shared layer file exceeds a layer threshold; if yes, the sharing layer is established;
otherwise, the shared layer is canceled from being created, and the layer file is divided into a unique layer;
and (4) calculating whether the size of all newly added shared layers of the mirror image reconstruction exceeds a weighing threshold, if so, executing the mirror image reconstruction, otherwise, returning to the step (S4).
Further, the process of establishing the mirror image merging view according to the mirror image file metadata comprises the following steps:
merging files with different paths and file names in the lower-layer mirror image into the upper-layer mirror image;
and covering the files with the same paths and file names in the upper layer mirror image with the lower layer mirror image.
And deleting hidden files in the lower-layer mirror image.
Advantageous effects
Compared with the prior art, the invention has the advantages that:
the mirror image reconstruction in the invention can improve the mirror image use efficiency by adjusting the mirror image layer number and the files contained in each layer, so that more completely consistent layers appear, thereby utilizing the layer sharing among different mirror images and reducing redundant files among the mirror images. Therefore, image reconstruction not only reduces image storage space in the image repository, but also benefits the container deployment process by avoiding redundant file transfers and fetches, while preserving native layer structure design and compatibility.
Drawings
FIG. 1 is a flow chart of a container image reconstruction in accordance with the present invention;
fig. 2 is an exemplary diagram of the present invention relating to a container mirror image reconstruction.
Detailed Description
The invention will be described below with reference to fig. 1 to 2:
the goal of image reconstruction is to promote the image layer reuse function and achieve redundancy removal capabilities near the file level. The overall flow of the mirror reconstruction is shown in fig. 1 and comprises three main steps:
and 1, generating file metadata for each mirror image according to the initial structure of the mirror image, and then creating a mirror image merging view through a merging mirror image layer.
And 2, identifying redundant files between the images by using the merged view.
And 3, dividing the file into a unique layer and a shared layer according to the redundant information.
To optimally speed up the flow, steps 1 and 2 are inferred from the file metadata only, while the actual file operation is performed only after step 3 threshold condition is met. The following will describe in detail the example of fig. 2.
And step 1, generating file metadata and a combined view. In this step, all files in the images will be traversed in parallel and the path, name, size and hash value (SHA 256) will be collected, generating file metadata for each image. Then, layer-by-layer merging based on the metadata, creating a mirrored merged view according to the following method:
(i) If the lower layer files and folders have different paths and file names than the upper layer, they are merged into the upper layer, as in FIG. 2, mirror 1 is based on file x when layers 1 and 2 are merged.
(ii) If the lower layer files and folders have the same path and file name as the upper layer, the upper layer files and folders will cover the lower layer, as shown by file f in fig. 2, regardless of whether the contents thereof are repeated.
(iii) The delete "whisteous" mechanism marks hidden files and folders, such as. wh.. Wh.. Opq (hiding all subfiles) and.wh.z (hiding file z) in FIG. 2.
And 2, determining the sharability of each file according to the merged view. First, a global key-value table is generated by traversing all mirrored merged views in parallel:
the key is the hash value of the file and the value is the list of images that contain the file. Files with a number of images greater than 1 in the image list are potentially shareable files and are partitioned into the sharing layer, such as file f and file i in FIG. 2. Furthermore, files that appear in only one image remain in the unique layer, such as file x and file q in FIG. 2. Finally, a new layer structure for each mirror can be inferred from the file metadata.
And 3, reconstructing a new layer structure. To optimize efficiency, two thresholds are set (the user can customize the threshold size as needed):
(i) Layer creation threshold:
to avoid creating excessive layering, the shared layer is created only when the size of the shared layer exceeds a threshold; otherwise, the creation of the shared layer is canceled and the file will remain in the unique layer.
(ii) Trade-off threshold:
considering the cost of the mirror image reconstruction, the reusable file size added by the reconstruction needs to be calculated. If the size exceeds the threshold, performing a reconstruction process; otherwise, the current reconstruction will be abandoned. If the threshold requirement is met, the reconstruction is performed, the files will be re-divided according to the layer structure planned in step 2, as shown in fig. 2, file f and file i will be moved to the specified shared path/a/z/, while their original locations are replaced with soft links pointing to files in/a/z/.

Claims (2)

1. A method for redundancy elimination of a container image reconstructed by an image layer, comprising the steps of:
s1, collecting paths, names, sizes and hash values of all files in a mirror image layer in a parallel traversal mode to obtain corresponding mirror image file metadata;
s2, establishing a mirror image merging view according to the mirror image file metadata;
s3, comparing the merged views of different images, so that the redundancy of the file is determined;
s4, dividing a mirror image unique layer and a mirror image sharing layer according to the redundancy of the mirror image file; wherein:
dividing files with the same file name and hash value which appear in different images into a sharing layer;
dividing files that exist in only one image into unique layers;
s5, judging whether to execute mirror image reconstruction or not through weighing the threshold value; wherein:
calculating whether the size of the shared layer file exceeds a layer threshold; if yes, the sharing layer is established; otherwise, the shared layer is canceled from being created, and the layer file is divided into a unique layer;
and (4) calculating whether the size of all newly added shared layers of the mirror image reconstruction exceeds a weighing threshold, if so, executing the mirror image reconstruction, otherwise, returning to the step (S4).
2. A method for redundancy elimination of container mirroring by mirror layer reconstruction according to claim 1, wherein said process of creating a mirrored merge view from mirrored file metadata:
merging files with different paths and file names in the lower-layer mirror image into the upper-layer mirror image;
and covering the files with the same paths and file names in the upper layer mirror image with the lower layer mirror image.
And deleting hidden files in the lower-layer mirror image.
CN202310702283.3A 2023-06-14 2023-06-14 Container mirror image redundancy removing method through mirror image layer reconstruction Pending CN116841972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310702283.3A CN116841972A (en) 2023-06-14 2023-06-14 Container mirror image redundancy removing method through mirror image layer reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310702283.3A CN116841972A (en) 2023-06-14 2023-06-14 Container mirror image redundancy removing method through mirror image layer reconstruction

Publications (1)

Publication Number Publication Date
CN116841972A true CN116841972A (en) 2023-10-03

Family

ID=88166202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310702283.3A Pending CN116841972A (en) 2023-06-14 2023-06-14 Container mirror image redundancy removing method through mirror image layer reconstruction

Country Status (1)

Country Link
CN (1) CN116841972A (en)

Similar Documents

Publication Publication Date Title
US10725976B2 (en) Fast recovery using self-describing replica files in a distributed storage system
US9703890B2 (en) Method and system that determine whether or not two graph-like representations of two systems describe equivalent systems
US10768919B2 (en) Package installation on a host file system using a container
US20140244590A1 (en) Hybrid data backup in a networked computing environment
US20110078681A1 (en) Method and system for running virtual machine image
US8813076B2 (en) Virtual machine updates
US20120151198A1 (en) System and Method for Instantiation of Distributed Applications from Disk Snapshots
US10585785B2 (en) Preservation of modifications after overlay removal from a container
US10922213B2 (en) Embedded quality indication data for version control systems
US10872007B2 (en) Methods and systems to compound alerts in a distributed computing system
US11150981B2 (en) Fast recovery from failures in a chronologically ordered log-structured key-value storage system
US9058576B2 (en) Multiple project areas in a development environment
Di et al. High-performance migration tool for live container in a workflow
US10061566B2 (en) Methods and systems to identify log write instructions of a source code as sources of event messages
WO2023217165A1 (en) De-duplication of data in executable files in a container image
CN116841972A (en) Container mirror image redundancy removing method through mirror image layer reconstruction
US11163636B2 (en) Chronologically ordered log-structured key-value store from failures during garbage collection
CN114860378A (en) File system migration method, device, system and medium thereof
CN116547657A (en) Merging composite images into a new file system namespace
Saurabh et al. Semantics-aware virtual machine image management in IaaS clouds
US11983147B2 (en) Deduplicating data integrity checks across systems
CN116700902B (en) Container acceleration deployment method and device for asynchronous parallel extraction of mirror image layer
US20240211381A1 (en) Efficient testing of versioned software system behaviour
US20240248630A1 (en) Online format conversion of virtual disk from redo-log snapshot format to single-container snapshot format
CN116719604A (en) Container migration method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination