CN116841972A - Container mirror image redundancy removing method through mirror image layer reconstruction - Google Patents
Container mirror image redundancy removing method through mirror image layer reconstruction Download PDFInfo
- Publication number
- CN116841972A CN116841972A CN202310702283.3A CN202310702283A CN116841972A CN 116841972 A CN116841972 A CN 116841972A CN 202310702283 A CN202310702283 A CN 202310702283A CN 116841972 A CN116841972 A CN 116841972A
- Authority
- CN
- China
- Prior art keywords
- mirror image
- layer
- file
- files
- redundancy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000005303 weighing Methods 0.000 claims abstract description 5
- 230000008030 elimination Effects 0.000 claims abstract 3
- 238000003379 elimination reaction Methods 0.000 claims abstract 3
- 230000008901 benefit Effects 0.000 description 3
- 238000002955 isolation Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1748—De-duplication implemented within the file system, e.g. based on file segments
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a container mirror image redundancy elimination method through mirror image layer reconstruction, which is characterized in that S1, collecting paths, names, sizes and hash values of all files in a mirror image layer in a parallel traversal mode to obtain corresponding mirror image file metadata; s2, establishing a mirror image merging view according to the mirror image file metadata; s3, comparing the merged views of different images, and confirming redundancy of each file; s4, dividing a mirror image unique layer and a mirror image sharing layer according to the redundancy of the file; s5, judging whether each newly built layer has the necessity of creation or not through a layer creation threshold value, and judging whether the integral mirror image reconstruction has the necessity of execution or not through weighing the threshold value; by reconstructing the files contained in the mirror image layer, the invention ensures that as many layers as possible are completely consistent, thereby reducing mirror image redundancy on the premise of not damaging the original layer structure and keeping high compatibility.
Description
Technical field:
the invention belongs to the technical field of containerized service, and particularly relates to a method for removing redundancy of a container mirror image through reconstruction of a mirror image layer.
The background technology is as follows:
as cloud computing technology has grown and applied more and more widely, containerization technology has also become more and more important. In containerization technology, container mirroring is a very important concept that provides a lightweight, portable, reusable way to build and deploy applications. However, this can result in higher storage and transport costs and also more time and resources being consumed in the deployment process, as the size of the container image is typically larger. Therefore, how to optimize the deployment cost of the container image becomes an important issue. The background technology mainly related to the invention comprises the following aspects:
container (Container): the container is a lightweight virtualization technology that allows applications to run in an isolated environment without being affected by the host operating system. The container is a virtualization technology at the operating system level, unlike a conventional virtual machine, it does not require an additional virtual machine manager nor virtualizing hardware, so the performance of the container is very high. Containers play a very important role in application development and deployment, and they provide a lightweight, portable, reusable way to build and deploy applications. Using the container, a developer can package an application and all dependencies into one image that can run on any platform that supports containerization techniques without additional configuration and installation. The isolation of the containers allows the applications to run in separate environments, meaning that different applications can run on the same host without interfering with each other. In addition, the container may also be connected to other containers and external services through a network in order to implement complex application architecture. In addition to developing and deploying applications, containers may also be used in testing, continuous integration, and delivery, among other fields. The container may enable testing and deployment of applications to be more automated and repeatable, while also improving efficiency and productivity of developers.
Container Image (Container Image): a container image is a read-only file that contains all the files and dependencies used to build and run the container. The container image typically contains the contents of an operating system, applications, libraries, configuration files, and other dependent items. They may be provided by a developer, system administrator, or third party, and are typically available through a mirror warehouse or other distribution channel. The container image is the basis of the container, and contains a set of layers, each of which is a read-only file system. The container is run by combining the layers together to form a complete file system and running the application on this basis. Because each layer of the image is read-only, the container can isolate different applications and operating environments to ensure security and reliability.
Mirror Layer (Image Layer): the container image is made up of multiple image layers, each of which is a read-only file system. The mirror layer is a basic component of the container mirror that contains applications, libraries, configuration files, dependencies, etc. The mirror layers are combined by a joint mount technique that mounts multiple mirror layers into a file system (container) to form a complete container. The mirror layers have the advantage that they can improve the repeatability and portability of the container mirror. Each mirror layer is a separate component that can be shared and reused between different mirrors. This may reduce the size of the mirror, thereby improving the efficiency of transmission and deployment. In addition, the non-repairable and isolated nature of the mirror layer can ensure the safety and reliability of the container mirror image.
The resource isolation capability of container technology allows multiple containers to run on the same server without interfering with each other. However, this isolation also prevents reuse of files between images, resulting in file redundancy. Currently, the division of the mirror layer is based on Dockerfile (configuration file), that is, each line of instructions in Dockerfile constructs a mirror layer, and can be shared only if the two layers are identical. However, the randomness of the Dockerfile instruction prevents data sharing, resulting in high redundancy between the different layers. In particular, it results in many similar but non-identical layers, i.e. some files are identical in both layers, but individual files are also not identical. However, the current reuse mechanism of the docker can only share two completely consistent layers, and even if only one file in the two layers is inconsistent, the whole layer cannot be shared. Therefore, it is necessary to reconstruct the files contained in the mirror layer by the present solution so that as many layers as possible are completely identical, thereby reducing mirror redundancy without damaging the native layer structure and maintaining high compatibility.
The invention comprises the following steps:
aiming at the technical problems in the prior art, the invention provides a method for reducing the redundancy of files of layers by reconstructing the files contained in different layers in the mirror image, and simultaneously keeping the compatibility with the original mirror image layer structure; the invention enables as many layers as possible to be completely consistent by reconstructing the files contained in the mirror image layer, thereby reducing mirror image redundancy on the premise of not damaging the original layer structure and keeping high compatibility.
The core content of the invention can be summarized as follows:
a method for removing redundancy of a container image reconstructed through an image layer comprises the following steps:
s1, collecting paths, names, sizes and hash values of all files in a mirror image layer in a parallel traversal mode to obtain corresponding mirror image file metadata;
s2, establishing a mirror image merging view according to the mirror image file metadata;
s3, comparing the merged views of different images, so that the redundancy of the file is determined;
s4, dividing a mirror image unique layer and a mirror image sharing layer according to the redundancy of the mirror image file; wherein:
dividing files with the same file name and hash value which appear in different images into a sharing layer;
dividing files that exist in only one image into unique layers;
s5, judging whether to execute mirror image reconstruction or not through weighing the threshold value; wherein:
calculating whether the size of the shared layer file exceeds a layer threshold; if yes, the sharing layer is established;
otherwise, the shared layer is canceled from being created, and the layer file is divided into a unique layer;
and (4) calculating whether the size of all newly added shared layers of the mirror image reconstruction exceeds a weighing threshold, if so, executing the mirror image reconstruction, otherwise, returning to the step (S4).
Further, the process of establishing the mirror image merging view according to the mirror image file metadata comprises the following steps:
merging files with different paths and file names in the lower-layer mirror image into the upper-layer mirror image;
and covering the files with the same paths and file names in the upper layer mirror image with the lower layer mirror image.
And deleting hidden files in the lower-layer mirror image.
Advantageous effects
Compared with the prior art, the invention has the advantages that:
the mirror image reconstruction in the invention can improve the mirror image use efficiency by adjusting the mirror image layer number and the files contained in each layer, so that more completely consistent layers appear, thereby utilizing the layer sharing among different mirror images and reducing redundant files among the mirror images. Therefore, image reconstruction not only reduces image storage space in the image repository, but also benefits the container deployment process by avoiding redundant file transfers and fetches, while preserving native layer structure design and compatibility.
Drawings
FIG. 1 is a flow chart of a container image reconstruction in accordance with the present invention;
fig. 2 is an exemplary diagram of the present invention relating to a container mirror image reconstruction.
Detailed Description
The invention will be described below with reference to fig. 1 to 2:
the goal of image reconstruction is to promote the image layer reuse function and achieve redundancy removal capabilities near the file level. The overall flow of the mirror reconstruction is shown in fig. 1 and comprises three main steps:
and 1, generating file metadata for each mirror image according to the initial structure of the mirror image, and then creating a mirror image merging view through a merging mirror image layer.
And 2, identifying redundant files between the images by using the merged view.
And 3, dividing the file into a unique layer and a shared layer according to the redundant information.
To optimally speed up the flow, steps 1 and 2 are inferred from the file metadata only, while the actual file operation is performed only after step 3 threshold condition is met. The following will describe in detail the example of fig. 2.
And step 1, generating file metadata and a combined view. In this step, all files in the images will be traversed in parallel and the path, name, size and hash value (SHA 256) will be collected, generating file metadata for each image. Then, layer-by-layer merging based on the metadata, creating a mirrored merged view according to the following method:
(i) If the lower layer files and folders have different paths and file names than the upper layer, they are merged into the upper layer, as in FIG. 2, mirror 1 is based on file x when layers 1 and 2 are merged.
(ii) If the lower layer files and folders have the same path and file name as the upper layer, the upper layer files and folders will cover the lower layer, as shown by file f in fig. 2, regardless of whether the contents thereof are repeated.
(iii) The delete "whisteous" mechanism marks hidden files and folders, such as. wh.. Wh.. Opq (hiding all subfiles) and.wh.z (hiding file z) in FIG. 2.
And 2, determining the sharability of each file according to the merged view. First, a global key-value table is generated by traversing all mirrored merged views in parallel:
the key is the hash value of the file and the value is the list of images that contain the file. Files with a number of images greater than 1 in the image list are potentially shareable files and are partitioned into the sharing layer, such as file f and file i in FIG. 2. Furthermore, files that appear in only one image remain in the unique layer, such as file x and file q in FIG. 2. Finally, a new layer structure for each mirror can be inferred from the file metadata.
And 3, reconstructing a new layer structure. To optimize efficiency, two thresholds are set (the user can customize the threshold size as needed):
(i) Layer creation threshold:
to avoid creating excessive layering, the shared layer is created only when the size of the shared layer exceeds a threshold; otherwise, the creation of the shared layer is canceled and the file will remain in the unique layer.
(ii) Trade-off threshold:
considering the cost of the mirror image reconstruction, the reusable file size added by the reconstruction needs to be calculated. If the size exceeds the threshold, performing a reconstruction process; otherwise, the current reconstruction will be abandoned. If the threshold requirement is met, the reconstruction is performed, the files will be re-divided according to the layer structure planned in step 2, as shown in fig. 2, file f and file i will be moved to the specified shared path/a/z/, while their original locations are replaced with soft links pointing to files in/a/z/.
Claims (2)
1. A method for redundancy elimination of a container image reconstructed by an image layer, comprising the steps of:
s1, collecting paths, names, sizes and hash values of all files in a mirror image layer in a parallel traversal mode to obtain corresponding mirror image file metadata;
s2, establishing a mirror image merging view according to the mirror image file metadata;
s3, comparing the merged views of different images, so that the redundancy of the file is determined;
s4, dividing a mirror image unique layer and a mirror image sharing layer according to the redundancy of the mirror image file; wherein:
dividing files with the same file name and hash value which appear in different images into a sharing layer;
dividing files that exist in only one image into unique layers;
s5, judging whether to execute mirror image reconstruction or not through weighing the threshold value; wherein:
calculating whether the size of the shared layer file exceeds a layer threshold; if yes, the sharing layer is established; otherwise, the shared layer is canceled from being created, and the layer file is divided into a unique layer;
and (4) calculating whether the size of all newly added shared layers of the mirror image reconstruction exceeds a weighing threshold, if so, executing the mirror image reconstruction, otherwise, returning to the step (S4).
2. A method for redundancy elimination of container mirroring by mirror layer reconstruction according to claim 1, wherein said process of creating a mirrored merge view from mirrored file metadata:
merging files with different paths and file names in the lower-layer mirror image into the upper-layer mirror image;
and covering the files with the same paths and file names in the upper layer mirror image with the lower layer mirror image.
And deleting hidden files in the lower-layer mirror image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310702283.3A CN116841972A (en) | 2023-06-14 | 2023-06-14 | Container mirror image redundancy removing method through mirror image layer reconstruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310702283.3A CN116841972A (en) | 2023-06-14 | 2023-06-14 | Container mirror image redundancy removing method through mirror image layer reconstruction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116841972A true CN116841972A (en) | 2023-10-03 |
Family
ID=88166202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310702283.3A Pending CN116841972A (en) | 2023-06-14 | 2023-06-14 | Container mirror image redundancy removing method through mirror image layer reconstruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116841972A (en) |
-
2023
- 2023-06-14 CN CN202310702283.3A patent/CN116841972A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10725976B2 (en) | Fast recovery using self-describing replica files in a distributed storage system | |
US9703890B2 (en) | Method and system that determine whether or not two graph-like representations of two systems describe equivalent systems | |
US10768919B2 (en) | Package installation on a host file system using a container | |
US20140244590A1 (en) | Hybrid data backup in a networked computing environment | |
US20110078681A1 (en) | Method and system for running virtual machine image | |
US8813076B2 (en) | Virtual machine updates | |
US20120151198A1 (en) | System and Method for Instantiation of Distributed Applications from Disk Snapshots | |
US10585785B2 (en) | Preservation of modifications after overlay removal from a container | |
US10922213B2 (en) | Embedded quality indication data for version control systems | |
US10872007B2 (en) | Methods and systems to compound alerts in a distributed computing system | |
US11150981B2 (en) | Fast recovery from failures in a chronologically ordered log-structured key-value storage system | |
US9058576B2 (en) | Multiple project areas in a development environment | |
Di et al. | High-performance migration tool for live container in a workflow | |
US10061566B2 (en) | Methods and systems to identify log write instructions of a source code as sources of event messages | |
WO2023217165A1 (en) | De-duplication of data in executable files in a container image | |
CN116841972A (en) | Container mirror image redundancy removing method through mirror image layer reconstruction | |
US11163636B2 (en) | Chronologically ordered log-structured key-value store from failures during garbage collection | |
CN114860378A (en) | File system migration method, device, system and medium thereof | |
CN116547657A (en) | Merging composite images into a new file system namespace | |
Saurabh et al. | Semantics-aware virtual machine image management in IaaS clouds | |
US11983147B2 (en) | Deduplicating data integrity checks across systems | |
CN116700902B (en) | Container acceleration deployment method and device for asynchronous parallel extraction of mirror image layer | |
US20240211381A1 (en) | Efficient testing of versioned software system behaviour | |
US20240248630A1 (en) | Online format conversion of virtual disk from redo-log snapshot format to single-container snapshot format | |
CN116719604A (en) | Container migration method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |