CN107809467B - Method for deleting container mirror image data in cloud environment - Google Patents

Method for deleting container mirror image data in cloud environment Download PDF

Info

Publication number
CN107809467B
CN107809467B CN201710934727.0A CN201710934727A CN107809467B CN 107809467 B CN107809467 B CN 107809467B CN 201710934727 A CN201710934727 A CN 201710934727A CN 107809467 B CN107809467 B CN 107809467B
Authority
CN
China
Prior art keywords
image
file
container
local
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710934727.0A
Other languages
Chinese (zh)
Other versions
CN107809467A (en
Inventor
邓玉辉
周毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Moyi Information Technology Co ltd
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201710934727.0A priority Critical patent/CN107809467B/en
Publication of CN107809467A publication Critical patent/CN107809467A/en
Application granted granted Critical
Publication of CN107809467B publication Critical patent/CN107809467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for deleting container mirror image data in a cloud environment, which aims at solving the problems that in the practice of container technology, due to the fact that the space occupation of a mirror image disk is overlarge, a disk and network I/O (input/output) overhead is generated in the releasing process, the deployment cost is increased, and the use flexibility of a mirror image is limited. The method is applicable to two scenarios: local storage and mirror export. When the local storage is carried out, the disk storage overhead during the mirror image storage is reduced by increasing the multiplexing rate of the local basic mirror image; when the mirror image is exported, a file export model is established through files accessed in the running process of the dynamic collection container, and the exported mirror image is constructed as required, so that the size of the exported mirror image is reduced, and the functional completeness of the exported mirror image is ensured.

Description

一种云环境下容器镜像数据的删减方法A method for deleting container image data in cloud environment

技术领域technical field

本发明涉及云计算容器技术领域,具体涉及一种云环境下容器镜像数据的删减方法,更具体的涉及了针对Docker容器镜像在本地存储及导出时所采取的镜像大小优化方法。The invention relates to the technical field of cloud computing containers, in particular to a method for deleting container image data in a cloud environment, and more particularly to an image size optimization method for locally storing and exporting Docker container images.

背景技术Background technique

容器技术是一种类似于沙盒机制的运行环境隔离技术,用户可以在容器中创建运行操作系统,实现操作系统级的虚拟化。同传统的虚拟机相比,容器技术通过共享内核资源的方式,实现轻量级的应用运行隔离。Docker是容器技术的一种实现形式,具有高可移植性、开发运维一体性的特点。Container technology is an operating environment isolation technology similar to the sandbox mechanism. Users can create and run operating systems in containers to achieve operating system-level virtualization. Compared with traditional virtual machines, container technology achieves lightweight application isolation by sharing kernel resources. Docker is an implementation form of container technology, featuring high portability and integration of development, operation and maintenance.

现如今,随着云计算和大数据规模的日益扩大,企业对产品持续集成和高效发布的需求与日俱增。在传统的以虚拟机为核心的虚拟化体系中,虽然可以实现对应用和服务的隔离,但需要提供完全独占的硬件资源,使得整个体系的资源开销十分庞大。Docker作为一种轻量级虚拟化理念,相对于虚拟机可以降低资源和时间的开销,使得应用和服务的打包,发布和协调更为灵活和快速。Today, with the ever-expanding scale of cloud computing and big data, enterprises are increasingly demanding continuous product integration and efficient release. In the traditional virtualization system with virtual machines as the core, although the isolation of applications and services can be achieved, it needs to provide completely exclusive hardware resources, which makes the resource overhead of the entire system very large. As a lightweight virtualization concept, Docker can reduce resource and time overhead compared to virtual machines, making the packaging, release and coordination of applications and services more flexible and fast.

Docker中的镜像由一系列镜像层构成,每一层都只包含上一层的增量部分,形成堆栈式结构。当成功创建一个容器后,就在原来的镜像层之上创建一个可供读写的容器层。而对容器的操作,比如创建、删除文件,均在该可读写的容器层中完成,并不会对只读镜像层产生影响。多个容器可以共享同一镜像的数据,同时也保存自身的数据状态信息。容器镜像文件的存储采用分层存储形式。这样可以复用镜像相同的部分,节约了磁盘空间的开销,减少容器镜像在网络传输的成本,但会造成容器运行时的复杂化。对于Docker的镜像,主要的结构和内容分别打包在不同的层中,从而可以共享相同层的内容,达到节省存储空间的目的。The image in Docker consists of a series of image layers, each layer only contains the incremental part of the previous layer, forming a stack structure. When a container is successfully created, a read-write container layer is created on top of the original image layer. The operations on the container, such as creating and deleting files, are all completed in the readable and writable container layer, and will not affect the read-only image layer. Multiple containers can share the data of the same image and also save their own data status information. The storage of container image files is in the form of tiered storage. In this way, the same part of the image can be reused, which saves the overhead of disk space and reduces the cost of transferring the container image over the network, but it will complicate the runtime of the container. For Docker images, the main structures and contents are packaged in different layers, so that the contents of the same layer can be shared and storage space can be saved.

而在实践中,由于镜像包含了整个运行环境,导致镜像的体积往往很大,在发布的过程中会产生较大的磁盘和网络I/O开销,同时也限制了镜像的使用灵活性,与容器技术的精简、便捷的设计初衷相违背,甚至加大了整个系统的部署难度。因此,对于过于庞大和冗余的镜像包的删减工作就显得尤为重要。In practice, because the image includes the entire operating environment, the image size is often large, which will generate a large disk and network I/O overhead in the process of publishing, and also limit the flexibility of the use of the image. The streamlined and convenient design of container technology runs counter to the original intention, and even increases the difficulty of deploying the entire system. Therefore, it is particularly important to delete too large and redundant image packages.

发明内容SUMMARY OF THE INVENTION

本发明的主要目的是为了解决现有技术中的上述缺陷,提出了一种云环境下容器镜像数据的删减方法。The main purpose of the present invention is to solve the above-mentioned defects in the prior art, and propose a method for deleting container image data in a cloud environment.

本发明的目的可以通过采取如下技术方案达到:The purpose of the present invention can be achieved by adopting the following technical solutions:

一种云环境下容器镜像数据的删减方法,其特征在于,所述的删减方法,可适用于本地镜像存储模式和镜像导出模式,分别包括下列步骤:A method for pruning container image data in a cloud environment, characterized in that the pruning method is applicable to a local image storage mode and an image export mode, and includes the following steps respectively:

本地镜像存储模式:Local image storage mode:

T1、运行本地镜像分析器,对本地镜像存储情况进行检索,若本地没有保存镜像,则执行步骤T2,若本地已存有镜像,则执行步骤T3。T1. Run the local image analyzer to retrieve the storage status of the local image. If the image is not stored locally, execute step T2, and if the image already exists locally, execute step T3.

T2、此时,对于新导入本地的镜像只进行分析,将新导入的镜像大小,基础镜像层大小、镜像各层SHA-256摘要值保存至本地。T2. At this time, only analyze the newly imported image locally, and save the size of the newly imported image, the size of the basic image layer, and the SHA-256 digest value of each layer of the image to the local.

T3、若本地已存有镜像,检查本地保存镜像的个数,若超过20个镜像,本地镜像分析器会对所有镜像的基础镜像进行检查,通过基础镜像层共享计算方法,选取比例值最大的作为共享基础镜像层,并保存其中所包含的文件绝对路径、大小和MD5摘要值,形成基础镜像文件指纹库。T3. If there are existing images locally, check the number of locally saved images. If there are more than 20 images, the local image analyzer will check the base images of all images, and select the one with the largest ratio through the calculation method of sharing the base image layer. As a shared base image layer, and save the absolute path, size and MD5 digest value of the files contained in it to form a base image file fingerprint library.

T4、选定共享基础镜像层后,对于以后所有新加入的镜像,都会先分析其基础镜像层的SHA-256摘要值,并与共享基础镜像层的SHA-256摘要值进行比对,若一致则可直接存入本地,无需进行修改;若不一致则执行步骤T5。T4. After the shared base image layer is selected, for all new images added in the future, the SHA-256 digest value of the base image layer will be analyzed first, and compared with the SHA-256 digest value of the shared base image layer, if they are consistent Then, it can be directly stored locally without modification; if it is inconsistent, step T5 is executed.

T5、本地存储模块会对新加入的镜像进行分析,取得其中所包含的文件的MD5摘要值,并与文件指纹库中的摘要值进行比对,剔除所有重复部分。使用选定的共享基础镜像和剩余部分重新生成一个新的镜像。T5. The local storage module will analyze the newly added image, obtain the MD5 digest value of the file contained in it, and compare it with the digest value in the file fingerprint database to remove all duplicates. Rebuilds a new image using the selected shared base image and the remainder.

镜像导出模式:Image export mode:

R1、当有镜像需要被导出时,执行镜像按需动态导出方法,根据要导出镜像的名称定位该镜像的文件访问信息表,若可以定位到目标文件访问信息表则执行步骤R3,否则,执行步骤R2;R1. When an image needs to be exported, execute the on-demand dynamic export method of the image, locate the file access information table of the image according to the name of the image to be exported, and execute step R3 if the target file access information table can be located, otherwise, execute step R2;

R2、镜像内文件访问信息收集,在生成容器时,导入一个文件访问探针,实时收集容器在运行过程中访问到的文件,并以文本形式记录,制成文件访问信息表,为步骤R3中镜像导出提供依据;R2. Collect file access information in the image. When generating the container, import a file access probe to collect the files accessed by the container in real time during the running process, record it in text form, and make a file access information table, which is in step R3. Provide basis for image export;

R3、读取文件访问信息表,获取导出镜像的文件访问信息表,建立镜像导出文件预测模型,进而得到在镜像运行时依赖的相关文件,并将这些文件导出制成新镜像。R3. Read the file access information table, obtain the file access information table of the exported image, establish a prediction model of the image exported file, and then obtain the relevant files that are depended on when the image is running, and export these files to make a new image.

进一步地,所述的本地镜像分析器会对本地已存储的所有镜像信息进行收集,包括每个镜像的大小、各镜像的分层数量以及镜像之间各个层的共享情况,并计算出通过共享基础镜像层减少的磁盘开销。Further, the local image analyzer will collect all locally stored image information, including the size of each image, the number of layers of each image, and the sharing of each layer between the images, and calculates the sharing of Reduced disk overhead by the base image layer.

进一步地,所述的基础镜像层共享计算方法,是计算共享本地存储的每一种基础镜像后,通过共享基础镜像层所减少的存储开销占总镜像的大小的比例,并选择该比例最大的基础镜像作为本地存储时所使用的共享基础镜像层。Further, the described basic image layer sharing calculation method is to calculate the ratio of the storage overhead reduced by sharing the basic image layer to the size of the total image after calculating each basic image that shares the local storage, and select the one with the largest ratio. The shared base image layer used when the base image is used as local storage.

进一步地,所述的文件指纹库,包含了选定的要共享的基础镜像层中所包含的所有文件的绝对路径、大小和MD5摘要值。Further, the file fingerprint library includes absolute paths, sizes and MD5 digest values of all files included in the selected base image layer to be shared.

进一步地,所述的文件访问信息表的内容包括访问文件的名称、MD5摘要值、大小、类型、绝对路径及访问该文件的进程。Further, the content of the file access information table includes the name of the access file, the MD5 digest value, the size, the type, the absolute path and the process of accessing the file.

进一步地,所述的文件访问探针是在容器运行的过程中实时收集执行的进程、相关配置文件、依赖文件的可执行程序,并在容器运行结束之后,对运行期间所访问到的文件的绝对路径、大小、名称、MD5摘要值写入到文件访问信息表中。Further, the file access probe is to collect and execute the process, related configuration files, and executable programs that depend on the file in real time during the running of the container, and after the running of the container is completed, the file accessed during the running is processed. The absolute path, size, name, and MD5 digest value are written into the file access information table.

进一步地,所述的导出文件预测模型用于获取导出镜像所包含的所有文件,通过获取文件访问信息表中每个文件的绝对路径和访问次数、每个目录下被访问文件的数量、类型和某个文件的访问次数,计算镜像对该文件的依赖度,得出该目录的导出概率值。Further, the described export file prediction model is used to obtain all files included in the export image, by obtaining the absolute path and the number of visits of each file in the file access information table, the number, type and number of files accessed under each directory. The number of visits to a certain file, calculate the dependency of the image on the file, and obtain the export probability value of the directory.

本发明相对于现有技术具有如下的优点及效果:Compared with the prior art, the present invention has the following advantages and effects:

(1)本发明可以有效地降低镜像在本地存储时所占用的磁盘空间,同时也大大减小了导出的镜像的大小,便于镜像的发布和移植。(1) The present invention can effectively reduce the disk space occupied by the mirror image when it is stored locally, and also greatly reduces the size of the exported mirror image, which is convenient for the release and transplantation of the mirror image.

(2)本发明在减小镜像大小的同时,通过导出模型的建立,保证删减后的镜像有较好的可靠性和可维护性,使得镜像在打包发布之后依然能够保证良好的再编辑性。(2) While reducing the size of the image, the present invention ensures that the deleted image has better reliability and maintainability through the establishment of the export model, so that the image can still ensure good re-editability after being packaged and released. .

(3)本发明可支持现有Docker的所有文件驱动,通过将本地镜像先容器化,删减处理之后再导出,因而不依赖于特定文件驱动。(3) The present invention can support all file drivers of the existing Docker, by firstly containerizing the local image, and then exporting after the deletion processing, so it does not depend on a specific file driver.

(4)本发明的数据获取是建立在和Docker网络通信基础上,对镜像的操作也是分离于原有代码的,没有对原有代码进行修改,保证了原本系统的稳定性。(4) The data acquisition of the present invention is based on the network communication with Docker, and the operation of the mirror image is also separated from the original code, and the original code is not modified, which ensures the stability of the original system.

附图说明Description of drawings

图1是本发明适用系统的结构示意图;Fig. 1 is the structural representation of the applicable system of the present invention;

图2是本发明公开的一种云环境下容器镜像数据的删减方法的工作流程图;Fig. 2 is a working flow chart of a method for deleting container image data in a cloud environment disclosed by the present invention;

图3是本发明中本地存储模式下基础镜像共享效果示意图;Fig. 3 is a schematic diagram of basic image sharing effect under local storage mode in the present invention;

图4是本发明中文件访问探针数据流图。FIG. 4 is a data flow diagram of a file access probe in the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

实施例Example

如图1所示,是一种云环境下容器镜像数据的删减方法的系统结构示意图,应用在单机环境下的容器镜像规模优化:As shown in Figure 1, it is a schematic diagram of the system structure of a method for deleting container image data in a cloud environment, which is applied to the scale optimization of container images in a single-machine environment:

此环境下,本地镜像数据删减包含本地存储的镜像和导出的镜像。本发明的目的在于,在本地环境下,通过增大镜像之间共享的镜像层的复用率,让更多的上层镜像复用同一个基础镜像,从而减小镜像在本地的总体存储开销;在导出镜像环境下,通过获取的文件访问信息,使生成的删减镜像只包含镜像功能所依赖的文件,从而减小导出镜像的大小。In this environment, the local image data pruning includes locally stored images and exported images. The purpose of the present invention is, in the local environment, by increasing the multiplexing rate of the image layer shared between the images, so that more upper-layer images can reuse the same basic image, thereby reducing the overall storage overhead of the image locally; In the export image environment, through the obtained file access information, the generated deleted image only contains the files that the image function depends on, thereby reducing the size of the exported image.

为了更清晰地阐明本发明的应用场景,以下结合系统工作流程图(图2)、本地存储模式下基础镜像共享效果示意图(图3)和文件访问探针数据流图(图4)再作详细分析。In order to clarify the application scenario of the present invention more clearly, the following details are combined with the system work flow diagram (Fig. 2), the schematic diagram of the basic image sharing effect in the local storage mode (Fig. 3), and the file access probe data flow diagram (Fig. 4). analyze.

如图2所示,一种云环境下容器镜像数据的删减方法,其应用场景包含本地存储模式和镜像导出模式。As shown in Figure 2, a method for reducing container image data in a cloud environment, its application scenarios include local storage mode and image export mode.

其中,本地存储模式的镜像删减方法,具体包括下述步骤:The image deletion method in the local storage mode specifically includes the following steps:

T1、程序运行时,对本地的镜像存储情况进行检索,若本地没有保存镜像,则执行步骤T2,若本地已存有镜像,则执行步骤T3;T1. When the program is running, the local image storage is retrieved. If the image is not stored locally, step T2 is performed. If the image is already stored locally, step T3 is performed;

T2、此时,对于新导入本地的镜像只进行分析,将新导入的镜像大小,基础镜像层大小、镜像各层SHA-256摘要值保存至本地;T2. At this time, only analyze the newly imported local image, and save the newly imported image size, the size of the basic image layer, and the SHA-256 digest value of each layer of the image to the local;

T3、若本地已存有镜像,检查本地保存镜像的个数,若超过20个镜像,则对所有镜像的基础镜像进行检查,计算共享的基础镜像占总镜像大小的比例,选取比例值最大的作为共享基础镜像层,并保存其中所包含的文件名和文件MD5摘要值,形成基础镜像文件指纹库;T3. If there are existing images locally, check the number of locally saved images. If there are more than 20 images, check the base images of all images, calculate the ratio of the shared base images to the total image size, and select the one with the largest ratio. As a shared basic image layer, and save the file name and file MD5 digest value contained in it to form a basic image file fingerprint library;

T4、选定共享基础镜像层后,对于以后所有新加入的镜像,都会先分析其基础镜像层的SHA-256摘要值,并与共享基础镜像层的SHA-256摘要值进行比对,若一致则可直接存入本地,无需进行修改;若不一致则执行步骤T5;T4. After the shared base image layer is selected, for all new images added in the future, the SHA-256 digest value of the base image layer will be analyzed first, and compared with the SHA-256 digest value of the shared base image layer, if they are consistent Then it can be directly stored locally without modification; if it is inconsistent, step T5 is performed;

T5、本地存储模块会对新加入的镜像进行分析,取得其中所包含的文件的MD5摘要值,并与文件指纹库中的摘要值进行比对,剔除所有重复部分;T5. The local storage module will analyze the newly added image, obtain the MD5 digest value of the file contained in it, and compare it with the digest value in the file fingerprint database to remove all duplicates;

T6、使用选定的共享基础镜像和步骤T5剔除后的剩余部分重新生成一个新的镜像。从而该镜像的基础镜像部分不会额外占用本地的磁盘空间,达到减少本地存储开销的目的。T6. Regenerate a new image using the selected shared base image and the remainder after culling in step T5. Therefore, the basic image part of the image does not occupy additional local disk space, so as to reduce the local storage overhead.

其中,镜像导出模式的镜像删减方法,具体包括下述步骤:The image deletion method in the image export mode specifically includes the following steps:

R1、当有镜像需要被导出时,执行镜像按需动态导出方法,根据要导出镜像的名称定位该镜像的文件访问信息表,若可以定位到目标文件访问信息表则执行步骤R3,否则,执行步骤R2;R1. When an image needs to be exported, execute the on-demand dynamic export method of the image, locate the file access information table of the image according to the name of the image to be exported, and execute step R3 if the target file access information table can be located, otherwise, execute step R2;

R2、镜像内文件访问信息收集,在生成容器时,导入一个文件访问探针,实时收集容器在运行过程中访问到的文件,并以文本形式记录,制成文件访问信息表,为步骤R3中镜像导出提供依据;R2. Collect file access information in the image. When generating the container, import a file access probe to collect the files accessed by the container in real time during the running process, record it in text form, and make a file access information table, which is in step R3. Provide basis for image export;

R3、读取文件访问信息表,获取导出镜像的文件访问信息表,建立镜像导出文件预测模型,进而得到在镜像运行时依赖的相关文件,并将这些文件导出制成新镜像。R3. Read the file access information table, obtain the file access information table of the exported image, establish a prediction model of the image exported file, and then obtain the relevant files that are depended on when the image is running, and export these files to make a new image.

本地镜像分析器,是在本机运行的一个与Docker进行通信的进程,其主要作用是获取当前本机中存储的每个镜像的大小、各镜像的分层数量以及镜像之间各个层的共享情况。在本机存储模式下,就是通过由本地镜像分析器获取的镜像分层信息,通过基础镜像层共享计算方法,从而达到共享基础镜像层的效果。The local image analyzer is a process running on the local machine that communicates with Docker. Its main function is to obtain the size of each image currently stored in the local machine, the number of layers of each image, and the sharing of each layer between images. Happening. In the local storage mode, the image layer information obtained by the local image analyzer is used to share the computing method through the base image layer, so as to achieve the effect of sharing the base image layer.

基础镜像层共享计算方法,本发明通过加大其对基础镜像的利用率,从而降低全局镜像存储的总开销。为了描述这一问题,定义如下:镜像的虚拟大小(不考虑层共享减少的空间)为V,那么本地n个镜像的虚拟存储开销为

Figure GDA0002442628700000071
则通过共享基础镜像层所带来的空间减少率则可定义为公式1:In the shared computing method of the basic image layer, the present invention reduces the total overhead of the global image storage by increasing the utilization rate of the basic image. To describe this problem, the definition is as follows: the virtual size of the mirror (without considering the space reduced by layer sharing) is V, then the virtual storage cost of the local n mirrors is
Figure GDA0002442628700000071
Then the space reduction rate brought by sharing the base image layer can be defined as Equation 1:

Figure GDA0002442628700000072
Figure GDA0002442628700000072

其中,S表示共享层的存储大小,L表示该层所共享次数。η的值越大表明该层的共享对全局存储开销的提升效果越显著。从公式1中也可以看出,共享的基础镜像越大,η越大。一般为了保证基础镜像的通用性,选取占用空间比较大的镜像对于本机存储而言,也是可以接受的。因此,可以根据η的值,选定本地共享的基础镜像。Among them, S represents the storage size of the shared layer, and L represents the number of times the layer is shared. A larger value of η indicates that the sharing of this layer has a more significant effect on improving the global storage cost. It can also be seen from Equation 1 that the larger the shared base image, the larger η. Generally, in order to ensure the versatility of the basic image, it is acceptable to select an image with a relatively large space for local storage. Therefore, the locally shared base image can be selected according to the value of η.

如图3所示,是本机存储模式下达到的基础镜像共享效果图。图中,每一个圆圈即为镜像的一层,从最左边开始到带阴影的圆圈是镜像具有完整功能的部分。事实上,每个镜像各层的大小差异很大,平均每个镜像中最大的层占镜像总大小的67%。从图中可以看出,通过共享同一个镜像层,可以增加本地中基础镜像层的共享比,达到多个镜像共用同一个基础镜像层的效果,从而从整体上减少镜像在本地占用的磁盘空间大小。As shown in Figure 3, it is the basic image sharing effect diagram achieved in the local storage mode. In the figure, each circle is a layer of the mirror, and the circle from the far left to the shaded circle is the fully functional part of the mirror. In fact, the size of each layer of each image varies widely, with the largest layer in each image accounting for 67% of the total image size on average. As can be seen from the figure, by sharing the same image layer, the sharing ratio of the local base image layer can be increased, and the effect of multiple images sharing the same base image layer can be achieved, thereby reducing the disk space occupied by the image locally as a whole. size.

文件指纹库,包含了选定的要共享的基础镜像层中所包含的所有文件的绝对路径、大小和MD5摘要值。其作用是为了剔除新导入镜像中与共享的基础镜像层的重复部分,从而减小在本地存储时占用的磁盘空间。File fingerprint library, which contains the absolute path, size and MD5 digest value of all files contained in the base image layer selected to be shared. Its function is to eliminate the duplication of the shared base image layer in the newly imported image, thereby reducing the disk space occupied by local storage.

文件访问探针,是一个独立的可执行文件,即从待修改的镜像中,运行起一个容器,并在该容器中执行该探针,用于实时捕捉在容器中执行的进程和访问到的文件。等到该容器运行结束后,探针会汇总在容器运行过程中所访问到的相关文件,并以JSON格式写入文件访问信息表保存在共享数据卷中。具体实现是通过4个并发进程及线程之间的相互通信完成,分别为User、Sensor、Monitor和Collector。User是向探针发出开始或结束指令的进程,通常是运行在主机中,与Sensor通过unix socket进行通信;Sensor即在容器中进行文件访问信息捕获的探针进程,主要作用是负责接收Monitor捕获而来的报告(report),并把这些报告上报给User进程;Monitor线程是负责汇总从Collector线程收集到的事件(even),并整理成报告提交给Sensor。A file access probe is an independent executable file, that is, a container is run from the image to be modified, and the probe is executed in the container to capture the process executed in the container and the accessed data in real time. document. After the container runs, the probe will summarize the related files accessed during the container running, and write the file access information table in JSON format and save it in the shared data volume. The specific implementation is completed through the mutual communication between four concurrent processes and threads, namely User, Sensor, Monitor and Collector. User is a process that sends start or end commands to the probe, usually running in the host, and communicates with the Sensor through unix socket; Sensor is the probe process that captures file access information in the container, and its main function is to receive the capture of the Monitor. The report comes and reports these reports to the User process; the Monitor thread is responsible for summarizing the events (even) collected from the Collector thread, and organizes them into reports and submits them to the Sensor.

图4是文件访问探针的数据流图,整个系统有三条数据流:stop流,由User发出,对整个探针信息收集完成,停止探针工作,Sensor收到stop信号后,会把其传送给Monitor,由Monitor执行清理工作(cleanup);report流,由Monitor发出,是Monitor执行事件整理函数(ProcessEven),将从Collector收集到的原始访问信息转化为report结构体,传回给Sensor,进而由Sensor发送给User;even流,由Collector发出,是Collector在容器运行过程中执行事件捕获函数(GetEven),实时收集文件访问原始信息。Figure 4 is the data flow diagram of the file access probe. There are three data flows in the whole system: the stop flow, which is sent by the User. The collection of the entire probe information is completed, and the probe work is stopped. After the Sensor receives the stop signal, it will be transmitted. For the Monitor, the Monitor performs the cleanup work (cleanup); the report stream, which is issued by the Monitor, is the Monitor execution event sorting function (ProcessEven), which converts the original access information collected from the Collector into a report structure, which is sent back to the Sensor, and then Sent by Sensor to User; even stream, sent by Collector, is when Collector executes the event capture function (GetEven) during container operation to collect original file access information in real time.

文件访问信息表,是容器中探针生成的,以JSON格式记录的文件访问信息。每一条记录以访问的文件名作为主键,对应属性包括文件的绝对路径,文件的大小,文件的MD5摘要值,文件的类型及访问文件的进程号,如果文件类型为符号链接,则尝试进行解引用,给出文件的实际路径。The file access information table is the file access information recorded in JSON format generated by the probe in the container. Each record takes the accessed file name as the primary key, and the corresponding attributes include the absolute path of the file, the size of the file, the MD5 digest value of the file, the type of the file and the process ID of the file that is accessed. If the file type is a symbolic link, try to solve the problem. reference, giving the actual path to the file.

导出文件预测模型,本发明采用全概率模型来确定某个目录是否全部导出。其设计的依据是源于在规范设计下,每个目录里的文件是实现某种功能的集合,当一个目录内有较多文件被导出,那很有可能表明,这个目录里的文件是实现镜像功能的核心,因此这个目录下的所有文件也应该被导出,具体描述如下:定义要导出镜像所需的依赖文件集合为X,其容量为n。假定,某个文件Xi,(i∈[1,n])对上层目录是否导出的影响是等可能的,即1/n。对于文件系统中的任何一个目录,都包括0到多个文件和子目录。因此,对于任何一个目录Y的导出概率可以描述为:Exporting the file prediction model, the present invention uses a full probability model to determine whether a certain directory is all exported. The basis of its design is that under the standard design, the files in each directory are a collection of functions that implement a certain function. When many files in a directory are exported, it is likely to indicate that the files in this directory are implemented. The core of the mirroring function, so all files in this directory should also be exported. The specific description is as follows: Define the set of dependent files required to export the mirroring as X, and its capacity is n. It is assumed that the influence of a certain file X i , (i∈[1,n]) on whether the upper directory is exported is equally possible, that is, 1/n. For any directory in the file system, include zero or more files and subdirectories. Therefore, the derived probability for any directory Y can be described as:

Figure GDA0002442628700000091
Figure GDA0002442628700000091

其中m为目录Y下的文件和子目录总数,

Figure GDA0002442628700000092
为第i个文件或目录
Figure GDA0002442628700000093
的导出概率。由于集合X内的文件一定会被导出,因此对于任何导出文件的
Figure GDA0002442628700000094
公式(2)可进一步写成:where m is the total number of files and subdirectories under directory Y,
Figure GDA0002442628700000092
for the ith file or directory
Figure GDA0002442628700000093
The derived probability of . Since files in collection X are bound to be exported, for any exported file
Figure GDA0002442628700000094
Formula (2) can be further written as:

Figure GDA0002442628700000095
Figure GDA0002442628700000095

其中k为目录Y下文件数目,l为子目录数。我们给出一个阈值ε,0≤ε≤1,若P(Y)>ε,则导出Y目录下的所有文件。Where k is the number of files in directory Y, and l is the number of subdirectories. We give a threshold ε, 0≤ε≤1, if P(Y)>ε, export all files in the Y directory.

上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments, and any other changes, modifications, substitutions, combinations, The simplification should be equivalent replacement manners, which are all included in the protection scope of the present invention.

Claims (7)

1.一种云环境下容器镜像数据的删减方法,其特征在于,所述的删减方法包括本地镜像存储模式和镜像导出模式,其中,1. a method for pruning container image data under cloud environment, it is characterized in that, described method for pruning comprises local image storage mode and image export mode, wherein, 所述的本地镜像存储模式包括下列步骤:The described local image storage mode includes the following steps: T1、运行本地镜像分析器,对本地镜像存储情况进行检索,若本地没有保存镜像,则执行步骤T2,若本地已存有镜像,则执行步骤T3;T1. Run the local image analyzer to retrieve the storage situation of the local image. If the image is not stored locally, then execute step T2, and if the image already exists locally, execute step T3; T2、此时,对于新导入本地的镜像只进行分析,将新导入的镜像大小,基础镜像层大小、镜像各层SHA-256摘要值保存至本地;T2. At this time, only analyze the newly imported local image, and save the newly imported image size, the size of the basic image layer, and the SHA-256 digest value of each layer of the image to the local; T3、若本地已存有镜像,检查本地保存镜像的个数,若超过20个镜像,本地镜像分析器会对所有镜像的基础镜像进行检查,通过基础镜像层共享计算方法,选取比例值最大的作为共享基础镜像层,并保存其中所包含的文件绝对路径、大小和MD5摘要值,形成基础镜像文件指纹库;T3. If there are existing images locally, check the number of locally saved images. If there are more than 20 images, the local image analyzer will check the base images of all images, and select the one with the largest ratio through the calculation method of sharing the base image layer. As a shared basic image layer, and save the absolute path, size and MD5 digest value of the files contained in it to form a basic image file fingerprint library; T4、选定共享基础镜像层后,对于以后所有新加入的镜像,都会先分析其基础镜像层的SHA-256摘要值,并与共享基础镜像层的SHA-256摘要值进行比对,若一致则可直接存入本地,无需进行修改;若不一致则执行步骤T5;T4. After the shared base image layer is selected, for all new images added in the future, the SHA-256 digest value of the base image layer will be analyzed first, and compared with the SHA-256 digest value of the shared base image layer, if they are consistent Then it can be directly stored locally without modification; if it is inconsistent, step T5 is performed; T5、本地存储模块会对新加入的镜像进行分析,取得其中所包含的文件的MD5摘要值,并与文件指纹库中的摘要值进行比对,剔除所有重复部分,使用选定的共享基础镜像和剩余部分重新生成一个新的镜像;T5. The local storage module will analyze the newly added image, obtain the MD5 digest value of the file contained in it, and compare it with the digest value in the file fingerprint database, remove all duplicate parts, and use the selected shared base image and the remainder to rebuild a new image; 所述的镜像导出模式包括下列步骤:The image export mode includes the following steps: R1、当有镜像需要被导出时,执行镜像按需动态导出方法,根据要导出镜像的名称定位该镜像的文件访问信息表,若可以定位到目标文件访问信息表则执行步骤R3,否则,执行步骤R2;R1. When an image needs to be exported, execute the on-demand dynamic export method of the image, locate the file access information table of the image according to the name of the image to be exported, and execute step R3 if the target file access information table can be located, otherwise, execute step R2; R2、镜像内文件访问信息收集,在生成容器时,导入一个文件访问探针,实时收集容器在运行过程中访问到的文件,并以文本形式记录,制成文件访问信息表,为步骤R3中镜像导出提供依据;R2. Collect file access information in the image. When generating the container, import a file access probe to collect the files accessed by the container in real time during the running process, record it in text form, and make a file access information table, which is in step R3. Provide basis for image export; R3、读取文件访问信息表,获取导出镜像的文件访问信息表,建立镜像导出文件预测模型,进而得到在镜像运行时依赖的相关文件,并将这些文件导出制成新镜像。R3. Read the file access information table, obtain the file access information table of the exported image, establish a prediction model of the image exported file, and then obtain the relevant files that are depended on when the image is running, and export these files to make a new image. 2.根据权利要求1所述的一种云环境下容器镜像数据的删减方法,其特征在于,所述的本地镜像分析器会对本地已存储的所有镜像信息进行收集,包括每个镜像的大小、各镜像的分层数量以及镜像之间各个层的共享情况,并计算出通过共享基础镜像层减少的磁盘开销。2. The method for deleting container image data in a cloud environment according to claim 1, wherein the local image analyzer will collect all locally stored image information, including the information of each image. size, the number of layers in each image, and the sharing of layers between images, and calculate the disk overhead reduction by sharing the underlying image layer. 3.根据权利要求1所述的一种云环境下容器镜像数据的删减方法,其特征在于,所述的基础镜像层共享计算方法,是计算共享本地存储的每一种基础镜像后,通过共享基础镜像层所减少的存储开销占总镜像的大小的比例,并选定该比例最大的基础镜像作为本地存储时所使用的共享基础镜像层。3. The method for deleting container image data in a cloud environment according to claim 1, wherein the shared computing method for the basic image layer is that after computing each basic image of the shared local storage, the The ratio of the storage overhead reduced by the shared base image layer to the size of the total image, and the base image with the largest proportion is selected as the shared base image layer used for local storage. 4.根据权利要求1所述的一种云环境下容器镜像数据的删减方法,其特征在于,所述的文件指纹库,包含选定的要共享的基础镜像层中所包含的所有文件的绝对路径、大小和MD5摘要值。4. the deletion method of container image data under a kind of cloud environment according to claim 1, is characterized in that, described file fingerprint library, comprises all files contained in the base image layer selected to be shared. Absolute path, size and MD5 digest values. 5.根据权利要求1所述的一种云环境下容器镜像数据的删减方法,其特征在于,所述的文件访问信息表的内容包括访问文件的名称、MD5摘要值、大小、类型、绝对路径及访问该文件的进程。5. the deletion method of container image data under a kind of cloud environment according to claim 1, is characterized in that, the content of described file access information table comprises the name, MD5 digest value, size, type, absolute value of access file The path and the process accessing the file. 6.根据权利要求1所述的一种云环境下容器镜像数据的删减方法,其特征在于,所述的文件访问探针是在容器运行的过程中实时收集执行的进程、相关配置文件、依赖文件的可执行程序,并在容器运行结束之后,对运行期间所访问到的文件的绝对路径、大小、名称、MD5摘要值写入到文件访问信息表中。6. The method for deleting container image data in a cloud environment according to claim 1, wherein the file access probe is to collect and execute processes, related configuration files, The executable program depends on the file, and after the container runs, the absolute path, size, name, and MD5 digest value of the file accessed during the running are written into the file access information table. 7.根据权利要求1所述的一种云环境下容器镜像数据的删减方法,其特征在于,所述的导出文件预测模型用于获取导出镜像所包含的所有文件,通过获取文件访问信息表中每个文件的绝对路径和访问次数、每个目录下被访问文件的数量、类型和某个文件的访问次数,计算镜像对该文件的依赖度,得出该目录的导出概率值。7. The method for deleting container image data in a cloud environment according to claim 1, wherein the export file prediction model is used to obtain all files included in the export image, and the access information table is obtained by obtaining the file. The absolute path and access times of each file in each directory, the number and type of accessed files in each directory, and the access times of a certain file, calculate the dependency of the image on the file, and obtain the export probability value of the directory.
CN201710934727.0A 2017-10-10 2017-10-10 Method for deleting container mirror image data in cloud environment Active CN107809467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710934727.0A CN107809467B (en) 2017-10-10 2017-10-10 Method for deleting container mirror image data in cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710934727.0A CN107809467B (en) 2017-10-10 2017-10-10 Method for deleting container mirror image data in cloud environment

Publications (2)

Publication Number Publication Date
CN107809467A CN107809467A (en) 2018-03-16
CN107809467B true CN107809467B (en) 2020-06-16

Family

ID=61584851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710934727.0A Active CN107809467B (en) 2017-10-10 2017-10-10 Method for deleting container mirror image data in cloud environment

Country Status (1)

Country Link
CN (1) CN107809467B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144958B (en) * 2018-07-02 2021-08-03 广东睿江云计算股份有限公司 A method and device for collecting metadata of file access frequency in a joint file system
CN110912955B (en) * 2018-09-17 2022-04-05 阿里巴巴集团控股有限公司 Container mirror image downloading and uploading method and device
CN109639791A (en) * 2018-12-06 2019-04-16 广东石油化工学院 Cloud workflow schedule method and system under a kind of container environment
CN112084165A (en) * 2019-06-12 2020-12-15 阿里巴巴集团控股有限公司 Method, apparatus, electronic device and readable storage medium for managing mirror warehouse
US11182193B2 (en) * 2019-07-02 2021-11-23 International Business Machines Corporation Optimizing image reconstruction for container registries
CN112306621B (en) * 2019-07-24 2025-04-18 中兴通讯股份有限公司 Container layered deployment method and system
CN113495870B (en) * 2020-04-01 2025-03-18 北京沃东天骏信息技术有限公司 Image building method, device, electronic device and storage medium
CN113176886B (en) * 2021-04-29 2025-02-28 中国工商银行股份有限公司 A method and device for compressing and operating an image file
CN114138414B (en) * 2021-12-02 2023-08-15 国汽大有时空科技(安庆)有限公司 Incremental compression method and system for container mirror image
CN116932465B (en) * 2023-09-15 2024-01-23 山东未来网络研究院(紫金山实验室工业互联网创新应用基地) Mirror image file management method, system, equipment and medium
CN117389690B (en) * 2023-12-08 2024-03-15 中电云计算技术有限公司 Mirror image package construction method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981929A (en) * 2012-11-05 2013-03-20 曙光云计算技术有限公司 Management method and system for disk mirror images
CN106227579A (en) * 2016-07-12 2016-12-14 深圳市中润四方信息技术有限公司 A kind of Docker container construction method and Docker manage control station
CN106445515A (en) * 2016-09-18 2017-02-22 深圳市华云中盛科技有限公司 PaaS cloud implementation method based on containers
CN106790483A (en) * 2016-12-13 2017-05-31 武汉邮电科学研究院 Hadoop group systems and fast construction method based on container technique
CN107105054A (en) * 2017-05-17 2017-08-29 郑州云海信息技术有限公司 A kind of mirror image garbage-cleaning system and method towards docker mirror images warehouse

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102294568B1 (en) * 2015-08-19 2021-08-26 삼성에스디에스 주식회사 Method and apparatus for security checking of image for container

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981929A (en) * 2012-11-05 2013-03-20 曙光云计算技术有限公司 Management method and system for disk mirror images
CN106227579A (en) * 2016-07-12 2016-12-14 深圳市中润四方信息技术有限公司 A kind of Docker container construction method and Docker manage control station
CN106445515A (en) * 2016-09-18 2017-02-22 深圳市华云中盛科技有限公司 PaaS cloud implementation method based on containers
CN106790483A (en) * 2016-12-13 2017-05-31 武汉邮电科学研究院 Hadoop group systems and fast construction method based on container technique
CN107105054A (en) * 2017-05-17 2017-08-29 郑州云海信息技术有限公司 A kind of mirror image garbage-cleaning system and method towards docker mirror images warehouse

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-Granularity Memory Mirroring via Binary Translation in Cloud Environments.;Zhengwei Qi et al.;《IEEE Transactions on Network and Service Management》;20140425;第11卷(第1期);全文 *
Performance analysis of Union and CoW File Systems with Docker.;Rajdeep Dua et al.;《2016 International Conference on Computing, Analytics and Security Trends (CAST)》;20161221;全文 *
一种概率模型的Docker镜像删减策略.;周毅 等.;《小型微型计算机系统》;20180915;第39卷(第09期);1908-1913页 *

Also Published As

Publication number Publication date
CN107809467A (en) 2018-03-16

Similar Documents

Publication Publication Date Title
CN107809467B (en) Method for deleting container mirror image data in cloud environment
US11403321B2 (en) System and method for improved performance in a multidimensional database environment
US20210256029A1 (en) Stream retention in a data storage system
Li Alluxio: A virtual distributed file system
CN110795257A (en) Method, device and equipment for processing multi-cluster operation records and storage medium
US8707005B2 (en) Data control systems for virtual environments
CN106897322A (en) The access method and device of a kind of database and file system
CN102521114B (en) File system log storage system under virtualization environment
CN111736762B (en) Synchronous updating method, device, equipment and storage medium of data storage network
JP2016100006A (en) Method and device for generating benchmark application for performance test
CN109947712A (en) Automatically merge method, system, equipment and the medium of file in Computational frame
US20140040191A1 (en) Inventorying and copying file system folders and files
US20250021241A1 (en) Container storage management method and apparatus
EP3264254B1 (en) System and method for a simulation of a block storage system on an object storage system
CN118519781A (en) Resource allocation method, apparatus, device, storage medium and program product
CN119202070A (en) Database data processing method, database data processing device, database data processing program product, database data processing equipment and storage medium
US11500749B2 (en) Distributed data store for testing data center services
CN111737223B (en) A file copy method, device, equipment and storage medium
CN110837442B (en) KVM virtual machine backup system based on dirty data bitmap and network block equipment
CN114138424A (en) A method, device and electronic device for generating a virtual machine memory snapshot
Park Improving the performance of HDFS by reducing I/O using adaptable I/O system
CN116578413B (en) Signal-level simulation model clouding method based on cloud+end architecture
US11983147B2 (en) Deduplicating data integrity checks across systems
CN111158860B (en) Data operation method, electronic device and storage medium
CN115269300B (en) A method and system for obtaining file operation logs in a cloud environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20250311

Address after: Room 1901-1914, No. 25 Huizhi 3rd Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province 511495

Patentee after: GUANGZHOU MOYI INFORMATION TECHNOLOGY CO.,LTD.

Country or region after: China

Address before: 510632 No. 601, Whampoa Avenue, Tianhe District, Guangdong, Guangzhou

Patentee before: Jinan University

Country or region before: China