CN113239353A

CN113239353A - Content difference-based container software security detection system and method

Info

Publication number: CN113239353A
Application number: CN202110407750.0A
Authority: CN
Inventors: 陈力波; 夏懿航; 赵瑞杰; 王轶骏; 薛质; 姜开达
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2021-08-10
Anticipated expiration: 2041-04-15
Also published as: CN113239353B

Abstract

The invention provides a content difference-based container software security detection system and a content difference-based container software security detection method, wherein the system comprises an identification classification node, a data acquisition node and a security analysis node, and the three nodes work cooperatively by exchanging data; identifying classification nodes: the method comprises the following steps of finishing the work of interaction with a user, basic image identification of all input images to be detected, classification of the images, data sending acquisition tasks and the like; a data acquisition node: downloading the corresponding container mirror image according to the issued data acquisition task, and extracting the corresponding container mirror image layer data from the downloaded container mirror image; and (4) safety analysis node: and identifying the data of the non-basic mirror layer according to the extracted container mirror layer data, and carrying out security analysis on the data of the non-basic mirror layer. The invention ensures the detection effectiveness, does not need to actually run the mirror image, greatly saves the calculation and storage resources, and can realize the full-scale safety detection of the massive mirror image in the open warehouse in a short time through the limited resources.

Description

Content difference-based container software security detection system and method

Technical Field

The invention relates to the technical field of container software security detection, in particular to a system and a method for detecting the security of container software based on content difference.

Background

Container software applications based on virtualization technology are increasingly common, container forms represented mainly by Docker bear common enterprise-level virtualization applications of various cloud centers, wherein Docker Hub is an official storage warehouse of the Docker Hub and comprises 1800 ten thousand application images maintained by various users in a community together, and users need to download the images from the warehouse as long as the users use and deploy related applications. The vulnerability of such container images has become a focus of attack and defense concerns, and especially, the short-term security configuration exposes serious security holes such as privacy disclosure, unauthorized security and the like, and even malicious images intentionally implanted by third parties exist, so that the security of the whole enterprise software supply chain is influenced. Therefore, how to rapidly evaluate the security of the continuously iteratively updated massive mirror images on the storage warehouse is a difficult problem. The existing detection method usually needs to detect the system environment, the software library version and the software behavior in an operating container after the mirror image is operated, and then the mirror image which is malicious or has a safety problem is screened through detection model matching.

Chinese patent publication No. CN108958890A discloses a container mirror image detection method, device and electronic equipment, and the method includes: acquiring software features to be matched from a software feature set of a container mirror image to be detected by statically scanning the container mirror image to be detected; comparing the software features to be matched with software vulnerability features stored in a preset software vulnerability database; if the software vulnerability characteristics matched with the software characteristics to be matched exist in the software vulnerability library, determining a test case set corresponding to the matched software vulnerability characteristics; aiming at the software corresponding to the software features to be matched, detecting whether software bugs exist in the software corresponding to the software features to be matched or not by executing the test cases in the test case set; and when the software bugs exist, determining that the mirror image of the container to be tested has abnormity. The patent realizes the detection of the mirror image of the container to be detected by detecting whether software corresponding to the characteristics of the software to be matched has software bugs.

In view of the above-mentioned prior art, the inventor considers that the general problem of such a method is that it needs to consume a lot of resources (storage, bandwidth) to download a large amount of images to the local test environment, and needs to consume a lot of computing resources and time cost to perform detection. Therefore, it is difficult to apply to large-scale container software security detection.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a system and a method for detecting the safety of container software based on content difference.

The content difference-based container software safety detection system comprises an identification classification node, a data acquisition node and a safety analysis node, wherein the three nodes work cooperatively by exchanging data;

the identification classification node: the method comprises the following steps of finishing the work of interaction with a user, basic image identification of all input images to be detected, classification of the images, data sending acquisition tasks and the like;

the data acquisition node: downloading the corresponding container mirror image according to the issued data acquisition task, and extracting the corresponding container mirror image layer data from the downloaded container mirror image;

the security analysis node: and identifying the data of the non-basic mirror layer according to the extracted container mirror layer data, and carrying out security analysis on the data of the non-basic mirror layer.

Preferably, the identification classification node comprises a user interaction module, a basic mirror image identification module and a mirror image classification module;

the user interaction module: receiving user input of the system, and acquiring a mirror image list needing to be detected;

the basic mirror image identification module: judging a basic mirror image on which the current mirror image depends in the mirror image list to obtain an identification result;

the mirror image classification module: and receiving the identification result of the basic mirror image identification module, merging the mirror images depending on the basic mirror images with similar characteristics into a class, and sending a data acquisition task.

Preferably, the data acquisition node comprises a communication scheduling module, a mirror image downloading module and a data extraction module;

the communication scheduling module: receiving a data acquisition task issued by a mirror image classification module;

the mirror image downloading module: according to the data acquisition task, filtering the downloaded repeated mirror image layer, and requesting a mirror image warehouse to download a corresponding container mirror image;

the data extraction module: statically extracting container mirror image layer data corresponding to the downloaded container mirror image;

and the communication scheduling module sends the extracted container mirror layer data to the security analysis node.

Preferably, the security analysis node comprises a communication module and an analysis module;

the communication module: acquiring the extracted container mirror image layer data;

the analysis module: the container mirror image layer data are divided into basic mirror image layer data and non-basic mirror image layer data, the final contents of all files in the non-basic mirror image layer data are determined, the final contents of all files in the non-basic mirror image layer data are analyzed, the difference between the non-basic mirror image layer data and the basic mirror image layer data is compared, and the difference data is analyzed in combination with the basic mirror image safety.

Preferably, the data acquisition task and the container mirror layer data are both transmitted through a distributed task scheduling technology, wherein the data acquisition task is transmitted from the identification classification node to the data acquisition node through the distributed task scheduling technology, and the container mirror layer data is transmitted from the data acquisition node to the security analysis node through the distributed task scheduling technology.

The invention provides a content difference-based container software security detection method, which comprises the following steps:

step 1, finishing the tasks of interaction with a user, basic image identification of all input images to be detected, classification of the images and data acquisition of transmission;

step 2: downloading the corresponding container mirror image according to the issued data acquisition task, and extracting the corresponding container mirror image layer data from the downloaded container mirror image;

and step 3: and identifying the data of the non-basic mirror layer according to the extracted container mirror layer data, and carrying out security analysis on the data of the non-basic mirror layer.

Preferably, the step 1 comprises the following steps:

step 1.1: receiving user input of the system, and acquiring a mirror image list needing to be detected;

step 1.2: judging a basic mirror image on which the current mirror image depends in the mirror image list to obtain an identification result;

step 1.3: and receiving the identification result, merging the images depending on the similar characteristic basic images into a class, and sending a data acquisition task.

Preferably, the step 2 comprises the following steps;

step 2.1: receiving a data acquisition task issued by a mirror image classification module; (ii) a

Step 2.2: according to the data acquisition task, filtering the downloaded repeated mirror image layer, and requesting a mirror image warehouse to download a corresponding container mirror image;

step 2.3: statically extracting container mirror image layer data corresponding to the downloaded container mirror image;

step 2.4: and the communication scheduling module sends the extracted container mirror layer data to the security analysis node.

Preferably, the step 3 comprises the following steps;

step 3.1: acquiring the extracted container mirror image layer data;

step 3.2: the container mirror image layer data are divided into basic mirror image layer data and non-basic mirror image layer data, the final contents of all files in the non-basic mirror image layer data are determined, the final contents of all files in the non-basic mirror image layer data are analyzed, the difference between the non-basic mirror image layer data and the basic mirror image layer data is compared, and the difference data is analyzed in combination with the basic mirror image safety.

Compared with the prior art, the invention has the following beneficial effects:

1. the method clarifies the internal formats of the mirror images of the mainstream containers such as the Docker mirror image and the like through a reverse analysis method, and can position the differential content of the file systems of different mirror images; secondly, a security detection rule base based on a file system is established, and a static analysis technology is utilized to carry out large-scale online detection on main stream mirror images such as Docker and the like; finally, the method develops and forms a prototype system and carries out actual evaluation, and can detect the mirror image resources of the Docker Hub warehouse on line in real time. The result shows that the invention does not need to actually run the mirror image while ensuring the detection effectiveness, greatly saves the calculation and storage resources, and can realize the full-scale safety detection of the massive mirror image in the open warehouse in a short time through the limited resources;

2. the invention can be used for fast static analysis of container mirror security. Firstly, identifying classification nodes, judging container basic images, dividing containers similar to the basic images into a group, distributing the group to data acquisition nodes for centralized downloading, improving the efficiency of container downloading, and reducing the storage space required by downloading. After the data acquisition node finishes downloading, statically extracting container mirror image layer data, and submitting the container mirror image layer data to the security analysis node for analysis. The security analysis node analyzes the portions identified as non-base mirror layers in reverse order of number of layers. And meanwhile, comparing with the non-basic mirror image layer, and only checking the content of the file which appears repeatedly at the last time. Finally, whether the file with the risk in the basic mirror image layer is repaired in the non-basic mirror image layer is checked;

3. the invention can be used for quickly detecting the mirror image security of the container software downloading, can greatly reduce the downloading flow due to the application of the original key technology based on the content difference, and intensively detects the file system content different from the basic security mirror image, thereby being applied to the quick security detection and risk discovery of a large-scale public container software library.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a block diagram of an embodiment of the present invention;

FIG. 2 is a flow chart of an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The embodiment of the invention discloses a content difference-based container software security detection system and a content difference-based container software security detection method, as shown in fig. 1 and fig. 2, the system comprises an identification classification node, a data acquisition node and a security analysis node, and the three nodes exchange data through a message channel distributed task scheduling technology so as to work cooperatively. The distributed task scheduling technology adopts a Gearman distributed task scheduling framework.

Identifying classification nodes: and finishing the tasks of interaction with a user, basic image identification of all input images to be detected, classification of the images, data sending acquisition and the like. The base image refers to the image on which the container image is dependent when created.

The identification classification node comprises a user interaction module, a basic mirror image identification module and a mirror image classification module. A user interaction module: and receiving user input of the system, and acquiring a mirror image list needing to be detected. Basic mirror image identification module: and judging the basic mirror image on which the current mirror image in the mirror image list depends according to a basic mirror image identification algorithm to obtain an identification result. The basic image recognition algorithm is to provide a complete container image relationship by constructing a container image knowledge base, and explicitly indicate the dependency relationship between container images in a tree form. The parent node of the container mirror on the container mirror relationship tree is the base mirror of the container mirror. A mirror image classification module: and receiving the identification result of the basic mirror image identification module, combining the mirror images depending on the basic mirror images with similar characteristics into a class according to a classification algorithm, and sending a data acquisition task through a distributed task scheduling technology. The classification algorithm is to judge the similarity of the basic mirror images of the container mirror images by calculating the similarity of the paths from the root container mirror image node to the current container mirror image node of the container mirror image relationship tree and when the similarity exceeds a set similarity threshold value.

The identification classification node detects a base image of the image using a base image identification algorithm. Through a user interaction module for identifying the classification nodes, a user stores the information of the mirror images of the detection containers into a relational database through a WEB interface, and the identification classification nodes can continuously read the mirror images of the detection containers required from the database.

And identifying and classifying the nodes, grouping the nodes with the similar characteristic basic images into a group by using a classification algorithm, and issuing a data acquisition task to the data acquisition node by taking the group as a unit. The classification algorithm adopts a tree structure to represent the relation between container mirror images, the path from each node to a root node is stored through a database, the Levenshtein algorithm is adopted to calculate the similarity degree of the basic mirror images, a threshold value x is set, and the mirror images exceeding the threshold value can be classified into a group.

A data acquisition node: and downloading the corresponding container mirror image according to the issued data acquisition task, and extracting the corresponding container mirror image layer data from the downloaded container mirror image.

The data acquisition node comprises a communication scheduling module, a mirror image downloading module and a data extraction module. A communication scheduling module: and receiving the data acquisition task issued by the mirror image classification module through a distributed task scheduling technology. The mirror image downloading module: and filtering the downloaded repeated mirror image layer according to the data acquisition task, and requesting the mirror image warehouse to download the corresponding container mirror image. A data extraction module: and statically extracting the container mirror image layer data corresponding to the container mirror image according to a data processing method for the downloaded container mirror image. And the communication scheduling module sends the extracted container mirror layer data to the security analysis node.

After receiving the data acquisition task, the data acquisition node centrally downloads the container mirror image, statically extracts container mirror image layer data after the downloading is completed, and distributes the container mirror image layer data to the security analysis node after the container mirror image layer data is extracted. Preferably, in order to ensure that the container mirror image is downloaded as correctly as possible, the error reason is checked after the download task is in error, and the download task is added into the download queue again for the error caused by the network failure; and if the container mirror image does not exist, and the container mirror image lacks the access right and other errors, discarding the downloading task.

After finishing the downloaded mirror image, extracting container mirror image layer data by the following algorithm:

the data extraction method calculates the hash value of the container mirror layer data for multiple times, and finally obtains the data from the corresponding file according to the calculated hash value:

setting the hash value of a mirror layer needing to extract data as layer (1) -layer (n);

obtaining the mapping of layer (1) -layer (n) in the container software as diff (1) -diff (n) through a mapping file in the container;

calculating a chain-type relation hash value chain (1) -chain (n) of the mirror layer by using diff (1) -diff (n), wherein the calculation formula is chain (n) hash (chain (n-1) diff (n)), and chain (1) diff (1);

and utilizing chain (1) -chain (n) to acquire the corresponding container mirror layer data in the container data storage directory.

Wherein n represents the mirror image nth layer, layer (n) represents the hash value of the mirror image nth layer, diff (n) represents the mapping value obtained by layer (n) in the container software, chain (n) represents the directory hash value corresponding to the mirror image nth layer data obtained by calculation, chain (n-1) represents the directory hash value corresponding to the mirror image nth-1 layer data obtained by calculation, and hash (chain (n-1) diff (n)) represents the hash value of the character string obtained by splicing the directory hash value corresponding to the mirror image nth-1 layer data and the mapping value of the mirror image nth layer in the container software.

And (4) safety analysis node: and identifying the data of the non-basic mirror layer according to the extracted container mirror layer data, and carrying out security analysis on the data of the non-basic mirror layer. The container mirror image layer data is transmitted through a distributed task scheduling technology, and the container mirror image layer data is transmitted to the security analysis node through a communication scheduling module of the data acquisition node through the distributed task scheduling technology.

The security analysis node comprises a communication module and an analysis module. A communication module: and acquiring the extracted container mirror layer data through a distributed task scheduling technology. An analysis module: the method comprises the steps of dividing container mirror image layer data into basic mirror image layer data and non-basic mirror image layer data by a differential analysis method, determining the final contents of all files in the non-basic mirror image layer data, analyzing the final contents of all files in the non-basic mirror image layer data, comparing the difference between the non-basic mirror image layer data and the basic mirror image layer data, and analyzing the difference data by combining the basic mirror image safety. The method comprises the following steps:

s1: the security analysis node receives container mirror image layer data;

s2: through the container basic mirror image, dividing the extracted data of the container basic mirror image layer into basic mirror image data and non-basic mirror image data;

s3: for non-basic mirror layer data, respectively detecting files in the mirror layer from back to front according to the order of the mirror layer;

s4: judging whether the current detected file has the file with the same name and the same path which is already checked, if so, skipping the current file, and if not, continuing checking;

s5: for the data of the basic mirror image layer, judging whether the corresponding basic mirror image has known potential safety hazards, if so, entering S6, otherwise, finishing the current mirror image detection, and storing the detection result into a database;

s6: and determining files containing potential safety hazards, judging whether the files with the same name and the same path are checked in S4, if so, entering S7, otherwise, confirming that the potential safety hazards still exist, finishing the current mirror image detection, and storing the detection result into a database.

S7: and judging whether the security risk exists in the file with the same name and the same path detected in the S4, if so, confirming that the security risk still exists in the file, and otherwise, confirming that the security risk does not exist in the file. And finishing the current mirror image detection and storing the detection result into the database.

The final test results will be stored in a JSON format in a suitable manner in a NoSQL-type database, taking into account possible variations in the structure of the test results.

The security analysis node performs security analysis by using a differential analysis method, sequentially checks data of non-basic mirror image layers from back to front according to the sequence of the mirror image layers, and does not perform secondary check on files with the same name; and for the data of the basic mirror image layer, comparing the data with the potential safety hazard in the data of the non-basic mirror image layer in sequence, and checking whether the potential safety hazard is repaired. The whole process can reduce the number of files needing to be analyzed, so that the detection efficiency is improved.

The container mirror image relation knowledge base is stored in a tree form, and because the container mirror image relation is in a dynamic change process for a long time, container mirror image modification, deletion and addition exist, and the operations can cause the container mirror image relation knowledge base to change, the container mirror image relation knowledge base can be maintained regularly for a long time.

Firstly, a basic mirror image of a mirror image to be detected is obtained by using a basic mirror image recognition algorithm, then the mirror images with the basic mirror images with similar characteristics are combined into a group by using a classification algorithm, a data acquisition task is issued to a data acquisition node by using the recognition classification node according to the group, container mirror image layer data in the container mirror image are statically extracted by the data acquisition node after the data acquisition node finishes downloading, the extracted container mirror image layer data are submitted to a security analysis node, and finally the security analysis node evaluates the security of the container mirror image by using a differential analysis method.

Identifying classification nodes, judging the basic mirror images of the container mirror images and the classification container mirror images, and transferring the classified and grouped data acquisition tasks to the data acquisition nodes to complete; the data acquisition node downloads the corresponding container mirror image from the container warehouse according to the issued task, extracts the data of the container mirror image and transfers the result to the security analysis node; and the security analysis node divides the container mirror image layer data into basic mirror image layer data and non-basic mirror image layer data, and performs security check on the container mirror image data by using a differential analysis method.

The number of the identification classification nodes, the data acquisition nodes and the security analysis nodes in the content difference-based container software security detection system is not limited, and the identification classification nodes, the security analysis nodes and the data acquisition nodes are set as one identification classification node, one security analysis node and a plurality of data acquisition nodes. Meanwhile, each node in the content difference-based container software security detection system can independently operate on one host, and can also share one host with other nodes.

The embodiment of the invention discloses a content difference-based container software security detection method, which comprises the following steps,

the step 1 comprises the following steps: step 1.1: and receiving user input of the system, and acquiring a mirror image list needing to be detected. Step 1.2: and judging the basic mirror image on which the current mirror image in the mirror image list depends to obtain an identification result. Step 1.3: and receiving the identification result, merging the images depending on the similar characteristic basic images into a class, and sending a data acquisition task.

The step 2 comprises the following steps: step 2.1: and receiving a data acquisition task issued by the mirror image classification module. Step 2.2: and filtering the downloaded repeated mirror image layer according to the data acquisition task, and requesting the mirror image warehouse to download the corresponding container mirror image. Step 2.3: and statically extracting container mirror image layer data corresponding to the container mirror image for the downloaded container mirror image. Step 2.4: and the communication scheduling module sends the extracted container mirror layer data to the security analysis node. .

The step 3 comprises the following steps: step 3.1: and the security analysis node acquires the extracted container mirror image layer data. Step 3.2 comprises the following steps: step 3.2.1: through the container basic mirror image, dividing the extracted data of the container basic mirror image layer into basic mirror image data and non-basic mirror image data; step 3.2.2: for non-basic mirror layer data, respectively detecting files in the mirror layer from back to front according to the order of the mirror layer; step 3.2.3: judging whether the current detected file has the file with the same name and the same path which is already checked, if so, skipping the current file, and if not, continuing checking; step 3.2.4: for the data of the basic mirror image layer, judging whether the corresponding basic mirror image has known potential safety hazard, if so, entering step 3.2.5, otherwise, ending the current mirror image detection, and storing the detection result in a database; step 3.2.5: and determining a file containing the potential safety hazard, judging whether the file with the same name and the same path is checked in the step 3.2.3, if so, entering the step 3.2.6, otherwise, confirming that the potential safety hazard still exists, ending the current mirror image detection, and storing a detection result into a database. Step 3.2.6: and (4) judging whether the file with the same name and the same path detected in the step (3.2.3) has potential safety hazard, if so, confirming that the file still has the potential safety hazard, otherwise, confirming that the file does not have the potential safety hazard. And finishing the current mirror image detection and storing the detection result into the database. The final test results will be stored in a JSON format in a suitable manner in a NoSQL-type database, taking into account possible variations in the structure of the test results.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A content difference-based container software security detection system is characterized by comprising an identification classification node, a data acquisition node and a security analysis node, wherein the three nodes work cooperatively by exchanging data;

2. The content differentiation-based container software security detection system according to claim 1, wherein the recognition classification node comprises a user interaction module, a basic image recognition module and an image classification module;

3. The content differentiation-based container software security detection system according to claim 2, wherein the data acquisition node comprises a communication scheduling module, an image downloading module and a data extraction module;

4. The content differentiation-based container software security detection system according to claim 3, characterized in that the security analysis node comprises a communication module and an analysis module;

5. The content difference-based container software security detection system according to claim 1, wherein the data acquisition task and the container mirror layer data are both transmitted by a distributed task scheduling technique, wherein the data acquisition task is transmitted from the identification classification node to the data acquisition node by the distributed task scheduling technique, and the container mirror layer data is transmitted from the data acquisition node to the security analysis node by the distributed task scheduling technique.

6. A content difference-based container software security detection method, which is applied to the content difference-based container software security detection system of claims 1-5, and comprises the following steps:

7. The content differentiation-based container software security detection method according to claim 6, wherein the step 1 comprises the steps of:

8. The content differentiation-based container software security detection method according to claim 7, wherein the step 2 comprises the steps of;

step 2.1: receiving a data acquisition task issued by a mirror image classification module;

9. The content differentiation-based container software security detection method according to claim 8, wherein the step 3 comprises the steps of;

step 3.1: acquiring the extracted container mirror image layer data;

10. The content difference-based container software security detection method according to claim 6, wherein the data acquisition task and the container mirror layer data are both transmitted by a distributed task scheduling technique, wherein the data acquisition task is transmitted from the identification classification node to the data acquisition node by the distributed task scheduling technique, and the container mirror layer data is transmitted from the data acquisition node to the security analysis node by the distributed task scheduling technique.