CN113296784B - Container base mirror image recommendation method and system based on configuration code characterization - Google Patents

Container base mirror image recommendation method and system based on configuration code characterization Download PDF

Info

Publication number
CN113296784B
CN113296784B CN202110539905.6A CN202110539905A CN113296784B CN 113296784 B CN113296784 B CN 113296784B CN 202110539905 A CN202110539905 A CN 202110539905A CN 113296784 B CN113296784 B CN 113296784B
Authority
CN
China
Prior art keywords
container
mirror image
configuration
open source
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110539905.6A
Other languages
Chinese (zh)
Other versions
CN113296784A (en
Inventor
毛新军
张银园
张洋
卢遥
王涛
张璋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110539905.6A priority Critical patent/CN113296784B/en
Publication of CN113296784A publication Critical patent/CN113296784A/en
Application granted granted Critical
Publication of CN113296784B publication Critical patent/CN113296784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a container base mirror image recommendation method and a system based on configuration code characterization, wherein the method comprises the following steps: analyzing the data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file; characterizing each of the functional code segments as an abstract syntax tree structure; obtaining a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node; taking a plurality of structural sequences corresponding to the functional code segments and corresponding leaf nodes as inputs, and taking a basic mirror image corresponding to the functional code segments as an output training neural network model; and obtaining a basic mirror image corresponding to the functional code segment to be recommended according to the trained neural network model. The invention improves the efficiency and accuracy of obtaining the container base mirror image.

Description

Container base mirror image recommendation method and system based on configuration code characterization
Technical Field
The invention relates to the field of container base mirror images, in particular to a container base mirror image recommendation method and system based on configuration code characterization.
Background
In recent years, the Docker container technology has attracted widespread attention in the industry, thanks to the rapid deployment nature of the container technology. However, in the software development process based on the Docker container, configuration file information such as Dockerfile needs to be written. To complete the configuration of the Dockerfile, the developer first needs to specify the base image on which the Dockerfile depends, which often depends on the developer's personal experience. More importantly, the selection of the proper basic mirror image is not only beneficial to reducing the size of the mirror image, but also beneficial to improving the success rate of construction of the mirror image. However, in mirrored hosting communities like Docker Hub, the container search technique relies primarily on the personal experience of the developer.
Disclosure of Invention
The invention aims to provide a container base mirror image recommending method and system based on configuration code characterization, which improve the efficiency and accuracy of obtaining the container base mirror image.
In order to achieve the above object, the present invention provides the following solutions:
a container base image recommendation method based on configuration code characterization, the method comprising:
obtaining a container mirror configuration dataset; the container image configuration data set includes a plurality of container image configuration files;
analyzing the data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file;
characterizing each of the functional code segments as an abstract syntax tree structure;
obtaining a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node;
taking a plurality of structural sequences corresponding to each functional code segment and corresponding leaf nodes as inputs, and taking a basic mirror image corresponding to each functional code segment as an output training neural network model to obtain a container basic mirror image recommendation model;
obtaining a plurality of structural sequences and corresponding leaf nodes of the functional code segments to be recommended;
and inputting a plurality of structural sequences of the functional code segments to be recommended and corresponding leaf nodes into the container base image recommendation model to obtain the base image corresponding to the functional code segments to be recommended.
Optionally, the obtaining a container mirror configuration data set specifically includes:
acquiring an open source item set;
screening out items comprising mirror configuration files from the open source item set to obtain a container mirror database;
and eliminating repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Optionally, the obtaining the open source item set specifically includes:
and screening open source projects with star indexes larger than a first set value and Issue indexes larger than a second set value from the open source community code hosting platform to obtain an open source project set.
Optionally, the removing the repeated container mirror configuration files in the container mirror database to obtain a container mirror configuration data set formed by a plurality of container mirror configuration files with different contents specifically includes:
obtaining hash values of all the container mirror files in a container mirror database;
and eliminating repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Optionally, the neural network model is a neural network model based on an attention mechanism.
The invention also discloses a container base mirror image recommendation system based on configuration code characterization, which comprises:
the data set acquisition module is used for acquiring a container mirror image configuration data set; the container image configuration data set includes a plurality of container image configuration files;
the data analysis module is used for analyzing the data in each container mirror image configuration file in the container mirror image configuration data set to obtain functional code fragments and basic mirrors corresponding to each container mirror image configuration file;
the code segment characterization module is used for characterizing each functional code segment into an abstract syntax tree structure;
the multi-path acquisition module is used for acquiring a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node;
the container base mirror image recommendation model training module is used for taking a plurality of structural sequences corresponding to each functional code segment and corresponding leaf nodes as inputs, taking a base mirror image corresponding to each functional code segment as an output training neural network model, and obtaining a container base mirror image recommendation model;
the input feature acquisition module is used for acquiring a plurality of structural sequences of the functional code fragments to be recommended and corresponding leaf nodes;
and the container base mirror image recommendation model application module is used for inputting the multiple structural sequences of the functional code fragments to be recommended and the corresponding leaf nodes into the container base mirror image recommendation model to obtain the base mirror images corresponding to the functional code fragments to be recommended.
Optionally, the data set acquisition module specifically includes:
the open source item set acquisition unit is used for acquiring an open source item set;
the container mirror image database acquisition unit is used for screening out items comprising mirror image configuration files from the open source item set to acquire a container mirror image database;
the container mirror image configuration data set obtaining unit is used for removing repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Optionally, the open source item set acquisition unit specifically includes:
the open source project set acquisition subunit is used for screening open source projects with star indexes larger than a first set value and Issue indexes larger than a second set value from the open source community code hosting platform to obtain the open source project set.
Optionally, the container mirror configuration data set obtaining unit specifically includes:
a hash value obtaining subunit, configured to obtain hash values of the container image files in the container image database;
and the repeated eliminating subunit is used for eliminating repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Optionally, the neural network model is a neural network model based on an attention mechanism.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention relates to a container base mirror image recommending method and a system based on configuration code characterization, which are characterized in that functional code segments are characterized as abstract syntax tree structures, semantic and structural characteristics of configuration information are obtained from the abstract syntax tree structures, a plurality of structural sequences corresponding to the functional code segments and corresponding leaf nodes are used as inputs, a base mirror image corresponding to the functional code segments is used as an output training neural network model, a container base mirror image recommending model is obtained, and the base mirror image corresponding to the functional code segments to be recommended is obtained according to the container base mirror image recommending model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for recommending container base images based on configuration code characterization;
FIG. 2 is a schematic diagram of a configuration code representation-based container base image recommendation system;
FIG. 3 is a detailed flowchart of a method for recommending container base images based on configuration code characterization according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a container base mirror image recommending method and system based on configuration code characterization, which improve the efficiency and accuracy of obtaining the container base mirror image.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
FIG. 1 is a schematic flow chart of a container base image recommending method based on configuration code representation, and as shown in FIG. 1, the container base image recommending method based on configuration code representation comprises the following steps:
step 101: obtaining a container mirror configuration dataset; the container image configuration data set includes a plurality of container image configuration files.
The obtaining a container mirror configuration data set specifically includes:
an open source set of items is obtained.
And screening out the items comprising the mirror configuration file from the open source item set to obtain a container mirror database.
And eliminating repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
The obtaining the open source item set specifically includes:
and screening open source projects with star indexes larger than a first set value and Issue indexes larger than a second set value from the open source community code hosting platform to obtain an open source project set. The reliability of the screened open source items is improved through star indexes and Issue indexes, so that the reliability of a training model taking the screened open source items as sample data is improved.
The step of eliminating repeated container mirror configuration files in the container mirror database to obtain a container mirror configuration data set formed by a plurality of container mirror configuration files with different contents comprises the following steps:
and obtaining the hash value of each container image file in the container image database.
And eliminating repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Step 102: analyzing the data in each container mirror image configuration file in the container mirror image configuration data set to obtain functional code fragments and basic mirrors corresponding to each container mirror image configuration file.
Step 103: each of the functional code segments is characterized as an abstract syntax tree structure.
Step 104: a plurality of paths of the abstract syntax tree structure from the root node to each leaf node are obtained, each path comprising a structure sequence from the root node to a corresponding leaf node and a corresponding leaf node.
Step 105: and taking a plurality of structural sequences corresponding to the functional code segments and corresponding leaf nodes as inputs, and taking a basic mirror image corresponding to the functional code segments as an output training neural network model to obtain a container basic mirror image recommendation model.
The neural network model is a neural network model based on an attention mechanism.
Step 106: a plurality of structural sequences of functional code segments to be recommended and corresponding leaf nodes are obtained.
Step 107: and inputting a plurality of structural sequences of the functional code segments to be recommended and corresponding leaf nodes into the container base image recommendation model to obtain the base image corresponding to the functional code segments to be recommended.
The detailed description of the method for recommending the container base image based on the configuration code representation is provided below, and the detailed flow diagram of the method for recommending the container base image based on the configuration code representation is shown in fig. 3.
S1: and constructing an active open source project set according to indexes such as star, issue and the like of the open source community code hosting platform.
S2: based on the active open source item set obtained in the step S1, whether the open source item contains a Dockerfire image configuration file is checked by using an API (application programming interface), the open source item containing image configuration is screened out, and a container image database is constructed according to Dockerfire image configuration data contained in the open source item containing image configuration.
S3: based on the container mirror image data set obtained in the step S1, removing the repeated container Dockerfile, only retaining container data with different Dockerfile contents, and analyzing the container configuration file Dockerfile to obtain a functional code segment X and a basic mirror image Y.
S4: characterizing the functional code segment X obtained in the step S3 into an abstract syntax tree structure, and acquiring a plurality of paths from a root node to leaf nodes based on an AST (abstract syntax tree) structure.
S5: splitting each path obtained in the step S4 into a structure sequence and leaf nodes, taking the leaf nodes corresponding to the structure sequence and the structure sequence as characteristics, training a neural network model of a multi-code attention mechanism based on a basic mirror image Y obtained in the step S4 as a label (output), wherein the model (container basic mirror image recommendation model) can be used for predicting a basic mirror image according to Dockerfire functional code segments.
In the present invention, the step S1 includes the following:
s1.1: in the collaborative development community Github, basic information data of the project is collected by using an API, and the flow open source project is screened according to a star index.
S1.2: and screening out active open source projects from the popular open source projects according to Issue index data submitted by a developer, and constructing an active open source project set.
In the present invention, the step S2 includes the following:
s2.1: and (3) acquiring file name information contained in the project according to the active open source project set obtained in the step (S1), and eliminating the project set which does not contain the mirror configuration file.
S2.2: and traversing the mirror configuration information of the rest project data sets to construct a container mirror configuration data set.
In the present invention, step S3 includes the following:
s3.1: traversing the content of each configuration file of the data set, and removing repeated mirror configuration data to obtain a mirror configuration data set.
S3.2: and analyzing the instruction information of the Dockerfile image configuration file of the container, and extracting functional instruction data (except the FROM instruction) X and basic image instruction data, namely a basic image name Y declared by the FROM instruction.
In the present invention, step S4 includes the following:
s4.1: the common Dockerfile functional instruction data X is analyzed into an AST structure (a root node is DOCKER-FILE, a state node is abstract instruction or command information, and a leaf node is PACKAGE or ARG information).
S4.2: traversing the abstract syntax tree structure of each Dockerf file to obtain a plurality of syntax paths, wherein each path is a node information set from a root node to a leaf node.
In the present invention, step S5 includes the following:
s5.1: each path can be split into a structural sequence between the root node and the leaf node and semantic information expressed by the leaf node.
S5.2: the structure sequence and the semantic information features are input into a model, the basic mirror name is input into the model as a label, and a basic mirror automatic recommendation model (container basic mirror recommendation model) is obtained through training, wherein the model can be used for automatically recommending basic mirrors for Dockerfile only containing functional code fragments.
The invention achieves the following technical effects:
the method proposes a method of recommending mirroring based on structured Dockerfile functional fragments. By characterizing the functional fragments in the Dockerfile in the form of abstract syntax trees, the semantics and structural features of the configuration information can be obtained, and the attention mechanisms in the neural network can capture important paths, thereby recommending the correct base mirror image. The method can effectively assist the majority of developers to automatically select the proper basic mirror image, and improves the efficiency of container configuration.
The following describes a container base image recommendation method based on configuration code characterization according to a specific embodiment.
S1, constructing an active open source project set.
For an open source community (taking Github as an example), open source items with star indexes larger than 10 and Issue indexes larger than 10 are screened out, and the open source items meeting the requirements are used as an open source item set.
S2: a container mirror database is constructed.
Traversing each file of the item, and eliminating the item if the item does not contain the file with the end of the Dockerf file suffix. And for the Dockerfile file with repeated rejection contents, the Dockerfile file is added into a final container mirror image database only if the hash value of the content of the Dockerfile file is not found by acquiring the hash value of the content of the Dockerfile file.
S3: decimating functional fragments and base images
For extracting functional fragments and base images, removing annotation information (rows at// beginning), and for data beginning with a FROM instruction, extracting names of the base images by a name/name (version) tuple; instruction data other than FROM considers functional code segments.
S4: AST characterizes and acquires paths
According to the information type of the instruction, the functional code segment is characterized as an AST grammar tree structure, specifically, a plurality of paths are sequentially acquired in a depth-first mode, common instruction contents such as APT-GET-INSTALL are characterized as state nodes, and PACKAGE or ARG information such as GCC-Y is characterized as leaf node information.
Paths x in each Dockefile functional fragment i Can be characterized asThe path sequence (structure sequence) of each path is denoted as s i ,/> Representing root node->Leaf nodes that characterize semantic information.
Each Dockerfile functional fragment can be characterized as<x 1 ,x 2 …x k >A set of multiple paths, k representing the number of paths.
Pair state node sequence (Structure sequence)The whole is encoded.
Representing structural sequence coding using an embedding matrix Es, encode_sequence (s i )=E s
For leaf nodes, sub-information can be split according to the' \partition information, and a learned embedded matrix E is used subtoken Representing the encoding of each sub-information. The encoded vectors of the sub-information are then summed to represent the encoding of the complete leaf node:
where t represents a leaf node.
Coding of root node, coding of structural sequence andthe coding of leaf nodes is connected into a new vector z iWherein (1)>Coding representing root node->Coding representing a structural sequence->Representing the coding of the leaf node.
Z corresponding to each path i The calculation of how the learning at the fully connected layer is combined is expressed as:where W represents a weight matrix and tanh () represents an activation function.
Each of which is provided withIs of the attention weight alpha i Denoted as->Wherein the attention vector alpha epsilon R 2d Randomly initializing and learning simultaneously with the network (neural network model based on the attention mechanism), k represents the number of paths, R 2d The representation dimension is 2d.
The linear combination of (a) is expressed as: />
Predictions of the neural network model based on the attention mechanism are calculated as (softmax normalized) dot products between the Dockerfile vector and each base mirror label, respectively.
Q represents the number of base images, image_tag i′ Represents the i' th base mirror image, v T Represents the transpose of v, q (y i′ ) Representing image_tag i′ Corresponding distribution probability, image_tag with maximum distribution probability i′ And the base mirror image Y corresponding to v.
S5: splitting each path in the container mirror image configuration data set obtained in the step S4 into a structure sequence and leaf nodes, taking the leaf nodes corresponding to the structure sequence and the structure sequence as characteristics, taking a basic mirror image as a label (output) to train a neural network model of a multi-code attention mechanism, and predicting a basic mirror image according to a Dockerfire functional code segment through the neural network model of the attention mechanism (container basic mirror image recommendation model).
FIG. 2 is a schematic structural diagram of a container base image recommendation system based on configuration code representation according to the present invention, and as shown in FIG. 2, a container base image recommendation system based on configuration code representation includes:
a data set acquisition module 201 for acquiring a container image configuration data set; the container image configuration data set includes a plurality of container image configuration files.
The data analysis module 202 is configured to analyze data in each container image configuration file in the container image configuration data set, and obtain a functional code segment and a base image corresponding to each container image configuration file.
A code segment characterization module 203, configured to characterize each of the functional code segments into an abstract syntax tree structure.
A multi-path obtaining module 204, configured to obtain a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, and each path includes a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node.
The container base mirror image recommendation model training module 205 is configured to obtain a container base mirror image recommendation model by taking a plurality of structure sequences corresponding to each functional code segment and corresponding leaf nodes as inputs, and taking a base mirror image corresponding to each functional code segment as an output to train a neural network model.
The input feature acquisition module 206 is configured to acquire a plurality of structural sequences of the functional code segments to be recommended and corresponding leaf nodes.
The container base image recommendation model application module 207 is configured to input the multiple structure sequences of the functional code segments to be recommended and the corresponding leaf nodes into the container base image recommendation model, and obtain a base image corresponding to the functional code segments to be recommended.
The data set acquisition module 201 specifically includes:
and the open source item set acquisition unit is used for acquiring the open source item set.
And the container mirror image database acquisition unit is used for screening out the items comprising the mirror image configuration file from the open source item set to acquire the container mirror image database.
The container mirror image configuration data set obtaining unit is used for removing repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
The open source item set acquisition unit specifically comprises:
the open source project set acquisition subunit is used for screening open source projects with star indexes larger than a first set value and Issue indexes larger than a second set value from the open source community code hosting platform to obtain the open source project set.
The container mirror image configuration data set acquisition unit specifically comprises:
and the hash value acquisition subunit is used for acquiring the hash value of each container image file in the container image database.
And the repeated eliminating subunit is used for eliminating repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
The neural network model is a neural network model based on an attention mechanism.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (10)

1. A container base image recommendation method based on configuration code characterization, the method comprising:
obtaining a container mirror configuration dataset; the container image configuration data set includes a plurality of container image configuration files;
analyzing the data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file;
characterizing each of the functional code segments as an abstract syntax tree structure;
obtaining a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node;
taking a plurality of structural sequences corresponding to each functional code segment and corresponding leaf nodes as inputs, and taking a basic mirror image corresponding to each functional code segment as an output training neural network model to obtain a container basic mirror image recommendation model;
obtaining a plurality of structural sequences and corresponding leaf nodes of the functional code segments to be recommended;
and inputting a plurality of structural sequences of the functional code segments to be recommended and corresponding leaf nodes into the container base image recommendation model to obtain the base image corresponding to the functional code segments to be recommended.
2. The method for recommending container base images based on configuration code characterization according to claim 1, wherein the obtaining a container image configuration data set specifically comprises:
acquiring an open source item set;
screening out items comprising mirror configuration files from the open source item set to obtain a container mirror database;
and eliminating repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
3. The method for recommending container base images based on configuration code characterization according to claim 2, wherein the step of obtaining the open source item set specifically comprises the steps of:
and screening open source projects with star indexes larger than a first set value and Issue indexes larger than a second set value from the open source community code hosting platform to obtain an open source project set.
4. The method for recommending container base images based on configuration code characterization according to claim 2, wherein the step of eliminating repeated container image configuration files in the container image database to obtain a container image configuration data set composed of a plurality of container image configuration files with different contents comprises the following steps:
obtaining hash values of all the container mirror files in a container mirror database;
and eliminating repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
5. The configuration code characterization based container base image recommendation method according to claim 1, wherein the neural network model is an attention mechanism based neural network model.
6. A container base image recommendation system based on configuration code characterization, the system comprising:
the data set acquisition module is used for acquiring a container mirror image configuration data set; the container image configuration data set includes a plurality of container image configuration files;
the data analysis module is used for analyzing the data in each container mirror image configuration file in the container mirror image configuration data set to obtain functional code fragments and basic mirrors corresponding to each container mirror image configuration file;
the code segment characterization module is used for characterizing each functional code segment into an abstract syntax tree structure;
the multi-path acquisition module is used for acquiring a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node;
the container base mirror image recommendation model training module is used for taking a plurality of structural sequences corresponding to each functional code segment and corresponding leaf nodes as inputs, taking a base mirror image corresponding to each functional code segment as an output training neural network model, and obtaining a container base mirror image recommendation model;
the input feature acquisition module is used for acquiring a plurality of structural sequences of the functional code fragments to be recommended and corresponding leaf nodes;
and the container base mirror image recommendation model application module is used for inputting the multiple structural sequences of the functional code fragments to be recommended and the corresponding leaf nodes into the container base mirror image recommendation model to obtain the base mirror images corresponding to the functional code fragments to be recommended.
7. The container base image recommendation system based on configuration code characterization of claim 6, wherein the data set acquisition module specifically comprises:
the open source item set acquisition unit is used for acquiring an open source item set;
the container mirror image database acquisition unit is used for screening out items comprising mirror image configuration files from the open source item set to acquire a container mirror image database;
the container mirror image configuration data set obtaining unit is used for removing repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
8. The container base image recommendation system based on configuration code characterization according to claim 7, wherein the open source item set obtaining unit specifically comprises:
the open source project set acquisition subunit is used for screening open source projects with star indexes larger than a first set value and Issue indexes larger than a second set value from the open source community code hosting platform to obtain the open source project set.
9. The container base image recommendation system based on configuration code characterization according to claim 7, wherein the container image configuration data set obtaining unit specifically comprises:
a hash value obtaining subunit, configured to obtain hash values of the container image files in the container image database;
and the repeated eliminating subunit is used for eliminating repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
10. The configuration code characterization based container base image recommendation system according to claim 6, wherein the neural network model is an attention mechanism based neural network model.
CN202110539905.6A 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization Active CN113296784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110539905.6A CN113296784B (en) 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110539905.6A CN113296784B (en) 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization

Publications (2)

Publication Number Publication Date
CN113296784A CN113296784A (en) 2021-08-24
CN113296784B true CN113296784B (en) 2023-11-14

Family

ID=77322600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110539905.6A Active CN113296784B (en) 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization

Country Status (1)

Country Link
CN (1) CN113296784B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114327753A (en) * 2021-12-13 2022-04-12 中国人民解放军国防科技大学 Method, device, equipment and medium for predicting container construction result

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250427A (en) * 2016-07-25 2016-12-21 浪潮(北京)电子信息产业有限公司 A kind of generation method and system of container image recommendation information
CN110221900A (en) * 2019-06-05 2019-09-10 中国科学院软件研究所 A kind of Dockerfile foundation image version information method for automatically completing and device
CN111079014A (en) * 2019-12-17 2020-04-28 携程计算机技术(上海)有限公司 Recommendation method, system, medium and electronic device based on tree structure
CN111459491A (en) * 2020-03-17 2020-07-28 南京航空航天大学 Code recommendation method based on tree neural network
CN112035165A (en) * 2020-08-26 2020-12-04 山谷网安科技股份有限公司 Code clone detection method and system based on homogeneous network
CN112181584A (en) * 2019-07-02 2021-01-05 国际商业机器公司 Optimizing image reconstruction for container warehouses

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10545733B2 (en) * 2018-06-25 2020-01-28 Hcl Technologies Ltd. Code reusability
US10983761B2 (en) * 2019-02-02 2021-04-20 Microsoft Technology Licensing, Llc Deep learning enhanced code completion system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250427A (en) * 2016-07-25 2016-12-21 浪潮(北京)电子信息产业有限公司 A kind of generation method and system of container image recommendation information
CN110221900A (en) * 2019-06-05 2019-09-10 中国科学院软件研究所 A kind of Dockerfile foundation image version information method for automatically completing and device
CN112181584A (en) * 2019-07-02 2021-01-05 国际商业机器公司 Optimizing image reconstruction for container warehouses
CN111079014A (en) * 2019-12-17 2020-04-28 携程计算机技术(上海)有限公司 Recommendation method, system, medium and electronic device based on tree structure
CN111459491A (en) * 2020-03-17 2020-07-28 南京航空航天大学 Code recommendation method based on tree neural network
CN112035165A (en) * 2020-08-26 2020-12-04 山谷网安科技股份有限公司 Code clone detection method and system based on homogeneous network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向Dockerfile的容器镜像构建工具;耿朋;陈伟;魏峻;;计算机系统应用(第11期);全文 *

Also Published As

Publication number Publication date
CN113296784A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN108536679B (en) Named entity recognition method, device, equipment and computer readable storage medium
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
WO2022198868A1 (en) Open entity relationship extraction method, apparatus and device, and storage medium
CN111259851B (en) Multi-mode event detection method and device
CN111581966A (en) Context feature fusion aspect level emotion classification method and device
CN112800203B (en) Question-answer matching method and system fusing text representation and knowledge representation
CN113553850A (en) Entity relation extraction method based on ordered structure encoding pointer network decoding
CN108763211A (en) The automaticabstracting and system of knowledge are contained in fusion
CN108363685B (en) Self-media data text representation method based on recursive variation self-coding model
CN114880307A (en) Structured modeling method for knowledge in open education field
CN113296784B (en) Container base mirror image recommendation method and system based on configuration code characterization
CN116541492A (en) Data processing method and related equipment
CN112015890B (en) Method and device for generating movie script abstract
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
KR102269606B1 (en) Method, apparatus and computer program for analyzing new contents for solving cold start
CN116662566A (en) Heterogeneous information network link prediction method based on contrast learning mechanism
EP4064038B1 (en) Automated generation and integration of an optimized regular expression
CN113239143B (en) Power transmission and transformation equipment fault processing method and system fusing power grid fault case base
CN113591955B (en) Method, system, equipment and medium for extracting global information of graph data
CN115186085A (en) Reply content processing method and interaction method of media content interaction content
US20230062307A1 (en) Smart document management
CN115129849A (en) Method and device for acquiring topic representation and computer readable storage medium
CN114330701A (en) Model training method, device, computer equipment, storage medium and program product
CN113590780B (en) Feedback type dialogue intention acquisition method based on trigger type rule
CN117521674B (en) Method, device, computer equipment and storage medium for generating countermeasure information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant