CN113296784A - Container base mirror image recommendation method and system based on configuration code representation - Google Patents

Container base mirror image recommendation method and system based on configuration code representation Download PDF

Info

Publication number
CN113296784A
CN113296784A CN202110539905.6A CN202110539905A CN113296784A CN 113296784 A CN113296784 A CN 113296784A CN 202110539905 A CN202110539905 A CN 202110539905A CN 113296784 A CN113296784 A CN 113296784A
Authority
CN
China
Prior art keywords
container
mirror image
image configuration
container mirror
open source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110539905.6A
Other languages
Chinese (zh)
Other versions
CN113296784B (en
Inventor
毛新军
张银园
张洋
卢遥
王涛
张璋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110539905.6A priority Critical patent/CN113296784B/en
Publication of CN113296784A publication Critical patent/CN113296784A/en
Application granted granted Critical
Publication of CN113296784B publication Critical patent/CN113296784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a system for recommending a container base mirror image based on configuration code representation, wherein the method comprises the following steps: analyzing data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file; characterizing each of the functional code fragments as an abstract syntax tree structure; obtaining a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to the corresponding leaf node and the corresponding leaf node; taking a plurality of structure sequences corresponding to each functional code segment and corresponding leaf nodes as input, and taking a basic mirror image corresponding to each functional code segment as output to train a neural network model; and obtaining a basic mirror image corresponding to the functional code segment to be recommended according to the trained neural network model. The invention improves the efficiency and the accuracy of acquiring the container basic mirror image.

Description

Container base mirror image recommendation method and system based on configuration code representation
Technical Field
The invention relates to the field of container base mirror images, in particular to a container base mirror image recommendation method and system based on configuration code representation.
Background
In recent years, the Docker container technology has attracted a great deal of attention in the industry, thanks to the rapid deployment nature of the container technology. However, in the software development process based on the Docker container, configuration file information such as Dockerfile needs to be written. To complete the configuration of a Dockerfile, a developer first needs to specify the base image on which the Dockerfile depends, which often depends on the developer's personal experience. More importantly, the selection of the proper basic image is not only beneficial to reducing the size of the image, but also beneficial to improving the construction power of the image. However, in a mirror hosting community like Docker Hub, the container search technique relies heavily on the personal experience of the developer.
Disclosure of Invention
The invention aims to provide a container base mirror image recommendation method and system based on configuration code representation, and the efficiency and the accuracy of container base mirror image acquisition are improved.
In order to achieve the purpose, the invention provides the following scheme:
a method for recommendation of a container base image based on configuration code characterization, the method comprising:
obtaining a container mirror image configuration data set; the container image configuration dataset comprises a plurality of container image configuration files;
analyzing data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file;
characterizing each of the functional code fragments as an abstract syntax tree structure;
obtaining a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to the corresponding leaf node and the corresponding leaf node;
taking a plurality of structure sequences corresponding to each functional code segment and corresponding leaf nodes as input, and taking a basic mirror image corresponding to each functional code segment as output to train a neural network model, so as to obtain a container basic mirror image recommendation model;
obtaining a plurality of structural sequences of a functional code segment to be recommended and corresponding leaf nodes;
and inputting the plurality of structural sequences of the functional code segments to be recommended and the corresponding leaf nodes into the container basic mirror image recommendation model to obtain the basic mirror image corresponding to the functional code segments to be recommended.
Optionally, the obtaining a container mirror configuration data set specifically includes:
obtaining an open source project set;
screening out items comprising mirror image configuration files from the open source item set to obtain a container mirror database;
and removing repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set consisting of a plurality of container mirror image configuration files with different contents.
Optionally, the obtaining the open-source item set specifically includes:
and screening the open source items of which the star indexes are greater than a first set value and the Issue indexes are greater than a second set value from the open source community code hosting platform to obtain an open source item set.
Optionally, the removing of the repeated container mirror image configuration files in the container mirror database to obtain a container mirror image configuration data set composed of a plurality of container mirror image configuration files with different contents specifically includes:
obtaining the hash value of each container mirror image file in a container mirror database;
and removing repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Optionally, the neural network model is an attention-based neural network model.
The invention also discloses a container base mirror image recommendation system based on the configuration code representation, which comprises the following steps:
the data set acquisition module is used for acquiring a container mirror image configuration data set; the container image configuration dataset comprises a plurality of container image configuration files;
the data analysis module is used for analyzing data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file;
a code segment representation module for representing each of the functional code segments into an abstract syntax tree structure;
a multi-path obtaining module, configured to obtain multiple paths of the abstract syntax tree structure from a root node to each leaf node, where each path includes a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node;
the container basic mirror image recommendation model training module is used for training a neural network model by taking a plurality of structure sequences corresponding to the functional code segments and corresponding leaf nodes as input and taking a basic mirror image corresponding to the functional code segments as output to obtain a container basic mirror image recommendation model;
the input characteristic acquisition module is used for acquiring a plurality of structural sequences of the functional code segments to be recommended and corresponding leaf nodes;
and the container basic mirror image recommendation model application module is used for inputting the plurality of structure sequences and the corresponding leaf nodes of the functional code segments to be recommended into the container basic mirror image recommendation model to obtain the basic mirror image corresponding to the functional code segments to be recommended.
Optionally, the data set obtaining module specifically includes:
the open source project set acquisition unit is used for acquiring an open source project set;
a container mirror database acquisition unit, configured to filter out items including mirror configuration files from the open source item set, and acquire a container mirror database;
and the container mirror image configuration data set acquisition unit is used for eliminating repeated container mirror image configuration files in the container mirror image database and acquiring a container mirror image configuration data set consisting of a plurality of container mirror image configuration files with different contents.
Optionally, the open-source item set obtaining unit specifically includes:
and the open source item set acquisition subunit is used for screening open source items of which the star indexes are greater than a first set value and the Issue indexes are greater than a second set value from the open source community code hosting platform to obtain an open source item set.
Optionally, the container mirror image configuration data set obtaining unit specifically includes:
the hash value acquisition subunit is used for acquiring the hash value of each container image file in the container mirror database;
and the repeated removing subunit is used for removing repeated container mirror image configuration files in the container mirror image database according to the hash values of the container mirror image files to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Optionally, the neural network model is an attention-based neural network model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention relates to a container basic mirror image recommendation method and system based on configuration code representation, which are characterized in that a functional code segment is represented as an abstract syntax tree structure, semantic and structural characteristics of configuration information are obtained from the abstract syntax tree structure, a plurality of structural sequences and corresponding leaf nodes corresponding to the functional code segment are taken as input, a basic mirror image corresponding to the functional code segment is taken as output training neural network model, a container basic mirror image recommendation model is obtained, a basic mirror image corresponding to the functional code segment to be recommended is obtained according to the container basic mirror image recommendation model, and compared with the traditional method of selecting the basic mirror image according to personal experience, the efficiency and the accuracy of obtaining the container basic mirror image are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a container-based mirror image recommendation method based on configuration code characterization according to the present invention;
FIG. 2 is a schematic structural diagram of a container-based mirror image recommendation system based on configuration code characterization according to the present invention;
FIG. 3 is a detailed flowchart of a container-based mirror image recommendation method based on configuration code characterization according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a container base mirror image recommendation method and system based on configuration code representation, and the efficiency and the accuracy of container base mirror image acquisition are improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic flow chart of a container-based image recommendation method based on configuration code representation according to the present invention, and as shown in fig. 1, a container-based image recommendation method based on configuration code representation includes:
step 101: obtaining a container mirror image configuration data set; the container image configuration data set includes a plurality of container image configuration files.
The obtaining of the container mirror image configuration data set specifically includes:
an open source item set is obtained.
And screening out items comprising mirror image configuration files from the open source item set to obtain a container mirror image database.
And removing repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set consisting of a plurality of container mirror image configuration files with different contents.
The obtaining of the open source item set specifically includes:
and screening the open source items of which the star indexes are greater than a first set value and the Issue indexes are greater than a second set value from the open source community code hosting platform to obtain an open source item set. The reliability of the screened open source project is improved through the star index and the Issue index, and therefore the reliability of the training model taking the screened open source project as sample data is improved.
The removing of the repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set composed of a plurality of container mirror image configuration files with different contents specifically includes:
and obtaining the hash value of each container image file in the container image database.
And removing repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
Step 102: and analyzing the data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file.
Step 103: each of the functional code fragments is characterized as an abstract syntax tree structure.
Step 104: and obtaining a plurality of paths of the abstract syntax tree structure from the root node to each leaf node, wherein each path comprises a structure sequence from the root node to the corresponding leaf node and the corresponding leaf node.
Step 105: and taking a plurality of structure sequences corresponding to each functional code segment and corresponding leaf nodes as input, and taking a basic mirror image corresponding to each functional code segment as output to train a neural network model, so as to obtain a container basic mirror image recommendation model.
The neural network model is based on an attention mechanism.
Step 106: and obtaining a plurality of structural sequences and corresponding leaf nodes of the functional code segments to be recommended.
Step 107: and inputting the plurality of structural sequences of the functional code segments to be recommended and the corresponding leaf nodes into the container basic mirror image recommendation model to obtain the basic mirror image corresponding to the functional code segments to be recommended.
The following describes in detail a container-based image recommendation method based on configuration code representation according to the present invention, and a detailed flowchart of the container-based image recommendation method based on configuration code representation is shown in fig. 3.
S1: and constructing an active open source project set according to indexes such as star and Issue of the open source community code hosting platform.
S2: based on the active open source project set obtained in step S1, an API (application programming interface) is used to check whether the open source project includes a Dockerfile mirror configuration file, screen out an open source project including mirror configuration, and construct a container mirror database according to Dockerfile mirror configuration data included in the open source project including mirror configuration.
S3: based on the container mirror image dataset obtained in step S1, removing duplicate container Dockerfile, only retaining container data with different Dockerfile contents, and analyzing the container configuration file Dockerfile to obtain a functional code segment X and a base mirror image Y.
S4: the functional code segment X obtained in step S3 is characterized into an abstract syntax tree structure, and a plurality of paths from the root node to the leaf nodes are acquired based on the AST (abstract syntax tree) structure.
S5: splitting each path obtained in the step S4 into a structure sequence and leaf nodes, taking the leaf nodes corresponding to the structure sequence and the structure sequence as features, training a neural network model of a multi-coded attention mechanism based on the basic mirror image Y obtained in the step S4 as a label (output), and the model (container basic mirror image recommendation model) can be used for predicting a basic mirror image according to a Dockerfile functional code segment.
In the present invention, the step S1 includes the following steps:
s1.1: in a collaborative development community GitHub, basic information data of a project is collected by using an API, and a popular open source project is screened out according to a star index.
S1.2: and screening out active open source projects from the popular open source projects according to the Issue index data submitted by the developers, and constructing an active open source project set.
In the present invention, the step S2 includes the following steps:
s2.1: and according to the active open source project set obtained in the step S1, acquiring the file name information contained in the project, and removing the project set which does not contain the mirror image configuration file.
S2.2: and traversing mirror image configuration information of the residual project data sets, and constructing a container mirror image configuration data set.
In the present invention, step S3 includes the following:
s3.1: and traversing the content of each configuration file of the data set, and removing repeated mirror image configuration data to obtain a mirror image configuration data set.
S3.2: and analyzing the instruction information of the Dockerfile image configuration file of the container, and extracting functional instruction data (except FROM instruction) X and basic image instruction data, namely the basic image name Y declared by the FROM instruction.
In the present invention, step S4 includes the following:
s4.1: the common Dockerfile functional instruction data X is analyzed into an AST structure (root node is DOCKER-FILE, state node is abstract instruction or command information, leaf node is information such as packet or ARG).
S4.2: and traversing the abstract syntax tree structure of each Dockerfile to obtain a plurality of syntax paths, wherein each path is a node information set from a root node to a leaf node.
In the present invention, step S5 includes the following:
s5.1: each path may be split into a structural sequence between the root node and the leaf nodes and semantic information expressed by the leaf nodes.
S5.2: the structural sequence and semantic information characteristics are input into a model, the basic mirror image name is input into the model as a label, and a basic mirror image automatic recommendation model (container basic mirror image recommendation model) is obtained through training and can be used for automatically recommending the basic mirror image for Dockerfile only containing functional code segments.
The invention achieves the following technical effects:
the method proposes a method for recommending the mirror image according to the structured Dockerfile functional segment. By representing the functional segments in the Dockerfile in the form of abstract syntax trees, the semantic and structural characteristics of the configuration information can be acquired, and the attention mechanism in the neural network can capture important paths, so that a correct basic mirror image is recommended. The method can effectively assist developers to automatically select the appropriate basic mirror image, and improves the container configuration efficiency.
The following describes a container-based image recommendation method based on configuration code characterization according to a specific embodiment of the present invention.
S1, constructing an active open source project set.
For an open source community (for example, GitHub), an open source project with a star index greater than 10 and an Issue index greater than 10 is screened out, and the open source project meeting the requirements is used as an open source project set.
S2: a container mirror database is constructed.
And traversing each file of the item, and if the item does not contain the file at the end of the Dockerfile suffix, rejecting the item. For the Dockerfile with the removed content being repeated, the Dockerfile file is added into the final container mirror database only if the hash value does not appear by acquiring the hash value of the content of the Dockerfile.
S3: extracting functional segments and base images
For extracting functional fragments and basic images, removing annotation information (rows at// head), and for data at the head of FROM instruction, extracting the name of the basic image by using a namespace/name (version) tuple; instruction data other than FROM is considered a functional code segment.
S4: AST characterization and acquisition path
According to the information type of the instruction, the functional code segments are characterized into an AST syntax tree structure, specifically, a depth-first mode is adopted to sequentially obtain a plurality of paths, common instruction contents such as APT-GET-INSTALL and the like are characterized into state nodes, and PACKAGE or ARG information such as GCC-Y and the like are characterized into leaf node information.
Each path x in each Dockefile functional segmentiCan be characterized as
Figure BDA0003071256350000081
The path sequence (structure sequence) of each path is denoted as si
Figure BDA0003071256350000082
Figure BDA0003071256350000083
A root node is represented as a root node,
Figure BDA0003071256350000084
leaf nodes representing semantic information.
Each Dockerfile functional fragment can be characterized as<x1,x2…xk>A set of multiple paths, k representing the number of paths.
To state node sequence (structure sequence)
Figure BDA0003071256350000085
The whole is encoded.
The structural sequence code, encode _ sequence(s), is represented using an embedding matrix Esi)=Es
For leaf nodes, the sub-information can be split according to the 'partition' information, using the learned embedded matrix EsubtokenTo represent the encoding of each sub information. The coded vectors of sub-information are then summed to represent the code for the complete leaf node:
Figure BDA0003071256350000086
where t represents a leaf node.
Connecting the coding of the root node, the coding of the structural sequence and the coding of the leaf node into a new vector zi
Figure BDA0003071256350000091
Wherein,
Figure BDA0003071256350000092
the code representing the root node is represented by,
Figure BDA0003071256350000093
which represents the coding of the sequence of the structure,
Figure BDA0003071256350000094
representing the encoding of the leaf node.
Z corresponding to each pathiThe calculation of how the learning at the fully connected layer is combined is represented as:
Figure BDA0003071256350000095
where W represents a weight matrix and tanh () represents an activation function.
Each one of
Figure BDA0003071256350000096
Attention weight α ofiIs shown as
Figure BDA0003071256350000097
Wherein, note that the vector α ∈ R2dRandomly initialized and learned simultaneously with the network (attention-based neural network model), k representing the number of paths, R2dThe denoted dimension is 2 d.
Figure BDA0003071256350000098
Is expressed as:
Figure BDA0003071256350000099
predictions of the neural network model based on the attention mechanism are calculated as (softmax normalized) dot products between the Dockerfile vector and each base mirror label, respectively.
Figure BDA00030712563500000910
Q represents the number of basic images, image _ tagi′Denotes the ith' base image, vTDenotes the transposition of v, q (y)i′) Represents image _ tagi′Corresponding distribution probability, image _ tag with maximum distribution probabilityi′Is v the corresponding base image Y.
S5: splitting each path in the container mirror image configuration data set obtained in the step S4 into a structure sequence and leaf nodes, training a neural network model of a multi-coding attention system by using the leaf nodes corresponding to the structure sequence and the structure sequence as features and the basic mirror image as a label (output), and predicting the basic mirror image according to the Dockerfile functional code segment through the neural network model of the attention system (container basic mirror image recommendation model).
Fig. 2 is a schematic structural diagram of a container base image recommendation system based on configuration code representation according to the present invention, and as shown in fig. 2, a container base image recommendation system based on configuration code representation includes:
a data set obtaining module 201, configured to obtain a container mirror configuration data set; the container image configuration data set includes a plurality of container image configuration files.
And the data analysis module 202 is configured to analyze data in each container mirror configuration file in the container mirror configuration data set, so as to obtain a functional code segment and a basic mirror image corresponding to each container mirror configuration file.
A code segment representation module 203, configured to represent each of the functional code segments as an abstract syntax tree structure.
A multi-path obtaining module 204, configured to obtain multiple paths of the abstract syntax tree structure from a root node to each leaf node, where each path includes a structural sequence from the root node to a corresponding leaf node and a corresponding leaf node.
The container basis mirror image recommendation model training module 205 is configured to train a neural network model by taking a plurality of structure sequences and corresponding leaf nodes corresponding to each of the functional code segments as input and taking a basis mirror image corresponding to each of the functional code segments as output, so as to obtain a container basis mirror image recommendation model.
The input feature obtaining module 206 is configured to obtain a plurality of structural sequences of the functional code segment to be recommended and corresponding leaf nodes.
The container base mirror image recommendation model application module 207 is configured to input the plurality of structure sequences of the functional code segment to be recommended and the corresponding leaf nodes into the container base mirror image recommendation model, so as to obtain a base mirror image corresponding to the functional code segment to be recommended.
The data set obtaining module 201 specifically includes:
and the open source item set acquisition unit is used for acquiring the open source item set.
And the container mirror database acquisition unit is used for screening out items comprising mirror image configuration files from the open source item set to obtain a container mirror database.
And the container mirror image configuration data set acquisition unit is used for eliminating repeated container mirror image configuration files in the container mirror image database and acquiring a container mirror image configuration data set consisting of a plurality of container mirror image configuration files with different contents.
The open source item set obtaining unit specifically includes:
and the open source item set acquisition subunit is used for screening open source items of which the star indexes are greater than a first set value and the Issue indexes are greater than a second set value from the open source community code hosting platform to obtain an open source item set.
The container mirror image configuration data set acquisition unit specifically includes:
and the hash value acquisition subunit is used for acquiring the hash value of each container image file in the container image database.
And the repeated removing subunit is used for removing repeated container mirror image configuration files in the container mirror image database according to the hash values of the container mirror image files to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
The neural network model is based on an attention mechanism.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A container base image recommendation method based on configuration code characterization is characterized by comprising the following steps:
obtaining a container mirror image configuration data set; the container image configuration dataset comprises a plurality of container image configuration files;
analyzing data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file;
characterizing each of the functional code fragments as an abstract syntax tree structure;
obtaining a plurality of paths of the abstract syntax tree structure from a root node to each leaf node, wherein each path comprises a structure sequence from the root node to the corresponding leaf node and the corresponding leaf node;
taking a plurality of structure sequences corresponding to each functional code segment and corresponding leaf nodes as input, and taking a basic mirror image corresponding to each functional code segment as output to train a neural network model, so as to obtain a container basic mirror image recommendation model;
obtaining a plurality of structural sequences of a functional code segment to be recommended and corresponding leaf nodes;
and inputting the plurality of structural sequences of the functional code segments to be recommended and the corresponding leaf nodes into the container basic mirror image recommendation model to obtain the basic mirror image corresponding to the functional code segments to be recommended.
2. The method for recommending a container base image based on a configuration code representation according to claim 1, wherein the obtaining a container image configuration data set specifically comprises:
obtaining an open source project set;
screening out items comprising mirror image configuration files from the open source item set to obtain a container mirror database;
and removing repeated container mirror image configuration files in the container mirror image database to obtain a container mirror image configuration data set consisting of a plurality of container mirror image configuration files with different contents.
3. The method for recommending a container base image based on configuration code characterization according to claim 2, wherein the obtaining an open source item set specifically includes:
and screening the open source items of which the star indexes are greater than a first set value and the Issue indexes are greater than a second set value from the open source community code hosting platform to obtain an open source item set.
4. The method for recommending container base images based on configuration code characterization according to claim 2, wherein the removing of duplicate container image configuration files in the container image database to obtain a container image configuration data set composed of a plurality of container image configuration files with different contents specifically includes:
obtaining the hash value of each container mirror image file in a container mirror database;
and removing repeated container mirror image configuration files in the container mirror image database according to the hash value of each container mirror image file to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
5. The method of claim 1, wherein the neural network model is an attention-based neural network model.
6. A container base image recommendation system based on configuration code characterization, the system comprising:
the data set acquisition module is used for acquiring a container mirror image configuration data set; the container image configuration dataset comprises a plurality of container image configuration files;
the data analysis module is used for analyzing data in each container mirror image configuration file in the container mirror image configuration data set to obtain a functional code segment and a basic mirror image corresponding to each container mirror image configuration file;
a code segment representation module for representing each of the functional code segments into an abstract syntax tree structure;
a multi-path obtaining module, configured to obtain multiple paths of the abstract syntax tree structure from a root node to each leaf node, where each path includes a structure sequence from the root node to a corresponding leaf node and the corresponding leaf node;
the container basic mirror image recommendation model training module is used for training a neural network model by taking a plurality of structure sequences corresponding to the functional code segments and corresponding leaf nodes as input and taking a basic mirror image corresponding to the functional code segments as output to obtain a container basic mirror image recommendation model;
the input characteristic acquisition module is used for acquiring a plurality of structural sequences of the functional code segments to be recommended and corresponding leaf nodes;
and the container basic mirror image recommendation model application module is used for inputting the plurality of structure sequences and the corresponding leaf nodes of the functional code segments to be recommended into the container basic mirror image recommendation model to obtain the basic mirror image corresponding to the functional code segments to be recommended.
7. The system according to claim 1, wherein the data set acquisition module specifically includes:
the open source project set acquisition unit is used for acquiring an open source project set;
a container mirror database acquisition unit, configured to filter out items including mirror configuration files from the open source item set, and acquire a container mirror database;
and the container mirror image configuration data set acquisition unit is used for eliminating repeated container mirror image configuration files in the container mirror image database and acquiring a container mirror image configuration data set consisting of a plurality of container mirror image configuration files with different contents.
8. The system according to claim 7, wherein the open-source item set obtaining unit specifically includes:
and the open source item set acquisition subunit is used for screening open source items of which the star indexes are greater than a first set value and the Issue indexes are greater than a second set value from the open source community code hosting platform to obtain an open source item set.
9. The system according to claim 7, wherein the container mirror configuration dataset acquisition unit specifically includes:
the hash value acquisition subunit is used for acquiring the hash value of each container image file in the container mirror database;
and the repeated removing subunit is used for removing repeated container mirror image configuration files in the container mirror image database according to the hash values of the container mirror image files to obtain a container mirror image configuration data set formed by a plurality of container mirror image configuration files with different contents.
10. The configuration code characterization based container base image recommendation system according to claim 6, wherein the neural network model is an attention mechanism based neural network model.
CN202110539905.6A 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization Active CN113296784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110539905.6A CN113296784B (en) 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110539905.6A CN113296784B (en) 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization

Publications (2)

Publication Number Publication Date
CN113296784A true CN113296784A (en) 2021-08-24
CN113296784B CN113296784B (en) 2023-11-14

Family

ID=77322600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110539905.6A Active CN113296784B (en) 2021-05-18 2021-05-18 Container base mirror image recommendation method and system based on configuration code characterization

Country Status (1)

Country Link
CN (1) CN113296784B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114327753A (en) * 2021-12-13 2022-04-12 中国人民解放军国防科技大学 Method, device, equipment and medium for predicting container construction result

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250427A (en) * 2016-07-25 2016-12-21 浪潮(北京)电子信息产业有限公司 A kind of generation method and system of container image recommendation information
CN110221900A (en) * 2019-06-05 2019-09-10 中国科学院软件研究所 A kind of Dockerfile foundation image version information method for automatically completing and device
US20190391792A1 (en) * 2018-06-25 2019-12-26 Hcl Technologies Limited Code reusability
CN111079014A (en) * 2019-12-17 2020-04-28 携程计算机技术(上海)有限公司 Recommendation method, system, medium and electronic device based on tree structure
CN111459491A (en) * 2020-03-17 2020-07-28 南京航空航天大学 Code recommendation method based on tree neural network
US20200249918A1 (en) * 2019-02-02 2020-08-06 Microsoft Technology Licensing, Llc. Deep learning enhanced code completion system
CN112035165A (en) * 2020-08-26 2020-12-04 山谷网安科技股份有限公司 Code clone detection method and system based on homogeneous network
CN112181584A (en) * 2019-07-02 2021-01-05 国际商业机器公司 Optimizing image reconstruction for container warehouses

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250427A (en) * 2016-07-25 2016-12-21 浪潮(北京)电子信息产业有限公司 A kind of generation method and system of container image recommendation information
US20190391792A1 (en) * 2018-06-25 2019-12-26 Hcl Technologies Limited Code reusability
US20200249918A1 (en) * 2019-02-02 2020-08-06 Microsoft Technology Licensing, Llc. Deep learning enhanced code completion system
CN110221900A (en) * 2019-06-05 2019-09-10 中国科学院软件研究所 A kind of Dockerfile foundation image version information method for automatically completing and device
CN112181584A (en) * 2019-07-02 2021-01-05 国际商业机器公司 Optimizing image reconstruction for container warehouses
CN111079014A (en) * 2019-12-17 2020-04-28 携程计算机技术(上海)有限公司 Recommendation method, system, medium and electronic device based on tree structure
CN111459491A (en) * 2020-03-17 2020-07-28 南京航空航天大学 Code recommendation method based on tree neural network
CN112035165A (en) * 2020-08-26 2020-12-04 山谷网安科技股份有限公司 Code clone detection method and system based on homogeneous network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
耿朋;陈伟;魏峻;: "面向Dockerfile的容器镜像构建工具", 计算机系统应用, no. 11 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114327753A (en) * 2021-12-13 2022-04-12 中国人民解放军国防科技大学 Method, device, equipment and medium for predicting container construction result

Also Published As

Publication number Publication date
CN113296784B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN113312500B (en) Method for constructing event map for safe operation of dam
CN111339433B (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN113011189A (en) Method, device and equipment for extracting open entity relationship and storage medium
CN113707235A (en) Method, device and equipment for predicting properties of small drug molecules based on self-supervision learning
CN112380435A (en) Literature recommendation method and recommendation system based on heterogeneous graph neural network
CN111931061B (en) Label mapping method and device, computer equipment and storage medium
CN110457585B (en) Negative text pushing method, device and system and computer equipment
CN111159485A (en) Tail entity linking method, device, server and storage medium
CN107832300A (en) Towards minimally invasive medical field text snippet generation method and device
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN113010679A (en) Question and answer pair generation method, device and equipment and computer readable storage medium
CN116150404A (en) Educational resource multi-modal knowledge graph construction method based on joint learning
CN113128622A (en) Multi-label classification method and system based on semantic-label multi-granularity attention
CN110909174B (en) Knowledge graph-based method for improving entity link in simple question answering
CN113239184B (en) Knowledge base acquisition method and device, computer equipment and storage medium
CN113283243B (en) Entity and relationship combined extraction method
CN112989024B (en) Method, device and equipment for extracting relation of text content and storage medium
CN114398505A (en) Target word determining method, model training method and device and electronic equipment
CN112015890B (en) Method and device for generating movie script abstract
CN114266245A (en) Entity linking method and device
CN111831829A (en) Entity relationship extraction method and device for open domain and terminal equipment
CN113296784A (en) Container base mirror image recommendation method and system based on configuration code representation
CN115114462A (en) Model training method and device, multimedia recommendation method and device and storage medium
CN116561350B (en) Resource generation method and related device
CN117711001B (en) Image processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant