WO2022093172A1 - Ci/cd pipeline code file duplication notifications - Google Patents

Ci/cd pipeline code file duplication notifications Download PDF

Info

Publication number
WO2022093172A1
WO2022093172A1 PCT/US2020/057350 US2020057350W WO2022093172A1 WO 2022093172 A1 WO2022093172 A1 WO 2022093172A1 US 2020057350 W US2020057350 W US 2020057350W WO 2022093172 A1 WO2022093172 A1 WO 2022093172A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
pipeline
files
sections
pipeline code
Prior art date
Application number
PCT/US2020/057350
Other languages
French (fr)
Inventor
Mauricio COUTINHO MORAES
Natalia MACHADO DOS SANTOS
Jhonny Marcos ACORDI MERTZ
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2020/057350 priority Critical patent/WO2022093172A1/en
Publication of WO2022093172A1 publication Critical patent/WO2022093172A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • G06F8/751Code clone detection

Definitions

  • Computing devices rely on program code to execute applications and perform any variety of operations.
  • a programmer may desire to create a budgeting application or program. To do so, the programmer writes computer readable code to instruct the hardware resources of the computing device how to operate to carry out the intended function.
  • FIG. 1 is a block diagram of a system for identifying duplications in continuous integration/continuous deployment (CI/CD) pipeline code files, according to an example of the principles described herein.
  • CI/CD continuous integration/continuous deployment
  • Fig. 2 is a flowchart of a method for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • Fig. 3 is a flowchart of a method for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • FIG. 4 is a block diagram of a system for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • Figs. 5A - 5C depict the identification and removal of duplicate CI/CD code sections, according to an example of the principles described herein.
  • Fig. 6 depicts a graph model of a portion of a CI/CD pipeline code file, according to an example of the principles described herein.
  • Fig. 7 depicts a graph model of a portion of a CI/CD pipeline code file, according to an example of the principles described herein.
  • Fig. 8 depicts a template of duplicated CI/CD code sections, according to an example of the principles described herein.
  • Fig. 9 depicts a non-transitory machine-readable storage medium for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • programmers may develop applications to execute any number of operations. These applications are created using program code, which is a language that is recognizable by computer processors to carry out certain functions.
  • program code is tested and verified. For example, program code may be tested by providing inputs to the program code and observing how the program code behaves. For example, for each input into the program code, a system test may determine whether the program code produces an expected output. When performed manually, such a process may be long and complex and may involve coordination between a plurality of different systems and devices to keep track of modifications made to the program code.
  • DevOps refers to the process of coordinating program code development and operations with the intent to reduce the risk of change and increase the speed of application development and deployment.
  • an entity may employ a continuous integration/continuous deployment (CI/CD) pipeline which automates the processes of validating, building, testing, and deploying the program code.
  • CI/CD continuous integration/continuous deployment
  • a CI/CD pipeline allows developers to deliver applications to customers by introducing automation into the stages of application development.
  • the main concepts attributed to CI/CD are continuous integration, continuous delivery, and continuous deployment.
  • CI/CD introduces ongoing automation and continuous monitoring throughout the lifecycle of an application, from integration and testing phases to delivery and deployment. Taken together, these connected practices may be referred to as a CI/CD pipeline. That is, the CI/CD pipeline employs automated processes related to code quality instead of relying on manual performance of such operations.
  • During development of program code it may be that the code is changed, sometimes by different programmers working on different aspects of the code. If not coordinated, one programmer’s contributions may conflict with another programmer’s efforts, such that bugs may result.
  • Continuous integration involves programmers integrating, checking-in, or merging their work to a shared repository for an application.
  • Each code update may be verified by a rebuild of the application with the changes included. Automated tests are performed to ensure the source file operates correctly with the addition of the update. Successful integration ensures that new code changes to an application are built, tested, and merged to a shared repository. Doing so prevents having too many updates to an application in development at the same time, which updates may conflict with one another.
  • Continuous deployment or continuous delivery refers to a practice that allows for frequent releases of application updates by maintaining the application in a deployable state. Continuous deployment, by automating deployment operations, reduces the load on operations personnel to perform manual deployment operations.
  • Particular examples of processes that may be automated include compiling search files and packaging the search files into another format such as a .zip file. Another step may be to deploy this generated file into a runtime structure where it can receive and respond to HTTP requests. Yet another operation may be to run automated tests.
  • CI/CD pipelines may simplify and increase the efficacy of application development and release, certain enhancements may further increase their efficacy and reliability.
  • some CI/CD pipelines provide platforms to allow programmers to share and leverage CI/CD pipeline components created by other programmers in a community.
  • some entities desire entity-specific reusable components of CI/CD pipelines.
  • programmers may develop their own CI/CD pipelines, with such pipeline code files accessible to a community such that other programmers in the community may leverage code already generated when creating their own CI/CD pipeline. This allows multiple teams and projects to maintain, share, and reuse CI/CD pipeline code.
  • the present specification describes systems and methods for identifying and addressing duplicated sections of CI/CD pipeline code.
  • the present specification leverages the unique characteristics of CI/CD pipelines, i.e., the stages, jobs, and steps that form the CI/CD pipeline, to identify and remove CI/CD pipeline code duplication. In general, this is accomplished by considering CI/CD pipeline code and identifying similarities by scanning CI/CD pipeline code in a repository. The duplications are reported, and in some cases a remedial action is executed.
  • identifying similar CI/CD pipeline code content is based on semi-structured text analysis and comparison of graph representations of the CI/CD pipeline code files.
  • the present specification describes a system.
  • the system includes a repository to store continuous integration/continuous development (CI/CD) pipeline code files, each pipeline code file to validate an application.
  • the system also includes a non-transitory machine-readable storage medium to store instructions and a processor to execute the instructions.
  • the instructions cause the processor to receive an update to application files. A reception of the update calls a pipeline code file to verify the update.
  • the instructions also cause the processor to 1) compare pipeline code files from the repository to identify duplicate code sections in each of the pipeline code files and 2) responsive to a detected duplication, present a notification of the duplication.
  • the present specification also describes a method.
  • a number of CI/CD pipeline code files are identified from a repository.
  • Each pipeline code file includes a hierarchy of stages, jobs, and steps.
  • a number of pipeline code files from the repository are compared to identify similar code sections in each of the pipeline code files. Specifically, this comparison may be done by 1) performing a semi-structured text analysis of the pipeline code files and 2) performing a graph model similarity analysis of the pipeline code files. Responsive to a detected duplication, a notification of the duplication is presented to a user.
  • the present specification also describes a non-transitory machine- readable storage medium encoded with instructions executable by a processor.
  • the machine-readable storage medium includes instructions to, when executed by the processor cause the processor to identify, from a repository, a number of continuous integration/continuous development (CI/CD) pipeline code files, wherein each pipeline code file includes a hierarchy of stages, jobs, and steps.
  • the instructions also cause the processor to generate a tree graph model of each pipeline code file based on the stages, jobs, and steps.
  • the instructions also cause the processor to perform semi-structured text analysis between the pipeline code files to identify code sections with a threshold similarity. For code sections having a threshold similarity, the instructions cause the processor to identify duplicate code sections based on the tree graph models. Responsive to a detected duplication, the instructions cause the processor to present a notification of the duplication.
  • using such a system, method, and machine-readable storage medium may, for example, 1 ) reduces CI/CD code duplication in a repository, 2) simplifies maintainability of CI/CD pipeline code files; 3) detects when a CI/CD code duplication occurs; and 4) automatically address a detected CI/CD code duplication.
  • the devices disclosed herein may address other matters and deficiencies in a number of technical areas, for example.
  • Fig. 1 is a block diagram of a system (100) for identifying duplications in continuous integration/continuous deployment (CI/CD) pipeline code files, according to an example of the principles described herein.
  • CI/CD pipeline code files include the program code that executes the automated processes carried out on an application. That is, throughout its life, an application is subject to quality assurance and production stages.
  • Other examples of pipeline stages include development and staging. Each of these stages are executed via program code that is compiled in a CI/CD pipeline code file.
  • the system (100) includes a repository (102) to store the CI/CD pipeline code files.
  • Each pipeline code file may correspond to a different CI/CD pipeline.
  • each CI/CD pipeline may include a hierarchical structure.
  • the CI/CD pipeline may be made up of stages, with each stage being made up of jobs, and each job being made up of steps.
  • stages represent the highest level (e.g., development, quality assurance, staging, production, etc.)
  • jobs represent an intermediate level (e.g., build, deploy, test, etc.)
  • steps represent the lowest level (e.g., download a tool, run a tool, verify results, etc.).
  • the pipeline code files may include tags, or other distinguishing metadata, for each of the stages, jobs, and steps that make up the CI/CD pipeline code files.
  • different CI/CD pipelines may be developed to automate different tasks.
  • different CI/CD pipelines may include different stages with the different stages potentially having different jobs found therein. Even if different CI/CD pipelines include the same stages, each stage may be different in that different jobs and steps are used in a particular stage.
  • two CI/CD pipelines may include a testing stage. However, the steps of each testing stage may be different between the different CI/CD pipelines.
  • a quality assurance stage and a production stage may each include a build job and a deploy job. However, the quality assurance stage may have a test job that the production stage does not.
  • the system (100) also includes a non-transitory machine-readable storage medium (104) to store CI/CD code instructions (105).
  • the system (100) also includes a processor (106) to execute the CI/CD code instructions (105).
  • the non-transitory machine-readable storage medium (104) is communicatively coupled to the processor (106).
  • the non-transitory machine-readable storage medium (104) may be an electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions.
  • the non-transitory machine-readable storage medium (104) may be, for example, Random-Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, etc.
  • RAM Random-Access Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • the CI/CD code instructions (105) cause the processor (106) to execute certain operations. Specifically, the CI/CD code instructions (105) cause the processor (106) to receive an update to application files. That is, as described above, programmers may submit updates to application program code to the system (100) for automated building, testing, and deployment. The processor (106) receives these update files and executes the automated processes. That is, reception of the updates calls an associated pipeline code file to verify (build, test, and deploy) the application code with the update incorporated.
  • the processor (106) may also compare pipeline code files from the repository (102) to identify duplicate code sections in each of the pipeline code files. That is, as described above, different pipeline code files pertain to different CI/CD pipelines. While each CI/CD pipeline may be distinct and unique, each may include similar code sections, which code sections may include stages, jobs, or steps executed along the CI/CD pipeline. As described above, duplicated code sections in different CI/CD pipeline code files may lead to inefficiencies in maintaining the repository (102) and may lead to other complications. Accordingly, duplicated sections are identified and addressed, so as to increase the efficiency of maintaining pipeline code files and to reduce the complexity and size of the repository (102). For example, rather than having multiple instances of a portion of code distributed throughout dozens if not hundreds of pipeline code files and consuming large amounts of space, the identification of duplicated code sections may prompt actions to reduce the duplication, complexity, and repository (102) maintenance efforts.
  • the CI/CD code instructions (105) cause the processor (106) to present a notification of the duplication.
  • a popup window, or other generated notification may indicate to a programmer or administrator that there is a duplication between different pipeline code files.
  • the notification may indicate which of the pipeline code files includes the duplicated content.
  • the indication may be based on user input.
  • the system (100) may present a user interface wherein a programmer, administrator, or other user may enter a code section as a candidate for refactoring.
  • the processor (106) may identify pipeline code files with that code section and display a notification to the user of which pipeline code files include that particular code section. As will be described below, in some examples in addition to providing a notification, the processor (106) may take an action, such as generating a template that includes the duplicated code sections.
  • Fig. 2 is a flowchart of a method (200) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • CI/CD pipeline code files are identified (block 201) from a repository (Fig. 1 , 102).
  • each pipeline code file may include a hierarchical structure of stages, jobs, and steps.
  • the pipeline code file may include stages, where a stage is made up of a number of jobs.
  • each job may include a number of steps.
  • the identified (block 201) pipeline code files may be those files that are subject to an analysis to detect duplicated code.
  • the identification (block 201 ) may be based on user input.
  • a user may select a pipeline code file to determine if other pipeline code files contain duplicated content.
  • the system (Fig. 1 , 100) may perform a comparison of all the pipeline code files in the repository (Fig. 1 , 102). Note that in either of these examples, not all duplications may be searched for. That is, a user may input a particular code section to search for in all, or a subset of, the pipeline code files. In this example, the results may therefore not indicate all instances of duplication of any code section, but may identify duplications of the code section specifically identified and searched for.
  • the method (200) also includes comparing (block 202) a number of pipeline code files from the repository (Fig. 1 , 102) to identify similar code sections in the subject pipeline code files.
  • a variety of operations may be carried out. For example, a semi-structured text analysis of the pipeline code files may be performed. That is, the pipeline code may be semi-structured text meaning that it has certain consistent characteristics, but also has other features that are not consistent.
  • the pipeline code files include certain attributes such as stages, jobs, and steps, that may have consistent nomenclature. By comparison, the values associated with these attributes may vary.
  • the processor (Fig. 1 , 106) may carry out semi-structured text searching to identify a threshold similarity between a search query or particular code section of interest and the code found in the pipeline code files.
  • Semi-structured text analysis may serve to find similarities between code sections in the pipeline code files. While semi-structured text similarity detection may provide a measure of similarity, a second comparative step may be executed to more reliably detect code section similarities. Accordingly, the semi-structured text analysis may serve as a filter, to reduce the sets of pipeline code files that are likely to present duplications.
  • the comparison may include performing a graph model similarity analysis of the pipeline code files. That is, each pipeline code file may be represented as a graph representation of the hierarchy. For example, stages of the CI/CD pipeline may be represented at one hierarchical level as stage nodes and the jobs that make up that stage may be represented as child nodes to the stage node. Similarly, the steps that make up a job may be represented as child nodes to a job node. Accordingly, the graphical model analysis may identify those code sections that are similar to different pipeline code files based on similar parent/child node relationships. Note that as stages, jobs and steps can be represented as graph models, the comparison of pipeline code file graph models may be at a stage, job, or step hierarchical level.
  • the comparison (block 202) of the number of pipeline code files includes a semi-structured text analysis and a graph model analysis of the pipeline code files to identify duplicate code sections. That is, a comparison (block 202) to identify similar code sections may include identifying identical code sections.
  • a notification is then presented (block 203).
  • the form of the notification may be of a variety of types including a pop-up window, an email message, or another type of message.
  • the present method (200) identifies duplications in code sections across CI/CD pipeline code files in a repository (Fig. 1 , 102) to reduce the presence of duplicated content and to ease the repository (Fig. 1 , 102) maintenance.
  • Fig. 3 is a flowchart of a method (300) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • the method (300) may include identifying (block 301) a number of CI/CD pipeline code files and comparing (block 302) a number of these pipeline code files to identify similar code sections. Responsive to the detection of such a duplication, a notification may be presented (block 303) to the user. These operations may be performed as described above in connection with Fig. 2.
  • additional measures may be taken. That is, in addition to providing a notification of a duplication, the system (Fig. 1 , 100) may take a remedial measure. For example, responsive to a determined duplicated code section, the processor (Fig. 1 , 106) may delete duplicated code sections from each of the pipeline code files.
  • a template may be generated (block 304) which template includes the duplicated code section. That is, the template may refer to a file that includes duplicated program code sections. This template may be stored in the repository (Fig. 1 , 102). The template may include the step attributes and values.
  • the duplicated code section may be replaced (block 305) with a pointer to the template. That is, rather than having the duplicated code section, the pipeline code files may include a pointer to the location in the repository (Fig. 1 , 102) where the template is stored. Then during execution of the CI/CD pipeline, the processor (Fig. 1 , 106) may, upon reaching this point in the pipeline code, call the template and execute the functionality described therein.
  • Figs. 5A - 5C depict an example of a duplicated code section, a generated template, and a reference or pointer to the template pipeline code file.
  • FIG. 4 is a block diagram of a system (100) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • the system (100) includes a repository (102) of pipeline code files as well as a non-transitory machine-readable storage medium (104) and processor (106) to carry out the functionality of identifying and addressing duplicated code sections among the pipeline code files in the repository (102).
  • the system (100) includes additional components.
  • the system (100) includes a graph repository (408) to store graph models of the pipeline code files.
  • each pipeline code file may be represented as a graph model to be compared against pipeline code files for other CI/CD pipelines.
  • the graph database (408) stores each of these graph models such that they may be compared via duplication. Examples of graph models are depicted below in connection with Figs. 6 and 7.
  • Figs. 5A - 5C depict the identification and removal of duplicate CI/CD code sections, according to an example of the principles described herein.
  • Fig. 5a depicts a duplicated code section (510) that has been found duplicated in a variety of pipeline code files. That is, a code section that is identically repeated in two or more pipeline code files may be a candidate for refactoring. If left in each respective location, the duplicated code section (510) may occupy more space than is needed. Moreover, if a fault is detected in one instance of the duplicated code section (510), the effort to identify and address the issue is repeated for each instance, which is ineffective and inefficient. As described above, the duplicated code section (510) may be removed from the associated pipeline code files and formed into a template (512). Note that in Fig. 5A, script that is not identically repeated is indicated with ellipsis.
  • Fig. 5B depicts an example of such a template (512) that includes certain header information and the same content as the duplicated code section (510). That is, the template (512) includes the same task and inputs as the duplicated code section (510). Note that the ellipsis in the template (512) indicate non-identically repeated input parameter values that are present in original calls, in a way that the original calls that were impacted by the creation of the new template continue to work as intended.
  • the original pipeline code files may be updated to include a pointer (514) to the location of the template (512). That is, the duplicated code section (510) is erased after the creation and the template (512) may be used in its place.
  • Fig. 5C depicts a code section with a pointer (514) in place of the duplicated code section (510).
  • the pointer (514) calls the template, “NewTemplate.yaml” two times and executes the operations described in the template (512). While Figs.
  • FIG. 5A - 5C depict a particular example of code section identification and removal and reliance on a pointer (514) to execute the same functionality while maintaining a single instance of the code (i.e., in the template file on the repository Fig. 1 , 102)), any kind of identically repeated code section may benefit from the approach presented above, and not only the task’s inputs as shown in the example depicted in Figs. 5A - 5C.
  • Fig. 6 depicts a graph model (616) of a portion of a CI/CD pipeline code file, according to an example of the principles described herein.
  • the system may include a graph database (Fig. 4, 408) that includes graph models (616) of the pipeline code files.
  • Fig. 6 depicts a section of such a graph model (616).
  • Fig. 6 depicts steps that may be found in different pipeline code files.
  • Step X, and Step Y share an attribute of b with a value of I and an attribute of c with a value of m.
  • Fig. 6 depicts a hierarchical indication of steps and attributes, a similar hierarchical indication may be made for stages and jobs.
  • steps include attributes
  • jobs include steps and attributes
  • stages include jobs and attributes.
  • each may be represented recursively. That is, each of Step X and Step Y may have a parent node to a job and each may job node may have a parent node to a stage. That is, in the graph model (616), jobs of a stage may be represented as child nodes to a stage node. Similarly, in the graph model (616), steps of a job may be represented as child nodes to a job node.
  • Such a graph model (616) may lead to more accurate results and an easier way to understand the pipeline code files and to troubleshoot and maintain the repository (Fig. 1 , 102).
  • the child nodes may be selectively hidden or displayed. That is the graph model (616) may be dynamic and navigable so as to present information in an easily viewable way.
  • Fig. 7 depicts graph models (616-1 , 616-2) of a portion of a CI/CD pipeline code file, according to an example of the principles described herein.
  • graph models (616) may indicate a sequential relationship of stages, jobs, and steps of the CI/CD pipeline.
  • a first graph model (616-1) pertaining to a first pipeline code file may include steps F and G performed in that order.
  • a second graph model (616-2) pertaining to a second pipeline code file may include the steps G and F in that order.
  • the different order may be relevant such that these steps should not be indicated as duplicated and therefore consolidated.
  • the semi-structured text analysis and the graphical model analysis may be performed separately or in combination. For example, from a semi-structured text analysis alone it may be determined that the first and second graph models (616-1 , 616-2) may be determined to be identical and therefore consolidated.
  • the graphical representation allows for an analysis of sequential operation such that both are retained as separate code sections.
  • the processor (Fig. 1 , 106) may compare a sequential execution of similar code sections and determine that similar code sections executed in a different order are not duplicated code sections.
  • Fig. 8 depicts a template (512) of duplicated CI/CD code sections, according to an example of the principles described herein.
  • a duplicated code section is identical to another code section.
  • a code section may not be identical, but still may be a candidate for refactoring.
  • one pipeline code file may include two jobs, a job A and a job B.
  • a second pipeline code file may include a job A. Accordingly, while not identical, portions of these pipeline code files may be managed as described above.
  • code sections pertaining to job A and job B may be removed from the first pipeline code file while the code section pertaining to job A is removed from the second pipeline code file.
  • Both code sections may be generated in a template (512) with the job B code section being conditionally executed, based upon a conditional input parameter.
  • a first pipeline that includes a job to deploy a web application and finishes and a second pipeline that performs the same web app deployment job and also includes a job to destroy the deployment before finishing.
  • Fig 8 depicts such an example where a TemplateDeploy.yaml is called and if a condition to destroy is met, the TemplateDestroyDeployment.yaml is called.
  • this template (512) may be called by the first pipeline and the second pipeline where the original code sections were initially found. That is, the template (512) may include multiple sections of duplicated code and at least one section of duplicated code may be conditionally executed.
  • Fig. 9 depicts a non-transitory machine-readable storage medium (918) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
  • the system (Fig. 1 , 100) includes various hardware components. Specifically, the system (Fig. 1 , 100) includes a processor (Fig. 1 , 106) and a machine- readable storage medium (918).
  • the machine-readable storage medium (918) is communicatively coupled to the processor.
  • the machine-readable storage medium (918) includes a number of instructions (920, 922, 924, 926, 928) for performing a designated function. In some examples, the instructions may be machine code and/or script code.
  • the machine-readable storage medium (918) causes the processor to execute the designated function of the instructions (920, 922, 924, 926, 928).
  • the machine-readable storage medium (918) can store data, programs, instructions, or any other machine-readable data that can be utilized to operate the system (Fig. 1 , 100).
  • Machine-readable storage medium (918) can store machine readable instructions that the processor of the system (Fig. 1 , 100) can process, or execute.
  • the machine-readable storage medium (918) can be an electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions.
  • Machine-readable storage medium (918) may be, for example, Random-Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, etc.
  • the machine-readable storage medium (918) may be a non-transitory machine-readable storage medium (718).
  • using such a system, method, and machine-readable storage medium may, for example, 1 ) reduces CI/CD code duplication in a repository, 2) simplifies maintainability of CI/CD pipeline code files; 3) detects when a CI/CD code duplication occurs; and 4) automatically address a detected CI/CD code duplication.
  • the devices disclosed herein may address other matters and deficiencies in a number of technical areas, for example.

Abstract

In an example in accordance with the present disclosure, a system is described. The system includes a repository to store continuous integration/continuous development (CI/CD) pipeline code files. Each pipeline code file is to validate an application. The system also includes a non-transitory memory to store instructions and a processor to execute the instructions. The instructions cause the processor to receive an update to application files, wherein reception calls a pipeline code file to verify the update. The instructions also cause the processor to compare pipeline code files from the repository to identify duplicate code sections in each of the pipeline code files and responsive to a detected duplication, present a notification of the duplication.

Description

CI/CD PIPELINE CODE FILE DUPLICATION NOTIFICATIONS
BACKGROUND
[0001] Computing devices rely on program code to execute applications and perform any variety of operations. For example, a programmer may desire to create a budgeting application or program. To do so, the programmer writes computer readable code to instruct the hardware resources of the computing device how to operate to carry out the intended function.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] The accompanying drawings illustrate various examples of the principles described herein and are part of the specification. The illustrated examples are given merely for illustration, and do not limit the scope of the claims.
[0003] Fig. 1 is a block diagram of a system for identifying duplications in continuous integration/continuous deployment (CI/CD) pipeline code files, according to an example of the principles described herein.
[0004] Fig. 2 is a flowchart of a method for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein. [0005] Fig. 3 is a flowchart of a method for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
[0006] Fig. 4 is a block diagram of a system for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein. [0007] Figs. 5A - 5C depict the identification and removal of duplicate CI/CD code sections, according to an example of the principles described herein. [0008] Fig. 6 depicts a graph model of a portion of a CI/CD pipeline code file, according to an example of the principles described herein.
[0009] Fig. 7 depicts a graph model of a portion of a CI/CD pipeline code file, according to an example of the principles described herein.
[0010] Fig. 8 depicts a template of duplicated CI/CD code sections, according to an example of the principles described herein.
[0011] Fig. 9 depicts a non-transitory machine-readable storage medium for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein.
[0012] Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements. The figures are not necessarily to scale, and the size of some parts may be exaggerated to more clearly illustrate the example shown. Moreover, the drawings provide examples and/or implementations consistent with the description; however, the description is not limited to the examples and/or implementations provided in the drawings.
DETAILED DESCRIPTION
[0013] As described above, programmers may develop applications to execute any number of operations. These applications are created using program code, which is a language that is recognizable by computer processors to carry out certain functions. In general, before release to the public, program code is tested and verified. For example, program code may be tested by providing inputs to the program code and observing how the program code behaves. For example, for each input into the program code, a system test may determine whether the program code produces an expected output. When performed manually, such a process may be long and complex and may involve coordination between a plurality of different systems and devices to keep track of modifications made to the program code. [0014] DevOps refers to the process of coordinating program code development and operations with the intent to reduce the risk of change and increase the speed of application development and deployment. To expedite the development and deployment of application program code, an entity may employ a continuous integration/continuous deployment (CI/CD) pipeline which automates the processes of validating, building, testing, and deploying the program code.
[0015] A CI/CD pipeline allows developers to deliver applications to customers by introducing automation into the stages of application development. The main concepts attributed to CI/CD are continuous integration, continuous delivery, and continuous deployment. Specifically, CI/CD introduces ongoing automation and continuous monitoring throughout the lifecycle of an application, from integration and testing phases to delivery and deployment. Taken together, these connected practices may be referred to as a CI/CD pipeline. That is, the CI/CD pipeline employs automated processes related to code quality instead of relying on manual performance of such operations. [0016] During development of program code, it may be that the code is changed, sometimes by different programmers working on different aspects of the code. If not coordinated, one programmer’s contributions may conflict with another programmer’s efforts, such that bugs may result. Continuous integration involves programmers integrating, checking-in, or merging their work to a shared repository for an application. Each code update may be verified by a rebuild of the application with the changes included. Automated tests are performed to ensure the source file operates correctly with the addition of the update. Successful integration ensures that new code changes to an application are built, tested, and merged to a shared repository. Doing so prevents having too many updates to an application in development at the same time, which updates may conflict with one another.
[0017] Continuous deployment or continuous delivery refers to a practice that allows for frequent releases of application updates by maintaining the application in a deployable state. Continuous deployment, by automating deployment operations, reduces the load on operations personnel to perform manual deployment operations.
[0018] Particular examples of processes that may be automated include compiling search files and packaging the search files into another format such as a .zip file. Another step may be to deploy this generated file into a runtime structure where it can receive and respond to HTTP requests. Yet another operation may be to run automated tests.
[0019] While such CI/CD pipelines may simplify and increase the efficacy of application development and release, certain enhancements may further increase their efficacy and reliability. For example, some CI/CD pipelines provide platforms to allow programmers to share and leverage CI/CD pipeline components created by other programmers in a community. Moreover, some entities desire entity-specific reusable components of CI/CD pipelines.
[0020] That is to say, programmers may develop their own CI/CD pipelines, with such pipeline code files accessible to a community such that other programmers in the community may leverage code already generated when creating their own CI/CD pipeline. This allows multiple teams and projects to maintain, share, and reuse CI/CD pipeline code.
[0021] As such, there may be multiple people from different contexts, expertise, and time-constraints working together to create and maintain CI/CD pipeline code. As the complexity and number of these scripts grows, and more and more teams and projects become engaged, complications may emerge with diverging configurations and patterns, lack of awareness and communication, duplicated efforts, and different sharing mindsets. As such, different pipeline code files may have duplicated components. Teams eventually may start losing time maintaining and supporting their own pipeline code. As a particular example, a source of duplication may be a result of parallel or uncoordinated efforts. For example, if a bug is detected in a section of duplicated code, each instance of the duplicated code may be individually repaired. Accordingly, the present specification describes systems and methods for identifying and addressing duplicated sections of CI/CD pipeline code. [0022] Specifically, the present specification leverages the unique characteristics of CI/CD pipelines, i.e., the stages, jobs, and steps that form the CI/CD pipeline, to identify and remove CI/CD pipeline code duplication. In general, this is accomplished by considering CI/CD pipeline code and identifying similarities by scanning CI/CD pipeline code in a repository. The duplications are reported, and in some cases a remedial action is executed. In one particular example, identifying similar CI/CD pipeline code content is based on semi-structured text analysis and comparison of graph representations of the CI/CD pipeline code files.
[0023] Specifically, the present specification describes a system. The system includes a repository to store continuous integration/continuous development (CI/CD) pipeline code files, each pipeline code file to validate an application. The system also includes a non-transitory machine-readable storage medium to store instructions and a processor to execute the instructions. The instructions cause the processor to receive an update to application files. A reception of the update calls a pipeline code file to verify the update. The instructions also cause the processor to 1) compare pipeline code files from the repository to identify duplicate code sections in each of the pipeline code files and 2) responsive to a detected duplication, present a notification of the duplication.
[0024] The present specification also describes a method. According to the method, a number of CI/CD pipeline code files are identified from a repository. Each pipeline code file includes a hierarchy of stages, jobs, and steps. A number of pipeline code files from the repository are compared to identify similar code sections in each of the pipeline code files. Specifically, this comparison may be done by 1) performing a semi-structured text analysis of the pipeline code files and 2) performing a graph model similarity analysis of the pipeline code files. Responsive to a detected duplication, a notification of the duplication is presented to a user.
[0025] The present specification also describes a non-transitory machine- readable storage medium encoded with instructions executable by a processor. The machine-readable storage medium includes instructions to, when executed by the processor cause the processor to identify, from a repository, a number of continuous integration/continuous development (CI/CD) pipeline code files, wherein each pipeline code file includes a hierarchy of stages, jobs, and steps. The instructions, also cause the processor to generate a tree graph model of each pipeline code file based on the stages, jobs, and steps. The instructions also cause the processor to perform semi-structured text analysis between the pipeline code files to identify code sections with a threshold similarity. For code sections having a threshold similarity, the instructions cause the processor to identify duplicate code sections based on the tree graph models. Responsive to a detected duplication, the instructions cause the processor to present a notification of the duplication.
[0026] In summary, using such a system, method, and machine-readable storage medium may, for example, 1 ) reduces CI/CD code duplication in a repository, 2) simplifies maintainability of CI/CD pipeline code files; 3) detects when a CI/CD code duplication occurs; and 4) automatically address a detected CI/CD code duplication. However, it is contemplated that the devices disclosed herein may address other matters and deficiencies in a number of technical areas, for example.
[0027] As used in the present specification and in the appended claims, the term “a number of” or similar language is meant to be understood broadly as any positive number including 1 to infinity.
[0028] Fig. 1 is a block diagram of a system (100) for identifying duplications in continuous integration/continuous deployment (CI/CD) pipeline code files, according to an example of the principles described herein. As described above, CI/CD pipeline code files include the program code that executes the automated processes carried out on an application. That is, throughout its life, an application is subject to quality assurance and production stages. Other examples of pipeline stages include development and staging. Each of these stages are executed via program code that is compiled in a CI/CD pipeline code file. Accordingly, the system (100) includes a repository (102) to store the CI/CD pipeline code files. Each pipeline code file may correspond to a different CI/CD pipeline. In general, each CI/CD pipeline may include a hierarchical structure. For example, the CI/CD pipeline may be made up of stages, with each stage being made up of jobs, and each job being made up of steps. As such, stages represent the highest level (e.g., development, quality assurance, staging, production, etc.), jobs represent an intermediate level (e.g., build, deploy, test, etc.), and steps represent the lowest level (e.g., download a tool, run a tool, verify results, etc.). Accordingly, the pipeline code files may include tags, or other distinguishing metadata, for each of the stages, jobs, and steps that make up the CI/CD pipeline code files.
[0029] That is, different CI/CD pipelines may be developed to automate different tasks. Specifically, different CI/CD pipelines may include different stages with the different stages potentially having different jobs found therein. Even if different CI/CD pipelines include the same stages, each stage may be different in that different jobs and steps are used in a particular stage. For example, two CI/CD pipelines may include a testing stage. However, the steps of each testing stage may be different between the different CI/CD pipelines. As another example, a quality assurance stage and a production stage may each include a build job and a deploy job. However, the quality assurance stage may have a test job that the production stage does not.
[0030] The system (100) also includes a non-transitory machine-readable storage medium (104) to store CI/CD code instructions (105). The system (100) also includes a processor (106) to execute the CI/CD code instructions (105). The non-transitory machine-readable storage medium (104) is communicatively coupled to the processor (106). The non-transitory machine-readable storage medium (104) may be an electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions. The non-transitory machine-readable storage medium (104) may be, for example, Random-Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, etc.
[0031] As described above, the CI/CD code instructions (105) cause the processor (106) to execute certain operations. Specifically, the CI/CD code instructions (105) cause the processor (106) to receive an update to application files. That is, as described above, programmers may submit updates to application program code to the system (100) for automated building, testing, and deployment. The processor (106) receives these update files and executes the automated processes. That is, reception of the updates calls an associated pipeline code file to verify (build, test, and deploy) the application code with the update incorporated.
[0032] The processor (106) may also compare pipeline code files from the repository (102) to identify duplicate code sections in each of the pipeline code files. That is, as described above, different pipeline code files pertain to different CI/CD pipelines. While each CI/CD pipeline may be distinct and unique, each may include similar code sections, which code sections may include stages, jobs, or steps executed along the CI/CD pipeline. As described above, duplicated code sections in different CI/CD pipeline code files may lead to inefficiencies in maintaining the repository (102) and may lead to other complications. Accordingly, duplicated sections are identified and addressed, so as to increase the efficiency of maintaining pipeline code files and to reduce the complexity and size of the repository (102). For example, rather than having multiple instances of a portion of code distributed throughout dozens if not hundreds of pipeline code files and consuming large amounts of space, the identification of duplicated code sections may prompt actions to reduce the duplication, complexity, and repository (102) maintenance efforts.
[0033] Responsive to a detected duplication, the CI/CD code instructions (105) cause the processor (106) to present a notification of the duplication. For example, a popup window, or other generated notification may indicate to a programmer or administrator that there is a duplication between different pipeline code files. In an example, the notification may indicate which of the pipeline code files includes the duplicated content. In some examples, rather than indicating each duplication that occurs throughout the corpus of pipeline code files in the repository (102), the indication may be based on user input. For example, the system (100) may present a user interface wherein a programmer, administrator, or other user may enter a code section as a candidate for refactoring. Responsive to this query, the processor (106) may identify pipeline code files with that code section and display a notification to the user of which pipeline code files include that particular code section. As will be described below, in some examples in addition to providing a notification, the processor (106) may take an action, such as generating a template that includes the duplicated code sections.
[0034] Fig. 2 is a flowchart of a method (200) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein. According to the method (200), CI/CD pipeline code files are identified (block 201) from a repository (Fig. 1 , 102). As described above, each pipeline code file may include a hierarchical structure of stages, jobs, and steps. Specifically, the pipeline code file may include stages, where a stage is made up of a number of jobs. Similarly, each job may include a number of steps. The identified (block 201) pipeline code files may be those files that are subject to an analysis to detect duplicated code. In one example, the identification (block 201 ) may be based on user input. For example, a user may select a pipeline code file to determine if other pipeline code files contain duplicated content. [0035] In another example, the system (Fig. 1 , 100) may perform a comparison of all the pipeline code files in the repository (Fig. 1 , 102). Note that in either of these examples, not all duplications may be searched for. That is, a user may input a particular code section to search for in all, or a subset of, the pipeline code files. In this example, the results may therefore not indicate all instances of duplication of any code section, but may identify duplications of the code section specifically identified and searched for.
[0036] The method (200) also includes comparing (block 202) a number of pipeline code files from the repository (Fig. 1 , 102) to identify similar code sections in the subject pipeline code files. To identify the similar code sections, a variety of operations may be carried out. For example, a semi-structured text analysis of the pipeline code files may be performed. That is, the pipeline code may be semi-structured text meaning that it has certain consistent characteristics, but also has other features that are not consistent. For example, the pipeline code files include certain attributes such as stages, jobs, and steps, that may have consistent nomenclature. By comparison, the values associated with these attributes may vary. Accordingly, the processor (Fig. 1 , 106) may carry out semi-structured text searching to identify a threshold similarity between a search query or particular code section of interest and the code found in the pipeline code files.
[0037] Take as an example, the code “bash: curl -verbose http://aSite.com” and the code “script: curl http://anotherSite.com.” If such lines were analysed with semi-structured techniques, they may be considered much more similar (or near) to each other than if they were compared with free-text techniques as “bash” and “script” are closely related and each make a curl call with different arguments.
[0038] Semi-structured text analysis may serve to find similarities between code sections in the pipeline code files. While semi-structured text similarity detection may provide a measure of similarity, a second comparative step may be executed to more reliably detect code section similarities. Accordingly, the semi-structured text analysis may serve as a filter, to reduce the sets of pipeline code files that are likely to present duplications.
[0039] Accordingly, the comparison (block 202) may include performing a graph model similarity analysis of the pipeline code files. That is, each pipeline code file may be represented as a graph representation of the hierarchy. For example, stages of the CI/CD pipeline may be represented at one hierarchical level as stage nodes and the jobs that make up that stage may be represented as child nodes to the stage node. Similarly, the steps that make up a job may be represented as child nodes to a job node. Accordingly, the graphical model analysis may identify those code sections that are similar to different pipeline code files based on similar parent/child node relationships. Note that as stages, jobs and steps can be represented as graph models, the comparison of pipeline code file graph models may be at a stage, job, or step hierarchical level.
[0040] Accordingly, as described above, the comparison (block 202) of the number of pipeline code files includes a semi-structured text analysis and a graph model analysis of the pipeline code files to identify duplicate code sections. That is, a comparison (block 202) to identify similar code sections may include identifying identical code sections. [0041] Responsive to a detected duplication, a notification is then presented (block 203). The form of the notification may be of a variety of types including a pop-up window, an email message, or another type of message. Accordingly, the present method (200) identifies duplications in code sections across CI/CD pipeline code files in a repository (Fig. 1 , 102) to reduce the presence of duplicated content and to ease the repository (Fig. 1 , 102) maintenance.
[0042] Fig. 3 is a flowchart of a method (300) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein. As described above, the method (300) may include identifying (block 301) a number of CI/CD pipeline code files and comparing (block 302) a number of these pipeline code files to identify similar code sections. Responsive to the detection of such a duplication, a notification may be presented (block 303) to the user. These operations may be performed as described above in connection with Fig. 2.
[0043] In some examples, additional measures may be taken. That is, in addition to providing a notification of a duplication, the system (Fig. 1 , 100) may take a remedial measure. For example, responsive to a determined duplicated code section, the processor (Fig. 1 , 106) may delete duplicated code sections from each of the pipeline code files. A template may be generated (block 304) which template includes the duplicated code section. That is, the template may refer to a file that includes duplicated program code sections. This template may be stored in the repository (Fig. 1 , 102). The template may include the step attributes and values.
[0044] In the pipeline code files that include the duplicated content, the duplicated code section may be replaced (block 305) with a pointer to the template. That is, rather than having the duplicated code section, the pipeline code files may include a pointer to the location in the repository (Fig. 1 , 102) where the template is stored. Then during execution of the CI/CD pipeline, the processor (Fig. 1 , 106) may, upon reaching this point in the pipeline code, call the template and execute the functionality described therein. Figs. 5A - 5C depict an example of a duplicated code section, a generated template, and a reference or pointer to the template pipeline code file. [0045] Fig. 4 is a block diagram of a system (100) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein. As described above, the system (100) includes a repository (102) of pipeline code files as well as a non-transitory machine-readable storage medium (104) and processor (106) to carry out the functionality of identifying and addressing duplicated code sections among the pipeline code files in the repository (102). In this example, the system (100) includes additional components. Specifically, the system (100) includes a graph repository (408) to store graph models of the pipeline code files.
[0046] That is, as described above, each pipeline code file may be represented as a graph model to be compared against pipeline code files for other CI/CD pipelines. Accordingly, the graph database (408) stores each of these graph models such that they may be compared via duplication. Examples of graph models are depicted below in connection with Figs. 6 and 7.
[0047] Figs. 5A - 5C depict the identification and removal of duplicate CI/CD code sections, according to an example of the principles described herein.
Specifically, Fig. 5a depicts a duplicated code section (510) that has been found duplicated in a variety of pipeline code files. That is, a code section that is identically repeated in two or more pipeline code files may be a candidate for refactoring. If left in each respective location, the duplicated code section (510) may occupy more space than is needed. Moreover, if a fault is detected in one instance of the duplicated code section (510), the effort to identify and address the issue is repeated for each instance, which is ineffective and inefficient. As described above, the duplicated code section (510) may be removed from the associated pipeline code files and formed into a template (512). Note that in Fig. 5A, script that is not identically repeated is indicated with ellipsis.
[0048] Fig. 5B depicts an example of such a template (512) that includes certain header information and the same content as the duplicated code section (510). That is, the template (512) includes the same task and inputs as the duplicated code section (510). Note that the ellipsis in the template (512) indicate non-identically repeated input parameter values that are present in original calls, in a way that the original calls that were impacted by the creation of the new template continue to work as intended.
[0049] To ensure the original pipeline code files execute as expected, the original pipeline code files may be updated to include a pointer (514) to the location of the template (512). That is, the duplicated code section (510) is erased after the creation and the template (512) may be used in its place. Specifically, Fig. 5C depicts a code section with a pointer (514) in place of the duplicated code section (510). As depicted in Fig. 5C and as described above, the pointer (514) calls the template, “NewTemplate.yaml” two times and executes the operations described in the template (512). While Figs. 5A - 5C depict a particular example of code section identification and removal and reliance on a pointer (514) to execute the same functionality while maintaining a single instance of the code (i.e., in the template file on the repository Fig. 1 , 102)), any kind of identically repeated code section may benefit from the approach presented above, and not only the task’s inputs as shown in the example depicted in Figs. 5A - 5C.
[0050] Fig. 6 depicts a graph model (616) of a portion of a CI/CD pipeline code file, according to an example of the principles described herein. As described above, the system (Fig. 1 , 100) may include a graph database (Fig. 4, 408) that includes graph models (616) of the pipeline code files. Fig. 6 depicts a section of such a graph model (616). Specifically, Fig. 6 depicts steps that may be found in different pipeline code files. As depicted in Fig. 6, Step X, and Step Y share an attribute of b with a value of I and an attribute of c with a value of m. While Fig. 6 depicts a hierarchical indication of steps and attributes, a similar hierarchical indication may be made for stages and jobs.
[0051] That is, steps include attributes, jobs include steps and attributes, and stages include jobs and attributes. Accordingly, each may be represented recursively. That is, each of Step X and Step Y may have a parent node to a job and each may job node may have a parent node to a stage. That is, in the graph model (616), jobs of a stage may be represented as child nodes to a stage node. Similarly, in the graph model (616), steps of a job may be represented as child nodes to a job node. Such a graph model (616) may lead to more accurate results and an easier way to understand the pipeline code files and to troubleshoot and maintain the repository (Fig. 1 , 102). In this example, the child nodes may be selectively hidden or displayed. That is the graph model (616) may be dynamic and navigable so as to present information in an easily viewable way.
[0052] Fig. 7 depicts graph models (616-1 , 616-2) of a portion of a CI/CD pipeline code file, according to an example of the principles described herein. In the example depicted in Fig. 7, graph models (616) may indicate a sequential relationship of stages, jobs, and steps of the CI/CD pipeline. As depicted in Fig. 7, it may be the case that two pipeline code files include similar steps, but in different orders. For example, a first graph model (616-1) pertaining to a first pipeline code file may include steps F and G performed in that order. However, a second graph model (616-2) pertaining to a second pipeline code file may include the steps G and F in that order. Notwithstanding the associated pipeline code files including similar steps, the different order may be relevant such that these steps should not be indicated as duplicated and therefore consolidated. Note that in some examples the semi-structured text analysis and the graphical model analysis may be performed separately or in combination. For example, from a semi-structured text analysis alone it may be determined that the first and second graph models (616-1 , 616-2) may be determined to be identical and therefore consolidated. However, as described above, the graphical representation allows for an analysis of sequential operation such that both are retained as separate code sections. In summary, the processor (Fig. 1 , 106) may compare a sequential execution of similar code sections and determine that similar code sections executed in a different order are not duplicated code sections.
[0053] Fig. 8 depicts a template (512) of duplicated CI/CD code sections, according to an example of the principles described herein. As described above, in one example, a duplicated code section is identical to another code section. In another example, a code section may not be identical, but still may be a candidate for refactoring. For example, one pipeline code file may include two jobs, a job A and a job B. A second pipeline code file may include a job A. Accordingly, while not identical, portions of these pipeline code files may be managed as described above. In this example, code sections pertaining to job A and job B may be removed from the first pipeline code file while the code section pertaining to job A is removed from the second pipeline code file. Both code sections may be generated in a template (512) with the job B code section being conditionally executed, based upon a conditional input parameter. Take as a particular example, a first pipeline that includes a job to deploy a web application and finishes and a second pipeline that performs the same web app deployment job and also includes a job to destroy the deployment before finishing. Fig 8 depicts such an example where a TemplateDeploy.yaml is called and if a condition to destroy is met, the TemplateDestroyDeployment.yaml is called. As with the above examples, this template (512) may be called by the first pipeline and the second pipeline where the original code sections were initially found. That is, the template (512) may include multiple sections of duplicated code and at least one section of duplicated code may be conditionally executed.
[0054] Fig. 9 depicts a non-transitory machine-readable storage medium (918) for identifying duplications in CI/CD pipeline code files, according to an example of the principles described herein. To achieve its desired functionality, the system (Fig. 1 , 100) includes various hardware components. Specifically, the system (Fig. 1 , 100) includes a processor (Fig. 1 , 106) and a machine- readable storage medium (918). The machine-readable storage medium (918) is communicatively coupled to the processor. The machine-readable storage medium (918) includes a number of instructions (920, 922, 924, 926, 928) for performing a designated function. In some examples, the instructions may be machine code and/or script code.
[0055] The machine-readable storage medium (918) causes the processor to execute the designated function of the instructions (920, 922, 924, 926, 928). The machine-readable storage medium (918) can store data, programs, instructions, or any other machine-readable data that can be utilized to operate the system (Fig. 1 , 100). Machine-readable storage medium (918) can store machine readable instructions that the processor of the system (Fig. 1 , 100) can process, or execute. The machine-readable storage medium (918) can be an electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions. Machine-readable storage medium (918) may be, for example, Random-Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, etc. The machine-readable storage medium (918) may be a non-transitory machine-readable storage medium (718).
[0056] Referring to Fig. 9, identify instructions (920), when executed by the processor (Fig. 1 , 106), cause the processor (Fig. 1 , 106) to, identify, from a repository (Fig. 1 , 102), a number of CI/CD pipeline code files, wherein each pipeline code file includes a hierarchy of stages, jobs, and steps. Generate graph model instructions (922), when executed by the processor, cause the processor to, generate a tree graph model for each pipeline code file based on stages, jobs, and steps.
[0057] Semi-structured text analysis instructions (924), when executed by the processor, cause the processor to, perform semi-structured text analysis between the pipeline code files to identify code sections with a threshold similarity. Identify duplicate instructions (926), when executed by the processor, also cause the processor to, for code sections having a threshold similarity, identify duplicate code sections based on the tree graph models. Notify instructions (928), when executed by the processor, also cause the processor to, responsive to a detected duplication, present a notification of a duplication. [0058] In summary, using such a system, method, and machine-readable storage medium may, for example, 1 ) reduces CI/CD code duplication in a repository, 2) simplifies maintainability of CI/CD pipeline code files; 3) detects when a CI/CD code duplication occurs; and 4) automatically address a detected CI/CD code duplication. However, it is contemplated that the devices disclosed herein may address other matters and deficiencies in a number of technical areas, for example.

Claims

CLAIMS What is claimed is:
1 . A system, comprising: a repository to store continuous integration/continuous development (CI/CD) pipeline code files, each pipeline code file to validate an application; a non-transitory machine-readable storage medium to store instructions; and a processor to execute the instructions, the instructions to cause the processor to: receive an update to application files, wherein reception calls a pipeline code file to verify the update; compare pipeline code files from the repository to identify duplicate code sections in the pipeline code files; and responsive to a detected duplication, present a notification of the duplication.
2. The system of claim 1 , wherein comparing pipeline code files from the repository to identify duplicate code sections in the pipeline code files comprises at least one of a semi-structured text analysis and a graph model analysis of the pipeline code files to identify duplicate code sections.
3. The system of claim 1 , wherein the instructions further cause the processor to delete duplicated code sections from each pipeline code file.
4. The system of claim 3, wherein the instructions further cause the processor to, responsive to a detected duplication: generate a template to include duplicated code sections; and include in each pipeline code file a pointer to the template.
5. The system of claim 1 , further comprising a graph database to store graph models of the pipeline code files.
6. The system of claim 5, wherein: a graph model represents a pipeline code file as a hierarchy; jobs of a stage are represented as child nodes to a stage node; and steps of a job are represented as child nodes to a job node.
7. The system of claim 5, wherein a graph model indicates a sequential relationship of stages, jobs, and steps of the CI/CD pipeline.
8. A method, comprising: identifying, from a repository, a number of continuous integration/continuous development (CI/CD) pipeline code files, wherein each pipeline code file comprises a hierarchy of stages, jobs, and steps; comparing a number of pipeline code files from the repository to identify similar code sections in each of the pipeline code files by: performing a semi-structured text analysis of the pipeline code files; and performing a graph model similarity analysis of the pipeline code files; and responsive to a detected duplication, present a notification of the duplication.
9. The method of claim 8, wherein identifying similar code sections comprises identifying code sections that are identical.
10. The method of claim 8, further comprising, responsive to a determined duplicated code section: generating a template which includes the duplicated code section; and replacing the duplicated code section in each pipeline code file with a pointer to the template.
11 . The method of claim 10, wherein: a template comprises multiple sections of duplicated code; and at least one section of duplicated code is conditionally executed.
12. The method of claim 10, wherein comparing the number of pipeline code files comprises comparing a sequential execution of similar code sections.
13. The method of claim 12, wherein similar code sections executed in a different order between pipeline code files are not identified as duplicated code sections.
14. A non-transitory machine-readable storage medium encoded with instructions executable by a processor, the machine-readable storage medium comprising instructions to, when executed by the processor cause the processor to: identify, from a repository, a number of continuous integration/continuous development (CI/CD) pipeline code files, wherein each pipeline code file comprises a hierarchy of stages, jobs, and steps; generate a tree graph model of each pipeline code file based on the stages, jobs, and steps; performing semi-structured text analysis between the pipeline code files to identify code sections with a threshold similarity; and for code sections having the threshold similarity, identify duplicate code sections based on the tree graph models; responsive to a detected duplication, present a notification of the duplication.
15. The non-transitory machine-readable storage medium of claim 14, wherein the machine-readable storage medium comprises instructions to
19 generate a template based on duplicated code sections, wherein duplicated step attributes and values are included in the template.
20
PCT/US2020/057350 2020-10-26 2020-10-26 Ci/cd pipeline code file duplication notifications WO2022093172A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2020/057350 WO2022093172A1 (en) 2020-10-26 2020-10-26 Ci/cd pipeline code file duplication notifications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2020/057350 WO2022093172A1 (en) 2020-10-26 2020-10-26 Ci/cd pipeline code file duplication notifications

Publications (1)

Publication Number Publication Date
WO2022093172A1 true WO2022093172A1 (en) 2022-05-05

Family

ID=81384267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/057350 WO2022093172A1 (en) 2020-10-26 2020-10-26 Ci/cd pipeline code file duplication notifications

Country Status (1)

Country Link
WO (1) WO2022093172A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6715145B1 (en) * 1999-08-31 2004-03-30 Accenture Llp Processing pipeline in a base services pattern environment
US20100017459A1 (en) * 2000-06-30 2010-01-21 International Business Machines Corporation Device and method for updating code
US20100057663A1 (en) * 2008-08-29 2010-03-04 Oracle International Corporation Techniques for matching a certain class of regular expression-based patterns in data streams
US8037453B1 (en) * 2006-09-13 2011-10-11 Urbancode, Inc. System and method for continuous software configuration, test and build management
US20110307879A1 (en) * 2009-02-24 2011-12-15 Toyota Jidosha Kabushiki Kaisha Program update device, program update method, and information processing device
US20140189641A1 (en) * 2011-09-26 2014-07-03 Amazon Technologies, Inc. Continuous deployment system for software development

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6715145B1 (en) * 1999-08-31 2004-03-30 Accenture Llp Processing pipeline in a base services pattern environment
US20100017459A1 (en) * 2000-06-30 2010-01-21 International Business Machines Corporation Device and method for updating code
US8037453B1 (en) * 2006-09-13 2011-10-11 Urbancode, Inc. System and method for continuous software configuration, test and build management
US20100057663A1 (en) * 2008-08-29 2010-03-04 Oracle International Corporation Techniques for matching a certain class of regular expression-based patterns in data streams
US20110307879A1 (en) * 2009-02-24 2011-12-15 Toyota Jidosha Kabushiki Kaisha Program update device, program update method, and information processing device
US20140189641A1 (en) * 2011-09-26 2014-07-03 Amazon Technologies, Inc. Continuous deployment system for software development

Similar Documents

Publication Publication Date Title
US11281708B2 (en) Utilizing a machine learning model to predict metrics for an application development process
Tomassi et al. Bugswarm: Mining and continuously growing a dataset of reproducible failures and fixes
Jiang et al. Co-evolution of infrastructure and source code-an empirical study
US20170372247A1 (en) Methods, systems, and articles of manufacture for implementing software application development and releases
AU2005203492B2 (en) Automated test case verification that is loosely coupled with respect to automated test case execution
US20110161938A1 (en) Including defect content in source code and producing quality reports from the same
CN108776643B (en) Target code merging control method and system based on version control process
US20080276221A1 (en) Method and apparatus for relations planning and validation
US20170132116A1 (en) Methods Circuits Apparatuses Systems and Associated Computer Executable Code for Generating a Software Unit Test
Avdeenko et al. The ontology-based approach to support the completeness and consistency of the requirements specification
CN112131116A (en) Automatic regression testing method for embedded software
US20200133823A1 (en) Identifying known defects from graph representations of error messages
Chen et al. Bad smells and refactoring methods for gui test scripts
Li et al. A practical approach to testing GUI systems
US11119899B2 (en) Determining potential test actions
Bandara et al. Identifying software architecture erosion through code comments
Baker et al. Detect, fix, and verify TensorFlow API misuses
Tekin et al. A graph mining approach for detecting identical design structures in object-oriented design models
CN112148614A (en) Regression testing method and device
WO2022093172A1 (en) Ci/cd pipeline code file duplication notifications
WO2022093178A1 (en) Ci/cd pipeline code recommendations
CN111913706B (en) Topology construction method of dispatching automation system, storage medium and computing equipment
Sotiropoulos et al. The additional testsuite framework: facilitating software testing and test management
Njomou et al. On the Challenges of Migrating to Machine Learning Life Cycle Management Platforms
Sadovykh et al. Architecture driven modernization in practice–study results

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20960094

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20960094

Country of ref document: EP

Kind code of ref document: A1