US11886850B2 - Transformation templates to automate aspects of computer programming - Google Patents
Transformation templates to automate aspects of computer programming Download PDFInfo
- Publication number
- US11886850B2 US11886850B2 US17/903,496 US202217903496A US11886850B2 US 11886850 B2 US11886850 B2 US 11886850B2 US 202217903496 A US202217903496 A US 202217903496A US 11886850 B2 US11886850 B2 US 11886850B2
- Authority
- US
- United States
- Prior art keywords
- source code
- transformation
- migration
- templates
- library
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000009466 transformation Effects 0.000 title claims abstract description 282
- 238000013508 migration Methods 0.000 claims abstract description 78
- 238000000844 transformation Methods 0.000 claims abstract description 66
- 230000005012 migration Effects 0.000 claims abstract description 48
- 238000000034 method Methods 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 22
- 230000027455 binding Effects 0.000 claims description 8
- 238000009739 binding Methods 0.000 claims description 8
- 230000015654 memory Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000004321 preservation Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 13
- 238000010801 machine learning Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 239000000463 material Substances 0.000 description 5
- 238000009635 antibiotic susceptibility testing Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 230000010006 flight Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 229920002803 thermoplastic polyurethane Polymers 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/51—Source to source
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/76—Adapting program code to run in a different environment; Porting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- Source code maintenance often includes migration of source code, which is time consuming and expensive. Some large code bases may require numerous years' worth of engineer and/or programmer time in order to be migrated from one version to another. This type of work is often considered tedious and/or cumbersome, which may lead to mistakes being made and/or failure to implement transformations that are critical to the migration.
- Implementations are described herein for building and/or applying a library of transformation templates to automate migration of source code.
- the library of transformation templates may be built during a training phase then applied to new source code during an inference phase.
- “training” source code bases that have undergone previous migrations may be analyzed to identify and learn source code transformations (e.g., edits).
- These learned transformations may be a foundation for capturing software developers' skills and/or institutional knowledge, as well as for learning how various types of source code has evolved over time.
- the learned transformations may be generalized to form what will be referred to herein as “transformation templates,” which are rules for transforming source code snippets.
- Transformation templates may be represented in various ways, such as pairs of code snippets, one predecessor and the other successor, and/or as pairs of graphs, again one predecessor and the other successor.
- each graph may take various forms, such as an abstract syntax tree (AST) or a control flow graph (CFG).
- AST abstract syntax tree
- CFG control flow graph
- two versions of source code may be analyzed to identify transformation(s) made to the pre-migration version to yield the post-migration version.
- various techniques for aligning source code may be applied first, e.g., to ensure that a post-migration source code snippet in fact corresponds to a pre-migration source code snippet.
- candidate transformation templates may be generated for each identified transformation.
- Each candidate transformation template may be a variation or permutation of the transformation in which different tokens are replaced with what will be referred to herein as “placeholders” or “wildcards,” while other tokens are preserved.
- the snippet xrange(6) is transformed during migration to range(6).
- This transformation could be represented by multiple different candidate transformation templates, each having a different combination of placeholders (Z, Y, and X below) and preserved tokens, such as the following: xrange( Z ) ⁇ range( Z ); Z (6) ⁇ Y (6); Z ( X ) ⁇ Y ( X );
- the first candidate transformation template would match any instance of the xrange( ) function with argument(s) passed to it, and replace it with range( ) while preserving the argument(s).
- the second candidate transformation template would match any instance of a function that includes, as a single argument, the number 6 .
- the third candidate transformation template would match any instance of any function that includes any argument(s).
- the criteria may include preservation of programming language built-in keyword(s) in and/or across the candidate transformation template.
- Programming language built-in keywords such as function names or other operators—especially if imported from standard or commonly-used application programming interfaces (APIs)—may be particularly important to preserve.
- Function arguments may be transient between different instances of the same function call.
- Other criteria may be provided to determine which candidate transformation template should be selected for inclusion in the library.
- these criteria may include successful application of the candidate transformation template to a pre-migration version training source code snippet to accurately generate a post-migration version of the training source code snippet. If the candidate transformation does not properly transform some other sampled source code snippet from a pre-migration version to a post-migration version, that candidate transformation template can be discarded or a score associated with it may be decremented.
- the criteria may include a count of transformations being implementable using the candidate transformation template. One broader candidate transformation template that is applicable to multiple source code snippets may be more likely selected than a narrower candidate transformation template that is only applicable to a single source code snippet.
- transformations detected in the training source code may be grouped into clusters based on similarity. Transformation templates may then be generated on a cluster-to-cluster basis, rather than on an individual transformation basis.
- the training source code and/or pertinent snippets thereof e.g., transformations and immediately surrounding contextual code
- tokens of the transformation may be encoded into embeddings, e.g., using techniques such as word2vec or a Bidirectional Encoder Representations from Transformers (BERT).
- these embeddings may be further encoded to include structure (e.g., syntactic or semantic) of the source code itself.
- structure e.g., syntactic or semantic
- a graph representation of the source code such as an abstract syntax tree (AST) or control flow graph (CFG)
- AST abstract syntax tree
- CFG control flow graph
- a graph-based machine learning model such as a graph neural network (GNN) may be applied to the graph representation to generate another embedding that encodes both semantics and structure of the original source code transformation.
- the resulting embedding may be grouped into a cluster with similar embeddings representing similar transformations. This cluster may then be leveraged to generate a transformation template that is applicable to any source code that maps to the cluster.
- a method implemented using one or more processors may include: analyzing pre-migration and post-migration versions of source code that exist prior to and after migration of the source code; based on the analyzing, identifying one or more transformations made to the pre-migration version of the source code to yield the post-migration version of the source code; and building a library of transformation templates that are applicable subsequently to automate migration of new source code.
- the building may include, for one or more of the transformations: generating a plurality of candidate transformation templates, wherein for each candidate transformation template, different permutations of tokens of the transformation are replaced with placeholders, and selecting one of the plurality of candidate transformation templates for inclusion in the library of transformation templates, wherein the selecting is based on one or more criteria.
- the library may include a lattice of transformation templates.
- the one or more criteria may include successful application of the candidate transformation template to a pre-migration version training source code snippet to accurately generate a post-migration version of the training source code snippet.
- the one or more criteria may include preservation of a programming language keyword in the candidate transformation template. In various implementations, the one or more criteria may include a count of the transformations being implementable using the candidate transformation template.
- the one or more transformations may include a plurality of transformations
- the method may further include: grouping the plurality of transformations into a plurality of clusters; generating a plurality of candidate transformation templates for a given cluster of the plurality of clusters, and selecting, based on one or more of the criteria, one of the plurality of candidate transformation templates generated for the given cluster for inclusion in the library of transformation templates.
- the grouping may include encoding each transformation of the plurality of transformations into an embedding.
- the encoding may be based on a transformer network and a graph neural network (GNN).
- one or more of the transformations may include contextual code surrounding the transformation.
- the method may include analyzing a pre-migration version of the new source code to match one or more transformation templates from the library to one or more snippets of the new source code; and applying the matched one or more transformation templates to the one or more snippets of the new source code to generate a post-migration version of the new source code.
- implementations include one or more processors of one or more computing devices, where the one or more processors are operable to execute instructions stored in associated memory, and where the instructions are configured to cause performance of any of the aforementioned methods. Some implementations also include one or more non-transitory computer readable storage media storing computer instructions executable by one or more processors to perform any of the aforementioned methods.
- FIG. 1 schematically depicts an example environment in which selected aspects of the present disclosure may be implemented, in accordance with various implementations.
- FIG. 2 schematically demonstrates an example of how aspects of the present disclosure may be implemented, in accordance with various implementations.
- FIG. 3 depicts an example graphical user interface (GUI) that may be presented in accordance with various implementations described herein.
- GUI graphical user interface
- FIG. 4 depicts another example of how aspects of the present disclosure may be implemented, in accordance with various implementations.
- FIG. 5 depicts another example of how aspects of the present disclosure may be implemented, in accordance with various implementations.
- FIG. 6 depicts a flowchart illustrating an example method for practicing selected aspects of the present disclosure.
- FIG. 7 illustrates an example architecture of a computing device.
- FIG. 1 schematically depicts an example environment in which selected aspects of the present disclosure may be implemented, in accordance with various implementations.
- Any computing devices depicted in FIG. 1 or elsewhere in the figures may include logic such as one or more microprocessors (e.g., central processing units or “CPUs”, graphical processing units or “GPUs”, tensor processing units or (“TPUs”)) that execute computer-readable instructions stored in memory, or other types of logic such as application-specific integrated circuits (“ASIC”), field-programmable gate arrays (“FPGA”), and so forth.
- Some of the systems depicted in FIG. 1 such as a code knowledge system 102 , may be implemented using one or more server computing devices that form what is sometimes referred to as a “cloud infrastructure,” although this is not required.
- a code knowledge system 102 may be operably coupled with clients 1101 -P via one or more computer networks ( 114 ) to help clients 110 1-P manage their respective code bases 112 1-P .
- code knowledge system 102 may be implemented locally at a client 110 .
- Code knowledge system 102 may include, among other things, a transformation module 104 that is configured to perform selected aspects of the present disclosure in order to help one or more clients 110 1-P to manage and/or make changes to one or more corresponding code bases 112 1-P .
- Each client 110 may be, for example, an entity or organization such as a business (e.g., financial institute, bank, etc.), non-profit, club, university, government agency, or any other organization that operates one or more software systems.
- a bank may operate one or more software systems to manage the money under its control, including tracking deposits and withdrawals, tracking loans, tracking investments, and so forth.
- An airline may operate one or more software systems for booking/canceling/rebooking flight reservations, managing delays or cancelations of flight, managing people associated with flights, such as passengers, air crews, and ground crews, managing airport gates, and so forth.
- Transformation module 104 may be configured to leverage prior source code transformations contained in training code 106 to facilitate building and/or application of transformation templates to automate aspects of computer programming, e.g., to aid clients 110 1-P in editing, updating, replatforming, migrating, or otherwise acting upon their code bases 112 1-P .
- training code 106 may include multiple different corpuses 108 1-N of source code that can be leveraged in this manner. These corpuses 108 1-N may be publicly available, proprietary, stored on a cloud, stored in a version control system (VCS), and so forth.
- VCS version control system
- one or more corpuses 108 of training code 106 may include pre-migration and post-migration versions of source code that exist prior to and after migration of the source code, respectively.
- a VCS may store all or at least some previous versions of source code.
- transformation module 104 may identify one or more transformations made to the pre-migration version of the source code to yield the post-migration version of the source code. Transformation module 104 may then build a library 107 of transformation templates that are applicable subsequently to automate migration of new source code.
- library 107 may be configured as a lattice of transformation templates, although this is not required.
- a “transformation template” may include one or more rules for transforming a source code snippet.
- different tokens are replaced with what are referred to herein as “placeholders” or “wildcards,” while other tokens are preserved.
- Transformation templates may be represented in various ways, such as pairs of code snippets, one predecessor and the other successor, and/or as pairs of graphs, again one predecessor and the other successor. In the latter case, subsequent source code to which the graph-based transformation template is to be applied may also be converted to graph form, such as an abstract syntax tree (AST) or control flow graph (CFG).
- AST abstract syntax tree
- CFG control flow graph
- each client 110 may include an integrated development environment (IDE) 111 that can be used to edit/write source code.
- IDE integrated development environment
- other applications may be used to edit source code, such as a simple text editor, a word processing application, a source code editor application with specific functionality to aid in computer programming, etc.
- the source code the programmer sees may be visually annotated, e.g., with different tokens being rendered in different colors to facilitate ease of reading.
- the source code editor may include extra functionality specifically designed to ease programming tasks, such as tools for automating various programming tasks, a compiler, real time syntax checking, etc.
- techniques described herein may enhance aspects of this extra functionality provided by a source code editor (whether a standalone application or part of an IDE), e.g., by generating and/or recommending code edit suggestions (e.g., to comport with prior successful transformations).
- code knowledge system 102 may include a machine learning (“ML” in FIG. 1 ) module 105 that has access to data indicative of one or more trained machine learning models (not depicted).
- These trained machine learning models may take various forms, including but not limited to a graph-based network such as a graph neural network (GNN), graph attention neural network (GANN), or graph convolutional neural network (GCN), a sequence-to-sequence model such as an encoder-decoder, various flavors of a recurrent neural network (“RNN”, e.g., long short-term memory, or “LSTM”, gate recurrent units, or “GRU”, etc.), and any other type of machine learning model that may be applied to facilitate selected aspects of the present disclosure.
- GNN graph neural network
- GANN graph attention neural network
- GCN graph convolutional neural network
- RNN recurrent neural network
- LSTM long short-term memory
- GRU gate recurrent units
- ML module 105 may apply these machine learning models to source code transformations made previously in order to group the transformations into clusters of embeddings corresponding to semantically and/or syntactically similar source code transformations.
- ML module 105 may apply a machine learning model such as a GNN or an encoder portion of an autoencoder to pre- and post-migration versions of a source code snippet to generate an embedding (or feature vector) representation of the transformation.
- a graph-based machine learning model such as a GNN
- the source code transformation may be represented in the form of a graph, such as an AST or CFG.
- transformation module 104 may then generate transformation templates on a cluster-by cluster basis. However, this is not meant to be limiting. In other implementations, transformation module 104 may generate transformation templates on an individual source code transformation basis.
- transformation module 104 may generate multiple different variations of candidate transformation templates, and then analyze these candidates against one or more criteria to determine whether and/or how effective they will be in automating similar transformations in other source code. For example, in some implementations, for a given candidate transformation template, transformation module 104 may generate a plurality of candidate transformation templates. With each candidate transformation template, different permutations of tokens of the transformation are replaced with placeholders. Transformation module 104 may then select one of the plurality of candidate transformation templates for inclusion in library 107 of transformation templates.
- Transformation module 104 may select candidate transformation templates for inclusion in library 107 based on one or more criteria. These criteria may come in various forms. In some implementations, the criteria may include successful application of the candidate transformation template to a pre-migration version of a training source code snippet to accurately generate a post-migration version of the training source code snippet. In some such implementations, the success or failure of such an application may be dispositive. In other implementations, a count of transformations being implementable using the candidate transformation template may be considered when determining whether to select the candidate transformation for inclusion in library 107 . And as will be described below, in some implementations, unsuccessful applications of a candidate transformation template may be identified during a “guard search” and used to identify counter-candidate transformation templates to handle these exceptions.
- the criteria may include preservation of a programming language built-in keyword in the candidate transformation template, as opposed to “variabilization” of the programming language keyword. For example, suppose a source code transformation for which candidate transformation templates are being evaluated comprises changing “xrange(6)” to “range(6).” The following candidate transformations might be generated: xrange( Z ) ⁇ range( Z ); Z (6) ⁇ Y (6); Z ( X ) ⁇ Y ( X ); The first transformation may receive a score that is higher than the other two transformations because the programming language built-in keywords “xrange” and “range” are preserved. Intuitively, downstream application of transformation templates is less likely to depend on programmer-defined tokens (e.g., variable names, custom function names) than programming language keywords. In some implementations, an exception to this criterion may occur where a programming language built-in keyword is located within an outer call node, in which case the built-in keyword may be replaced with a placeholder.
- programmer-defined tokens e.g., variable names, custom function names
- Transformation module 104 may consider other criteria for selecting candidate transformation templates for inclusion in library 107 .
- candidate transformation templates in which programmer-defined tokens are not replaced with placeholders may receive a lower score or have their score decremented.
- the number of nodes in a graph e.g., AST, CFG
- a score for such a transformation template may be decremented for each node in the rule.
- a rule that allows a variable to appear in a successor portion of a transformation template that did not appear on the predecessor portion of the transformation template may be penalized or even forbidden.
- xrange(X)range(X) may be applicable to any of the following source code snippets: “xrange(7),” “xrange(id)”, “xrange(list(foo)[0]), etc. This may occur in batches, such that programmer(s) are presented with lists of changes and/or multiple changes are implemented automatically.
- the application may occur one-at-a-time, e.g., while a programmer operates IDE 111 to modify source code.
- FIG. 3 depicts one example of a graphical user interface that may be presented in accordance with techniques described herein.
- FIG. 5 schematically depicts one example of how application of a transformation template may be implemented.
- FIG. 2 schematically demonstrates an example lattice pipeline 218 for building library 107 of transformation templates, in accordance with various implementations.
- the input to the system is a collection 220 of source code transformations extracted from one or more source code files, such as from across a corpus of source code 108 .
- the collection 220 of source code transformations may be collected in various ways, such as by aligning graphs (e.g., ASTs, CFGs) corresponding to pre- and post-migration source code and identifying aligned code snippets in which a transformation occurred.
- Each source code transformation may be represented in various ways, such as a pair of source code snippets, a pair of graphs (e.g., ASTs, CFGs) corresponding to source code snippets, etc.
- the output of lattice pipeline 218 is a plurality of transformation templates 244 1-3 (three are provided here for illustrative purposes, any number of templates may be generated) that are stored in library 107 for future use to automate aspects of programming.
- the collection 220 of source code transformations may be grouped into clusters using various clustering techniques.
- each source code transformation in collection 220 may be processed, e.g., by ML module 105 , to generate an embedding.
- ML module 105 e.g., ML module 105
- a graph-based machine learning model such as a GNN may be employed.
- a combination of machine learning models may be applied to generate the embeddings.
- a pure text-based encoder such as Bidirectional Encoder Representations from Transformers (BERT)-based transformer or word2vec may be used to transform each token/node into a vector, and a graph of those vectors (e.g., arranged in accordance with an AST or CFG of the underlying source code) may be processed using a GNN.
- a graph of those vectors e.g., arranged in accordance with an AST or CFG of the underlying source code
- just a text-based encoder may be employed.
- the embeddings Once the embeddings are generated, they may be grouped into clusters using any applicable clustering techniques for embeddings in embedding space, including but not limited to K-means clustering.
- one or more filters may be applied to weed out source code transformations that are not suitable for creation of transformation templates. These may include malformed source code transformations such as changes that encompass entire functions (which may be too large to generate a reliable transformation template) or which involve import statements.
- raw source code transformations 226 1-5 there may remain some number of raw source code transformations 226 1-5 (five shown for illustrative purposes, any number may result) for further processing by lattice pipeline 218 .
- generalization processing may occur for each raw source code transformation 226 , at block 228 .
- this generalization 228 may occur in parallel across multiple raw source code transformations 226 1-5 , although this is not required.
- only one such parallel processing pipeline will be described here, but the process may be the same for each raw source code transformation 226 .
- raw source code transformation 226 which may include a predecessor snippet/graph and a successor snippet/graph, may be generalized in multiple different ways into a plurality of candidate transformation templates 230 1-3 (three are shown in FIG. 2 but any positive number of candidates is possible).
- each of these candidate transformation templates 230 1-3 may be assessed, e.g., by transformation module 104 , against one or more the criteria mentioned previously.
- the assessment block 234 is only illustrated for first candidate transformation template 230 1 for the sake of simplicity, but the same assessment would occur, e.g., in parallel, for all candidate transformation templates 230 1-3 .
- a “guard search” also may be performed with candidate transformation templates 230 1-3 .
- transformations that should not occur, in spite of the candidate transformation template 230 otherwise being applicable, may be identified. This may result in a list of “do not apply” conditions being established for one or more of candidate transformation templates 230 1-3 .
- the guard search procedure 232 discovers these “do not apply” conditions by applying the candidate transformation template to training code 106 to detect incorrect/failed transformations, and generalizing these contexts using the same generalization procedure described herein.
- output of the assessment of block 234 may be analyzed by a scoring block 236 to assign candidate transformation templates 230 1-3 corresponding scores 238 1-3 .
- this score 238 may be incremented or decremented based on assessment of each candidate transformation template 230 against the various criteria described previously.
- one or more candidate transformation templates 230 may be selected, e.g., by transformation module 104 , based on scores 238 1-3 .
- scores 238 1-3 are processed using an argmax function 240 to select what will be referred to herein as a “selected” transformation template 241 from candidate transformation templates 230 1-3 .
- this process of guard searching ( 232 ), assessment ( 234 ), scoring ( 236 ), and selecting (via argmax 240 ) are performed for each candidate transformation template 230 .
- a plurality of selected transformation templates 241 1-5 are generated for, and correspond to, the plurality of raw source code transformations 226 1-5 .
- an inter-transformation selection process may be performed on selected transformation templates 241 1-5 to further whittle down the number of subsequently-applicable transformation templates to a final set of usable transformation templates 244 1-3 .
- This whittling down may include deduplication of identical transformation templates and/or elimination of selected transformation templates 241 that are subsumed by other transformation templates.
- These usable transformation templates 244 1-3 may then be stored in library 107 (e.g., as a lattice structure) for future use in automating aspects of computer programming.
- FIG. 3 depicts an example GUI that may be presented to a user to recommend one or more auto edits, and for facilitating navigation to relevant portions of source code for potential transformation.
- a number of instances of source code that match various transformation templates have been identified in a codebase.
- the file “foo.cc” includes two instances of source code that match two different transformation templates: a first instance that matches a transformation template that transforms the function call “xrange” to “range” while preserving (via the placeholder X) argument(s) that are passed to the function; and a second instance that matches a transformation template that wraps a KEYS call in a LIST call.
- the file “bar.cc” also includes a number of applicable transformations.
- the programmer may be able to select (e.g., click) any of the filenames and/or the instances of applicable transformation templates to be taken directly to the corresponding locations in source code.
- these transformations may be implemented automatically, or the user may have the option of accepting them, rejecting them, and/or modifying them.
- FIG. 4 schematically depicts a non-limiting example of how N (positive integer) training source code transformations may be used to assess (block 234 in FIG. 2 ) candidate transformation templates 230 .
- each training source code transformation is represented as a predecessor AST 450 and a successor AST 450 ′.
- training source code transformations may be represented in other ways/data structures.
- nodes of first predecessor AST 450 ′ 1 are selected, as indicated by the dashed arrows.
- a first candidate transformation template (“TT” in FIG. 4 ) 230 1 is also selected.
- first candidate transformation template 230 1 is applied to the selected nodes of first predecessor AST 450 1 to generate a transformed AST (not depicted).
- the transformed AST is compared to first successor AST 450 ′ 1 to evaluate the accuracy of the transformation. If the transformation was accurate, a true positive may be output at block 458 . In some implementations, this may result in a score associated with first candidate transformation template 230 1 being incremented (or at least not decremented). If the transformed AST differs from first successor AST 450 ′ 1 , then a false positive may be output at block 460 . In some implementations, this may result in a score associated with first candidate transformation template 230 1 being decremented.
- FIG. 5 schematically depicts a non-limiting example of how a transformation template 244 may be applied to predict a successor source code snippet 570 from a predecessor source code snippet 562 , e.g., during inference.
- Placeholders/variables of predecessor portion 564 of transformation template 244 may be bound to token(s) of predecessor source code snippet 562 at block 568 .
- the placeholder Z is bound to the programmer-defined function name “foo.”
- the placeholder Y is bound to the programmer-defined variable x.
- the placeholder X is bound to the programmer-defined variables (y, z).
- FIG. 6 is a flowchart illustrating an example method 600 of practicing selected aspects of the present disclosure, in accordance with implementations disclosed herein. For convenience, the operations of the flow chart are described with reference to a system that performs the operations. This system may include various components of various computer systems, such as one or more components of code knowledge system 102 . Moreover, while operations of method 600 are shown in a particular order, this is not meant to be limiting. One or more operations may be reordered, omitted or added.
- the system may analyze pre-migration and post-migration versions of source code (also referred to herein as predecessor and successor versions herein) that exist prior to and after migration of the source code.
- this analysis may include aligning source code snippet(s) in the pre-migration version with corresponding source code snippet(s) in the post-migration version, e.g., using ASTs, CFGs, or other techniques.
- the system may identify one or more transformations (e.g., raw source code transformations 226 1-5 in FIG. 2 ) made to the pre-migration version of the source code to yield the post-migration version of the source code.
- these source code transformations may be grouped into clusters, e.g., based on embeddings generated from the transformations using machine learning models such as GNNs and/or transformers.
- the system may build a library (e.g., 107 ) of transformation templates e.g., 244 1-3 ) that are applicable subsequently to automate migration of new source code.
- the building of block 606 may be implemented in various ways. For example, at block 608 , for one or more of the transformations (or for a cluster of transformations in some cases), a plurality of candidate transformation templates (e.g., 230 1-3 in FIG. 2 ) may be generated. In some implementations, for each candidate transformation template, different permutations of tokens of the transformation may be replaced with placeholders.
- the system may select one of the plurality of candidate transformation templates for inclusion in the library of transformation templates. This selection may be based on an assessment (e.g., block 234 in FIG. 2 ) that is performed against one or more criteria.
- Example criteria were described previously, and may include, but are not limited to, preservation of programming language built-in keywords, variabilization of programmer-defined tokens, a count of the transformations being implementable using the candidate transformation template, and so forth.
- FIG. 7 is a block diagram of an example computing device 710 that may optionally be utilized to perform one or more aspects of techniques described herein.
- Computing device 710 typically includes at least one processor 714 which communicates with a number of peripheral devices via bus subsystem 712 .
- peripheral devices may include a storage subsystem 724 , including, for example, a memory subsystem 725 and a file storage subsystem 726 , user interface output devices 720 , user interface input devices 722 , and a network interface subsystem 716 .
- the input and output devices allow user interaction with computing device 710 .
- Network interface subsystem 716 provides an interface to outside networks and is coupled to corresponding interface devices in other computing devices.
- User interface input devices 722 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touch screen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices.
- pointing devices such as a mouse, trackball, touchpad, or graphics tablet
- audio input devices such as voice recognition systems, microphones, and/or other types of input devices.
- use of the term “input device” is intended to include all possible types of devices and ways to input information into computing device 710 or onto a communication network.
- User interface output devices 720 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices.
- the display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image.
- the display subsystem may also provide non-visual display such as via audio output devices.
- output device is intended to include all possible types of devices and ways to output information from computing device 710 to the user or to another machine or computing device.
- Storage subsystem 724 stores programming and data constructs that provide the functionality of some or all of the modules described herein.
- the storage subsystem 724 may include the logic to perform selected aspects of the method of FIG. 6 , as well as to implement various components depicted in FIGS. 1 - 2 and 4 - 5 .
- Memory 725 used in the storage subsystem 724 can include a number of memories including a main random access memory (RAM) 730 for storage of instructions and data during program execution and a read only memory (ROM) 732 in which fixed instructions are stored.
- a file storage subsystem 726 can provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges.
- the modules implementing the functionality of certain implementations may be stored by file storage subsystem 726 in the storage subsystem 724 , or in other machines accessible by the processor(s) 714 .
- Bus subsystem 712 provides a mechanism for letting the various components and subsystems of computing device 710 communicate with each other as intended. Although bus subsystem 712 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple busses.
- Computing device 710 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computing device 710 depicted in FIG. 7 is intended only as a specific example for purposes of illustrating some implementations. Many other configurations of computing device 710 are possible having more or fewer components than the computing device depicted in FIG. 7 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Stored Programmes (AREA)
Abstract
Description
xrange(Z)→range(Z);
Z(6)→Y(6);
Z(X)→Y(X);
xrange(Z)→range(Z);
Z(6)→Y(6);
Z(X)→Y(X);
The first transformation may receive a score that is higher than the other two transformations because the programming language built-in keywords “xrange” and “range” are preserved. Intuitively, downstream application of transformation templates is less likely to depend on programmer-defined tokens (e.g., variable names, custom function names) than programming language keywords. In some implementations, an exception to this criterion may occur where a programming language built-in keyword is located within an outer call node, in which case the built-in keyword may be replaced with a placeholder.
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/903,496 US11886850B2 (en) | 2021-02-16 | 2022-09-06 | Transformation templates to automate aspects of computer programming |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/176,730 US11481202B2 (en) | 2021-02-16 | 2021-02-16 | Transformation templates to automate aspects of computer programming |
US17/903,496 US11886850B2 (en) | 2021-02-16 | 2022-09-06 | Transformation templates to automate aspects of computer programming |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/176,730 Continuation US11481202B2 (en) | 2021-02-16 | 2021-02-16 | Transformation templates to automate aspects of computer programming |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220413820A1 US20220413820A1 (en) | 2022-12-29 |
US11886850B2 true US11886850B2 (en) | 2024-01-30 |
Family
ID=82800359
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/176,730 Active US11481202B2 (en) | 2021-02-16 | 2021-02-16 | Transformation templates to automate aspects of computer programming |
US17/903,496 Active US11886850B2 (en) | 2021-02-16 | 2022-09-06 | Transformation templates to automate aspects of computer programming |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/176,730 Active US11481202B2 (en) | 2021-02-16 | 2021-02-16 | Transformation templates to automate aspects of computer programming |
Country Status (1)
Country | Link |
---|---|
US (2) | US11481202B2 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11481202B2 (en) | 2021-02-16 | 2022-10-25 | X Development Llc | Transformation templates to automate aspects of computer programming |
US11941372B2 (en) * | 2021-04-01 | 2024-03-26 | Microsoft Technology Licensing, Llc | Edit automation using an anchor target list |
US11875136B2 (en) | 2021-04-01 | 2024-01-16 | Microsoft Technology Licensing, Llc | Edit automation using a temporal edit pattern |
US20230079904A1 (en) * | 2021-09-16 | 2023-03-16 | Dell Products L.P. | Code Migration Framework |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060195436A1 (en) * | 2005-02-28 | 2006-08-31 | Fujitsu Network Communications, Inc. | Phased migration of database software application |
US20080155519A1 (en) | 2006-12-21 | 2008-06-26 | Loughborough University Enterprises Limited | Code translator |
US7617449B2 (en) * | 2004-05-28 | 2009-11-10 | Microsoft Corporation | Method and system for mapping content between a starting template and a target template |
US20120204161A1 (en) * | 2011-02-09 | 2012-08-09 | Particle Code, Inc. | Automated Code Map Generation for an Application Programming Interface of a Programming Language |
US20130227533A1 (en) * | 2008-11-06 | 2013-08-29 | Albert Donald Tonkin | Code transformation |
US20130263091A1 (en) * | 2012-03-31 | 2013-10-03 | Bmc Software, Inc. | Self-evolving computing service template translation |
US20130339943A1 (en) * | 2012-06-18 | 2013-12-19 | Syntel, Inc. | Computerized migration tool and method |
US8788935B1 (en) | 2013-03-14 | 2014-07-22 | Media Direct, Inc. | Systems and methods for creating or updating an application using website content |
US20140282371A1 (en) * | 2013-03-14 | 2014-09-18 | Media Direct, Inc. | Systems and methods for creating or updating an application using a pre-existing application |
US9317266B1 (en) * | 2014-11-12 | 2016-04-19 | Bank Of America Corporation | Leveraging legacy applications for use with modern applications |
US10019259B2 (en) * | 2013-01-29 | 2018-07-10 | Mobilize.Net Corporation | Code transformation using extensibility libraries |
US10095511B1 (en) | 2017-02-23 | 2018-10-09 | Amdocs Development Limited | System, method, and computer program for converting a current Java project to a Maven project |
US20180349109A1 (en) | 2017-06-03 | 2018-12-06 | Apple Inc. | Integration of learning models into a software development system |
US20190034429A1 (en) * | 2017-07-29 | 2019-01-31 | Splunk Inc. | Translating a natural language request to a domain-specific language request using templates |
US20190361684A1 (en) * | 2018-05-25 | 2019-11-28 | Paypal, Inc. | Systems and methods for providing an application transformation tool |
US20200097261A1 (en) * | 2018-09-22 | 2020-03-26 | Manhattan Engineering Incorporated | Code completion |
US20200150953A1 (en) | 2018-11-09 | 2020-05-14 | Manhattan Engineering Incorporated | Deployment models |
US10776721B1 (en) | 2019-07-25 | 2020-09-15 | Sas Institute Inc. | Accelerating configuration of machine-learning models |
US20200311352A1 (en) | 2019-03-29 | 2020-10-01 | Fujitsu Limited | Translation method, learning method, and non-transitory computer-readable storage medium for storing translation program |
US20200319879A1 (en) | 2019-04-03 | 2020-10-08 | Accenture Global Solutions Limited | Development project blueprint and package generation |
US20210011694A1 (en) * | 2019-07-09 | 2021-01-14 | X Development Llc | Translating between programming languages using machine learning |
US20210149650A1 (en) | 2019-11-14 | 2021-05-20 | Mojatatu Networks | Systems and methods for creating and deploying applications and services |
US20210165647A1 (en) * | 2019-12-03 | 2021-06-03 | Bank Of America Corporation | System for performing automatic code correction for disparate programming languages |
US20210192321A1 (en) * | 2019-12-18 | 2021-06-24 | X Development Llc | Generation and utilization of code change intents |
US20220236971A1 (en) * | 2021-01-27 | 2022-07-28 | X Development Llc | Adapting existing source code snippets to new contexts |
US20220261231A1 (en) | 2021-02-16 | 2022-08-18 | X Development Llc | Transformation templates to automate aspects of computer programming |
-
2021
- 2021-02-16 US US17/176,730 patent/US11481202B2/en active Active
-
2022
- 2022-09-06 US US17/903,496 patent/US11886850B2/en active Active
Patent Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7617449B2 (en) * | 2004-05-28 | 2009-11-10 | Microsoft Corporation | Method and system for mapping content between a starting template and a target template |
US20060195436A1 (en) * | 2005-02-28 | 2006-08-31 | Fujitsu Network Communications, Inc. | Phased migration of database software application |
US20080155519A1 (en) | 2006-12-21 | 2008-06-26 | Loughborough University Enterprises Limited | Code translator |
US20130227533A1 (en) * | 2008-11-06 | 2013-08-29 | Albert Donald Tonkin | Code transformation |
US20120204161A1 (en) * | 2011-02-09 | 2012-08-09 | Particle Code, Inc. | Automated Code Map Generation for an Application Programming Interface of a Programming Language |
US20130263091A1 (en) * | 2012-03-31 | 2013-10-03 | Bmc Software, Inc. | Self-evolving computing service template translation |
US20130339943A1 (en) * | 2012-06-18 | 2013-12-19 | Syntel, Inc. | Computerized migration tool and method |
US10019259B2 (en) * | 2013-01-29 | 2018-07-10 | Mobilize.Net Corporation | Code transformation using extensibility libraries |
US8788935B1 (en) | 2013-03-14 | 2014-07-22 | Media Direct, Inc. | Systems and methods for creating or updating an application using website content |
US20140282371A1 (en) * | 2013-03-14 | 2014-09-18 | Media Direct, Inc. | Systems and methods for creating or updating an application using a pre-existing application |
US9317266B1 (en) * | 2014-11-12 | 2016-04-19 | Bank Of America Corporation | Leveraging legacy applications for use with modern applications |
US20160132308A1 (en) * | 2014-11-12 | 2016-05-12 | Bank Of America Corpoaration | Leveraging legacy applications for use with modern applications |
US10095511B1 (en) | 2017-02-23 | 2018-10-09 | Amdocs Development Limited | System, method, and computer program for converting a current Java project to a Maven project |
US20180349109A1 (en) | 2017-06-03 | 2018-12-06 | Apple Inc. | Integration of learning models into a software development system |
US20190034429A1 (en) * | 2017-07-29 | 2019-01-31 | Splunk Inc. | Translating a natural language request to a domain-specific language request using templates |
US20190361684A1 (en) * | 2018-05-25 | 2019-11-28 | Paypal, Inc. | Systems and methods for providing an application transformation tool |
US20200097261A1 (en) * | 2018-09-22 | 2020-03-26 | Manhattan Engineering Incorporated | Code completion |
US20200150953A1 (en) | 2018-11-09 | 2020-05-14 | Manhattan Engineering Incorporated | Deployment models |
US20200311352A1 (en) | 2019-03-29 | 2020-10-01 | Fujitsu Limited | Translation method, learning method, and non-transitory computer-readable storage medium for storing translation program |
US20200319879A1 (en) | 2019-04-03 | 2020-10-08 | Accenture Global Solutions Limited | Development project blueprint and package generation |
US20210011694A1 (en) * | 2019-07-09 | 2021-01-14 | X Development Llc | Translating between programming languages using machine learning |
US10776721B1 (en) | 2019-07-25 | 2020-09-15 | Sas Institute Inc. | Accelerating configuration of machine-learning models |
US20210149650A1 (en) | 2019-11-14 | 2021-05-20 | Mojatatu Networks | Systems and methods for creating and deploying applications and services |
US20210165647A1 (en) * | 2019-12-03 | 2021-06-03 | Bank Of America Corporation | System for performing automatic code correction for disparate programming languages |
US20210192321A1 (en) * | 2019-12-18 | 2021-06-24 | X Development Llc | Generation and utilization of code change intents |
US20220236971A1 (en) * | 2021-01-27 | 2022-07-28 | X Development Llc | Adapting existing source code snippets to new contexts |
US20220261231A1 (en) | 2021-02-16 | 2022-08-18 | X Development Llc | Transformation templates to automate aspects of computer programming |
Non-Patent Citations (4)
Title |
---|
Alrubaye et al., "Learning to Recommend Third-Party Library Migration Opportunities at the API Level" Applied Soft Computing. vol. 90. 11 pages, dated May 2020. |
Levin et al., "Boosting Automatic Commit Classification Into Maintenance Activities by Utilizing Source Code Changes" arXiv:1711.05340v1 [cs.SE] Nov. 14, 2017. |
Plaice et al., "A New Approach to Version Control" Apr. 1992. IEEE Transactions on Software Engineering. 14 pp. |
Ronald J. Williams, "Version Space Learning" CSG220: Machine Learning, Spring 2007. 17 pp. |
Also Published As
Publication number | Publication date |
---|---|
US20220413820A1 (en) | 2022-12-29 |
US20220261231A1 (en) | 2022-08-18 |
US11481202B2 (en) | 2022-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11169786B2 (en) | Generating and using joint representations of source code | |
US11886850B2 (en) | Transformation templates to automate aspects of computer programming | |
US11604628B2 (en) | Generation and/or recommendation of tools for automating aspects of computer programming | |
US20210192321A1 (en) | Generation and utilization of code change intents | |
US20200371778A1 (en) | Automated identification of code changes | |
EP3671526B1 (en) | Dependency graph based natural language processing | |
US11281864B2 (en) | Dependency graph based natural language processing | |
US20190303115A1 (en) | Automated source code sample adaptation | |
US11656867B2 (en) | Conditioning autoregressive language model to improve code migration | |
US11822909B2 (en) | Adapting existing source code snippets to new contexts | |
US12001951B2 (en) | Automated contextual processing of unstructured data | |
US10706030B2 (en) | Utilizing artificial intelligence to integrate data from multiple diverse sources into a data structure | |
US11775271B1 (en) | Annotations for developers | |
US11842175B2 (en) | Dynamic recommendations for resolving static code issues | |
US20230297784A1 (en) | Automated decision modelling from text | |
Hosseini et al. | Compositional generalization for natural language interfaces to web apis | |
US11775267B2 (en) | Identification and application of related source code edits | |
US20240176604A1 (en) | Predicting and/or applying symbolic transformation templates | |
US12014160B2 (en) | Translating between programming languages independently of sequence-to-sequence decoders | |
US20230350657A1 (en) | Translating large source code using sparse self-attention | |
US20230251856A1 (en) | Refactoring and/or rearchitecting source code using machine learning | |
US20240086164A1 (en) | Generating synthetic training data for programming language translation | |
Jacob et al. | Webscalding: A framework for big data web services | |
Krishnan | Feature-based analysis of open source using big data analytics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: X DEVELOPMENT LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEWIS, OWEN;NI, BIN;SIGNING DATES FROM 20210208 TO 20210216;REEL/FRAME:061017/0841 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:X DEVELOPMENT LLC;REEL/FRAME:062572/0565 Effective date: 20221227 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |