GB2420638A - Method of substituting code fragments in Internal Representation - Google Patents

Method of substituting code fragments in Internal Representation Download PDF

Info

Publication number
GB2420638A
GB2420638A GB0425878A GB0425878A GB2420638A GB 2420638 A GB2420638 A GB 2420638A GB 0425878 A GB0425878 A GB 0425878A GB 0425878 A GB0425878 A GB 0425878A GB 2420638 A GB2420638 A GB 2420638A
Authority
GB
United Kingdom
Prior art keywords
pattern
code
source code
alternative
created
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0425878A
Other versions
GB0425878D0 (en
Inventor
Geetha Manjunath
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to GB0425878A priority Critical patent/GB2420638A/en
Publication of GB0425878D0 publication Critical patent/GB0425878D0/en
Publication of GB2420638A publication Critical patent/GB2420638A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/72Code refactoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation

Abstract

A method and apparatus is disclosed for compiling software in which the intermediate code representation produced by the compiler is processed further to automatically perform code substitutions defined by a user.

Description

1 2420638 A Method and Apparatus for Compiling Software
Field of Invention
The present invention relates to a method and apparatus for compiling software. More particularly, but not exclusively, the present invention relates to method and apparatus for identifying patterns in the intermediate representation of a compiler, which facilitates the
Background of the Invention
Software development often involves reengineering of existing source code to ensure shortest time-to-market. The reengineering process typically involves manual scans of the source code from either legacy software or open source. Tools based on regular expressions and intelligent code browsers and analysers are available for reengineering. These tools help the user to find certain syntactic elements of the program, based on queries by the programmer.
Part of the complexity of the reengineering task is due to the innumerable ways a specific functionality can be programmed. This is referred to as code duplication. One implication of code duplication in software reengineering is that if a bug is repaired in a system with duplicated code, all possible duplications of that bug must be repaired. Similarly, during reengineering of software for migration from one platform to another, changes are commonly required to all duplicated software portions. Most often duplicated code portions are not exact copies but closely resemble each other.
New applications are often developed using languages which use optimised scientific libraries for scientific applications. However, the selection of the correct function from a set of software libraries can be difficult as the libraries are large, complex and numeros. It is difficult for a developer to keep track of the various functions in a library and to be able to select the correct library function. As a result programmers often write custom code or make a selection having inappropriate functions.
C W Kessler in "Pattern-Driven Automatic Parallelization", Scientific Programming, Vol. 5, pp. 251-274, 1996 discloses a source code pattern recognition tool to parallelise sequential programs for distributed and shared memory architectures. However, the technique is limited to array operations in scientific computations and they had manually enhanced the compiler to identify those primitive patterns. Also the library itself is required to be written in a special language.
Santanu Paul & Atul Prakash in "A Framework for Source Code Search using Program Patterns", University of Michigan, USA disclose a system in which pattern matching is performed on application source code to enhance code readability and maintainability. The system requires the pattern to be described in a special pattern language and that the patterns are identified by the infrastructure.
An object of the invention is to provide a developer tool and framework for software reengineering that performs automatic code replacement of duplicated code.
It is an object of the present invention to provide a method or apparatus for compiling software, which avoids some of the above disadvantages or at least provides the public with a useful choice.
Summary of the Invention
According to a first aspect of the invention there is provided a method of compiling software comprising the steps of: a) converting source code into an intermediate representation; b) creating a code pattern in the intermediate representation, the pattern being associated with a portion of alternative code; c) searching the source code in the intermediate representation for a match with the pattern; and d) if the pattern matches a portion of the source then substituting that portion with the associated alternative code in the source code.
Preferably, in step b) a set of patterns is created, each the pattern associated with a portion of alternative code and in step c) the intermediate code is searched for each of the patterns.
Preferably, the alternative code is a call to a library of code sequences. Preferably, a finite state machine is used for the searching for the match. Preferably, the pattern is matched if it exactly corresponds to a portion of the source code. Preferably, the pattern is a template which matches selected elements of a portion of the source code. Preferably, the pattern is created in source code and converted into the intermediate representation for use in the method by the compiler. Preferably, the pattern and the alternative code are created from the same source code sequence. Preferably, the pattern is generated from a code sequence from a library of such sequences. Preferably, the pattern is generated from a secondary representation of a code sequence from a library of such sequences. Preferably, the pattern and the alternative code are created from the same source code sequence.
According to a second aspect of the invention there is provided apparatus for compiling software comprising: a parser operable to convert source code into an intermediate representation; a pattern generator operable to create a code pattern in the intermediate representation, the pattern being associated with a portion of alternative code; and a pattern matcher operable to search the source code in the intermediate representation for a match with the pattern and if the pattern matches a portion of the source then further operable to substitute that portion with the associated alternative code in the source code.
According to a third aspect of the invention there is provided a computer program or group of computer program arranged to enable a computer or group of computer to carry out the method of compiling software comprising the steps of: a) converting source code into an intermediate representation; b) creating a code pattern in the intermediate representation, the pattern being associated with a portion of alternative code; c) searching the source code in the intermediate representation for a match with the pattern; and d) if the pattern matches a portion of the source then substituting that portion with the associated alternative code in the source code.
According to a fourth aspect of the invention there is provided a computer program or group of computer programs arranged to enable a computer or group of computers to provide apparatus for compiling software comprising: a parser operable to convert source code into an intermediate representation; a pattern generator operable to create a code pattern in the intermediate representation, the pattern being associated with a portion of alternative code; and a pattern matcher operable to search the source code in the intermediate representation for a match with the pattern and if the pattern matches a portion of the source then further operable to substitute that portion with the associated alternative code in the source code.
Brief Description of the Drawings
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which: Figure 1 is a schematic illustration of a compiler system according to an embodiment of the invention; and Figure 2 is a flow chart illustrating the processing of the compiler system of figure 1.
Detailed Description of a Preferred Embodiment of the Invention Figure 1 shows a compiler system 100 running on a computer (not shown) which comprises a compiler 101 which according to this embodiment corresponds to a SUIF2 compiler released by the SUIF Compiler Group, Stanford University, California, USA, running on a computer with a WindowsTM operating system. The compiler 101 comprises a parser that takes source code 103 as an input and a code generator that generates object code 105 for execution.
Between the parsing and the code generation, the compiler produces an intermediate representation (IR) 107 of the parsed source code 103 which is independent of the language of the input source code or of the computer on which the compiler runs.
The compiler system 100 further comprises a pattern generator 109 which takes as its input a set of source code sequences 111 and from them generates a set of patterns 113. Each of the source code sequences 111 is associated with one of a set of JR code substitutions 115. The pattern generator 109 operates in the same manner as the parser of the compiler 101 when it parses the source code sequences and generates from each one a pattern in the same form as the JR. The source code sequences 111 and each of the associated IR code substitutions 115 are created or selected by the user. The patterns 113 are used in conjunction with the JR code substitutions by a pattern matcher 117. The pattern matcher 117 scans the JR for code sequences which match any of the patterns 113. If a match is found, the pattern matcher 117 performs a code substitution by exchanging the matched code sequence in the JR with the associated element from the set of JR code substitutions 115.
The pattern matcher 117 is arranged to find and replace single patterns in the JR 107 in either exact match or template matching modes. The user inputs find and replace code fragments in a single C source file as functions find() and replaces. The replace pattern may be another code sequence or a call to a library of code sequences. The pattern generator converts the find patterns into the patterns 113 and the replace patterns into the JR code substitutions 115.
The pattern matcher identifies the pattern by performing a node-to-node match in the parse tree represented in the JR 107. The pattern matcher uses finite state machine to identify code fragments in the IR. Since SUIF supports multiple languages (C++TM, JavaTM), the pattern matcher is not limited to the JR created by any particular source code language.
In exact match mode the pattern matcher carries out a node to node comparison on the parse tree to identify exactly matching patterns. In the template match mode, templates are matched instead of exact code patterns. Templates are created by masking certain elements of a code pattern with the keyword "ANY" that signifies a "don't care" condition for a specific attribute. This don't care condition could be applied to a type of a variable, a data type, an upper or lower bound of a loop, a sequence of statements and so on. As a more general case, the pattern generator can optionally ignore all variable names, data types and other syntactic attributes from an input source code sequence.
An example of a find and replace code sequence is as follows: FindO{ Node n; n->index += 10; if (n->index > MAX IND) error("Index Exceeded"); Replace() { Node n; 1NCRINDEX(n) When the above find and replace code sequence is applied to the following source code: Node var, t var->index += 10; if (var-> index > MAX IND)mp error(" Index Exceeded"); tmp->mdex += 10 if (tmp->index += 10; error("Index Exceed"); then the result is as follows: Node var, tmp; 1NCRINDEX(var); INCRINDEX(tmp); In the above example, the Find function has been applied as a template to the code fragment and resulted in each equivalent sequence in the source code being substituted for a call to a standard function for carrying out the task. This standard function may be in a public library of functions. While the above examples are provided in pseudo code to aid understanding, it will be understood that the code would be represented internally in the compiler system 100 in the SU1F2 JR.
The operation of the compiler system 100 will now be described with reference to the flow chart in figure 2. At step 201, the user inputs the source software code sequences 111 and the associated substitutions 115 in the form of Find and Replace statements as noted above.
Once this process is complete, the processing moves to step 203 as the user initiates the pattern generation process and the source code sequences 111 are converted either into exact match or template patterns 113. Processing then moves to step 205 where the patterns 113 are linked to their associated JR code substitutions 115. At step 207, the search process for matching patterns in the IR is initiated and a list of matches created.
Processing then moves to step 209 where each entry in the list of matches is processed in turn. If a match is an exact match then processing moves to 211 where the associated code substitution is retrieved and the matched code sequence in the JR replaced. Processing then moves to step 213 where the list is consulted to see if any more matches have been found. If not processing ends at step 215. If further matches have been found then processing returns to step 209.
If at step 209 a template match has been found then processing moves to step 217. In step 217, the template is populated with the variable, type definitions and any other applicable data before being inserted into the JR in place of the matched code sequence. Processing then moves to step 213 and processing continues as described above. Once all of the matches have been processed and the relevant code substitutions completed, the JR can be processed by the compiler 101 to produce the object code 105.
The compiler system 100 uses source code pattern matching technology in a multi-language compiler infrastructure. This framework is both platform independent and language independent and is an extendable and customisable developer tool. It can also be used for solution development, migrating applications between platforms and to improve application performance by using performance code sequence libraries such as those supplied by chip vendors.
The standard compiler is effectively given access to a library of patterns that are annotated intermediate representations (graphs) of various code fragments (patterns) with a corresponding or associated action (substitution) to be taken by the compiler so that automatic code modifications are performed on the input source code. The find and replace patterns are bundled and specified by the programmer for a particular software reengineering purpose. A useful application of the system is in the identification of usage opportunities of the Integrated Performance Primitives (IPP) and math kernel routines (IntelTM Programming Primitives) in legacy and simple applications, which is known to increase the application performance significantly on some platforms.
In the above embodiment, a matched code sequence results in the substitution of another code sequence. In another embodiment, the substituted code is different from the code from which the pattern is derived. In other words, the pattern and the substituted code are created from separate source code sequences. While code substitution is one form of compiler action whenever a pattern match instance is identified, the system also supports configuration of other types of compiler actions. The user can input a specific function/code that can be directly executed by the compiler on a pattern match. In this case, the source code may not be modified or substituted at all. In a further embodiment, multiple intermediate code patterns may be derived from a single specified pattern code. This caters for multiple ways of writing a program for a given specification. In yet a further embodiment, a pattern being matched in the IR results in a secondary process being carried out either in addition to or instead of any associated code substitution. The secondary process may be the insertion of instructions into the JR for the compiler or a message generation process.
It will be understood by those skilled in the art that the apparatus that embodies a part or all of the present invention may be a general purpose device having software arranged to provide a part or all of an embodiment of the invention. The device could be single device or a group of devices and the software could be a single program or a set of programs.
Furthermore, any or all of the software used to implement the invention can be communicated via various transmission or storage means such as computer network, floppy disc, CD-ROM or magnetic tape so that the software can be loaded onto one or more devices.
While the present invention has been illustrated by the description of the embodiments thereof, and while the embodiments have been described in considerable detail, it is not the intention of the applicant to restrict or in any way limit the scope of the appended claims to such detail. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details representative apparatus and method, and illustrative examples shown and described.
Accordingly, departures may be made from such details without departure from the spirit or scope of applicant's general inventive concept.

Claims (24)

  1. Claims 1. A method of compiling software comprising the steps of: a)
    converting source code into an intennediate representation; b) creating a code pattern in said intermediate representation, said pattern being associated with a portion of alternative code; c) searching said source code in said intermediate representation for a match with said pattern; and d) if said pattern matches a portion of said source then substituting that portion with said associated alternative code in said source code.
  2. 2. A method according to claim 1 in which in step b) a set of patterns is created, each said pattern associated with a portion of alternative code and in step c) the intermediate code is searched for each of said patterns.
  3. 3. A method according to claim 1 in which said alternative code is a call to a library of code sequences.
  4. 4. A method according to claim 1 in which a finite state machine is used for said searching for said match.
  5. 5. A method according to claim I in which said pattern is matched if it exactly corresponds to a portion of said source code.
  6. 6. A method according to claim 1 in which said pattern is a template which matches selected elements of a portion of said source code.
  7. 7. A method according to claim 1 in which said pattern is created in source code and converted into said intermediate representation for use in said method by said compiler.
  8. 8. A method according to claim 1 in which said pattern and said alternative code are created from the same source code sequence.
  9. 9. A method according to claim 1 in which said pattern is generated from a code sequence from a library of such sequences.
  10. 10. A method according to claim 1 in which said pattern is generated from a secondary representation of a code sequence from a library of such sequences.
  11. 11. A method according to claim 1 in which said pattern and said alternative code are created from the same source code sequence.
  12. 12. Apparatus for compiling software comprising: a parser operable to convert source code into an intermediate representation; a pattern generator operable to create a code pattern in said intermediate representation, said pattern being associated with a portion of alternative code; and a pattern matcher operable to search said source code in said intermediate representation for a match with said pattern and if said pattern matches a portion of said source then further operable to substitute that portion with said associated alternative code in said source code.
  13. 13. Apparatus according to claim 12 in which a set of patterns is created, each said pattern associated with a portion of alternative code and said pattern matcher is operable to search said intermediate code is searched for each of said patterns.
  14. 14. Apparatus according to claim 12 in which said alternative code is a call to a library of code sequences.
  15. 15. Apparatus according to claim 12 in which a finite state machine is used by said pattern matcher for said searching for said match.
  16. 16. Apparatus according to claim 12 in which said pattern is matched if it exactly corresponds to a portion of said source code.
  17. 17. Apparatus according to claim 12 in which said pattern is a template which matches selected elements of a portion of said source code.
  18. 18. Apparatus according to claim 12 in which said pattern is created in source code and converted into said intermediate representation for use by said pattern matcher.
  19. 19. Apparatus according to claim 12 in which said pattern and said alternative code are created from the same source code sequence.
  20. 20. Apparatus according to claim 12 in which said pattern is generated from a code sequence from a library of such sequences.
  21. 21. Apparatus according to claim 12 in which said pattern is generated from a secondary representation of a code sequence from a library of such sequences.
  22. 22. Apparatus according to claim 12 in which said pattern and said alternative code are created from the same source code sequence.
  23. 23. A computer program or group of computer program arranged to enable a computer or group of computer to carry out the method of claim 1.
  24. 24. A computer program or group of computer programs arranged to enable a computer or group of computers to provide the apparatus of claim 12.
GB0425878A 2004-11-24 2004-11-24 Method of substituting code fragments in Internal Representation Withdrawn GB2420638A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0425878A GB2420638A (en) 2004-11-24 2004-11-24 Method of substituting code fragments in Internal Representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0425878A GB2420638A (en) 2004-11-24 2004-11-24 Method of substituting code fragments in Internal Representation

Publications (2)

Publication Number Publication Date
GB0425878D0 GB0425878D0 (en) 2004-12-29
GB2420638A true GB2420638A (en) 2006-05-31

Family

ID=33561317

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0425878A Withdrawn GB2420638A (en) 2004-11-24 2004-11-24 Method of substituting code fragments in Internal Representation

Country Status (1)

Country Link
GB (1) GB2420638A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2435703A (en) * 2006-03-01 2007-09-05 Symbian Software Ltd Reducing the length of program code by identifying and removing repeated instruction sequences
EP2553571A1 (en) * 2010-03-31 2013-02-06 Irdeto Canada Corporation A system and method for encapsulating and enabling protection through diverse variations in software libraries
WO2017095720A1 (en) * 2015-12-04 2017-06-08 Microsoft Technology Licensing, Llc Techniques to identify idiomatic code in a code base
WO2018129327A1 (en) * 2017-01-06 2018-07-12 Google Llc Loop and library fusion
CN109564518A (en) * 2016-09-09 2019-04-02 欧姆龙株式会社 Executable program creating device, executable program creation method and executable program create program
WO2019075390A1 (en) * 2017-10-12 2019-04-18 Versata Development Group, Inc. Blackbox matching engine
US20220121677A1 (en) * 2019-06-25 2022-04-21 Sisense Sf, Inc. Method for automated query language expansion and indexing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6195792B1 (en) * 1998-02-19 2001-02-27 Nortel Networks Limited Software upgrades by conversion automation
US20020129347A1 (en) * 2001-01-10 2002-09-12 International Business Machines Corporation Dependency specification using target patterns
EP1308838A2 (en) * 2001-10-31 2003-05-07 Aplix Corporation Intermediate code preprocessing apparatus, intermediate code execution apparatus, intermediate code execution system, and computer program product for preprocessing or executing intermediate code

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6195792B1 (en) * 1998-02-19 2001-02-27 Nortel Networks Limited Software upgrades by conversion automation
US20020129347A1 (en) * 2001-01-10 2002-09-12 International Business Machines Corporation Dependency specification using target patterns
EP1308838A2 (en) * 2001-10-31 2003-05-07 Aplix Corporation Intermediate code preprocessing apparatus, intermediate code execution apparatus, intermediate code execution system, and computer program product for preprocessing or executing intermediate code

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2435703A (en) * 2006-03-01 2007-09-05 Symbian Software Ltd Reducing the length of program code by identifying and removing repeated instruction sequences
EP2553571A1 (en) * 2010-03-31 2013-02-06 Irdeto Canada Corporation A system and method for encapsulating and enabling protection through diverse variations in software libraries
EP2553571A4 (en) * 2010-03-31 2014-05-07 Irdeto Canada Corp A system and method for encapsulating and enabling protection through diverse variations in software libraries
EP3812894A1 (en) * 2010-03-31 2021-04-28 Irdeto B.V. A system and method for encapsulating and enabling protection through diverse variations in software libraries
US9892272B2 (en) 2010-03-31 2018-02-13 Irdeto B.V. System and method for encapsulating and enabling protection through diverse variations in software libraries
US10185837B2 (en) 2010-03-31 2019-01-22 Irdeto B.V. System and method for encapsulating and enabling protection through diverse variations in software libraries
US10042740B2 (en) 2015-12-04 2018-08-07 Microsoft Technology Licensing, Llc Techniques to identify idiomatic code in a code base
WO2017095720A1 (en) * 2015-12-04 2017-06-08 Microsoft Technology Licensing, Llc Techniques to identify idiomatic code in a code base
CN109564518A (en) * 2016-09-09 2019-04-02 欧姆龙株式会社 Executable program creating device, executable program creation method and executable program create program
EP3511827A4 (en) * 2016-09-09 2019-12-25 Omron Corporation Executable program creation device, executable program creation method, and executable program creation program
WO2018129327A1 (en) * 2017-01-06 2018-07-12 Google Llc Loop and library fusion
WO2019075390A1 (en) * 2017-10-12 2019-04-18 Versata Development Group, Inc. Blackbox matching engine
US10782946B2 (en) 2017-10-12 2020-09-22 Devfactory Innovations Fz-Llc Blackbox matching engine
US20220121677A1 (en) * 2019-06-25 2022-04-21 Sisense Sf, Inc. Method for automated query language expansion and indexing
US11954113B2 (en) * 2019-06-25 2024-04-09 Sisense Sf, Inc. Method for automated query language expansion and indexing

Also Published As

Publication number Publication date
GB0425878D0 (en) 2004-12-29

Similar Documents

Publication Publication Date Title
US10698682B1 (en) Computerized software development environment with a software database containing atomic expressions
US5854932A (en) Compiler and method for avoiding unnecessary recompilation
KR101150003B1 (en) Software development infrastructure
US5313387A (en) Re-execution of edit-compile-run cycles for changed lines of source code, with storage of associated data in buffers
US5193191A (en) Incremental linking in source-code development system
US5325531A (en) Compiler using clean lines table with entries indicating unchanged text lines for incrementally compiling only changed source text lines
US5201050A (en) Line-skip compiler for source-code development system
Yi POET: a scripting language for applying parameterized source‐to‐source program transformations
US6611946B1 (en) Method and system for automatic generation of DRC rules with just in time definition of derived layers
US20040158820A1 (en) System for generating an application framework and components
US5812855A (en) System and method for constaint propagation cloning for unknown edges in IPA
US11579856B2 (en) Multi-chip compatible compiling method and device
WO2016163901A1 (en) An apparatus for processing an abstract syntax tree being associated with a source code of a source program
Cann The optimizing SISAL compiler: version 12.0
JP5147240B2 (en) Method and system for reversible design tree transformation
US11294665B1 (en) Computerized software version control with a software database and a human database
US20140298290A1 (en) Identification of code changes using language syntax and changeset data
US6009273A (en) Method for conversion of a variable argument routine to a fixed argument routine
Faith et al. KHEPERA: A System for Rapid Implementation of Domain Specific Languages.
Cordy et al. Practical metaprogramming
Boshernitsan Harmonia: A flexible framework for constructing interactive language-based programming tools
AU638999B2 (en) Incremental compiler for source-code development system
GB2420638A (en) Method of substituting code fragments in Internal Representation
Fauth et al. Global code selection for directed acyclic graphs
Lengyel et al. Implementing an OCL Compiler for .NET

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)