WO2018236384A1 - Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code - Google Patents

Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code Download PDF

Info

Publication number
WO2018236384A1
WO2018236384A1 PCT/US2017/038825 US2017038825W WO2018236384A1 WO 2018236384 A1 WO2018236384 A1 WO 2018236384A1 US 2017038825 W US2017038825 W US 2017038825W WO 2018236384 A1 WO2018236384 A1 WO 2018236384A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
dfsm
language
nfsm
generating
Prior art date
Application number
PCT/US2017/038825
Other languages
French (fr)
Inventor
Daniel Joseph Bentley Kluss
Original Assignee
Archeo Futurus, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Archeo Futurus, Inc. filed Critical Archeo Futurus, Inc.
Priority to PCT/US2017/038825 priority Critical patent/WO2018236384A1/en
Publication of WO2018236384A1 publication Critical patent/WO2018236384A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/51Source to source
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Definitions

  • This disclosure relates generally to data processing and, more specifically, to methods and systems for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code.
  • a computer can be defined as a device configured to automatically perform a set of logical or arithmetic operations.
  • Such devices include mechanical and electromechanical computers, analog computers, vacuum tubes and digital electronic circuits, transistors, and integrated circuits, and the like.
  • computing platforms ranging from portable mobile computers to supercomputer systems. This has resulted in a tremendous amount of code written for different computer platforms.
  • Embodiments disclosed herein are directed to methods and systems for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code.
  • a method includes acquiring a first code, with the first code being written in a first language.
  • the method may include generating, based on the first code, a first deterministic finite state machine (DFSM).
  • the method may include optimizing the first DFSM to obtain a second DFSM.
  • the method may also include generating, based on the second DFSM, a second code, with the second code being written in a second language.
  • DFSM deterministic finite state machine
  • generating the first DFSM includes generating, based on the first code, a non-deterministic finite state machine (NFSM) and converting the NFSM to the first DFSM.
  • NFSM non-deterministic finite state machine
  • generating the NFSM includes parsing, based on a first grammar associated with the first language, the first code to obtain an abstract syntax tree (AST) and converting the AST to the NFSM.
  • AST abstract syntax tree
  • generating the second code includes converting the second DFSM to a NFSM and generating, based on the NFSM, the second code. In certain embodiments, generating the second code includes converting the NFSM to an AST and recompiling, based on a second grammar associated with the second language, the AST to the second code.
  • optimizing the first DFSM is performed to minimize a number of states in the second DFSM.
  • Optimizing the first DFSM includes twice reversing the first DFSM to an NFSM.
  • the first language or the second language includes a programming language such as one of the following: JavaScript, C, C++, Perl, C#, PHP, Python, an assembly language, and so forth.
  • the first language or the second language includes a presentation language (e.g., Hypertext Markup Language (HTML), Extensible Markup Language (XML), and so forth).
  • the first language or the second language includes a style sheet language, for example, Cascading Style Sheets (CSS).
  • the first language or second language includes Hardware Description Language (HDL) or bits native to a field- programmable gate array.
  • HDL Hardware Description Language
  • the method may include generating the second code and performing, based on a formal specification, a formal verification of the second DFSM.
  • the first language or the second language may include a binary assembly executable by a processor.
  • the steps of the method for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code are stored on a machine- readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
  • FIG. 1 is a block diagram showing a system for compiling source code, according to some example embodiments.
  • FIG. 2 is a block diagram showing an example system for processing of a Hypertext Transfer Protocol (HTTP) request, according to an example embodiment.
  • HTTP Hypertext Transfer Protocol
  • FIG. 3 is a process flow diagram showing a method for compiling source code, according to an example embodiment.
  • FIG. 4 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
  • the technology described herein allows translating a source code from one programming language to another programming language.
  • Some embodiments of the present disclosure may allow optimizing source code in terms of a number of states of a DFSM.
  • Some embodiments of the present disclosure may facilitate optimizing the source code according to requirements of a hardware description.
  • Embodiments of the present disclosure may allow reducing or eliminating a source code's security vulnerabilities, including buffer overflows, stacking overflow, memory leaks, uninitialized data, and so forth.
  • the method for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code is disclosed.
  • the method may include acquiring a first code is disclosed, with the first code being written in a first language.
  • the method may include generating, based on the first code, a first DFSM.
  • the method may also include optimizing the first DFSM to obtain a second DFSM.
  • the method may also include generating, based on the second DFSM, a second code, the second code being written in a second language.
  • FIG. 1 is a block diagram showing an example system 100 for compiling source code, according to some example embodiments.
  • the example system 100 may include a parsing expression grammar (PEG) module 110, a converter 120 between AST and NFSM, a converter 130 between NFSM and DFSM, and an optimizer 140.
  • PEG parsing expression grammar
  • the system 100 can be implemented with a computer system. An example computer system is described below with reference to FIG. 4.
  • the PEG module 110 may be configured to receive an input code 105.
  • the input code 105 may be written in an input programming language.
  • the input programming language may be associated with a grammar 170.
  • the grammar 170 may be determined by an augmented Backus-Naur Form (ABNF).
  • ABNF augmented Backus-Naur Form
  • the PEG module may be configured to convert the input code 105 into an AST 115 based on the grammar 170.
  • the AST 115 may be further provided to converter 120.
  • the converter 120 may be configured to transform the AST 115 into NFSM 125. Thereafter, NFSM 125 may be provided to the converter 130. The converter 130 may be configured to translate the NFSM 125 into DFSM 135. The DFSM 135 can be provided to optimizer 140.
  • optimizer 140 may be configured to optimize the DFSM 135 to obtain a DFSM 145.
  • the optimization may include minimizing a number of states in the DFSM 135.
  • optimization can be performed by an implication chart method, Hopcroft's algorithm, Moore reduction procedure, Brzozowski's algorithm, and other techniques.
  • Brzozowski's algorithm includes reversing the edges of a DFSM to produce a NFSM, and converting this NFSM to a DFSM using a standard powerset construction by constructing only the reachable states of the converted DFSM. Repeating the reversing a second time produces a DFSM with a provable minimum of number of states in the DFSM.
  • the DFSM 145 which is an optimized DFSM 135, can be further provided to converter 130.
  • the converter 130 may be configured to translate the DFSM 145 into a NFSM 150.
  • the NFSM 150 may be further provided to converter 120.
  • the converter 120 may be configured to translate the NFSM 150 into an AST 155.
  • the AST 155 may be further provided to PEG module 110.
  • the PEG module 110 may be configured to convert the AST 155 into output code 160 based on a grammar 180.
  • the grammar 180 may specify an output programming language.
  • the input languages or output languages may include one of high level programming languages, such as but not limited to C, C++, C#, JavaScript, PHP, Python, Perl, and the like.
  • the input code or output source code can be optimized to run on various hardware platforms like Advanced RISC Machine (ARM), x86-64, graphics processing unit (GPU), a field-programmable gate array (FPGA), or a custom application-specific integrated circuit (ASIC).
  • the input code or source code can be optimized to run on various operational systems and platforms, such as Linux, Windows, Mac OS, Android, iOS, OpenCL/CUDA, bare metal, FPGA, and a custom ASIC.
  • the output programming language can be the same as the input programming languages.
  • the system 100 can be used to optimize the input code 105 by converting the input code 105 to the DSFM 135, optimizing the DFSM 135 in terms of number of states, and converting the optimized DFSM 135 to output code 160 in the original programming language.
  • the input programming language may include a domain specific language (DSL) which is determined by a strict grammar (i.e., ABNF).
  • DSL domain specific language
  • ABNF strict grammar
  • the system 100 may be used to convert documents written in a DSL to an output code 160 written in a high-level programming language or a code written in a low level programming language.
  • input code 105 or output code 160 may include CSS.
  • the system 100 may further include a database.
  • the database may be configured to store frequently occurring patterns in the input code written in specific programming languages and parts of optimized DFSM corresponding to the frequently occurring patterns.
  • the system 100 may include an additional module for looking up a specific pattern of the input code 105 in the database.
  • system 100 may be configured to substitute the specific pattern with the corresponding part of DFSM directly, and by skipping steps for converting the specific pattern to the AST and generating the NFSM and the DFSM.
  • the input code or output code may include a binary assembly executable by a processor.
  • the input code 105 or output code 160 may be written in a HDL, such as SystemC, Verilog, and Very High Speed Integrated Circuits Hardware Description Language (VHDL).
  • the input code 105 or output code 160 may include bits native to the FPGA as programmed using Joint Test Action Group (JTAG) standards.
  • JTAG Joint Test Action Group
  • DFSM 135 can be optimized using a constraint solver.
  • the constraint solver may include some requirements on a hardware platform described by the HDL.
  • the requirements may include requirements for a runtime, power usage, and cost of the hardware platform.
  • the optimization of the DFSM 135 can be carried out to satisfy one of the restrictions of the requirements.
  • the optimization of the DFSM may be performed to satisfy several requirement restrictions with weights assigned to each of the restrictions.
  • the DFSM 135 may be formally verified in accordance with a formal specification to detect software- related security vulnerabilities, including but not limited to, memory leak, division-by- zero, out-of-bounds array access, and others.
  • the input source can be written in terms of a technical specification.
  • An example technical specification can include a Request for Comments (RFC).
  • the technical specification may be associated with a specific grammar. Using the specific grammar, the input code, written in terms of the technical specification, can be translated into the AST 115 and further into the DFSM 135.
  • the DFSM 135 can be optimized using a constraint solver. The constraint solver may include restrictions described in the technical specification.
  • FIG. 2 is a block diagram showing an example system 200 for processing of HTTP requests, according to an example embodiment.
  • the system 200 may include a client 210, the system 100 for compiling source codes, and a FPGA 240.
  • the system 100 may be configured to receive a RFC 105 for Internet protocol (IP), Transmission Control Protocol (TCP), and HTTP.
  • the system 100 may be configured to program the RFC into a VHDL code, and, in turn, compile the VHDL code into bits 235 native to FPGA 240.
  • the FPGA 240 may be programmed with bits 235.
  • the FPGA includes a finite state machine, FSM 225, corresponding to bits 235.
  • the bits 235 may be stored in a flash memory and the FPGA 235 may be configured to request bits 235 from the flash memory upon startup.
  • the client 210 may be configured to send a HTTP request 215 to the FPGA 240.
  • the HTTP request 215 can be read by the FPGA 240.
  • the FSM 225 may be configured to recognize the HTTP request 215 and return an HTTP response 245 corresponding to the HTTP request 215 back to the client 210.
  • the FGPA 240 may include a fabric of FSM 250-260 to keep customers' application logics for recognizing different HTTP requests and providing different HTTP responses.
  • FIG. 3 is a process flow diagram showing a method 300 for compiling source codes, according to an example embodiment.
  • the method 300 can be implemented with a computer system. An example computer system is described below with reference to FIG. 4.
  • the method 300 may commence, in block 302, with acquiring a first code, the first code being written in a first language.
  • method 300 may include parsing, based on a first grammar associated with the first language, the first code to obtain a first AST.
  • the method 300 may include converting the first AST to a NFSM.
  • the method 300 may include converting the first NFSM to a first DFSM.
  • the method 300 may include optimizing the first DFSM to obtain the second DFSM.
  • the method may include converting the second DFSM to a second NFSM.
  • the method 300 may include converting the second NFSM to a second AST.
  • the method 300 may include recompiling, based on a second grammar associated with a second language, the AST into the second code, the second code being written in the second language.
  • FIG. 4 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system 400, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
  • the machine operates as a standalone device or can be connected (e.g., networked) to other machines.
  • the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine can be a server, a personal computer (PC), a tablet PC, a set-top box (STB), a PDA, a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as a Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • a portable music player e.g., a portable hard drive audio device, such as a Moving Picture Experts Group Audio Layer 3 (MP3) player
  • MP3 Moving Picture Experts Group Audio Layer 3
  • a web appliance e.g., a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • MP3 Moving Picture Experts Group Audio Layer 3
  • the example computer system 400 includes a processor or multiple processors 402, a hard disk drive 404, a main memory 406, and a static memory 408, which
  • the computer system 400 may also include a network interface device 412.
  • the hard disk drive 404 may include a computer-readable medium 420, which stores one or more sets of instructions 422 embodying or utilized by any one or more of the methodologies or functions described herein.
  • the instructions 422 can also reside, completely or at least partially, within the main memory 406 and/or within the processors 402 during execution thereof by the computer system 400.
  • the main memory 406 and the processors 402 also constitute machine-readable media.
  • the term "computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
  • the term "computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the
  • Computer-readable medium shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, RAM, ROM, and the like.
  • the exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware.
  • the computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems.
  • computer software programs for implementing the present method can be written in any number of suitable programming languages such as, for example, C, Python, Javascript, Go, or other compilers, assemblers, interpreters or other computer languages or platforms.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Methods and systems for compiling codes from programming languages into programming languages are disclosed. An example method may include acquiring a first code written in a first language. The method allows generating, based on the first code, a first deterministic finite state machine (DFSM). The method includes optimizing the first DFSM to obtain a second DFSM. The method includes generating, based on the second DFSM, a second code. The second code can be written in a second language. Generating the first DFSM includes parsing the first code into a first abstract syntax tree (AST), translating the first AST into a first non-deterministic finite state machine (NFSM), and converting the first NFSM into the first DFSM. Generating the second code includes translating the second DFSM into a second NFSM, translating the second NFSM into a second AST, and recompiling the second AST into a second code.

Description

COMPILING AND OPTIMIZING A COMPUTER CODE BY MINIMIZING A NUMBER OF STATES IN A FINITE MACHINE CORRESPONDING TO THE
COMPUTER CODE
TECHNICAL FIELD
[0001] This disclosure relates generally to data processing and, more specifically, to methods and systems for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code.
BACKGROUND
[0002] The approaches described in this section could be pursued but are not
necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
[0003] A computer can be defined as a device configured to automatically perform a set of logical or arithmetic operations. Such devices include mechanical and electromechanical computers, analog computers, vacuum tubes and digital electronic circuits, transistors, and integrated circuits, and the like. There has been a variety of computing platforms, ranging from portable mobile computers to supercomputer systems. This has resulted in a tremendous amount of code written for different computer platforms.
[0004] Translating code from one computing platform to another computing platform is a challenging task. One of the translation issues needing to be addressed is that code written for one computing platform may not satisfy limits of another computing platform in terms of memory, run-time, and power usage. Another issue is that the code translated from one computing platform to another computing platform may include software- related security vulnerabilities, like memory leaks, buffer overflows, and so forth. SUMMARY
[0005] This summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
[0006] Embodiments disclosed herein are directed to methods and systems for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code. According to an example embodiment, a method includes acquiring a first code, with the first code being written in a first language. The method may include generating, based on the first code, a first deterministic finite state machine (DFSM). The method may include optimizing the first DFSM to obtain a second DFSM. The method may also include generating, based on the second DFSM, a second code, with the second code being written in a second language.
[0007] In some embodiments, generating the first DFSM includes generating, based on the first code, a non-deterministic finite state machine (NFSM) and converting the NFSM to the first DFSM. In certain embodiments, generating the NFSM includes parsing, based on a first grammar associated with the first language, the first code to obtain an abstract syntax tree (AST) and converting the AST to the NFSM.
[0008] In some embodiments, generating the second code includes converting the second DFSM to a NFSM and generating, based on the NFSM, the second code. In certain embodiments, generating the second code includes converting the NFSM to an AST and recompiling, based on a second grammar associated with the second language, the AST to the second code.
[0009] In some embodiments, optimizing the first DFSM is performed to minimize a number of states in the second DFSM. Optimizing the first DFSM includes twice reversing the first DFSM to an NFSM.
[0010] In some embodiments, the first language or the second language includes a programming language such as one of the following: JavaScript, C, C++, Perl, C#, PHP, Python, an assembly language, and so forth. In certain embodiments, the first language or the second language includes a presentation language (e.g., Hypertext Markup Language (HTML), Extensible Markup Language (XML), and so forth). In some embodiments, the first language or the second language includes a style sheet language, for example, Cascading Style Sheets (CSS). In some embodiments, the first language or second language includes Hardware Description Language (HDL) or bits native to a field- programmable gate array.
[0011] In some embodiments, the method may include generating the second code and performing, based on a formal specification, a formal verification of the second DFSM. In further embodiments, the first language or the second language may include a binary assembly executable by a processor.
[0012] According to another example embodiment of the present disclosure, the steps of the method for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code are stored on a machine- readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
[0014] FIG. 1 is a block diagram showing a system for compiling source code, according to some example embodiments.
[0015] FIG. 2 is a block diagram showing an example system for processing of a Hypertext Transfer Protocol (HTTP) request, according to an example embodiment.
[0016] FIG. 3 is a process flow diagram showing a method for compiling source code, according to an example embodiment.
[0017] FIG. 4 shows a diagrammatic representation of a computing device for a machine in the example electronic form of a computer system, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed.
DETAILED DESCRIPTION
[0018] The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with exemplary embodiments. These exemplary embodiments, which are also referred to herein as "examples," are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
[0019] The technology described herein allows translating a source code from one programming language to another programming language. Some embodiments of the present disclosure may allow optimizing source code in terms of a number of states of a DFSM. Some embodiments of the present disclosure may facilitate optimizing the source code according to requirements of a hardware description. Embodiments of the present disclosure may allow reducing or eliminating a source code's security vulnerabilities, including buffer overflows, stacking overflow, memory leaks, uninitialized data, and so forth.
[0020] According to an example embodiment, the method for compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code is disclosed. The method may include acquiring a first code is disclosed, with the first code being written in a first language. The method may include generating, based on the first code, a first DFSM. The method may also include optimizing the first DFSM to obtain a second DFSM. The method may also include generating, based on the second DFSM, a second code, the second code being written in a second language.
[0021] FIG. 1 is a block diagram showing an example system 100 for compiling source code, according to some example embodiments. The example system 100 may include a parsing expression grammar (PEG) module 110, a converter 120 between AST and NFSM, a converter 130 between NFSM and DFSM, and an optimizer 140. The system 100 can be implemented with a computer system. An example computer system is described below with reference to FIG. 4.
[0022] In some embodiments of the present disclosure, the PEG module 110 may be configured to receive an input code 105. In some embodiments, the input code 105 may be written in an input programming language. The input programming language may be associated with a grammar 170. In some embodiments, the grammar 170 may be determined by an augmented Backus-Naur Form (ABNF). The PEG module may be configured to convert the input code 105 into an AST 115 based on the grammar 170. The AST 115 may be further provided to converter 120.
[0023] In some embodiments of the disclosure, the converter 120 may be configured to transform the AST 115 into NFSM 125. Thereafter, NFSM 125 may be provided to the converter 130. The converter 130 may be configured to translate the NFSM 125 into DFSM 135. The DFSM 135 can be provided to optimizer 140.
[0024] In some embodiments, optimizer 140 may be configured to optimize the DFSM 135 to obtain a DFSM 145. In some embodiments, the optimization may include minimizing a number of states in the DFSM 135. In various embodiments, optimization can be performed by an implication chart method, Hopcroft's algorithm, Moore reduction procedure, Brzozowski's algorithm, and other techniques. Brzozowski's algorithm includes reversing the edges of a DFSM to produce a NFSM, and converting this NFSM to a DFSM using a standard powerset construction by constructing only the reachable states of the converted DFSM. Repeating the reversing a second time produces a DFSM with a provable minimum of number of states in the DFSM.
[0025] In some embodiments, the DFSM 145, which is an optimized DFSM 135, can be further provided to converter 130. The converter 130 may be configured to translate the DFSM 145 into a NFSM 150. The NFSM 150 may be further provided to converter 120. The converter 120 may be configured to translate the NFSM 150 into an AST 155. The AST 155 may be further provided to PEG module 110.
[0026] In some embodiments, the PEG module 110 may be configured to convert the AST 155 into output code 160 based on a grammar 180. The grammar 180 may specify an output programming language.
[0027] In some embodiments, the input languages or output languages may include one of high level programming languages, such as but not limited to C, C++, C#, JavaScript, PHP, Python, Perl, and the like. In various embodiments, the input code or output source code can be optimized to run on various hardware platforms like Advanced RISC Machine (ARM), x86-64, graphics processing unit (GPU), a field-programmable gate array (FPGA), or a custom application-specific integrated circuit (ASIC). In various embodiments, the input code or source code can be optimized to run on various operational systems and platforms, such as Linux, Windows, Mac OS, Android, iOS, OpenCL/CUDA, bare metal, FPGA, and a custom ASIC.
[0028] In certain embodiments, the output programming language can be the same as the input programming languages. In these embodiments, the system 100 can be used to optimize the input code 105 by converting the input code 105 to the DSFM 135, optimizing the DFSM 135 in terms of number of states, and converting the optimized DFSM 135 to output code 160 in the original programming language.
[0029] In some other embodiments, the input programming language may include a domain specific language (DSL) which is determined by a strict grammar (i.e., ABNF). In these embodiments, the system 100 may be used to convert documents written in a DSL to an output code 160 written in a high-level programming language or a code written in a low level programming language. In certain embodiments, input code 105 or output code
160 can be written in a presentation language, including, but not limited to, HTML, XML, and XHTML. In some embodiments, input code 105 or output code 160 may include CSS. [0030] In some embodiments, the system 100 may further include a database. The database may be configured to store frequently occurring patterns in the input code written in specific programming languages and parts of optimized DFSM corresponding to the frequently occurring patterns. In these embodiments, the system 100 may include an additional module for looking up a specific pattern of the input code 105 in the database. If the database includes an entry containing a specific pattern and corresponding parts of DFSM, then system 100 may be configured to substitute the specific pattern with the corresponding part of DFSM directly, and by skipping steps for converting the specific pattern to the AST and generating the NFSM and the DFSM.
[0031] In some embodiments, the input code or output code may include a binary assembly executable by a processor.
[0032] In some embodiments, the input code 105 or output code 160 may be written in a HDL, such as SystemC, Verilog, and Very High Speed Integrated Circuits Hardware Description Language (VHDL). The input code 105 or output code 160 may include bits native to the FPGA as programmed using Joint Test Action Group (JTAG) standards. In certain embodiments, DFSM 135 can be optimized using a constraint solver. The constraint solver may include some requirements on a hardware platform described by the HDL. For example, the requirements may include requirements for a runtime, power usage, and cost of the hardware platform. The optimization of the DFSM 135 can be carried out to satisfy one of the restrictions of the requirements. In certain embodiments, the optimization of the DFSM may be performed to satisfy several requirement restrictions with weights assigned to each of the restrictions. In some embodiments, the DFSM 135 may be formally verified in accordance with a formal specification to detect software- related security vulnerabilities, including but not limited to, memory leak, division-by- zero, out-of-bounds array access, and others.
[0033] In certain embodiments, the input source can be written in terms of a technical specification. An example technical specification can include a Request for Comments (RFC). In some embodiments, the technical specification may be associated with a specific grammar. Using the specific grammar, the input code, written in terms of the technical specification, can be translated into the AST 115 and further into the DFSM 135. In some embodiments, the DFSM 135 can be optimized using a constraint solver. The constraint solver may include restrictions described in the technical specification.
[0034] FIG. 2 is a block diagram showing an example system 200 for processing of HTTP requests, according to an example embodiment. The system 200 may include a client 210, the system 100 for compiling source codes, and a FPGA 240.
[0035] In certain embodiments, the system 100 may be configured to receive a RFC 105 for Internet protocol (IP), Transmission Control Protocol (TCP), and HTTP. The system 100 may be configured to program the RFC into a VHDL code, and, in turn, compile the VHDL code into bits 235 native to FPGA 240. The FPGA 240 may be programmed with bits 235. In an example illustrated by FIG. 2, the FPGA includes a finite state machine, FSM 225, corresponding to bits 235. In other embodiments, the bits 235 may be stored in a flash memory and the FPGA 235 may be configured to request bits 235 from the flash memory upon startup.
[0036] In some embodiments, the client 210 may be configured to send a HTTP request 215 to the FPGA 240. In some embodiments, the HTTP request 215 can be read by the FPGA 240. The FSM 225 may be configured to recognize the HTTP request 215 and return an HTTP response 245 corresponding to the HTTP request 215 back to the client 210. In certain embodiments, the FGPA 240 may include a fabric of FSM 250-260 to keep customers' application logics for recognizing different HTTP requests and providing different HTTP responses.
[0037] The system 200 may be an improvement over conventional HTTP servers, since the system 200 does not require large computing resources and maintenance of software for treatment of HTTP requests. The system FPGA 240 does not need to be physically large and requires a smaller amount of power than conventional HTTP servers. [0038] FIG. 3 is a process flow diagram showing a method 300 for compiling source codes, according to an example embodiment. The method 300 can be implemented with a computer system. An example computer system is described below with reference to FIG. 4.
[0039] The method 300 may commence, in block 302, with acquiring a first code, the first code being written in a first language. In block 304, method 300 may include parsing, based on a first grammar associated with the first language, the first code to obtain a first AST. In block 306, the method 300 may include converting the first AST to a NFSM. In block 308, the method 300 may include converting the first NFSM to a first DFSM. In block 310, the method 300 may include optimizing the first DFSM to obtain the second DFSM. In block 312, the method may include converting the second DFSM to a second NFSM. In block 314, the method 300 may include converting the second NFSM to a second AST. In block 316, the method 300 may include recompiling, based on a second grammar associated with a second language, the AST into the second code, the second code being written in the second language.
[0040] FIG. 4 shows a diagrammatic representation of a computing device for a machine in the exemplary electronic form of a computer system 400, within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein can be executed. In various exemplary embodiments, the machine operates as a standalone device or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate in the capacity of a server or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a server, a personal computer (PC), a tablet PC, a set-top box (STB), a PDA, a cellular telephone, a digital camera, a portable music player (e.g., a portable hard drive audio device, such as a Moving Picture Experts Group Audio Layer 3 (MP3) player), a web appliance, a network router, a switch, a bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
[0041] The example computer system 400 includes a processor or multiple processors 402, a hard disk drive 404, a main memory 406, and a static memory 408, which
communicate with each other via a bus 410. The computer system 400 may also include a network interface device 412. The hard disk drive 404 may include a computer-readable medium 420, which stores one or more sets of instructions 422 embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 422 can also reside, completely or at least partially, within the main memory 406 and/or within the processors 402 during execution thereof by the computer system 400. The main memory 406 and the processors 402 also constitute machine-readable media.
[0042] While the computer-readable medium 420 is shown in an exemplary
embodiment to be a single medium, the term "computer-readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "computer-readable medium" shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the
methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term "computer-readable medium" shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, RAM, ROM, and the like.
[0043] The exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems. Although not limited thereto, computer software programs for implementing the present method can be written in any number of suitable programming languages such as, for example, C, Python, Javascript, Go, or other compilers, assemblers, interpreters or other computer languages or platforms.
[0044] Thus, systems and methods for compiling source code from programming languages to programing languages are disclosed. Although embodiments have been described with reference to specific example embodiments, it may be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMS What is claimed is:
1. A computer-implemented method for compiling and optimizing computer code, the method comprising:
acquiring a first code, the first code being written in a first language;
generating, based on the first code, a first deterministic finite state machine (DFSM); optimizing the first DFSM to obtain a second DFSM; and
generating, based on the second DFSM, a second code, the second code being written in a second language.
2. The method of claim 1, wherein generating the first DFSM includes:
generating, based on the first code, a non-deterministic finite state machine (NFSM); and
converting the NFSM into the first DFSM.
3. The method of claim 2, wherein generating the NFSM includes:
parsing, based on a first grammar associated with the first language, the first code to obtain an abstract syntax tree (AST); and
converting the AST into the NFSM.
4. The method of claim 1, wherein generating the second code includes:
converting the second DFSM into a non-deterministic finite state machine (NFSM); and
generating, based on the NFSM, the second code.
5. The method of claim 4, wherein generating the second code includes:
converting the NFSM into an abstract syntax tree (AST); and
recompiling, based on a second grammar associated with the second language, the AST into the second code.
6. The method of claim 1, wherein optimizing the first DFSM includes minimizing a number of states in the second DFSM.
7. The method of claim 1, wherein optimizing the first DFSM includes reversing twice the first DFSM to the NFSM.
8. The method of claim 1, wherein the first language or the second language includes one of the following: JavaScript, C, C++, and a domain specific language.
9. The method of claim 1, wherein the first language or the second language includes one of the following: a hardware description language and bits native to a field-programmable gate array.
10. The method of claim 1, wherein the first language or the second language includes a presentation language.
11. The method of claim 1, further comprising prior to generating the second code, performing, based on a formal specification, a formal verification of the second DFSM.
12. A system for compiling codes, the system comprising:
at least one processor; and a memory communicatively coupled to the at least one processor, the memory storing instructions, which, when executed by the at least one processor, perform a method comprising:
acquiring a first code, the first code being written in a first language;
generating, based on the first code, a first deterministic finite state machine (DFSM); optimizing the first DFSM to obtain a second DFSM; and
generating, based on the second DFSM, a second code, the second code being written in a second language.
13. The system of claim 12, wherein generating the first DFSM includes:
generating, based on the first code, a non-deterministic finite state machine (NFSM); and
converting the NFSM into the first DFSM.
14. The system of claim 13, wherein generating NFSM includes:
parsing, based on a first grammar associated with the first language, the first code to obtain an abstract syntax tree (AST); and
converting the AST to the NFSM.
15. The system of claim 12, wherein generating the second code includes:
converting the second DFSM into a non-deterministic finite state machine (NFSM); and
generating, based on the NFSM, the second code.
16. The system of claim 15, wherein generating the second code includes:
converting the NFSM into an abstract syntax tree (AST); and recompiling, based on a second grammar associated with the second language, the AST into the second code.
17. The system of claim 12, wherein optimizing the first DFSM includes minimizing a number of states in the second DFSM.
18. The system of claim 12, wherein the first language or the second language includes one of the following: JavaScript, C, C++ , and a domain specific language.
19. The system of claim 12, wherein the first language or the second language includes one of the following: a hardware description language and bits native to a field-programmable gate array.
20. A non-transitory computer-readable storage medium having embodied thereon instructions, which, when executed by one or more processors, perform a method for organizing data, the method comprising:
acquiring a first code, the first code being written in a first language;
parsing, based on a first grammar associated with the first language, the first code to obtain a first abstract syntax tree (AST);
converting the first AST into a first non-deterministic finite state machine (NFSM); converting the first NFSM into a first deterministic finite state machine (DFSM); optimizing the first DFSM to obtain the second DFSM;
converting the second DFSM into a second NFSM;
converting the second NFSM into a second AST; and
recompiling, based on a second grammar associated with a second language, the AST into the second code, the second code being written in the second language.
PCT/US2017/038825 2017-06-22 2017-06-22 Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code WO2018236384A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2017/038825 WO2018236384A1 (en) 2017-06-22 2017-06-22 Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2017/038825 WO2018236384A1 (en) 2017-06-22 2017-06-22 Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code

Publications (1)

Publication Number Publication Date
WO2018236384A1 true WO2018236384A1 (en) 2018-12-27

Family

ID=64735746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/038825 WO2018236384A1 (en) 2017-06-22 2017-06-22 Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code

Country Status (1)

Country Link
WO (1) WO2018236384A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10481881B2 (en) 2017-06-22 2019-11-19 Archeo Futurus, Inc. Mapping a computer code to wires and gates
CN111625224A (en) * 2020-05-28 2020-09-04 北京百度网讯科技有限公司 Code generation method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130174131A1 (en) * 2012-01-04 2013-07-04 International Business Machines Corporation Code converting method, program, and system
US20140089249A1 (en) * 2007-07-16 2014-03-27 Sonicwall, Inc. Data pattern analysis using optimized deterministic finite ‎automation
US20150082207A1 (en) * 2013-09-13 2015-03-19 Fujitsu Limited Extracting a deterministic finite-state machine model of a gui based application
US20150135171A1 (en) * 2013-11-08 2015-05-14 Fujitsu Limited Information processing apparatus and compilation method
US20150277865A1 (en) * 2012-11-07 2015-10-01 Koninklijke Philips N.V. Compiler generating operator free code

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140089249A1 (en) * 2007-07-16 2014-03-27 Sonicwall, Inc. Data pattern analysis using optimized deterministic finite ‎automation
US20130174131A1 (en) * 2012-01-04 2013-07-04 International Business Machines Corporation Code converting method, program, and system
US20150277865A1 (en) * 2012-11-07 2015-10-01 Koninklijke Philips N.V. Compiler generating operator free code
US20150082207A1 (en) * 2013-09-13 2015-03-19 Fujitsu Limited Extracting a deterministic finite-state machine model of a gui based application
US20150135171A1 (en) * 2013-11-08 2015-05-14 Fujitsu Limited Information processing apparatus and compilation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BARRETT: "Compiler Design", FALL 2005, 21 December 2005 (2005-12-21), pages 39, 592, XP055559431, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.9894&rep=rep1&type=pdf> *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10481881B2 (en) 2017-06-22 2019-11-19 Archeo Futurus, Inc. Mapping a computer code to wires and gates
CN111625224A (en) * 2020-05-28 2020-09-04 北京百度网讯科技有限公司 Code generation method, device, equipment and storage medium
CN111625224B (en) * 2020-05-28 2023-11-24 北京百度网讯科技有限公司 Code generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US9996328B1 (en) Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code
JP7324831B2 (en) DEPLOYMENT METHOD, DEPLOYMENT DEVICE AND ELECTRONIC DEVICE OF OPERATORS IN DEEP LEARNING FRAMEWORK
JP5786513B2 (en) System, method and storage medium for testing software modules
JP5786512B2 (en) Solve hybrid constraints to verify security software modules for detecting injection attacks
US20110289488A1 (en) Generating Test Sets Using Intelligent Variable Selection and Test Set Compaction
JP5786511B2 (en) Solve hybrid constraints to verify software module specification requirements
US9292282B2 (en) Server-side translation for custom application support in client-side scripts
JP5936118B2 (en) Code conversion method, program and system
CN103226485A (en) Code publishing method, machine and system
US9880943B2 (en) Cache management in a multi-threaded environment
KR20210112330A (en) Languages and compilers that create synchronous digital circuits that maintain thread execution order
US9152400B2 (en) Eliminating redundant reference count operations in intermediate representation of script code
Hinsen ActivePapers: a platform for publishing and archiving computer-aided research
US20180060111A1 (en) Method and Apparatus for Online Upgrade of Kernel-Based Virtual Machine Module
US8438000B2 (en) Dynamic generation of tests
WO2018236384A1 (en) Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code
JP2018097841A (en) System and method for executing code by interpreter
US20190220294A1 (en) Using lightweight jit compilation for short-lived jvms on parallel distributing computing framework
CN115951890A (en) Method, system and device for code conversion between different front-end frames
KR102117165B1 (en) Method and apparatus for testing intermediate language for binary analysis
US20170115973A1 (en) Operating method of semiconductor device and semiconductor system
US10481881B2 (en) Mapping a computer code to wires and gates
US10416975B2 (en) Compiling a parallel loop with a complex access pattern for writing an array for GPU and CPU
WO2019213539A1 (en) Mapping a computer code to wires and gates
US11615014B2 (en) Using relocatable debugging information entries to save compile time

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17915101

Country of ref document: EP

Kind code of ref document: A1