US20140317628A1 - Memory apparatus for processing support of long routing in processor, and scheduling apparatus and method using the memory apparatus - Google Patents

Memory apparatus for processing support of long routing in processor, and scheduling apparatus and method using the memory apparatus Download PDF

Info

Publication number
US20140317628A1
US20140317628A1 US14/258,795 US201414258795A US2014317628A1 US 20140317628 A1 US20140317628 A1 US 20140317628A1 US 201414258795 A US201414258795 A US 201414258795A US 2014317628 A1 US2014317628 A1 US 2014317628A1
Authority
US
United States
Prior art keywords
memory
spill
instruction
processor
data flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/258,795
Inventor
Won-Sub Kim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIM, WON-SUB
Publication of US20140317628A1 publication Critical patent/US20140317628A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level
    • G06F8/4452Software pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs

Definitions

  • Apparatuses and methods consistent with exemplary embodiments relate to a memory apparatus for effective process support of long routing in a coarse grain reconfigurable array (CGRA)-based processor, and a scheduling apparatus and method using the memory apparatus.
  • CGRA coarse grain reconfigurable array
  • a coarse grain reconfigurable array (CGRA)-based processor with a functional unit array supports point-to-point connections among all functional units in the array, and thus directly handles the routing, unlike communication through general write and read registers. Specifically, in the occurrence of skew in a data flow (i.e., in the event of imbalance in a dependence graph), long routing may occur in scheduling.
  • CGRA coarse grain reconfigurable array
  • a local rotating register file is used to support such long routing because values of functional units are routed for a number of cycles.
  • the local rotating register file may be suitable to store the values for several cycles. Meanwhile, when long routing frequently occurs, the local rotating register has a limitation in use due to a limited number of connections to read and write ports, and thereby the entire processing performance is reduced.
  • a scheduling apparatus including: an analyzer configured to analyze a degree of skew in a data flow of a program; a determiner configured to determine whether operations of the data flow utilize a memory spill based on a result of the analysis of the degree of skew; and an instruction generator configured to eliminate dependency between the operations that are determined, by the determiner, to utilize the memory spill, and to generate a memory spill instruction.
  • the generated memory spill instruction may include a memory spill store instruction and a memory spill load instruction, wherein the memory spill store instruction instructs a processor to store a processing result of a first operation of the operation in memory, and wherein the memory spill load instruction instructs the processor to load the processing result of the first operation from the memory when the processor performs a second operation of the data flow that uses the processing result of the first operation.
  • the analyzer may be configured to analyze the degree of skew by analyzing a long routing path on a data flow graph of the program.
  • the instruction generator may be configured to, in response to a determination that there is no operation that utilizes the memory spill, generate a register spill instruction for the processor to store a processing result of each operation in a local register.
  • the instruction generator may be configured to generate a memory spill instruction to enable an identical logic index and different physical indices to be allocated to iterations of the program performed during a same cycle.
  • the instruction generator may be configured to differentiate the physical indices by allocating addresses with respect to the iterations based on a number of at least one memory element included in the memory.
  • a scheduling method including: analyzing a degree of skew in a data flow of a program; determining whether operations of the data flow utilize a memory spill based on a result of the analysis of the degree of skew; and eliminating a dependency between the operations that utilize the memory spill, and generating a memory spill instruction.
  • the memory spill instruction may include a memory spill store instruction and a memory spill load instruction, wherein the memory spill store instruction instructs a processor to store a processing result of a first operation of the data flow in the memory, and wherein the memory spill load instruction instructs the processor to load the processing result of the first operation from the memory when the processor performs a second operation of the data flow that uses the processing result of the first operation.
  • the analyzing may include analyzing the degree of skew by analyzing a long routing path on a data flow graph of the program.
  • the generating the instruction may include, in response to a determination that there is no operation that utilizes the memory spill, generating a register spill instruction to store a processing result of each operation in a local register.
  • the generating the instruction may include generating a memory spill instruction to enable an identical logic index and different physical indices to be allocated to iterations of the program performed during a same cycle.
  • the generating the instruction may include differentiating the physical indices by allocating addresses with respect to the iterations based on a number of at least one memory element included in the memory.
  • a memory apparatus including: a memory port; a memory element with a physical index; and a memory controller configured to control access to the memory element by calculating the physical index based on logic index information included in a request input, through the memory port, from a processor in response to a memory spill instruction generated as a result of program scheduling, and to process the input request.
  • the memory apparatus may further include a write control buffer configured to, in response to a write request from the processor, control an input to memory via the memory port by temporarily storing data.
  • a write control buffer configured to, in response to a write request from the processor, control an input to memory via the memory port by temporarily storing data.
  • the memory apparatus may further include a read control buffer configured to, in response to a read request from the processor, control an input to the processor by temporarily storing data that is output from the memory through the memory port.
  • a read control buffer configured to, in response to a read request from the processor, control an input to the processor by temporarily storing data that is output from the memory through the memory port.
  • the memory spill instruction may include a memory spill store instruction and a memory spill load instruction, wherein the memory spill store instruction instructs the processor to store a processing result of a first operation in the memory element, and wherein the memory spill load instruction instructs the processor to load the stored processing result of the first operation from the memory element when the processor performs a second operation that uses the processing result of the first operation.
  • the memory port may include at least one write port configured to process a data write request, which the processor transmits in response to the memory spill store instruction, and at least one read port configured to process a data read request, which the processor transmits in response to the memory spill load instruction.
  • a number of the at least one memory element may be equal to a number of at least one write port such that they correspond to each other, respectively.
  • a scheduling method including: determining whether operations in a data flow of a program cause long routing; and generating, in response to determining that the operations cause the long routing, a memory spill instruction corresponding to a memory distinct from a local register.
  • FIG. 1 is a diagram illustrating a scheduling apparatus according to an exemplary embodiment
  • FIG. 2 is a diagram illustrating an example of a data flow graph for explaining long routing in the scheduling apparatus according to an exemplary embodiment
  • FIG. 3 is a flowchart illustrating a scheduling method according to an exemplary embodiment
  • FIG. 4 is a block diagram illustrating a memory apparatus according to an exemplary embodiment.
  • FIG. 1 is a diagram illustrating a scheduling apparatus 100 according to an exemplary embodiment.
  • a coarse grained reconfigurable array (CGRA) may use a modulo scheduling method that employs software pipelining. Unlike general modulo scheduling, the modulo scheduling used for CGRA takes into consideration routing between operations for a scheduling process.
  • a scheduling apparatus 100 according to an exemplary embodiment is capable of modulo scheduling to allow a CGRA-based processor to effectively process long routing between operations.
  • the scheduling apparatus 100 includes an analyzer 110 , a determiner 120 , and an instruction generator 130 .
  • the analyzer 110 may analyze a degree of skew in data flow, based on a data flow graph of a program.
  • the analyzer 110 may determine the degree of skew in data flow by analyzing data dependency between operations based on the data flow graph.
  • FIG. 2 is a diagram illustrating an example of a data flow graph for explaining long routing in the scheduling apparatus 100 according to an exemplary embodiment. Referring to ( a ) of FIG. 2 , data dependency between operation A and operation G is notably different from other data dependencies between every other two consecutive operations (A through G). Such skew occurring due to the imbalance among data dependencies causes long routing in scheduling.
  • the determiner 120 determines whether memory spill is to be utilized, based on the analyzing result from the analyzer 110 .
  • register spill may be used, whereby a processing result from each functional unit of the processor is written in a local register file, and is utilized such that the processing result can be routed for several cycles.
  • memory spill may be used to store the execution result of the operation in memory, rather than in a local register file, and use, when necessary or desired, the stored data by reading the stored data from the memory.
  • the determiner 120 may determine whether operations (e.g., A and G) whose data dependency causes long routing on the data flow graph are present, based on the analyzing result from the analyzer 110 .
  • Memory spill may be determined to be utilized for such operations (e.g., A and G) that cause long routing.
  • the instruction generator 130 may eliminate the data dependency between the operations A and G. In addition, the instruction generator 130 may generate a memory spill instruction to allow the processor to utilize memory in writing and reading a processing result of the operations A and G.
  • the instruction generator 130 may generate a memory spill store instruction to allow a functional unit of the processor to store a processing result of the first operation A in the memory, as opposed to in a local register file (i.e., a register spill). Moreover, the instruction generator 130 may generate a memory spill load instruction to allow the functional unit of the processor to load the processing result of first operation A from the memory, as opposed to from the local register file, when executing the second operation G.
  • a memory spill store instruction to allow a functional unit of the processor to store a processing result of the first operation A in the memory, as opposed to in a local register file (i.e., a register spill).
  • the instruction generator 130 may generate a memory spill load instruction to allow the functional unit of the processor to load the processing result of first operation A from the memory, as opposed to from the local register file, when executing the second operation G.
  • the instruction generator 130 may perform scheduling to avoid addresses being allocated to the same memory bank with respect to an iteration of a program loop.
  • the instruction generator 130 may generate a memory spill instruction to enable the same logic index and different physical indices to be allocated to iterations in a program loop.
  • the CGRA increases a throughput by use of software pipeline technology in which iterations of a program loop are performed in parallel with one another at a given initiation interval (II).
  • Variables generated during each iteration may have an overlapped lifetime, and such overlapped lifetime may be overcome by using a rotating register file. That is, the same logical address and different physical addresses are allocated to the variables generated during each iteration so as to allow access to the rotating register file.
  • the memory may have a structure that allows the scheduling apparatus 100 to support scheduling by use of memory spill.
  • the memory may include one or more memory elements with different physical indices.
  • the instruction generator 130 may vary the physical indices by allocating different addresses to different iterations of the program loop, based on the number of memory elements included in the memory. By doing so, a problem of the occurrence of an overlapped address bank in the same cycle, which may take place when data is written in the memory during the execution of iterations of a software-pipelined program, can be overcome. If the determiner 120 determines that there is no operation that will utilize memory spill, the instruction generator 130 may generate a register spill instruction to store a processing result of each operation in the local register.
  • FIG. 3 is a flowchart illustrating a scheduling method according to an exemplary embodiment. With reference to FIG. 3 , a method for allowing memory spill through the scheduling apparatus 100 of FIG. 1 is described.
  • the scheduling apparatus 100 analyzes a degree of skew in data flow based on the data flow (e.g., a data flow graph) of a program.
  • the scheduling apparatus 100 may determine the degree of skew in data flow by analyzing data dependency between operations based on the data flow graph. Referring back to ( a ) of FIG. 2 , by way of example, data dependency between operation A and operation G is notably different from other data dependencies between every other two operations (A through G), and consequently, skew occurs in the entire data flow, which causes long routing in scheduling.
  • operation 320 it is determined whether memory spill is to be utilized, based on the analysis result. Referring back to ( b ) of FIG. 2 , by way of example, it is determined that memory spill is to be utilized for the execution of the operations (e.g., operations A and G in FIG. 2 ) with data dependency that causes long routing in a data flow graph.
  • the operations e.g., operations A and G in FIG. 2
  • the scheduling apparatus 100 In response to a determination that there are operations (e.g., A and G in FIG. 2 ) that are to utilize memory spill, the scheduling apparatus 100 eliminates the data dependency between the operations in operation 330 , and generates a memory spill instruction to allow a processor to use the memory, rather than a local register, for writing and reading a processing result of the operations in operation 340 .
  • the memory spill instruction may include a memory spill store instruction and a memory spill load instruction.
  • the memory spill store instruction allows (i.e., instructs) a functional unit of the processor to store a processing result of the first operation (e.g., operation A in FIG.
  • the instruction generator 130 may perform scheduling to avoid addresses being allocated to the same memory bank with respect to an iteration of a program loop. That is, the instruction generator 130 may generate a memory spill instruction to enable the same logic index and different physical indices to be allocated to iterations in a program loop.
  • the CGRA increases a throughput by use of software pipeline technology in which iterations of a program loop are performed in parallel with one another at a given initiation interval (II).
  • Variables generated during each iteration may have an overlapped lifetime, and such overlapped lifetime may be overcome by using a rotating register file. That is, the same logical address and different physical addresses may be allocated to the variables so as to allow the access to the rotating register file.
  • the memory may have a structure that allows the scheduling apparatus 100 to provide scheduling support by use of memory spill.
  • the memory may include one or more memory elements with different physical indices.
  • the instruction generator 130 may vary the physical indices by allocating different addresses to different iterations of the program loop, based on the number of memory elements included in the memory. By doing so, a problem due to occurrence of an overlapped address bank in the same cycle, which may take place when data is written in the memory during the execution of iterations of a software-pipelined program, can be overcome.
  • the scheduling apparatus 100 In response to a determination that there is no operation that is to utilize a memory spill, the scheduling apparatus 100 generates a register spill instruction to store a processing result of each operation in the local register in operation 350 .
  • FIG. 4 is a block diagram illustrating a memory apparatus 400 according to an exemplary embodiment. As shown in FIG. 4 , the memory apparatus 400 is structured to support a processor 500 to store and load a different value for each iteration when executing the iterations of a software-pipelined program loop.
  • the memory apparatus 400 includes memory ports 410 and 420 , a memory controller 430 , and one or more memory elements 450 .
  • the memory ports 410 and 420 may include at least one write port 410 to process a write request from the processor 500 , and at least one read port 420 to process a read request from the processor 500 .
  • the number of memory elements 450 may correspond to the number of memory ports 410 or 420 .
  • the memory apparatus 400 may include the same number of memory elements 450 as the number of write ports 410 through which to receive a write request from the processor 500 .
  • the memory apparatus 400 may further include one or more control buffers 440 a and 440 b.
  • the control buffers 440 a and 440 b may temporarily store a number of requests from the processor 500 when the number of requests exceeds the number of memory ports 410 or 420 , and input the requests to the memory ports 410 or 420 after a predetermined period of delay time, thereby preventing the processor 500 from stalling.
  • the memory controller 430 may process a request input through the memory port 410 or 420 from the processor 500 that executes a memory spill instruction generated by the scheduling apparatus 100 . Based on the logic index information of the memory element 450 included in the request, the physical index is calculated to control access to the corresponding memory element 450 .
  • the processor 500 may transmit a write request to store the processing result of operation A in the memory apparatus 400 with respect to each iteration of the program loop. At least one write request from the processor 500 is input through at least one write port 410 , and the memory controller 430 controls data to be stored in the corresponding memory element 450 .
  • the write control buffer 440 a may temporarily store the at least one write request from the processor 500 , and then input the at least one write request to the write ports 410 after a predetermined period of time delay.
  • a memory spill load instruction is input to a functional unit of the processor 500 when executing operation G.
  • the functional unit executes the memory spill load instruction to transmit, to the memory apparatus 400 , a read request for the processing result data of operation A.
  • the same logic index information is transmitted during each iteration, and the memory controller 430 may calculate a physical index using the logic index information.
  • the read request may include the logic index information and information on each iteration identifier (ID), based on which the memory controller 430 may calculate the physical index.
  • the read control buffer 440 b may temporarily store data read from the memory element 450 , and transmit the data to the read port of the processor after a predetermined period of delay time, so as to prevent the processor 500 from stalling.
  • long routing caused by skew in data flow on a data flow graph may be spilt to the memory apparatus 400 , and a memory structure for effectively supporting the memory spill is provided, thereby improving the processing performance of the processor and reducing a processor size.
  • One or more exemplary embodiments can be implemented as computer readable codes stored in a computer readable record medium and executed by a hardware processor or controller. Codes and code segments constituting the computer program can be easily inferred by a skilled computer programmer in the art.
  • the computer readable record medium includes all types of record media in which computer readable data are stored. Examples of the computer readable record medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage.
  • the computer readable record medium may be distributed to computer systems over a network, in which computer readable codes may be stored and executed in a distributed manner.
  • one or more of the above-described elements may be implemented by a processor, circuitry, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Provided are a scheduling apparatus and method for effective processing support of long routing in a coarse grain reconfigurable array (CGRA)-based processor. The scheduling apparatus includes: an analyzer configured to analyze a degree of skew in a data flow of a program; a determiner configured to determine whether operations in the data flow utilize a memory spill based on the analyzed degree of skew; and an instruction generator configured to eliminate dependency between the operations that are determined to utilize the memory spill, and to generate a memory spill instruction.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application claims priority from Korean Patent Application No. 10-2013-0044430, filed on Apr. 22, 2013 in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference in its entirety.
  • BACKGROUND
  • 1. Field
  • Apparatuses and methods consistent with exemplary embodiments relate to a memory apparatus for effective process support of long routing in a coarse grain reconfigurable array (CGRA)-based processor, and a scheduling apparatus and method using the memory apparatus.
  • 2. Description of the Related Art
  • A coarse grain reconfigurable array (CGRA)-based processor with a functional unit array supports point-to-point connections among all functional units in the array, and thus directly handles the routing, unlike communication through general write and read registers. Specifically, in the occurrence of skew in a data flow (i.e., in the event of imbalance in a dependence graph), long routing may occur in scheduling.
  • A local rotating register file is used to support such long routing because values of functional units are routed for a number of cycles. The local rotating register file may be suitable to store the values for several cycles. Meanwhile, when long routing frequently occurs, the local rotating register has a limitation in use due to a limited number of connections to read and write ports, and thereby the entire processing performance is reduced.
  • SUMMARY
  • According to an aspect of an exemplary embodiment, there is provided a scheduling apparatus including: an analyzer configured to analyze a degree of skew in a data flow of a program; a determiner configured to determine whether operations of the data flow utilize a memory spill based on a result of the analysis of the degree of skew; and an instruction generator configured to eliminate dependency between the operations that are determined, by the determiner, to utilize the memory spill, and to generate a memory spill instruction.
  • The generated memory spill instruction may include a memory spill store instruction and a memory spill load instruction, wherein the memory spill store instruction instructs a processor to store a processing result of a first operation of the operation in memory, and wherein the memory spill load instruction instructs the processor to load the processing result of the first operation from the memory when the processor performs a second operation of the data flow that uses the processing result of the first operation.
  • The analyzer may be configured to analyze the degree of skew by analyzing a long routing path on a data flow graph of the program.
  • The instruction generator may be configured to, in response to a determination that there is no operation that utilizes the memory spill, generate a register spill instruction for the processor to store a processing result of each operation in a local register.
  • The instruction generator may be configured to generate a memory spill instruction to enable an identical logic index and different physical indices to be allocated to iterations of the program performed during a same cycle.
  • The instruction generator may be configured to differentiate the physical indices by allocating addresses with respect to the iterations based on a number of at least one memory element included in the memory.
  • According to an aspect of another exemplary embodiment, there is provided a scheduling method including: analyzing a degree of skew in a data flow of a program; determining whether operations of the data flow utilize a memory spill based on a result of the analysis of the degree of skew; and eliminating a dependency between the operations that utilize the memory spill, and generating a memory spill instruction.
  • The memory spill instruction may include a memory spill store instruction and a memory spill load instruction, wherein the memory spill store instruction instructs a processor to store a processing result of a first operation of the data flow in the memory, and wherein the memory spill load instruction instructs the processor to load the processing result of the first operation from the memory when the processor performs a second operation of the data flow that uses the processing result of the first operation.
  • The analyzing may include analyzing the degree of skew by analyzing a long routing path on a data flow graph of the program.
  • The generating the instruction may include, in response to a determination that there is no operation that utilizes the memory spill, generating a register spill instruction to store a processing result of each operation in a local register.
  • The generating the instruction may include generating a memory spill instruction to enable an identical logic index and different physical indices to be allocated to iterations of the program performed during a same cycle.
  • The generating the instruction may include differentiating the physical indices by allocating addresses with respect to the iterations based on a number of at least one memory element included in the memory.
  • According to an aspect of another exemplary embodiment, there is provided a memory apparatus including: a memory port; a memory element with a physical index; and a memory controller configured to control access to the memory element by calculating the physical index based on logic index information included in a request input, through the memory port, from a processor in response to a memory spill instruction generated as a result of program scheduling, and to process the input request.
  • The memory apparatus may further include a write control buffer configured to, in response to a write request from the processor, control an input to memory via the memory port by temporarily storing data.
  • The memory apparatus may further include a read control buffer configured to, in response to a read request from the processor, control an input to the processor by temporarily storing data that is output from the memory through the memory port.
  • The memory spill instruction may include a memory spill store instruction and a memory spill load instruction, wherein the memory spill store instruction instructs the processor to store a processing result of a first operation in the memory element, and wherein the memory spill load instruction instructs the processor to load the stored processing result of the first operation from the memory element when the processor performs a second operation that uses the processing result of the first operation.
  • The memory port may include at least one write port configured to process a data write request, which the processor transmits in response to the memory spill store instruction, and at least one read port configured to process a data read request, which the processor transmits in response to the memory spill load instruction.
  • A number of the at least one memory element may be equal to a number of at least one write port such that they correspond to each other, respectively.
  • According to an aspect of another exemplary embodiment, there is provided a scheduling method including: determining whether operations in a data flow of a program cause long routing; and generating, in response to determining that the operations cause the long routing, a memory spill instruction corresponding to a memory distinct from a local register.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The above and/or other aspects will become apparent and more readily appreciated from the following description of certain exemplary embodiments, taken in conjunction with the accompanying drawings, in which:
  • FIG. 1 is a diagram illustrating a scheduling apparatus according to an exemplary embodiment;
  • FIG. 2 is a diagram illustrating an example of a data flow graph for explaining long routing in the scheduling apparatus according to an exemplary embodiment;
  • FIG. 3 is a flowchart illustrating a scheduling method according to an exemplary embodiment; and
  • FIG. 4 is a block diagram illustrating a memory apparatus according to an exemplary embodiment.
  • DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
  • The following description is provided to assist the reader in gaining a comprehensive understanding of methods, apparatuses, and/or systems described herein. Accordingly, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be suggested to those of ordinary skill in the art. Also, descriptions of well-known functions and constructions may be omitted for increased clarity and conciseness.
  • Throughout the drawings and the detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated for clarity, illustration, and convenience.
  • Herein, a memory apparatus for processing support of long routing in a processor according to one or more exemplary embodiments, and a scheduling apparatus and method using the memory apparatus will be described in detail with reference to the accompanying drawings.
  • FIG. 1 is a diagram illustrating a scheduling apparatus 100 according to an exemplary embodiment. A coarse grained reconfigurable array (CGRA) may use a modulo scheduling method that employs software pipelining. Unlike general modulo scheduling, the modulo scheduling used for CGRA takes into consideration routing between operations for a scheduling process. A scheduling apparatus 100 according to an exemplary embodiment is capable of modulo scheduling to allow a CGRA-based processor to effectively process long routing between operations.
  • Referring to FIG. 1, the scheduling apparatus 100 includes an analyzer 110, a determiner 120, and an instruction generator 130.
  • The analyzer 110 may analyze a degree of skew in data flow, based on a data flow graph of a program. The analyzer 110 may determine the degree of skew in data flow by analyzing data dependency between operations based on the data flow graph. FIG. 2 is a diagram illustrating an example of a data flow graph for explaining long routing in the scheduling apparatus 100 according to an exemplary embodiment. Referring to (a) of FIG. 2, data dependency between operation A and operation G is notably different from other data dependencies between every other two consecutive operations (A through G). Such skew occurring due to the imbalance among data dependencies causes long routing in scheduling.
  • The determiner 120 determines whether memory spill is to be utilized, based on the analyzing result from the analyzer 110. Generally, in the occurrence of long routing in scheduling for a processor, “register spill” may be used, whereby a processing result from each functional unit of the processor is written in a local register file, and is utilized such that the processing result can be routed for several cycles. Meanwhile, when a functional unit of a processor executes an operation that causes long routing, “memory spill” may be used to store the execution result of the operation in memory, rather than in a local register file, and use, when necessary or desired, the stored data by reading the stored data from the memory.
  • Referring to (b) of FIG. 2, the determiner 120 may determine whether operations (e.g., A and G) whose data dependency causes long routing on the data flow graph are present, based on the analyzing result from the analyzer 110. Memory spill may be determined to be utilized for such operations (e.g., A and G) that cause long routing.
  • In the presence of the operations A and G that will utilize memory spill, the instruction generator 130 may eliminate the data dependency between the operations A and G. In addition, the instruction generator 130 may generate a memory spill instruction to allow the processor to utilize memory in writing and reading a processing result of the operations A and G.
  • For example, the instruction generator 130 may generate a memory spill store instruction to allow a functional unit of the processor to store a processing result of the first operation A in the memory, as opposed to in a local register file (i.e., a register spill). Moreover, the instruction generator 130 may generate a memory spill load instruction to allow the functional unit of the processor to load the processing result of first operation A from the memory, as opposed to from the local register file, when executing the second operation G.
  • In this case, the instruction generator 130 may perform scheduling to avoid addresses being allocated to the same memory bank with respect to an iteration of a program loop. In other words, the instruction generator 130 may generate a memory spill instruction to enable the same logic index and different physical indices to be allocated to iterations in a program loop.
  • Generally, the CGRA increases a throughput by use of software pipeline technology in which iterations of a program loop are performed in parallel with one another at a given initiation interval (II). Variables generated during each iteration may have an overlapped lifetime, and such overlapped lifetime may be overcome by using a rotating register file. That is, the same logical address and different physical addresses are allocated to the variables generated during each iteration so as to allow access to the rotating register file.
  • As described in detail below, the memory may have a structure that allows the scheduling apparatus 100 to support scheduling by use of memory spill. The memory may include one or more memory elements with different physical indices. The instruction generator 130 may vary the physical indices by allocating different addresses to different iterations of the program loop, based on the number of memory elements included in the memory. By doing so, a problem of the occurrence of an overlapped address bank in the same cycle, which may take place when data is written in the memory during the execution of iterations of a software-pipelined program, can be overcome. If the determiner 120 determines that there is no operation that will utilize memory spill, the instruction generator 130 may generate a register spill instruction to store a processing result of each operation in the local register.
  • FIG. 3 is a flowchart illustrating a scheduling method according to an exemplary embodiment. With reference to FIG. 3, a method for allowing memory spill through the scheduling apparatus 100 of FIG. 1 is described.
  • In operation 310, the scheduling apparatus 100 analyzes a degree of skew in data flow based on the data flow (e.g., a data flow graph) of a program. The scheduling apparatus 100 may determine the degree of skew in data flow by analyzing data dependency between operations based on the data flow graph. Referring back to (a) of FIG. 2, by way of example, data dependency between operation A and operation G is notably different from other data dependencies between every other two operations (A through G), and consequently, skew occurs in the entire data flow, which causes long routing in scheduling.
  • In operation 320, it is determined whether memory spill is to be utilized, based on the analysis result. Referring back to (b) of FIG. 2, by way of example, it is determined that memory spill is to be utilized for the execution of the operations (e.g., operations A and G in FIG. 2) with data dependency that causes long routing in a data flow graph.
  • In response to a determination that there are operations (e.g., A and G in FIG. 2) that are to utilize memory spill, the scheduling apparatus 100 eliminates the data dependency between the operations in operation 330, and generates a memory spill instruction to allow a processor to use the memory, rather than a local register, for writing and reading a processing result of the operations in operation 340. In this case, the memory spill instruction may include a memory spill store instruction and a memory spill load instruction. In this case, the memory spill store instruction allows (i.e., instructs) a functional unit of the processor to store a processing result of the first operation (e.g., operation A in FIG. 2) in the memory, as opposed to a local register file, and the memory spill load instruction allows the functional unit to load the processing result of the first operation from the memory when performing a second operation. At this time, the instruction generator 130 may perform scheduling to avoid addresses being allocated to the same memory bank with respect to an iteration of a program loop. That is, the instruction generator 130 may generate a memory spill instruction to enable the same logic index and different physical indices to be allocated to iterations in a program loop.
  • Generally, the CGRA increases a throughput by use of software pipeline technology in which iterations of a program loop are performed in parallel with one another at a given initiation interval (II). Variables generated during each iteration may have an overlapped lifetime, and such overlapped lifetime may be overcome by using a rotating register file. That is, the same logical address and different physical addresses may be allocated to the variables so as to allow the access to the rotating register file.
  • As described in detail below, the memory may have a structure that allows the scheduling apparatus 100 to provide scheduling support by use of memory spill. The memory may include one or more memory elements with different physical indices. The instruction generator 130 may vary the physical indices by allocating different addresses to different iterations of the program loop, based on the number of memory elements included in the memory. By doing so, a problem due to occurrence of an overlapped address bank in the same cycle, which may take place when data is written in the memory during the execution of iterations of a software-pipelined program, can be overcome.
  • In response to a determination that there is no operation that is to utilize a memory spill, the scheduling apparatus 100 generates a register spill instruction to store a processing result of each operation in the local register in operation 350.
  • FIG. 4 is a block diagram illustrating a memory apparatus 400 according to an exemplary embodiment. As shown in FIG. 4, the memory apparatus 400 is structured to support a processor 500 to store and load a different value for each iteration when executing the iterations of a software-pipelined program loop.
  • Referring to FIG. 4, the memory apparatus 400 includes memory ports 410 and 420, a memory controller 430, and one or more memory elements 450.
  • The memory ports 410 and 420 may include at least one write port 410 to process a write request from the processor 500, and at least one read port 420 to process a read request from the processor 500.
  • There are provided one or more memory elements 450, which may have different physical indices to allocate different memory addresses to iterations of the program loop. In this case, the number of memory elements 450 may correspond to the number of memory ports 410 or 420. Particularly, the memory apparatus 400 may include the same number of memory elements 450 as the number of write ports 410 through which to receive a write request from the processor 500.
  • The memory apparatus 400 may further include one or more control buffers 440 a and 440 b. The control buffers 440 a and 440 b may temporarily store a number of requests from the processor 500 when the number of requests exceeds the number of memory ports 410 or 420, and input the requests to the memory ports 410 or 420 after a predetermined period of delay time, thereby preventing the processor 500 from stalling.
  • The memory controller 430 may process a request input through the memory port 410 or 420 from the processor 500 that executes a memory spill instruction generated by the scheduling apparatus 100. Based on the logic index information of the memory element 450 included in the request, the physical index is calculated to control access to the corresponding memory element 450.
  • As shown in FIG. 2( b), in response to the memory spill store instruction generated as a result of the scheduling process, the processor 500 may transmit a write request to store the processing result of operation A in the memory apparatus 400 with respect to each iteration of the program loop. At least one write request from the processor 500 is input through at least one write port 410, and the memory controller 430 controls data to be stored in the corresponding memory element 450.
  • In this case, if the number of write ports in the functional unit of the processor 500 is greater than the number of write ports 410 of the memory apparatus 400, the write control buffer 440 a may temporarily store the at least one write request from the processor 500, and then input the at least one write request to the write ports 410 after a predetermined period of time delay.
  • In addition, a memory spill load instruction is input to a functional unit of the processor 500 when executing operation G. The functional unit executes the memory spill load instruction to transmit, to the memory apparatus 400, a read request for the processing result data of operation A. At this time, the same logic index information is transmitted during each iteration, and the memory controller 430 may calculate a physical index using the logic index information. In this case, the read request may include the logic index information and information on each iteration identifier (ID), based on which the memory controller 430 may calculate the physical index.
  • If the number of input ports of the functional unit of the processor 500 is smaller than the number of input ports of the memory apparatus 400, the read control buffer 440 b may temporarily store data read from the memory element 450, and transmit the data to the read port of the processor after a predetermined period of delay time, so as to prevent the processor 500 from stalling.
  • According to aspects of the above-described exemplary embodiments, long routing caused by skew in data flow on a data flow graph may be spilt to the memory apparatus 400, and a memory structure for effectively supporting the memory spill is provided, thereby improving the processing performance of the processor and reducing a processor size.
  • One or more exemplary embodiments can be implemented as computer readable codes stored in a computer readable record medium and executed by a hardware processor or controller. Codes and code segments constituting the computer program can be easily inferred by a skilled computer programmer in the art. The computer readable record medium includes all types of record media in which computer readable data are stored. Examples of the computer readable record medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage. In addition, the computer readable record medium may be distributed to computer systems over a network, in which computer readable codes may be stored and executed in a distributed manner. Furthermore, it is understood that one or more of the above-described elements may be implemented by a processor, circuitry, etc.
  • A number of exemplary embodiments have been described above. Nevertheless, it will be understood that various modifications may be made to the exemplary embodiments. For example, suitable results may be achieved if the described techniques are performed in a different order and/or if components in a described system, architecture, device, or circuit are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Accordingly, other implementations are within the scope of the following claims.

Claims (26)

1. A scheduling apparatus comprising:
an analyzer configured to analyze a degree of skew in a data flow of a program;
a determiner configured to determine whether operations in the data flow utilize a memory spill based on a result of the analysis of the degree of skew by the analyzer; and
an instruction generator configured to eliminate dependency between the operations that are determined, by the determiner, to utilize the memory spill, and to generate a memory spill instruction corresponding to a memory distinct from a local register.
2. The scheduling apparatus of claim 1, wherein:
the generated memory spill instruction comprises a memory spill store instruction and a memory spill load instruction;
the memory spill store instruction instructs a processor to store a processing result of a first operation of the data flow in the memory; and
the memory spill load instruction instructs the processor to load the stored processing result of the first operation from the memory when the processor performs a second operation of the data flow that uses the processing result of the first operation.
3. The scheduling apparatus of claim 1, wherein the analyzer is configured to analyze the degree of skew by analyzing a long routing path on a data flow graph of the program.
4. The scheduling apparatus of claim 1, wherein the instruction generator is configured to, in response to a determination that there is no operation that utilizes the memory spill, generate a register spill instruction for the processor to store a processing result of each operation of the data flow in the local register.
5. The scheduling apparatus of claim 1, wherein the instruction generator is configured to generate a memory spill instruction to allocate a same logic index and different physical indices to iterations of a program performed during a same cycle.
6. The scheduling apparatus of claim 5, wherein the instruction generator is configured to differentiate the different physical indices by allocating addresses with respect to the iterations based on a number of at least one memory element included in the memory.
7. A scheduling method comprising:
analyzing a degree of skew in a data flow of a program;
determining whether operations in the data flow utilize a memory spill based on a result of the analyzing the degree of skew; and
eliminating a dependency between the operations that are determined, by the determining, to utilize the memory spill, and generating a memory spill instruction corresponding to a memory distinct from a local register.
8. The scheduling method of claim 7, wherein:
the generated memory spill instruction comprises a memory spill store instruction and a memory spill load instruction;
the memory spill store instruction instructs a processor to store a processing result of a first operation of the data flow in the memory; and
the memory spill load instruction instructs the processor to load the stored processing result of the first operation from the memory when the processor performs a second operation of the data flow that uses the processing result of the first operation.
9. The scheduling method of claim 7, wherein the analyzing comprises analyzing the degree of skew by analyzing a long routing path on a data flow graph of the program.
10. The scheduling method of claim 7, wherein the generating the memory spill instruction comprises, in response to a determination that there is no operation that utilizes the memory spill, generating a register spill instruction to store a processing result of each operation of the data flow in the local register.
11. The scheduling method of claim 7, wherein the generating the memory spill instruction comprises generating a memory spill instruction to allocate a same logic index and different physical indices to iterations of a program performed during a same cycle.
12. The scheduling method of claim 11, wherein the generating the memory spill instruction further comprises differentiating the different physical indices by allocating addresses with respect to the iterations based on a number of at least one memory element included in the memory.
13. A memory apparatus comprising:
a memory port;
a memory element with a physical index; and
a memory controller configured to control access to the memory element by determining the physical index based on logic index information included in a request input, through the memory port, from a processor in response to a memory spill instruction generated as a result of program scheduling, and to process the input request,
wherein the memory element is distinct from a local register of the processor.
14. The memory apparatus of claim 13, further comprising:
a write control buffer configured to, in response to a write request from the processor, control an input to the memory via the memory port by temporarily storing data.
15. The memory apparatus of claim 13, further comprising:
a read control buffer configured to, in response to a read request from the processor, control an input to the processor by temporarily storing data that is output from the memory through the memory port.
16. The memory apparatus of claim 13, wherein:
the memory spill instruction comprises a memory spill store instruction and a memory spill load instruction;
the memory spill store instruction instructs the processor to store a processing result of a first operation in the memory element; and
the memory spill load instruction instructs the processor to load the stored processing result of the first operation from the memory element when the processor performs a second operation that uses the processing result of the first operation.
17. The memory apparatus of claim 16, wherein the memory port comprises:
a write port configured to process a data write request, which the processor transmits in response to the memory spill store instruction; and
a read port configured to process a data read request, which the processor transmits in response to the memory spill load instruction.
18. The memory apparatus of claim 17, wherein:
a plurality of memory elements, including the memory element, is provided, and a plurality of write ports, including the write port, is provided; and
a number of the plurality of memory elements is equal to a number of the plurality of write ports such that the plurality of memory elements and the plurality of memory elements respectively correspond to each other.
19. The memory apparatus of claim 13, wherein a plurality of memory elements, including the memory element, is provided, and each of the plurality of memory elements has a different physical index.
20. A scheduling method comprising:
determining whether operations in a data flow of a program cause long routing; and
generating, in response to determining that the operations cause the long routing, a memory spill instruction corresponding to a memory distinct from a local register.
21. The scheduling method of claim 20, wherein the determining comprises analyzing dependencies between the operations in a data flow graph of the program.
22. The scheduling method of claim 20, wherein the generating the memory spill instruction comprises:
generating a memory spill store instruction which instructs a processor to store a processing result of a first operation, among the operations that cause the long routing, in the memory; and
generating a memory spill load instruction which instructs the processor to load the stored processing result of the first operation from the memory when the processor performs a second operation that uses the processing result of the first operation.
23. The scheduling method of claim 20, wherein the generating the memory spill instruction comprises, in response to a determination that there is no operation that utilizes the memory spill, generating a register spill instruction to store a processing result of each operation of the data flow in a local register.
24. The scheduling method of claim 20, wherein the generating the memory spill instruction comprises generating a memory spill instruction to allocate a same logic index and different physical indices to iterations of a program performed during a same cycle.
25. The scheduling method of claim 24, wherein the generating the memory spill instruction further comprises differentiating the different physical indices by allocating addresses with respect to the iterations based on a number of at least one memory element included in the memory.
26-27. (canceled)
US14/258,795 2013-04-22 2014-04-22 Memory apparatus for processing support of long routing in processor, and scheduling apparatus and method using the memory apparatus Abandoned US20140317628A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2013-0044430 2013-04-22
KR1020130044430A KR20140126190A (en) 2013-04-22 2013-04-22 Memory apparatus for supporting long routing of processor, scheduling apparatus and method using the memory apparatus

Publications (1)

Publication Number Publication Date
US20140317628A1 true US20140317628A1 (en) 2014-10-23

Family

ID=51730055

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/258,795 Abandoned US20140317628A1 (en) 2013-04-22 2014-04-22 Memory apparatus for processing support of long routing in processor, and scheduling apparatus and method using the memory apparatus

Country Status (2)

Country Link
US (1) US20140317628A1 (en)
KR (1) KR20140126190A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052347A (en) * 2017-12-06 2018-05-18 北京中科睿芯智能计算产业研究院有限公司 A kind of device for executing instruction selection, method and command mappings method
US20190121575A1 (en) * 2017-10-23 2019-04-25 Micron Technology, Inc. Virtual partition management
US10698853B1 (en) 2019-01-03 2020-06-30 SambaNova Systems, Inc. Virtualization of a reconfigurable data processor
US10768899B2 (en) 2019-01-29 2020-09-08 SambaNova Systems, Inc. Matrix normal/transpose read and a reconfigurable data processor including same
US10831507B2 (en) 2018-11-21 2020-11-10 SambaNova Systems, Inc. Configuration load of a reconfigurable data processor
US11055141B2 (en) 2019-07-08 2021-07-06 SambaNova Systems, Inc. Quiesce reconfigurable data processor
US11188497B2 (en) 2018-11-21 2021-11-30 SambaNova Systems, Inc. Configuration unload of a reconfigurable data processor
US11327771B1 (en) 2021-07-16 2022-05-10 SambaNova Systems, Inc. Defect repair circuits for a reconfigurable data processor
US11386038B2 (en) 2019-05-09 2022-07-12 SambaNova Systems, Inc. Control flow barrier and reconfigurable data processor
US11409540B1 (en) 2021-07-16 2022-08-09 SambaNova Systems, Inc. Routing circuits for defect repair for a reconfigurable data processor
US11487694B1 (en) 2021-12-17 2022-11-01 SambaNova Systems, Inc. Hot-plug events in a pool of reconfigurable data flow resources
US11556494B1 (en) 2021-07-16 2023-01-17 SambaNova Systems, Inc. Defect repair for a reconfigurable data processor for homogeneous subarrays
US11782729B2 (en) 2020-08-18 2023-10-10 SambaNova Systems, Inc. Runtime patching of configuration files
US11809908B2 (en) 2020-07-07 2023-11-07 SambaNova Systems, Inc. Runtime virtualization of reconfigurable data flow resources

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5058053A (en) * 1988-03-31 1991-10-15 International Business Machines Corporation High performance computer system with unidirectional information flow
US20030023733A1 (en) * 2001-07-26 2003-01-30 International Business Machines Corporation Apparatus and method for using a network processor to guard against a "denial-of-service" attack on a server or server cluster
US20030237080A1 (en) * 2002-06-19 2003-12-25 Carol Thompson System and method for improved register allocation in an optimizing compiler
US20050005267A1 (en) * 2003-07-03 2005-01-06 International Business Machines Corporation Pairing of spills for parallel registers
US20060195707A1 (en) * 2005-02-25 2006-08-31 Bohuslav Rychlik Reducing power by shutting down portions of a stacked register file
US20110246170A1 (en) * 2010-03-31 2011-10-06 Samsung Electronics Co., Ltd. Apparatus and method for simulating a reconfigurable processor
US20120096247A1 (en) * 2010-10-19 2012-04-19 Hee-Jin Ahn Reconfigurable processor and method for processing loop having memory dependency
US20130024621A1 (en) * 2010-03-16 2013-01-24 Snu R & Db Foundation Memory-centered communication apparatus in a coarse grained reconfigurable array
US8972697B2 (en) * 2012-06-02 2015-03-03 Intel Corporation Gather using index array and finite state machine

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5058053A (en) * 1988-03-31 1991-10-15 International Business Machines Corporation High performance computer system with unidirectional information flow
US20030023733A1 (en) * 2001-07-26 2003-01-30 International Business Machines Corporation Apparatus and method for using a network processor to guard against a "denial-of-service" attack on a server or server cluster
US20030237080A1 (en) * 2002-06-19 2003-12-25 Carol Thompson System and method for improved register allocation in an optimizing compiler
US20050005267A1 (en) * 2003-07-03 2005-01-06 International Business Machines Corporation Pairing of spills for parallel registers
US20060195707A1 (en) * 2005-02-25 2006-08-31 Bohuslav Rychlik Reducing power by shutting down portions of a stacked register file
US20130024621A1 (en) * 2010-03-16 2013-01-24 Snu R & Db Foundation Memory-centered communication apparatus in a coarse grained reconfigurable array
US20110246170A1 (en) * 2010-03-31 2011-10-06 Samsung Electronics Co., Ltd. Apparatus and method for simulating a reconfigurable processor
US20120096247A1 (en) * 2010-10-19 2012-04-19 Hee-Jin Ahn Reconfigurable processor and method for processing loop having memory dependency
US8972697B2 (en) * 2012-06-02 2015-03-03 Intel Corporation Gather using index array and finite state machine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Manoj Kumar Jain, Exploring Storage Organization in ASIP Synthesis, Proceedings of the Euromicro Symposium on Digital System Design (DSD'03) 0-7695-2003-0/03 $17.00 © 2003 IEEE, page 1-8. *
Mohammed Ashraful Alam Tuhin, COMPILING PARALLEL APPLICATIONS TO COARSE-GRAINED RECONFIGURABLE ARCHITECTURES, May, 2008, IEEE *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11340836B2 (en) 2017-10-23 2022-05-24 Micron Technology, Inc. Virtual partition management in a memory device
US20190121575A1 (en) * 2017-10-23 2019-04-25 Micron Technology, Inc. Virtual partition management
CN109697028A (en) * 2017-10-23 2019-04-30 美光科技公司 Virtual partition management
US10754580B2 (en) * 2017-10-23 2020-08-25 Micron Technology, Inc. Virtual partition management in a memory device
US11789661B2 (en) 2017-10-23 2023-10-17 Micron Technology, Inc. Virtual partition management
CN108052347A (en) * 2017-12-06 2018-05-18 北京中科睿芯智能计算产业研究院有限公司 A kind of device for executing instruction selection, method and command mappings method
US11983140B2 (en) 2018-11-21 2024-05-14 SambaNova Systems, Inc. Efficient deconfiguration of a reconfigurable data processor
US10831507B2 (en) 2018-11-21 2020-11-10 SambaNova Systems, Inc. Configuration load of a reconfigurable data processor
US11188497B2 (en) 2018-11-21 2021-11-30 SambaNova Systems, Inc. Configuration unload of a reconfigurable data processor
US11609769B2 (en) 2018-11-21 2023-03-21 SambaNova Systems, Inc. Configuration of a reconfigurable data processor using sub-files
US11681645B2 (en) 2019-01-03 2023-06-20 SambaNova Systems, Inc. Independent control of multiple concurrent application graphs in a reconfigurable data processor
US11237996B2 (en) 2019-01-03 2022-02-01 SambaNova Systems, Inc. Virtualization of a reconfigurable data processor
US10698853B1 (en) 2019-01-03 2020-06-30 SambaNova Systems, Inc. Virtualization of a reconfigurable data processor
US10768899B2 (en) 2019-01-29 2020-09-08 SambaNova Systems, Inc. Matrix normal/transpose read and a reconfigurable data processor including same
US11580056B2 (en) 2019-05-09 2023-02-14 SambaNova Systems, Inc. Control barrier network for reconfigurable data processors
US11386038B2 (en) 2019-05-09 2022-07-12 SambaNova Systems, Inc. Control flow barrier and reconfigurable data processor
US11055141B2 (en) 2019-07-08 2021-07-06 SambaNova Systems, Inc. Quiesce reconfigurable data processor
US11928512B2 (en) 2019-07-08 2024-03-12 SambaNova Systems, Inc. Quiesce reconfigurable data processor
US11809908B2 (en) 2020-07-07 2023-11-07 SambaNova Systems, Inc. Runtime virtualization of reconfigurable data flow resources
US11782729B2 (en) 2020-08-18 2023-10-10 SambaNova Systems, Inc. Runtime patching of configuration files
US11556494B1 (en) 2021-07-16 2023-01-17 SambaNova Systems, Inc. Defect repair for a reconfigurable data processor for homogeneous subarrays
US11409540B1 (en) 2021-07-16 2022-08-09 SambaNova Systems, Inc. Routing circuits for defect repair for a reconfigurable data processor
US11327771B1 (en) 2021-07-16 2022-05-10 SambaNova Systems, Inc. Defect repair circuits for a reconfigurable data processor
US11487694B1 (en) 2021-12-17 2022-11-01 SambaNova Systems, Inc. Hot-plug events in a pool of reconfigurable data flow resources

Also Published As

Publication number Publication date
KR20140126190A (en) 2014-10-30

Similar Documents

Publication Publication Date Title
US20140317628A1 (en) Memory apparatus for processing support of long routing in processor, and scheduling apparatus and method using the memory apparatus
US10877757B2 (en) Binding constants at runtime for improved resource utilization
US9292291B2 (en) Instruction merging optimization
US9513915B2 (en) Instruction merging optimization
US9335947B2 (en) Inter-processor memory
US10496659B2 (en) Database grouping set query
JP2017102919A (en) Processor with multiple execution units for instruction processing, method for instruction processing using processor, and design mechanism used in design process of processor
US10223269B2 (en) Method and apparatus for preventing bank conflict in memory
US9344115B2 (en) Method of compressing and restoring configuration data
US20120089813A1 (en) Computing apparatus based on reconfigurable architecture and memory dependence correction method thereof
US20150269073A1 (en) Compiler-generated memory mapping hints
US9678752B2 (en) Scheduling apparatus and method of dynamically setting the size of a rotating register
US20140013312A1 (en) Source level debugging apparatus and method for a reconfigurable processor
US9405546B2 (en) Apparatus and method for non-blocking execution of static scheduled processor
KR20150051083A (en) Re-configurable processor, method and apparatus for optimizing use of configuration memory thereof
JP6473023B2 (en) Performance evaluation module and semiconductor integrated circuit incorporating the same
KR101910934B1 (en) Apparatus and method for processing invalid operation of prologue or epilogue of loop
US11797280B1 (en) Balanced partitioning of neural network based on execution latencies
KR102168175B1 (en) Re-configurable processor, method and apparatus for optimizing use of configuration memory thereof
KR101225577B1 (en) Apparatus and method for analyzing assembly language code
KR20170065845A (en) Processor and controlling method thereof
US10481867B2 (en) Data input/output unit, electronic apparatus, and control methods thereof
KR102185280B1 (en) Re-configurable processor, method and apparatus for optimizing use of configuration memory thereof
KR20170122082A (en) Method and system for storing swap data using non-volatile memory
KR20150051115A (en) Re-configurable processor, method and apparatus for optimizing use of configuration memory thereof

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KIM, WON-SUB;REEL/FRAME:032730/0224

Effective date: 20140421

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION