CN107656880B - Processor having memory controller with dynamically programmable functional units - Google Patents

Processor having memory controller with dynamically programmable functional units Download PDF

Info

Publication number
CN107656880B
CN107656880B CN201710873051.9A CN201710873051A CN107656880B CN 107656880 B CN107656880 B CN 107656880B CN 201710873051 A CN201710873051 A CN 201710873051A CN 107656880 B CN107656880 B CN 107656880B
Authority
CN
China
Prior art keywords
pfu
program
programmable
memory
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710873051.9A
Other languages
Chinese (zh)
Other versions
CN107656880A (en
Inventor
G·葛兰·亨利
罗德尼·E·虎克
泰瑞·派克斯
道格拉斯·R·瑞德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhaoxin Semiconductor Co Ltd
Original Assignee
Shanghai Zhaoxin Integrated Circuit Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/337,169 external-priority patent/US10268586B2/en
Priority claimed from US15/337,140 external-priority patent/US10642617B2/en
Priority claimed from US15/590,883 external-priority patent/US11061853B2/en
Application filed by Shanghai Zhaoxin Integrated Circuit Co Ltd filed Critical Shanghai Zhaoxin Integrated Circuit Co Ltd
Publication of CN107656880A publication Critical patent/CN107656880A/en
Application granted granted Critical
Publication of CN107656880B publication Critical patent/CN107656880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0855Overlapped cache accessing, e.g. pipeline
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/452Instruction code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/602Details relating to cache prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6022Using a prefetch buffer or dedicated prefetch cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6028Prefetching based on hints or prefetch instructions

Abstract

A processor having a memory controller including a dynamically programmable functional unit, the processor including a memory controller, wherein the memory controller is to interface an external memory with a Programmable Functional Unit (PFU). The PFU is programmed with a PFU program to modify operation of the memory controller, wherein the PFU includes programmable logic elements and programmable interconnects. For example, the PFU is programmed with a PFU program to add functionality during operation of the processor or otherwise modify existing functionality of the memory controller to enhance the functionality of the memory controller. Thus, once the processor is manufactured, the functions and/or operations of the memory controller are not fixed, but instead the memory controller may be modified after manufacture to improve the efficiency of the processor and/or enhance the performance of the processor, such as when executing corresponding processes.

Description

Processor having memory controller with dynamically programmable functional units
Technical Field
The present invention relates generally to programmable resources of processors, and more particularly to processors having dynamically programmable functional units at the memory controller level.
Background
Processors continue to become more powerful, with these processors having higher performance at higher levels of efficiency. The term "processor," as used herein, refers to any type of processing unit including a microprocessor, a Central Processing Unit (CPU), one or more processing cores, a microcontroller, or the like. The term "processor" as used herein also includes any type of processor configuration, such as a processing unit integrated on a chip or Integrated Circuit (IC), including chips or integrated circuits contained within a system on a chip (SOC), and the like. Semiconductor manufacturing techniques are continually improving, resulting in increased speed, reduced power consumption, and reduced size of the circuits integrated on the processing chip. The reduction in integration size allows for the incorporation of additional functionality within the processing unit. However, once a conventional processor is manufactured, many of its internal functions and operations are substantially fixed.
The memory controller provides an interface between the processor and external system memory, typically configured as Dynamic Random Access Memory (DRAM). Although the memory controller may be provided separately, in many modern conventional processing configurations, the memory controller may be integrated onto the same chip or IC as a processor having an input/output (I/O) interface to external system memory. In conventional configurations, the functionality of the memory controller is essentially fixed once the processor is manufactured.
Disclosure of Invention
A processor according to one embodiment includes a memory controller, wherein the memory controller is to interface an external memory with a Programmable Functional Unit (PFU). The PFU is programmed with a PFU program to modify operation of the memory controller, wherein the PFU includes programmable logic elements and programmable interconnects. For example, the PFU is programmed with a PFU program to add functionality during operation of the processor or otherwise modify existing functionality of the memory controller to enhance the functionality of the memory controller. Thus, once the processor is manufactured, the functions and/or operations of the memory controller are not fixed, but instead the memory controller may be modified after manufacture to improve the efficiency of the processor and/or enhance the performance of the processor, such as when executing corresponding processes.
The processor includes a local memory for storing a PFU program. The local memory may be a Random Access Memory (RAM) for storing PFU programs retrieved from external memory. The processor may respond to a write command instructing the processor to write the PFU program from the external memory to the random access memory. The processor may also include a PFU programmer to program the PFU using a PFU program stored in the PFU memory. The PFU memory may be or may include a Read Only Memory (ROM) for storing at least one predetermined PFU program for programming the PFU to operate according to a predetermined PFU definition. For example, the PFU program may be a default PFU program that the PFU programmer uses to program the PFU at startup of the processor. Alternatively or additionally, the processor may be responsive to a program command for causing the PFU programmer to program the PFU with a specified PFU program of a plurality of PFU programs stored in the PFU memory. A configuration map may be included, wherein the configuration map is to map each of a plurality of different processing modes with a respective PFU program of a plurality of PFU programs stored in a PFU memory.
The programmable logic elements and programmable interconnects may be subdivided into a plurality of substantially identical programmable segments. A PFU programmer may be included, wherein the PFU programmer is to allocate a plurality of programmable segments and program the allocated plurality of programmable segments with a PFU program to program the PFU.
The programmable logic element may comprise a programmable look-up table. Additionally or alternatively, the programmable logic elements may include adders, multiplexers, and registers. The PFU may include a programmable memory where the PFU program may be a bit stream that is scanned into the programmable memory of the PFU. The PFU may be programmed with a plurality of PFU programs, and may include a PFU programmer for enabling at least one of the PFU programs at a time during operation of the processor.
As a more specific, non-limiting example, a PFU program may program the PFU to perform an encryption function for encrypting data stored in the external memory. The encryption function may include an encryption function and an inverse encryption function, wherein the inverse encryption function employs a predetermined key combined with the address to develop the pad value further combined with the data value.
A method for providing a programmable memory controller of a processor interfacing the processor with an external memory, the method comprising the steps of: incorporating a Programmable Functional Unit (PFU) comprising programmable logic elements and a programmable interconnect into the memory controller; and programming the PFU with a PFU program to modify operation of the memory controller.
The method may include storing the PFU program in a local memory of the processor. The method may further include executing a write command with the processor, wherein the write command is to command the processor to write the PFU program from the external memory to a random access memory of the local memory. The method may include providing a PFU programmer and a PFU engine within the PFU, wherein the PFU programmer programs the PFU engine with the PFU program stored in the local memory. The method may include executing a program command with the processor, wherein the program command is to command a PFU programmer to program a PFU engine with a PFU program stored in a PFU memory. The method may include setting a configuration map in the PFU, wherein the configuration map is used to map each of a plurality of different processing modes with a respective PFU program of a plurality of PFU programs stored in a PFU memory.
The method may include: subdividing the programmable logic element and the programmable interconnector into a plurality of substantially identical programmable sections; allocating a plurality of the programmable sections to configure the PFU according to the PFU program; and programming the allocated plurality of the programmable segments with at least one PFU program. The method may include: setting the PFU to a programmable memory; and scanning the at least one PFU program as a bitstream into a programmable memory of a PFU engine. The method may include: programming the PFU with a plurality of PFU programs; and enabling at least one of the plurality of PFU programs at a time during operation of the processor.
Drawings
The benefits, features, and advantages of the present invention will become better understood with regard to the following description, and accompanying drawings where:
FIG. 1 is a simplified block diagram of a processor including a Programmable Functional Unit (PFU) coupled to external memory and memory devices implemented according to one embodiment of the present invention;
FIG. 2 is a more detailed block diagram of the PFU of FIG. 1 implemented according to one embodiment of the present invention;
FIG. 3 is a simplified block diagram of the PFU programmer and controller of FIG. 2 interfacing with a PFU engine, implemented using programmable logic, in accordance with one embodiment of the present invention;
FIG. 4 is a block diagram illustrating a method for initially programming the PFU of FIG. 1 in accordance with one embodiment of the present invention;
FIG. 5 is a simplified block diagram depicting an executable binary application that may be used to program or otherwise reprogram the PFU of FIG. 1, in accordance with an embodiment of the present invention;
FIG. 6 is a more detailed block diagram of the programmable logic of FIG. 3 implemented according to one embodiment of the invention;
FIG. 7 is a schematic block diagram of the programmable logic element of FIG. 6 implemented according to one embodiment of the present invention;
FIG. 8 is a schematic diagram of the LUT of FIG. 7 implemented according to one embodiment of the invention;
fig. 9 is a simplified block diagram of a format of a PFU program for programming the PFU engine of fig. 2, according to an embodiment of the present invention;
FIG. 10 is a simplified block diagram illustrating an exemplary method for generating the PFU program of FIG. 1 for programming the PFU engine of FIG. 2 in accordance with one embodiment of the present invention;
FIG. 11 is a simplified block diagram illustrating an exemplary encryption process that may be programmed into a PFU and performed by an MC when storing data to the system memory of FIG. 1; and
fig. 12 is a simplified block diagram illustrating the reverse encryption process that may be programmed into the PFU and performed by the MC when loading data from the system memory of fig. 1.
Detailed Description
The present inventors have recognized possible limitations associated with predetermined memory controllers present in conventional processors. Accordingly, the present inventors have developed processors having memory controllers that include Programmable Functional Units (PFUs) that are configurable or otherwise programmable to modify or otherwise enhance the operation of the memory controller. The basic input/output system (BIOS) or Operating System (OS) may include configuration information for programming the PFU. The BIOS may copy this configuration information into memory at power-up, reset, reboot, etc. (referred to herein as a POR), or the OS (in the case where the BIOS is loaded later during boot-up) and send commands to the PFU to access the configuration information. Additionally or alternatively, programmers or developers of a particular software program, process or application may incorporate a PFU program into the application used to program the PFU to modify or enhance the operation of the memory controller used by that particular application. As an example, the PFU may be configured to perform programmed cryptographic functions when writing or reading with respect to external system memory used by the processor.
Fig. 1 is a simplified block diagram of a processor 100 including a Programmable Functional Unit (PFU)114 coupled to external memory and memory devices, implemented according to an embodiment of the present invention. The standard Instruction Set Architecture (ISA) of the processor 100 may be the x86 architecture, where in the x86 architecture, most applications designed to execute on an x86 processor may be executed correctly. If the desired result is obtained, the application is executed correctly. In particular, the processor 100 executes instructions of the x86 instruction set and includes an x86 user-visible register set. However, the present invention is not limited to the x86 architecture, such that processor 100 may be implemented according to any alternative ISA as known to those of ordinary skill in the art.
Processor 100 includes 4 slices (slices) individually labeled S0, S1, S2, and S3 (S0-S3), where it is understood that the number of slices is arbitrary and includes only one (1) and up to any positive integer. Each tile S0-S3 includes a respective one of four cores C0, C1, C2, and C3 (C0-C3), a respective one of four caches or "last level caches" LLC0, LLC1, LLC2, and LLC3(LLC 0-LLC 3), and a respective one of four ring stations R0, R1, R2, and R3 (R0-R3). Each core C0-C3 includes one or more internal cache memories (e.g., one or more L1 caches and L2 caches, not shown, etc.) coupled to a respective one of the ring stations R0-R3, which are further coupled to respective ones of the final stage caches LLC 0-LLC 3. It should be understood that processor 100 may be configured as a single core processor, a Central Processing Unit (CPU), or a microprocessor, rather than multiple tiles having multiple cores.
The processor 100 also includes an "uncore" 102 with a corresponding ring station RSU and a Memory Controller (MC)104 with a corresponding ring station RSM. The ring stations R0-R3, the RSU, and the RSM are coupled together in a ring configuration to enable communication between the partitions S0-S3, the countre 102, and the memory controller 104. As shown, for example, RS0 communicates bi-directionally with RS1, RS1 communicates bi-directionally with RSM, RSM communicates bi-directionally with RS2, RS2 communicates bi-directionally with RS3, RS3 communicates bi-directionally with RSU, and RSU communicates bi-directionally with RS 0. The particular ordering of the ring stations in the ring configuration is arbitrary in view of the ring and bidirectional communication, with the configuration shown being only one of many possible alternative configurations.
The uncoore 102 contains or otherwise interfaces with functions of the processor 100 that are not located in any of the partitions S0-S3 or the respective cores C0-C3, but rather should be tightly coupled to these cores to achieve a desired level of performance. In the illustrated configuration, for example, the noncore 102 is provided to interface with an external Read Only Memory (ROM)106 that typically contains a basic input/output system (BIOS) 108. BIOS108 is firmware that is executed at POR of processor 100, where processor 100 is used for hardware initialization during POR to provide runtime services to Operating System (OS)120 and programs or applications. The counte 102 is also configured to interface with an external memory 110, where the external memory 110 may include any number of data storage devices, such as one or more hard disk drives, optical disk drives, flash drives, etc., and typically stores an OS 120.
The MC104 interfaces the processor 100 to an external system memory 112. Partitions S0-S3 share resources of system memory 112 and may also share information with each other via ring stations RS 0-RS 3, RSU, RSM. System memory 112 may be implemented using a suitable memory device or chip, such as one or more Dynamic Random Access Memory (DRAM) chips or the like.
The MC104 also includes a PFU114, where the PFU114 may be programmed to modify or otherwise enhance the functionality of the MC 104. The PFU114 may be programmed in any of a number of ways depending on the details of the configuration. In one case, the BIOS108, after initializing the memory 110 and the system memory 112, accesses a PFU Program (PGM)116 stored in the memory 110 and copies the PFU program 116 to memory on the processor 100 or to the system memory 112. For example, after copying, a copy of PFU program 116 is shown as PFU program 118 stored in system memory 112. In one embodiment, PFU program 116 may be stored in an encrypted and/or compressed format, where PFU program 116 may be first decrypted and/or decompressed when PFU program 116 is stored in memory on processor 100 or in system memory 112. However, as described further herein, PFU program 116 may be in the form of a bitstream that includes a series of logical ones (1) and zeros (0) that do not require decryption or compression. The BIOS108 then sends commands or instructions or the like to the PFU114 to locate the PFU114 itself and program the PFU114 itself using the copied PFU program 118. Once programmed, PFU114 is able to modify or enhance the operation of MC104 during the operation of processor 100.
In another case, after executing the BIOS108, the OS 120 is loaded into the processor 100 and installed on the processor 100, and during OS installation, the OS 120 performs substantially the same process by copying the PFU program 116, and then instructing the PFU114 to locate and program itself with a PFU program, such as PFU program 118. In yet another case, a program or application or the like performs a similar process in which the application contains PFU program 116 and the application instructs PFU114 to locate and program itself using the copied PGM information, such as PFU program 118. In another embodiment, PFU114 includes local memory (e.g., local memory 206 of fig. 2) for storing PFU program 118. In this case, the BIOS108, OS 120, or application performs a similar programming process, except that the PFU program 118 is stored in the local memory 206 of the PFU114, and the PFU114 accesses the PFU program 118 from its local memory for programming.
Figure 2 is a more detailed block diagram of a PFU114 implemented according to one embodiment of the present invention. PFU engine 202 is provided, wherein PFU engine 202 is programmed with PFU program 118 to modify and/or enhance operation of MC 104. A PFU programmer and controller 204 may be included in PFU114, where PFU programmer and controller 204 is used to manage and/or control the operation of PFU engine 202, including programming PFU engine 202. PFU programmer and controller 204 accesses the identified one or more PFU programs for programming PFU engine 202 and enables programming of at least one of the one or more PFU programs into PFU engine 202. PFU programmer and controller 204 is shown as a separate unit and may be contained within PFU engine 202 itself. In one embodiment, PFU114 does not include local memory 206, where in this case, system memory 112 may be used to store PFU program 118. Without local memory 206, BIOS108, OS 120, or an application sends a programming command identifying the location of PFU program 118 in system memory 112, and PFU programmer and controller 204 accesses PFU program 118 from system memory 112 and programs PFU engine 202.
In one embodiment, PFU engine 202 may be configured with sufficient resources to be programmed with multiple PFU programs, where PFU programmer and controller 204 programs each PFU program into PFU engine 202 and only activates or enables the appropriate PFU program associated with a particular process in execution or a particular operating mode of processor 100. As an example, PFU engine 202 may be initially programmed at POR and enabled for most operations of processor 100. A process (e.g., a program or application, etc.) may program PFU engine 202 with another PFU program for use if the process is active and executing. PFU programmer and controller 204 manages the operation of PFU engine 202 by activating only one of the PFU programs programmed into PFU engine 202 at a time. In a configuration without local memory, PFU engine 202 may be programmed with a limited number of PFU programs.
It should be appreciated that PFU engine 202 may be a limited resource that may load a limited number of PFU programs at any given time. PFU engine 202 may not have sufficient capacity to be programmed with the total number of PFU programs that may be active at any given time during operation of processor 100. In such a configuration, it may be difficult to switch programming of PFU engines 202 having different PFU programs for different modes over time, particularly where location information for one or more of the PFU programs within system memory 112 may no longer be valid or may be unavailable. Furthermore, PFU engine 202 may include sufficient resources to be programmed with only one large PFU program or two smaller PFU programs depending on their implementation.
In another embodiment, PFU114 includes local memory 206, wherein local memory 206 is used to store at least one PFU program used to program PFU engine 202. Local memory 206 may include Random Access Memory (RAM)208, where in this case PFU program 116 may be copied to RAM 208 and accessed by PFU programmer and controller 204 to program PFU engine 202. In one embodiment, RAM 208 may be of a size sufficient to store a plurality of PFU programs, shown as PGMA, PGMB, PGMC, etc. In response to the program command, PFU programmer and controller 204 accesses the identified one of the PFU programs to program PFU engine 202. As such, if PFU engine 202 does not have sufficient resources to hold all PFU programs that can be activated at any time, PFU programmer and controller 204 may reprogram PFU engine 202 from local memory 206 on the fly, in response to commands or in response to mode changes.
The local memory 206 may also include a Read Only Memory (ROM)210, where the ROM210 is used to store one or more standard or predetermined PFU programs shown as PGM1, PGM2, PGM3, and the like. In one embodiment, one of these predetermined PFU programs is designated as a default PFU program (e.g., PGM 1). During initial startup of the processor 100, instead of copying the PFU program 116 from the memory 110 (or in addition to copying the PFU program 116 from the memory 110), the BIOS108 or OS 120 instructs the PFU programmer and controller 204 to program the PFU engine 202 with the default PFU program (where included) and then activate the default PFU program of the PFU engine 202. Alternatively or additionally, the BIOS108, OS 120, or any application or process may identify any of the predetermined PFU programs stored within ROM210 to program the PFU engine 202.
To facilitate multiple PFU programs, a PFU configuration map 212 may be set, where the PFU configuration map 212 maps a particular operating mode of processor 100 with the corresponding PFU program set for that mode. The operation mode may include process identification information in case a particular process employs a corresponding PFU program. As shown, for example, a plurality of modes are identified as M1, M2, M3, M4, etc., associated with respective PFU programs PGMA, PGM1, PGM2, PGMB, etc., respectively. PFU programmer and controller 204 updates PFU configuration map 212 each time a PFU program is programmed into PFU engine 202. Based on the mapping set in PFU configuration map 212, PFU programmer and controller 204 identifies the active mode (or process) at any given time and activates the corresponding PFU program programmed into PFU engine 202 or otherwise programs PFU engine 202. Once the correct PFU program is loaded and/or activated, the operation of the MC104 is modified or enhanced accordingly using the PFU engine 202.
In this way, the PFU programmer and controller 204 may map each mode (or process) with a corresponding PFU program unless or until replaced by another mode. In response to each subsequent programming command or mode change, PFU programmer and controller 204 activates PFU engine 202 or otherwise programs PFU engine 202 with the identified predetermined PFU program from ROM210 or RAM 208 and then updates PFU configuration map 212 accordingly. In particular, PFU programmer and controller 204 consults PFU configuration map 212 and determines whether a PFU program associated with the respective mode has been loaded into PFU engine 202. If the PFU program associated with the respective mode has been loaded into PFU engine 202, PFU programmer and controller 204 deactivates the current PFU program (if present) and activates the next PFU program within PFU engine 202 for the active mode. If PFU engine 202 is not loading PFU programs appropriate for the new mode, PFU programmer and controller 204 accesses RAM 208 or ROM210 that stores the identified PFU programs and programs PFU engine 202 accordingly.
In one embodiment, PFU programmer and controller 204 identifies whether PFU engine 202 has sufficient space available to program the next PFU program without overwriting any PFU program currently loaded within PFU engine 202. If PFU engine 202 has the available space, the next PFU program is loaded into the available space. However, if PFU engine 202 does not have sufficient available space to load the next PFU program, PFU programmer and controller 204 uses an alternate strategy to override one or more PFU programs currently residing within PFU engine 202. The replacement policy may be a Least Recently Used (LRU) algorithm or the like, but may also take into account the amount of programmable space required by the PFU program being loaded. For example, a larger PFU program may be selected and overridden despite the higher frequency of recent use of the larger PFU program if the smaller least recently used PFU program does not provide sufficient space for the next PFU program to be loaded. In one embodiment, if a copy of any PFU program being overwritten within PFU engine 202 is not stored within ROM210 or RAM 208, and if RAM 208 has sufficient available memory space, PFU programmer and controller 204 may offload or copy the PFU program from PFU engine 202 into RAM 208 before overwriting the PFU program in PFU engine 202.
Although RAM 208 may store a considerable number of PFU programs, PFU programmer and controller 204 may take appropriate action in the event that RAM 208 is not large enough to store all PFU programs that are attempting to be downloaded at any given time. For example, if a process attempts to configure a PFU program that is not discovered or available, PFU programmer and controller 204 may simply disable operation of PFU engine 202 for that process. Alternatively, PFU programmer and controller 204 may load or otherwise activate a standard PFU program, such as default PFU program PGM1, as long as any other PFU program is not permanently overridden.
Figure 3 is a simplified block diagram of PFU programmer and controller 204 interfacing with PFU engine 202, implemented using programmable logic 301, according to one embodiment of the present invention. In the illustrated embodiment, programmable logic 301 is subdivided into a set of "P" substantially identical programmable sectors 303, shown as programmable sectors P1, P2, …, PP, respectively, where "P" is a positive integer. PFU programmer and controller 204 programs one or more PFU programs into programmable logic 301. In particular, PFU programmer and controller 204 allocates one or more of programmable sections 303 sufficient to program a PFU program, and then loads the PFU program into allocated sections 303 to implement the corresponding PFU functions within PFU engine 202. The PFU programmer and controller 204 maintains pointers or the like to identify and locate PFU programs loaded into the PFU engine 202, and activates or deactivates the loaded PFU programs based on the operating mode or active process.
The programmable logic 301 may be a relatively large resource, such as implemented by a Field Programmable Gate Array (FPGA) or the like, to program multiple PFU programs at once for each of multiple application processes. However, programmable logic 301 is a limited resource because the remaining unallocated segments 303 may not be sufficient to program a new PFU program to be programmed. In this case, PFU programmer and controller 204 copies the existing PFU program from programmable logic 301 into RAM 208, with no copy already in RAM 208, and sufficient space available in RAM 208, and then may program the allocated section 303 with the new PFU program. Any PFU program that has been programmed for a process may be invalidated and eventually overwritten within PFU engine 202 and/or RAM 208 in the event that the process has completed operations, caused the process to terminate, or in the event of a mode switch.
Each programmable section 303 may include programmable logic sufficient to perform a simple PFU program. As shown, for example, a first PFU program PGMA (relatively simple) is loaded into the first programmable section P1 to implement the first program PFUA, and a second PFU program PGMB (more complex) is loaded into the two programmable sections P2 and P3 to implement the second program PFUB. In addition, even more complex PFU programs may be loaded into more than two sections 303. Any number of PFU programs may be programmed into programmable logic 301 depending on the relative size and complexity of the PFU programs and the total number of programmable sections 303.
In one embodiment, PFU programmer and controller 204 performs dynamic allocation, wherein PFU programmer and controller 204 identifies the next sector 303 available for allocation and begins programming when scanning for a new PFU program. If the PFU program continues after the first allocation section 303 has been fully programmed such that an additional section 303 is needed to complete the programming, the additional section is dynamically allocated on-the-fly until the PFU program is fully programmed into the PFU engine 202. In an alternative embodiment, the PFU programmer and controller 204 first evaluates the size of the new PFU program and allocates an appropriate number of programmable segments 303 accordingly prior to programming. In another alternative embodiment, the PFU program may be configured to include a resource declaration (RSRC)903 or the like (fig. 9) for indicating the number of sections 303 (or at least the number and type of programmable elements) required by the PFU program. In this case, the PFU programmer and controller 204 retrieves the resource declaration 903, pre-allocates the indicated number of sections 303, and then programs the allocated sections using a PFU program.
Once a PFU program is programmed into programmable logic 301 for a given process and PFU configuration map 212 is updated accordingly, PFU programmer and controller 204 monitors or otherwise is provided with mode information and enables the corresponding PFU program to operate during that mode.
Figure 4 is a block diagram illustrating a method for initially programming a PFU114 according to one embodiment of the present invention. At POR, BIOS108 performs initialization processes and routines for hardware initialization to provide runtime services to OS 120 and programs or applications in block 302. Initialization includes, for example, initialization of memory 110 and system memory 112 for use by processor 100.
The next set of blocks 304, 306, and 308 may be performed by the BIOS108 or OS 120 depending on the implementation. In the next block 304, it is determined whether the PFU program 116 is located on the ROM210 with the ROM210 provided with the PFU 114. For example, the PFU program may be stored on the ROM210 (in the case of setup) as PGM1 (e.g., default PFU program, etc.). If the PFU program 116 is not located on the ROM210 or the ROM210 is not set, operation proceeds to block 306 where in block 306 the PFU program 116 is accessed on the memory 110 and the PFU program 116 is copied to the RAM 208 of the local memory 206 (in the case of a set-up) or to the system memory 112.
Following block 304 or 306, operation proceeds to block 308, where in block 308 a program command PGM < ADDR > is sent to PFU114 of MC104 to program PFU engine 202. The PGM command may be received by PFU programmer and controller 204, where PFU programmer and controller 204 uses the included address ADDR to locate PFU program 118. In embodiments where PFU program 118 is pre-stored on ROM210 within processor 100, ADDR identifies a location within ROM210, such as the location of PGM1 (or any other pre-stored PFU program within ROM 210), etc. In embodiments of RAM 208 that do not have PFU program 118 pre-stored and that have local memory 206 provided on processor 100, PFU program 116 may be copied to a location in RAM 208 where ADDR identifies the location of the copied PFU program. For example, ADDR may identify the location on RAM 208 of the copied PFU program 118 stored as PGMA or the like. In the case where local memory 206 is not provided, PFU program 116 is copied as PFU program 118 stored in system memory 112, and ADDR identifies the location of PFU program 118 in system memory 112.
Operation then proceeds to block 310, where in block 310 PFU programmer and controller 204 accesses the PFU program (e.g., PFU program 118 and/or PGM1 and/or PGMA) using the set ADDR and programs PFU engine 202 and enables PFU engine 202 accordingly. Then, the method of initial programming is completed. Once the PFU engine 202 is thus programmed and the programmed PFU engine 202 is enabled, the programmed PFU engine 202 modifies and/or enhances the operation of the MC104 according to the PFU program.
FIG. 5 is a simplified block diagram depicting an executable binary Application (APP)502 that may be used to program or otherwise reprogram PFU114, according to one embodiment of the invention. The binary APP502 includes a header 504 and a body 506. Binary APP502 is shown in generic form and may be implemented as a binary executable file (. EXE) file, bytecode file (. NET, Java, etc.) or any other type of executable code that may be successfully executed by any one or more of processing cores C0-C3 of processor 100. In the illustrated configuration, header 504 includes at least one PFU write instruction, where each write instruction is provided to specify or locate a respective PFU program that is available to encode PFU 114. As shown, for example, header 504 includes a PFU WRITE instruction WRITE _ PFU containing an operand (or parameter) PGMA to identify a corresponding PFU program PGMA _ PFU contained within header 504. Alternatively, the PFU program PGMA _ PFU may be provided in different sections of the binary APP 502. In any case, the operand PGMA may be an address or offset used to locate the binary APP502 and/or the PFU program PGMA _ PFU within the system memory 112. Although binary APP502 includes only one PFU write instruction to identify a corresponding PFU program, an executable binary application may include any number of PFU write instructions to load any number of PFU programs that may be loaded into processor 100 at any given time.
During operation, a processing core (e.g., C0) proceeds to access and/or load binary APP502 from memory 110 to system memory 112, and executes a WRITE _ PFU instruction. Assuming that RAM 208 of local memory 206 is present, the operand PGMA of the WRITE _ PFU instruction is used to locate the PFU program PGMA _ PFU within binary APP502 and WRITE the PFU program PGMA _ PFU into RAM 208. Alternatively, the PFU program PGMA _ PFU may be written to any other memory accessible to the PFU114 of the processor 100. Header 121 also includes a PFU program instruction PGM _ PFU with location (or address) operand LOC, where the PFU program instruction PGM _ PFU is forwarded to PFU programmer and controller 204 of PFU 114. LOC identifies the location within RAM 208 of PFU program PGMA _ PFU copied from binary APP 502. PFU programmer and controller 204 then programs PFU engine 202 with PFU program PGMA _ PFU from RAM 208.
In configurations where local memory 206 (or any other suitable memory) is not provided within processor 100, the WRITE _ PFU instruction may simply identify the location of PFU program PGMA _ PFU within binary APP502 without actually copying PFU program PGMA _ PFU into any local memory of processor 100. In this case, LOC is updated with the address of the PFU program PGMA _ PFU in the system memory 112. PFU program instruction PGM _ PFU is forwarded to PFU programmer and controller 204 of PFU114, where PFU programmer and controller 204 uses operand LOC to locate PFU program PGMA _ PFU in system memory 112 to program PFU engine 202.
In an alternative configuration, a single instruction or command may be used in the binary APP502, where the single instruction or command, if executed, is forwarded to the PFU programmer and controller 204. PFU programmer and controller 204 uses the included operands in the form of addresses or offsets, etc., to locate the PFU program PGMA _ PFU with which to directly program PFU engine 202. In any programming configuration, PFU programmer and controller 204 enables the PFU program PGMA _ PFU newly programmed into PFU engine 202.
System memory 112 (and/or other external memory) may include a plurality of applications that are loaded for execution by processor 100 over time. Multiple applications or processes may be loaded into any one or more of processing cores C1-C3, but in the illustrated embodiment each processing core typically executes only one process at a time. Embodiments are also contemplated in which each processing core executes multiple processes at once. Multiple applications may be assigned to one of the processing cores for execution. OS 120 includes a scheduler or the like for scheduling execution of applications of processor 100, including swapping in and out of processes of the plurality of processes for execution, one at a time, for a given processing core. Multiple applications may be executed by a given processing core, where each application may include one or more PFU programs for programming PFU 114. PFU programmer and controller 204 and local memory 206 and PFU configuration map 212 may be used to manage different processes corresponding to different processing modes of processor 100 to control the programming of PFU engine 202 over time.
Fig. 6 is a more detailed block diagram of programmable logic 301 of fig. 3 implemented according to an embodiment of the invention. Programmable logic 301 is shown to include an array of programmable elements including programmable Logic Elements (LEs) 601 shown configured in an XY matrix of logic elements 601, each of which is shown as LExy, where x and y represent the row and column designations of the array, respectively. Each row also includes at least one of an array of miscellaneous logic blocks 603, where the miscellaneous logic blocks 603 each include support logic to supplement the matrix of logic elements 601. The miscellaneous logic blocks 603 may, for example, include one OR more storage elements, one OR more registers, one OR more latches, one OR more multiplexers, one OR more adders to add OR subtract digital values, a set of boolean logic elements OR gates (e.g., logic gates such as OR gates, AND gates, inverters, exclusive-OR (XOR) gates, etc.), AND so forth. Miscellaneous logic blocks 603 may include one or more registers that may be configured as shift registers or data swizzlers (swizzlers), etc. for flexible data manipulation. The logic elements 601 and miscellaneous logic blocks 603 are coupled together with a routing grid that includes a matrix of programmable crossbars or interconnects 605. Each programmable interconnect 605 includes a plurality of switches to selectively connect the programmable devices together. The routing grid includes sufficient connectivity to connect together the logic elements 601 and the multiple devices in miscellaneous logic blocks 603 for simple processing operations and more complex processing operations.
As further described herein, each programmable section 303 includes one or more programmable elements (logic elements 601, logic block 603) and a respective routing grid (interconnector 605) for selectively connecting devices and elements together to implement a respective function of PFU114 for modifying the operation of MC 104. The routing grid is a switching matrix comprising a plurality of switches or the like to redirect inputs and outputs between the logic elements 601 and miscellaneous logic blocks 603.
Programmable logic 301 includes programmable memory 607, where the programmable memory 607 is used to receive PFU programs (e.g., one or more of PFU program 116, PFU program 118, PGMA, PGMB, PGMC, …, PGM1, PGM2, PGM3, etc.) to program selected ones of logic elements 601, respective miscellaneous logic blocks 603, and programmable interconnectors 605 to create respective PFU functions for modifying the operation of MC104 when activated or otherwise enabled. Programmable memory 607 may also include storage locations or registers or the like to receive input operands or values and to store output results of the PFU program. Programmable memory 607 is dispersed among the programmable segments 303 of programmable logic 301 and may be used individually or collectively by each programmable segment 303 in a selected allocated segment 303 performing a particular PFU operation. Programmable memory 607 may be configured as a dedicated memory space within programmable logic 301 or even within MC104 and not accessible externally. Memory 607 may be implemented in any suitable manner, such as a Static Random Access Memory (SRAM) or the like.
Fig. 7 is a schematic block diagram of a programmable logic element 601 implemented according to an embodiment of the invention. The logic element 601 includes a look-up table (LUT)701, three 2-input Multiplexers (MUXs) 705, 706, and 707, a 2-input adder 709, and a clock register (or latch) 711. A portion of programmable memory 607 is shown for programming logic elements 601, any included miscellaneous logic blocks 603, and a portion of one or more interconnectors 605. As explained above, the programmable memory 607 may be used to provide input values, store output results, and/or store intermediate values updated for each of multiple iterations of a processing operation.
As shown, memory 607 is programmed with a PFU program shown as PGM _ PFU. LUT 701 is shown as a 4X1LUT programmed with corresponding LUT Value (LV) bits in memory 607. MUXs 705, 706, and 707 each have a select input controlled by a corresponding memory bit (shown as memory bits M1, M2, and M3, respectively) stored by memory 607. The output of LUT 701, shown as LO, is provided to one input of MUX 705 and to an input of register 711, with the output of register 711 being provided to the other input of MUX 705. The output of MUX 705 is provided to one input of MUX 706 and one input of adder 709. The output of adder 709 is provided to another input of MUX 706, where the output of MUX 706 is provided to an input of programmable interconnect 605. Memory 607 includes a programmable bit V that is provided to one input of MUX 707, another input of MUX 707 is coupled to the output of programmable interconnect 605, and the output of MUX 707 is provided to another input of adder 709. The output of adder 709 is provided to another input of MUX 706. The memory 607 may also be used to program the interconnector 605 and the corresponding parts of any miscellaneous logic blocks 603.
The illustrated logic element 601 is merely exemplary, and alternate versions may be considered depending on the particular configuration. The logic elements 601 may be configured at a bit slice granularity level to handle a single bit of a data value. For data values comprising a plurality of bits, a plurality of bit slice logic elements are used. For example, for 64-bit data values, 64-bit slice logic elements are used in parallel.
In operation, memory 607 is programmed with the LUT data values (LV) of LUT 701, the select inputs M1-M3 of MUX 705-707, and the programmable data value V provided to the input of MUX 707. The four input values S0-S3 are provided from the memory 607, or from another programming block, from the operand of the instruction, to select the value of the 16 values programmed into the LUT 701, with the selected value provided at the output of the LUT 701 as the LO. MUX 705 is programmed to provide either the LO output of LUT 701 directly or to provide a registered version. The registered version may be used to insert delays for the purpose of timing of PFU operations. MUX 706 is programmed to provide the output of MUX 705 directly or to provide the output of adder 709 to be provided as an output or to be provided to another programming block to interconnect 605. Adder 709 adds the selected value, which is either the programmed value V or the output from the interconnector 605 (provided from another input or from another programming block), to the output of MUX 705.
Fig. 8 is a schematic diagram of a LUT 701 implemented according to an embodiment of the invention. A set of 2-input MUXs organized as a binary MUX tree is provided to select between 16 input values LV0 LV15 based on the select inputs S3: S0 (where S0 is the least significant bit). LV 0-LV 15 were programmed into memory 607 as previously described. Each adjacent pair of 16 input values LV 0-LV 15 (LV0 and LV1, LV2 and LV3, …, etc.) is provided to a corresponding input pair of eight 2-input MUXs 801, with each of these 2-input MUXs 801 receiving S0 at its select input. Each adjacent pair of 8 outputs of MUX 801 is provided to a corresponding pair of inputs of four 2-input MUXs 803, with these 2-input MUXs 803 each receiving S1 at its select input. Each adjacent pair of four outputs of MUX 803 is provided to a corresponding pair of inputs of two 2-input MUXs 805, where each of these 2-input MUXs 805 receives S2 at its select input. The output pair of the MUX805 is provided to the input pair of the output MUX 807, wherein the output MUX 807 receives S3 at its select input and provides the LUT output LO at its output. It should be understood that the configuration shown in fig. 8 is but one of many suitable LUT implementations as would be understood by one of ordinary skill in the art.
Fig. 9 is a simplified block diagram of a format of a PFU program 901 for programming PFU engine 202, where PFU program 901 may take the form of any of PFU programs 116, 118, PGMA, PGMB, PGMC, …, PGM1, PGM2, PGM3, and the like, according to an embodiment of the present invention. In this case, PFU program 901 may include a resource declaration (RSRC)903, where RSRC 903 is used to represent the amount of resources needed within programmable logic 301 to implement the PFU program. As an example, resource declaration 903 may represent the number P of programmable segments required to complete programming. PFU programmer and controller 204 may read resource declarations 903 during programming of PFU engine 202 to allocate a corresponding number of programmable segments 303. Although greater granularity may be used, such as by tracking the amount of individual logic elements 601, miscellaneous logic blocks 603, programmable interconnect 605, and/or programmable memory 607, this may require PFU programmers and controllers 204 to track individual elements of programmable logic 301 over time.
PFU program 901 may also include a series of logical ones (1) and zeros (0) referred to as a bitstream. In one embodiment, for example, in response to a programming instruction received by the processing core, PFU programmer and controller 204 arranges the programmable memories of the allocated sections of programmable section 303 (including programmable memory 607 and the corresponding programmable memories of interconnector 605) into a large serialized shift register, then shifts in the bitstream until a full load is made in each allocated section, then de-arranges the programmable memories and provides a pointer to locate and identify the programmed PFU. Alternative programming methods and formats, including parallel programming, may be used. In addition, the resource declaration may be placed at any suitable location, such as the beginning or end, where PFU programmer and controller 204 is to read to ensure proper programming.
Figure 10 is a simplified block diagram illustrating an example method for generating PFU program 116 for use in programming PFU engine 202 of PFU114 in accordance with one embodiment of the present invention. An application generator, such as a programmer or the like, writes PFU function descriptions 1002 in a selected format for describing or otherwise defining memory controller operations for modifying or enhancing the MC 104. PFU functional description 1002 may be otherwise referred to as a PFU definition. The PFU function description 1002 may be written in any suitable hardware programming language, such as LegUp, Catapult (Catapult technology corporation), Verilog, HDL (hardware description language), Register Control Logic (RCL), Register Transfer Logic (RTL), and the like. PFU functional descriptions 1002 are provided to respective PFU programming tools 1004, wherein the PFU programming tools 1004 are configured to transform the PFU functional descriptions 1002 into PFU programs 116 suitable for programming the PFU engine 202 to operate according to the PFU functional descriptions 1002. As an example, PFU programming tool 1004 may convert PFU functional description 1002 into a corresponding bit stream that may be used to program one or more of programmable segments 303 of programmable logic 301 of PFU engine 202.
Once the PFU program 116 is generated, the PFU program 116 may be stored on the memory 110 in an appropriate location for access by the BIOS108 or OS 120 to program the PFU114 according to any of the methods previously described. Optionally, PFU program 116 may be incorporated into an application, such as binary APP502, to be programmed by the application when executed.
Fig. 11 is a simplified block diagram illustrating an exemplary encryption process that may be programmed into PFU114 and performed by MC104 when storing data to system memory 112. A Move (MOV) instruction 1102 represents any type of store instruction that any core of the processor 100 executes to store a DATA value DATA (DATA) stored in a Register (REG)1103 to a specified address ADDR in the system memory 112. The PFU engine 202 of the PFU114 is programmed with KEY1104 and encryption algorithm 1106. KEY1104 is any binary or hexadecimal value that may be predetermined and stored within PFU program 116. The encryption algorithm 1106 is in accordance with any standard or custom encryption algorithm, such as the Data Encryption Standard (DES), RSA public key system, MD5 algorithm, Advanced Encryption Standard (AES), various hash algorithms, and the like.
In operation, the MC104, as modified by the PFU114, extracts the address ADDR from the MOV instruction 1102 and applies the address ADDR to an input of the encryption algorithm 1106. KEY1104 is applied to the other input and the encryption algorithm 1106 provides a corresponding PAD (PAD) value 1108 at its output. In other words, encryption algorithm 1106 essentially converts KEY1104 and ADDR into PAD value 1108. The DATA value from REG 1103 is applied to one input of a boolean logic function, such as an exclusive-or (XOR) operation 1110, the PAD value 1108 is applied to the other input, and the XOR operation 1110 performs the indicated boolean operation (e.g., XOR) and provides at its output a corresponding encrypted DATA value XDATA 1112. The MC104 stores the encrypted XDATA value 1112 instead of the original DATA value at the address ADDR of the system memory 112.
Fig. 12 is a simplified block diagram illustrating the reverse encryption process that may be programmed into the PFU114 and performed by the MC104 when loading data from the system memory 112. The reverse encryption process of fig. 12 is complementary to the encryption process of fig. 11, where both processes are stored together in PFU program 116 to implement a complete encryption process for storing and loading information with respect to system memory 112. Another MOV instruction 1202 represents any type of load instruction executed by any core of processor 100 in order to load or read a data value from an addressed location of system memory 112 into a specified register of processor 100, such as REG 1103.
An address ADDR is extracted from the load instruction 1202 and applied to one input of the inverse cipher algorithm 1206 (or the decryption algorithm), and KEY1104 is applied to the other input of the inverse cipher algorithm 1206, where the inverse cipher algorithm 1206 provides a corresponding PAD 1208 at its output. MOV instruction 1202 is also applied to system memory 112 to retrieve an encrypted XDATA value 1112. The encrypted XDATA values 1112 and PAD 1208 are applied to respective inputs of an XOR operation 1110, where the XOR operation 1110 outputs a corresponding decrypted DATA value DATA. The MC104 stores the DATA value instead of the retrieved XDATA value 1112 into REG 1103 as specified with MOV instruction 1202.
Assuming that the encryption algorithm 1106 and the reverse encryption algorithm 1206 are complementary, the decrypted DATA value retrieved when executing the MOV instruction 1202 is the same as the original DATA value originally stored in the REG 1103 prior to executing the MOV instruction 1202. As such, PFU114 modifies the operation of MC104 to encrypt data stored in system memory 112 and decrypt data retrieved from system memory 112. Note that for symmetric key encryption such as AES, the encryption algorithm 1106 and the reverse encryption algorithm 1206 are the same (i.e., are the same algorithm), such that only one encryption/decryption algorithm is required.
The previous description has been presented to enable any person skilled in the art to make and use the invention as provided in the context of a particular application and its requirements. Although the present invention has been described in considerable detail with reference to certain preferred versions thereof, other versions and variations are possible and contemplated. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments. For example, the circuitry described herein may be implemented in any suitable manner including logic devices or circuits, and the like. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention. Thus, the present invention is not intended to be limited to the particular embodiments shown and described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Cross Reference to Related Applications
This application is a continuation-in-part application of the following U.S. patent application, the entire contents of which are incorporated herein by reference for all purposes and purposes.
Figure BDA0001417519240000231
This application is related to the following U.S. patent applications, the entire contents of which are incorporated herein by reference for all purposes and uses.
Figure BDA0001417519240000232

Claims (16)

1. A processor, comprising:
a memory controller for interfacing an external memory;
a Programmable Functional Unit (PFU) programmed by a PFU program to modify operation of the memory controller, wherein the PFU comprises a plurality of programmable logic elements and a plurality of programmable interconnects; and
a programmable memory to receive the PFU program to program the selected programmable logic element;
wherein the programmable logic element comprises:
a lookup table programmed by respective lookup table value bits in the programmable memory and providing a plurality of input values by an operand of an instruction to select the respective lookup table value bits as output;
a register comprising an output and an input coupled to the output of the lookup table;
a first multiplexer including an output, a select input controlled by a corresponding memory bit stored in the programmable memory, a first input coupled to the output of the lookup table, and a second input coupled to the output of the register;
a second multiplexer including an output, a select input controlled by a corresponding memory bit stored in the programmable memory, a first input coupled to a programmable bit stored in the programmable memory, and a second input coupled to the programmable interconnect;
an adder including an output, a first input coupled to the output of the first multiplexer, and a second input coupled to the output of the second multiplexer; and
a third multiplexer including a select input controlled by a respective memory bit stored in the programmable memory, a first input coupled to an output of the first multiplexer, a second input coupled to an output of the adder, and an output coupled to the programmable interconnect.
2. The processor of claim 1, further comprising a PFU programmer to program the PFU using a PFU program stored in local memory.
3. The processor of claim 2, wherein the processor is responsive to a program command, wherein the program command is to cause the PFU programmer to program the PFU with a specified PFU program of a plurality of PFU programs stored in the local memory.
4. The processor of claim 1, further comprising a configuration map for mapping each of a plurality of different processing modes with a respective one of a plurality of PFU programs stored in local memory.
5. The processor of claim 1, wherein said plurality of programmable logic elements and said plurality of programmable interconnects are subdivided into a plurality of substantially identical programmable sections, wherein said processor further comprises a PFU programmer for allocating a plurality of said programmable sections and programming said allocated plurality of said programmable sections with said PFU program to program said PFU.
6. The processor of claim 1, wherein the PFU comprises the programmable memory and the PFU program comprises a bit stream scanned into the programmable memory of the PFU.
7. The processor of claim 1, wherein the PFU is programmed with a plurality of PFU programs, wherein the processor further comprises a PFU programmer to enable at least one of the plurality of PFU programs at a time during operation of the processor.
8. The processor of claim 1, wherein the PFU program programs the PFU to perform an encryption function for encrypting data stored in the external memory.
9. The processor of claim 8, wherein the cryptographic function includes an encryption process and a reverse encryption process that employs a predetermined key combined with an address to develop a pad value that is further combined with a data value.
10. A method for providing a programmable memory controller of a processor interfacing the processor with an external memory, the method comprising the steps of:
a PFU comprising a programmable functional unit, the PFU including a plurality of programmable logic elements and a plurality of programmable interconnects;
programming the PFU with a PFU program to modify operation of the programmable memory controller; and
setting a programmable memory to receive the PFU program to program the selected programmable logic element, comprising:
setting a lookup table, a register, a first multiplexer, a second multiplexer, an adder and a third multiplexer;
the lookup table is programmed by corresponding lookup table value bits in the programmable memory and a plurality of input values are provided by an operand of the instruction to select the corresponding lookup table value bits as output;
the register receives the output of the lookup table;
the first multiplexer receiving a select input controlled by a respective memory bit stored in the programmable memory and receiving an output of the lookup table as a first input and an output of the register as a second input;
the second multiplexer receiving a select input controlled by a respective memory bit stored in the programmable memory and receiving a programmable bit stored in the programmable memory as a first input and an output of the programmable interconnect as a second input;
the adder receives the output of the first multiplexer as a first input and the output of the second multiplexer as a second input; and
the third multiplexer receives a select input controlled by a respective memory bit stored in the programmable memory and receives the output of the first multiplexer as a first input, the output of the adder as a second input, and the output of the third multiplexer is provided to the programmable interconnect.
11. The method of claim 10, further comprising the steps of: providing a PFU programmer and a PFU engine within the PFU, wherein in the PFU programmer programs the PFU engine with the PFU program stored in local memory.
12. The method of claim 11, further comprising the steps of: executing a program command with the processor, wherein the program command is to instruct the PFU programmer to program the PFU engine with a PFU program stored in the local memory.
13. The method of claim 10, further comprising the steps of: setting a configuration map in the PFU, wherein the configuration map is used to map each processing mode of a plurality of different processing modes with a corresponding PFU program of a plurality of PFU programs stored in a local memory.
14. The method of claim 10, further comprising the steps of:
subdividing the plurality of programmable logic elements and the plurality of programmable interconnectors into a plurality of substantially identical programmable sections;
allocating a plurality of the programmable sections to configure the PFU according to the PFU program; and
programming the allocated plurality of the programmable segments with at least one PFU program.
15. The method of claim 11, further comprising the steps of:
setting the PFU to the programmable memory; and
programming the PFU includes: scanning at least one of the PFU programs as a bitstream into the programmable memory of the PFU engine.
16. The method of claim 10, further comprising the steps of: programming the PFU with a plurality of PFU programs; and enabling at least one of the plurality of PFU programs at a time during operation of the processor.
CN201710873051.9A 2016-10-28 2017-09-25 Processor having memory controller with dynamically programmable functional units Active CN107656880B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US15/337,169 US10268586B2 (en) 2015-12-08 2016-10-28 Processor with programmable prefetcher operable to generate at least one prefetch address based on load requests
US15/337,169 2016-10-28
US15/337,140 US10642617B2 (en) 2015-12-08 2016-10-28 Processor with an expandable instruction set architecture for dynamically configuring execution resources
US15/337,140 2016-10-28
US15/590,883 2017-05-09
US15/590,883 US11061853B2 (en) 2015-12-08 2017-05-09 Processor with memory controller including dynamically programmable functional unit

Publications (2)

Publication Number Publication Date
CN107656880A CN107656880A (en) 2018-02-02
CN107656880B true CN107656880B (en) 2020-12-15

Family

ID=61130952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710873051.9A Active CN107656880B (en) 2016-10-28 2017-09-25 Processor having memory controller with dynamically programmable functional units

Country Status (1)

Country Link
CN (1) CN107656880B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932125B (en) * 2018-07-19 2021-06-22 闫伟 Control method of programmable logic controller

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176752A (en) * 2012-07-02 2013-06-26 晶天电子(深圳)有限公司 Super-endurance solid-state drive with Endurance Translation Layer (ETL) and diversion of temp files for reduced Flash wear

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6076152A (en) * 1997-12-17 2000-06-13 Src Computers, Inc. Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem
US6118300A (en) * 1998-11-24 2000-09-12 Xilinx, Inc. Method for implementing large multiplexers with FPGA lookup tables
US7142557B2 (en) * 2001-12-03 2006-11-28 Xilinx, Inc. Programmable logic device for wireless local area network
US7533256B2 (en) * 2002-10-31 2009-05-12 Brocade Communications Systems, Inc. Method and apparatus for encryption of data on storage units using devices inside a storage area network fabric
JP2006519548A (en) * 2003-02-19 2006-08-24 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Electronic circuit with an array of programmable logic cells.
US7584345B2 (en) * 2003-10-30 2009-09-01 International Business Machines Corporation System for using FPGA technology with a microprocessor for reconfigurable, instruction level hardware acceleration
US20050257186A1 (en) * 2004-05-13 2005-11-17 Michael Zilbershlag Operation system for programmable hardware
US20070288909A1 (en) * 2006-06-07 2007-12-13 Hong Kong Applied Science and Technology Research Institute Company Limited Hardware JavaTM Bytecode Translator
US9195462B2 (en) * 2007-04-11 2015-11-24 Freescale Semiconductor, Inc. Techniques for tracing processes in a multi-threaded processor
CN101316177B (en) * 2007-05-29 2013-10-30 康佳集团股份有限公司 Computer and television function integrated IP video telephone
US9753695B2 (en) * 2012-09-04 2017-09-05 Analog Devices Global Datapath circuit for digital signal processors
CN103632726B (en) * 2013-01-31 2017-02-08 中国科学院电子学研究所 Data shift register circuit based on programmable basic logic unit
US10169618B2 (en) * 2014-06-20 2019-01-01 Cypress Semiconductor Corporation Encryption method for execute-in-place memories
US10496410B2 (en) * 2014-12-23 2019-12-03 Intel Corporation Instruction and logic for suppression of hardware prefetchers

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176752A (en) * 2012-07-02 2013-06-26 晶天电子(深圳)有限公司 Super-endurance solid-state drive with Endurance Translation Layer (ETL) and diversion of temp files for reduced Flash wear

Also Published As

Publication number Publication date
CN107656880A (en) 2018-02-02

Similar Documents

Publication Publication Date Title
US11016906B2 (en) GPU virtualisation
EP3179362B9 (en) Processor with an expandable instruction set architecture for dynamically configuring execution resources
CN108139977B (en) Processor with programmable prefetcher
CN106575275B (en) Mechanism for inter-processor interrupts in heterogeneous multiprocessor systems
JP2004056716A (en) Semiconductor device
TW201706856A (en) Central processing unit with enhanced instruction set
US20190199354A1 (en) Regional partial reconfiguration of a programmable device
CN107656880B (en) Processor having memory controller with dynamically programmable functional units
JP2009296195A (en) Encryption device using fpga with multiple cpu cores
US9503096B1 (en) Multiple-layer configuration storage for runtime reconfigurable systems
US11061853B2 (en) Processor with memory controller including dynamically programmable functional unit
KR102500357B1 (en) Memory load and arithmetic load unit (alu) fusing
JP2008040585A (en) Microcomputer
EP3316129A1 (en) Compiler system for a processor with an expandable instruction set architecture for dynamically configuring execution resources
US11422742B2 (en) Allocation of memory
WO2017118417A1 (en) Multiple-layer configuration storage for runtime reconfigurable systems
Compton Reconfiguration management
JP2011141888A (en) Single chip microcomputer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room 301, 2537 Jinke Road, Zhangjiang High Tech Park, Pudong New Area, Shanghai 201203

Patentee after: Shanghai Zhaoxin Semiconductor Co.,Ltd.

Address before: Room 301, 2537 Jinke Road, Zhangjiang High Tech Park, Pudong New Area, Shanghai 201203

Patentee before: VIA ALLIANCE SEMICONDUCTOR Co.,Ltd.

CP01 Change in the name or title of a patent holder