GB2461850A

GB2461850A - Memory management unit with address translation for a range defined by upper and lower limits

Info

Publication number: GB2461850A
Application number: GB0812664A
Authority: GB
Inventors: Alistair Guy Morfey; Derek William Henderson
Original assignee: Cambridge Consultants Ltd
Current assignee: Cambridge Consultants Ltd
Priority date: 2008-07-10
Filing date: 2008-07-10
Publication date: 2010-01-20
Also published as: GB0812664D0

Abstract

A data processing apparatus has a memory management unit which translates logical addresses requested by a processor to physical addresses in memory. The physical address is based on the location of the executable in memory. The executable may be compiled to execute at a fixed logical address which is not dependent on the physical address. The executable may be a contiguous block in memory. The memory management unit may translate only those logical addresses which fall within a particular address range. This range of addresses may correspond to the location of an executable. Other logical addresses may be used as the physical address without any translation. Different logical address ranges may be translated to different physical address ranges. A range of addresses may be defined using upper and lower address limits.

Description

Data Processing Apparatus, for example using vector pointers The present invention relates to a data processing apparatus. This invention also relates to a method of operating a data processing apparatus.

Computer processors or microprocessors are used today in a wide range of embedded systems. Mobile phones, Bluetooth headsets, electronic gas and electricity meters, pacemakers, anti-lock brakes, and engine management controllers are just a few examples of systems that employ embedded microprocessors. These systems typically contain all of their software in a non-volatile memory such as Flash memory. Such software is usually termed "firmware".

A microprocessor or processor in an embedded system is usually implemented as a semiconductor device on a printed circuit board, or similar interconnect. It has become common to implement microprocessors as part of a larger semiconductor device known as an ASIC (Application Specific Integrated Circuit), ASSP (Application Specific Standard Product), or SoC (System on Chip). It is also common to implement a microprocessor in the form of an FPGA (Field Programmable Gate Array). This is usually during the development and testing phase, termed emulation. In certain circumstances, in which there are low volumes, FPGA implementations may be used in the final product. It is also common to simulate microprocessors using software simulation tools during the development and testing of the hardware system, or software, of the microprocessor.

Types of software simulation include: instruction accurate simulations, cycle accurate simulations, and timing accurate simulations.

Embedded processor systems are designed to meet at least some of the following objectives: -low cost production; -low power consumption, which enables low cost packaging, long periods between battery replacement or battery recharge, and assists in reducing the overall power consumption requirements of the system in which the microprocessor is embedded; -facilitate field upgrades of firmware, which enables bug fixes, feature enhancements, and other system attributes to be delivered to systems in use in the field (field upgrades may be via wired or wireless links); and provide convenience to the programmer when performing software development and testing This invention aims to provide an improved processor.

According to a first aspect of the invention, there is provided a data processing apparatus adapted to operate under control of an executable, the apparatus comprising: a processor, means for addressing an executable stored in a memory; and a memory management unit (MMU) for translating a logical address requested by the processor to a physical address located within the memory, wherein the physical address is computed based on the location of the executable within the memory.

By providing a memory management unit for translating a logical address to a physical address, the physical address being defined based on the actual location of the executable (which may include code, initialisation data, constants and vectors) within the memory enables the executable to be executed by the processor without the need to alter the compiled executable's addressing, that is, without the need for a program loader to perform so-called "address fix-ups".

The term executable as used herein may refer to one or more of the following: code; instructions; a program; program code; data; a binary; and hardware circuitry adapted to execute code and/or instructions.

Preferably, the executable is compiled to be executed at fixed logical address locations which are not dependent on the physical address locations in the memory in which the executable is locatable.

Thus, the executable may be compiled and linked to execute at a fixed set of logical addresses, and then subsequently stored and executed from any physical location within the memory.

Preferably, the executable is in the form of a contiguous block including one or more of the following: code; initialisation data; constants; and vectors. More preferably, the code comprises a set of event handlers.

Preferably, the MMU is further adapted to translate a logical address relating to a data block to a physical address located within the memory. In this way the MMU is adapted to translate a data block. More preferably, the MMU is adapted to translate global variables.

Preferably, the MMU is adapted to translate only those logical addresses that fall within a particular address range.

Preferably, the MMU is adapted to translate a first range of logical addresses to a first range of physical addresses and second and subsequent groups of logical addresses to second and subsequent ranges of physical addresses.

In this way a first executable block may be translated to one location within a physical memory and a second executable block may be translated to another location within a physical memory. In an example, the MMU is used to translate both an executable block and a block of global variables.

Preferably, the MMU is adapted to translate only those logical addresses that fall within a particular address range and otherwise to set the physical address to the logical address.

Preferably, the apparatus further comprises means for defining at least one translation window around a particular range of logical addresses.

Preferably, the size of the window is based on the size of an executable block. In one embodiment this may be set by software. Furthermore, the size of the window may be set to accommodate the size of the largest expected executable.

Preferably, the window comprises upper and lower address limits.

Preferably, the upper address limit is less than or equal to a highest logical memory location in which the executable is logically stored in the memory.

Preferably, the lower address limit is greater than or equal to a lowest logical memory location in which the executable is logically stored in the memory.

Preferably, the address limits are computed at link time by a linker.

Preferably, the apparatus further comprises pairs of processor memory registers for storing the upper and lower logical address limits for the or each window.

Preferably, the registers are memory mapped registers.

Preferably, the MMU is adapted to translate a logical address to a physical address by adding an offset to the logical address, the offset being computed based on the physical location in the memory at which the executable is stored.

Thus, a block of code is translated by adding an offset to logical addresses which fall within a particular address (translation) window.

Preferably, the apparatus further comprises means for storing at least one offset.

Preferably, a plurality of processor memory registers are provided for storing the or each offset.

Preferably, the memory registers are memory mapped registers.

Preferably, the apparatus further comprises means for computing the or each offset.

Preferably, the computing means forms part of an operating system.

Preferably, the operating system is adapted to compute the offset at run time.

Preferably, the computing means comprises an executable event handler, which forms part of the executable.

Preferably, the computing means is adapted to compute the offset during the initialisation

of the executable.

Preferably, the offset is computed by an executable loader at the time the executable is stored at a particular location in the memory.

Preferably, the offset is set to zero upon at least one of the following processor events: start up; reset; and initialisation.

Preferably, the MMU is adapted to add the offset to the logical address using modular arithmetic thereby to enable address wraparound.

In this way the offset can access either a greater memory location or a smaller memory location. Thus, for example, an executable can be translated from higher locations in memory to lower locations in memory and vice versa.

Thus, in the case where the processor supports more efficient addressing of a particular portion of the memory address space (for example low memory) it is advantageous to translate the executable to that portion of the memory space.

Preferably, the processor is adapted to address at least one of the following types of memory: random access memory (RAM); read only memory (ROM); and Flash memory.

Preferably, the processor is adapted to address memory external to the data processing apparatus.

Preferably, the processor is adapted to enable the executable to be stored at any available location within the memory, and executed in place from that location.

Preferably, the data processing apparatus further comprises a memory location adapted to point (whether directly or indirectly) to an address of at least one event handler associated with the executable; and means for loading an address value relating to said at least one handler into the memory location.

This important aspect is also provided independently. According to another aspect of the invention, there is provided a data processing apparatus adapted to operate under control of an executable, the apparatus comprising a memory location adapted to point (whether directly or indirectly) to an address of at least one event handler associated with the executable; and means for loading an address value relating to said at least one handler into the memory location.

Preferably, the executable is in the form of a contiguous block including one or more of the following: code; initialisation data; constants; and vectors.

Preferably, the code comprises a set of event handlers.

Preferably, at least one of the event handlers is a start or re-start handler relating to the

executable.

Preferably, the executable is in the form of a firmware block.

Preferably, the processor is adapted to load automatically an address value to the memory location upon start up or re-start.

Preferably, the processor is adapted to load an address value from a predefined location in the memory (or vector pointer start location) to said memory location (or vector pointer).

Preferably, the processor is adapted to load the value stored at the top of the memory to said memory location.

Preferably, the processor is adapted to enable code (which may form part of the executable) to write to said memory location. In this way it is possible to use software to alter the address value.

Preferably, the processor is adapted to enable privileged code to write to the memory location.

Preferably, the apparatus is on start up or restart adapted to use the address value loaded in said memory location to execute an event handler.

Thus, the writing of a new address value to the vector pointer start location will result in the vector pointer pointing, via a different vector table, to a different set of event handlers associated with a different executable when the processor restarts. This is because the address value stored in the vector pointer start location is loaded into the vector pointer when the processor restarts which will result in the processor executing a hardware reset handler, associated with a new executable, thereby switching from one executable to another.

Preferably, the address value directly points to an address of an event handler.

Preferably, the address value points to a vector table which includes the addresses in the executable of a set of event handlers.

Preferably, the address value points to a base address of the vector table.

Preferably, the vector table includes a set of addresses (vectors) that point to event handlers relating to the executable stored in memory.

Preferably, the vector table and the executable form part of a single contiguous block.

Preferably, the vector table includes vectors that point to event handlers for handling one or more of the following events: interrupts; exceptions; resets; errors; service requests; and system calls.

Preferably, the processor is adapted to use the address value stored in the memory location to index a particular event handler.

Preferably, the processor is adapted to add an offset to the address stored in the memory location. In one embodiment the address value stored in said memory location is added to an offset. The result of this addition indexes a particular event handler, and processor execution then branches to this handler.

Preferably, the memory location comprises a processor memory register.

The memory location defines a vector pointer to the base address of a vector table which includes a set of addresses (vectors) to event handlers which the processor is adapted to execute upon events, such as interrupts, exceptions, resets, errors, service requests and system calls. Each executable includes its own vector table, and it is therefore possible to switch executables by altering the address value stored in the vector pointer, which will result in the vector pointer pointing to the base address of a different vector table associated with a different executable. It is thus possible to switch between executables either by loading a new value to the vector pointer, or by loading a new value into the vector pointer start location and then causing the processor to reset, which will result in the copying of the new value from the vector pointer start location to the vector pointer following the reset, thereby effecting the switch to a new executable. In this way, an executable block, for example a firmware block, can be more easily replaced.

The combination of the features relating to the vector pointer and vector pointer start location, which enable firmware to be updated by overwriting a single memory location, and the features relating to the window translation of a firmware block, is particularly advantageous. This is because this combination not only enables the replacement of a firmware block via an "atomic" operation (i.e. the overwriting of a single memory location) but also enables a firmware block to be stored at any location within a memory from where it can be directly executed without the need to perform address fix-ups. Thus, a firmware block can be more easily replaced, and then executed, from anywhere within physical memory, as it stands.

Preferably, the size of the register is related to the processor address space.

Preferably, selected bits within the register are set to zero. More preferably, the least significant bits within the register are set to zero.

Preferably, the processor has a 24-bit address space and a 24-bit register for storing a pointer to the address of the start up or re-start handler, and more preferably, bits [7:0] of the register are set to 0.

Alternatively, in another embodiment, the processor may have a 64-bit, 32-bit or 16-bit address space, According to another aspect of the invention, there is provided, a method of operating a data processing apparatus as herein described.

According to a further aspect of the invention, there is provided a method of operating a -10-data processing apparatus, the method comprising: loading an executable block into a particular location in a memory; and translating a logical address requested by the data processing apparatus to a physical address located within the memory, wherein the physical address is computed based on the location of the executable within the memory.

According to another aspect of the invention, there is provided a method of operating a data processing apparatus, the method comprising: loading an executable into a memory accessible by the data processing apparatus, the executable comprising at least one event handler associated with the executable; and loading an address value relating to the at least one handler into a predefined memory location.

Flash memory is cited herein as an example of a non-volatile memory that can hold firmware. Clearly any other type of writable non-volatile memory could be used, including EPROM, EEPROM, battery backed SRAM, mask-ROM and other forms of non-volatile memory.

SRAM memory is cited herein as an example of a volatile memory. Clearly any other type of volatile memory could be used, including DRAM, SDRAM, and other forms of non-volatile memory.

The term "memory" as used herein preferably includes any kind of data storage accessible by the processor, including, for example, processor registers, on-and off-chip cache memory, and main memory (as would typically be accessed by the processor via a memory bus). Unless the context otherwise requires, memory may be read-only or may be readable and writable. The term "memory location" preferably refers to a storage location of any appropriate size within such a memory.

The invention also provides a computer program and a computer program product comprising software code adapted, when executed on a data processing apparatus, to perform any of the methods described herein, including any or all of their component steps.

The invention also provides a computer program and a computer program product comprising software code which, when executed on a data processing apparatus, comprises any of the apparatus features described herein.

The invention also provides a computer program and a computer program product having an operating system which supports a computer program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

The invention also provides a computer readable medium having stored thereon the computer program as aforesaid.

The invention also provides a signal carrying the computer program as aforesaid, and a method of transmitting such a signal.

The invention extends to methods and/or apparatus substantially as herein described with reference to the accompanying drawings.

Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa.

Furthermore, features implemented in hardware may be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.

The following documents are hereby incorporated by reference: GB2294137, GB22941 38, GB0509738. 1, GB070691 8.0 and W09609583. Also incorporated by reference are the GB Patent Applications filed the same day and having the following -12 -Agent references: P32694, P32697 and P32698. Any feature in any of these documents may be combined with any feature described herein in any appropriate combination.

The invention will now be described, purely by way of example, with reference to the accompanying drawings, in which:-Figure 1 is a schematic of a processor implemented within an ASIC (application specific integrated circuit) semiconductor device; Figure 2 is a schematic block diagram of the hardware architecture of a processor core; Figure 3 is a schematic block diagram of the interconnection between the processor core, a memory management unit (MMU) and an interrupt vector controller (IVC); Figure 4 shows an example programmer's model for the processor; Figure 5 shows the high and low memory spaces accessible by the processor; Figure 6 shows a number of example processor memory models; Figure 7 is a schematic diagram of the memory addressable by the processor showing a vector pointer; Figures 8 to 10 are schematic diagrams of the memory addressable by the processor showing the operation of a translation window and a vector pointer in various configurations; Figure 11 shows various processor memory models using a translation window and vector pointer; and Figure 12 is a state diagram showing the various operating modes and states of the processor.

Overview Figure 1 shows a data processing apparatus 1 implemented in the form of an ASIC (application specific integrated circuit) semiconductor device, comprising a central processing unit or processor 10, for running user programs, connected via a data bus 12 to analogue circuitry 14 and digital circuitry 16, and also to random-access memory (RAM) 18 and read-only memory (ROM) 20.

-13 -Processor 10 may be, for example, one of the XAP range of processors as supplied by Cambridge Consultants Ltd, Cambridge, England, such as the XAP5, which is a 16-bit RISC processor with von Neumann architecture, the 16-bit XAP4 or the 32-bit XAP3.

Further details relating to these processors and their associated instruction sets can be found in GB 2427722, WO 2006/120470 and PCT/GB2007/001323, which are incorporated herein by reference. The processor 10 may be described to those skilled in the art as an IP-core using a hardware description language such as Verilog or VHDL in RTL (register transfer level) code.

The processor 10 further comprises a memory management unit 22 (MMU), for interlacing with the RAM 18, ROM 20, custom analogue circuits 14, and custom digital circuits 16; a serial interface (SIF) 24, to facilitate debugging and/or control of the processor 10; and an interrupt vector controller (IVC) 26, for handling interrupts (external asynchronous events), including both maskable interrupts and non-maskable interrupts.

Analogue circuitry 14 and digital circuitry 16 are custom built for specific applications; likewise, the MMU 22 and IVC 26 may be customised as required. The processor core 11 is intended to be fixed and unchanged from one ASIC to another. RAM 18 and ROM may comprise on-chip or off-chip memory.

Persons skilled in the art will appreciate that the data processing unit 1 may be implemented, for example, in a semiconductor material such as silicon, for example, CMOS, and be suitable for ASIC or FPGA applications.

Architecture Processor 10 uses a load-store architecture, as will be familiar to those skilled in the art of processor design.

Figure 2 shows a block diagram of the hardware architecture of the processor core 11.

Processor core 11 can be seen to comprise an arithmetic logic unit (ALU) 100, serial interlace unit (SIF) 24, interrupt and exception controller 27 and a bank of registers 120. -14-

The ALU 100 is connected to an address generator unit 160. The processor 10 is also provided with an instruction fetch unit 150 and an instruction decode unit 152. A plurality of data input and output lines are shown, both between the constituent units of the processor 10 and to/from the processor itself, as will be familiar to those skilled in the art.

For example, in Figure 2, the data lines and opcodes (op) use the following notation: Rd destination register Rs primary source register Rt secondary source register Rx index register Ra base address register Registers Processor core 11 includes a small amount of on-chip storage in the form of registers, which provides fast-accessible working memory for the processor.

Figure 2 shows some of the registers used by the processor 10. These comprise several different types according to function, including: * general purpose registers 130 -used for the normal operands for the majority of tasks of the instruction set * address registers 132 -used to hold memory addresses * special registers 134 -used, for example, to indicate processor status * breakpoint registers 136 -used in debugging As shown, one possible example comprises eight 16-bit general purpose registers 130 (R0-R7); four 24-bit address registers 132; and three 16-bit "special" registers 134 and four 24-bit breakpoint registers 136. Examples of address registers include the program counter register (PC), which normally points to the next instruction to be executed; vector pointer register (VP), for pointing to the base of a vector table, containing a set of pointers (vectors) to software interrupt/exception handlers; and at least one stack pointer (SP), for providing access to a stack. -15-

The example of the processor described herein generally relates to a XAP5 processor which includes a 24-bit wide address space. This processor is accordingly provided with 24-bit address registers, that is, the PC, VP and SP are all 24-bits wide. In another example, the processor might provide a 32-bit wide address space, and corresponding 32-bit wide address registers.

An "operational flags" or FLAGS special register may also be provided for storing various flags and fields, for example, "carry" (C), "mode" (M) and "interrupt" (I) flags. A "state" (S) field in the FLAGS register is also provided for recording the execution progress of an instruction, to allow interrupts and exceptions to be processed during execution of certain instructions, as explained in more detail below. Figure 4 illustrates the FLAGS register, showing the bit ranges of various fields, including, for example, the "state" field S at bit positions [11:7]. Modes

The processor 10 is adapted to operate in a number of different modes. The modes comprise a User mode, for running user code, and three privileged modes: Trusted, Supervisor, and Interrupt. The privileged modes allow system or services code to be run at a privileged access level, allowing enhanced access to the processor operations, whereas user code runs at a lower access level wherein certain processor operations are restricted.

User mode allows non-privileged code to be run safely without affecting the operation of the Privileged code running in Trusted, Supervisor or Interrupt modes. User mode requests operating system services with system call instructions that transfer operation to the Trusted mode.

The processor 10 can also operate in one of two states: * Recovery state -which uses short, fast handlers to recover from errors that occurred in Supervisor or Interrupt mode * Non-maskable interrupt (NMI) state -which uses short, fast handlers to recover -16 -from hardware errors The transfer of the processor 10 from one mode to another is governed by the requirement for code to be run with a predetermined level of access, and also according to the servicing of interrupts.

A detailed modes and states diagram for the processor is shown in Figure 12.

Stacks Temporary working storage for the processor 10 is provided by means of conventional stacks, as will be familiar to those skilled in the art, which are last-in first-out (LIFO) data structures, wherein register values may be pushed' onto, and subsequently popped' off, the stacks. A stack pointer register (SP) is used to refer to locations in the stack, preferably the most recent item pushed onto the stack. In one example, the processor 10 is provided with two stack pointer registers, one (SP1) which points to Stacki, used when the processor is in a User or Trusted mode, the other (SPO) which points to StackO, used when the processor is in a Supervisor or Interrupt mode. Additionally, in this embodiment, StackO is also used when the processor is in a recovery state used to handle errors arising in Supervisor or Interrupt mode, and in a non-maskable interrupt state for handling non-maskable interrupts (e.g. hardware errors).

Memory management unit (MMU) and interrupt vector controller (IVC) Figure 3 shows the interconnection between the processor core 11, the MMU 22 and the IVC 26 in more detail. As can be seen from Figure 3, the processor core 11 may be packaged together with the IVC and the MMU, which are implemented as part of an ASIC or FPGA.

The MMU 22 is used to interface with various memories connected to the processor 10, for example, RAM, ROM, Flash and 10 registers. The IVC 26 is used to prioritise interrupt sources and provide interrupt numbers and priority levels for the interrupts to the processor core 11. -17-

Programmer's model Further details relating to the various processor registers referred to above are shown in Figure 4, which shows an example programmer's model for a XAP5 processor. Figure 4 provides details of the widths of the various registers and details relating to their calling syntax in assembler code.

Memory architecture In one example, the processor 10 is provided with a 16 MB address space, which is divided into two areas, one termed Low Memory, and the other termed High Memory.

The address breakdown of the memory space into Low and High memory is shown in Figure 5.

Near and Far memory modes As mentioned above, the processor 10 is able to address 16 MB (that is 224 bytes) of linear byte-addressable memory space. Executables, or programs, which may include code, constants, initialisation data and vectors, as well as other data, may reside in this memory space.

As described in more detail in PCT/GB2007/001323 programs for execution on the processor are compiled using a customised compiler, based on the GCC (GNU Compiler Collection) which provides two compile options, one termed Near, and the other termed Far. These options allow the program to be optimised for different application types.

The processor is able to switch between these memory modes under program control and with zero time overhead. The differences between the address space for use in Near and Far modes are indicated in Table 1.

Near Mode Far Mode Maximum Code Size 16MB 16MB Maximum Data Size 64kB 16MB Size of Code Pointer 24 bits 24 bits Size of Data Pointer 16 bits 24 bits Table 1: Near and Far modes The current operating mode of the processor 10 is indicated by the F bit (Far) which is the least significant bit of the program counter (PC). To change operating mode, the program writes to this bit using one of the processor instructions.

Some of the differences between the Near and Far memory modes are as follows:- -Certain 16-bit instructions are interpreted differently in dependence on the mode.

The stack is used differently, in Near mode it is limited (by software) to 16 bits, and in Far mode the full 24 bits are used.

-The compiler generates different instructions for certain constructs.

-Whilst the code address space uses all 24 bits in Near mode, function pointers are implemented as 16 bit addresses which point to a table of 24-bit addresses, known as a function table, which is stored in high memory.

-Different addressing types are used for constants, in Far mode the complier can select between using PC-relative addressing or zero-relative addressing, whereas in Near mode zero-relative addressing is used.

Near mode Near mode is suitable for applications requiring a larger memory space for code (instructions), but which only require 64 kilobytes for data. Data pointers are 16 bits wide in Near mode, and PC[0] is set to zero.

Far mode Far mode is suitable for larger applications. The full 16 MB address space is available for both code and data in Far mode. PC[0] is set to one when the processor is in Far mode.

Figure 6 shows an example memory model for a XAP4 16-bit processor, a XAP3 32-bit processor and example memory models for a XAP5 processor in both Near and Far modes.

Generating addresses When executing program code, the processor 10 generates addresses or pointers.

These are either zero-relative Or PC-relative.

The processor 10 usually interfaces with the following three types of (16-bit) memory mapped hardware: * Flash memory, or other non-volatile memory such as ROM or OTP, used for the program, including vectors, constants, initialisation data and code.

* RAM memory, used for variables, including local variables on the stack and heap and global variables.

* 10 registers, used for hardware interfaces to 10 devices.

There are two types of load and store instructions that can be used to read from and write to memory: * Indexed addressing, which uses two registers to form an address (Rx, Ra). Ra can be any of {RO-R7, SP}, and is able to access all the memory from 0 to OxF FFFFF.

* Displacement addressing, which uses a single register to form an address (offset, Ra), wherein Ra can be any of {RO-R7, SP, PC, 0), and is able to access all the memory from 0 to OxFFFFFF.

As mentioned above, there are also two memory regions: * Low memory = 0 to OxFFFF.

* High memory = OxOl 0000 to OxFFFFFF.

-20 -The compiler used for generating program code is accordingly adapted to adopt different address generation strategies as shown Table 2.

Type Likely Near Mode Far Mode Device Memory Relative Instruction Memory Relative Instruction Range to? Range to? Vector Flash All Zero I All Zero / Init Data Flash All Zero I All Zero / Code Flash High Zero mov.32.i High PC mov.32.i (Window aliases to function

table in High

Memory) Constant Flash Low Zero ld.i or (mov.i All Zero ld.i or (Window then ld.r) mov.32.i aliases to then (ld.fi or High ld.fr) Memory) Global RAM Low Zero ld.i or (mov.i All Zero ld.i or Variable then ld.r) mov.32.i then (ld.fi or ld.fr) Stack RAM Low SP ld.i All SP ld.i ID 10 Low Zero ld.i Low Zero ld.i Register Table 2: Compiler address generation strategies The logical addresses generated by the GCC compiler are shown in Table 3.

-21 -Section Near Mode Far Mode Vectors Zero-relative Zero-relative (Function table entries accessed through window) Code PC-relative PC-relative Constants Zero-relative Zero-relative Global Zero-relative Zero-relative variables & heap Stack SP-relative SP-relative (initial value formed zero-(initial value formed zero-relative) relative) registers Zero-relative Zero-relative Table 3: logical addresses generated by the compiler In both modes, global variables and the heap (stored in RAM) have their position defined at link-time, and are accessed with zero-relative addressing. 10 registers are also accessed with zero-relative addressing, since their addresses are constant for a given hardware design. Vectors are accessed relative to VP by the hardware, and contain the absolute addresses of the targets.

In Near mode, since 16-bit addresses are used for data pointers, a translation window is used to redirect accesses in low memory up to the zero-relative position of the constants and function table entries in high-memory. Flash memory is usually located in High memory. This will be described in further detail below.

An example instruction set suitable for the processor 10 is provided in Appendix A. The instruction set shown in Appendix A is for a XAP5 processor. The instruction set notation is provided together with the instruction listing in Appendix A. -22 -Further details relating to the vector pointer register (VP), the start value for VP and memory translation windows are now described in detail.

Vector Pointer and the vector pointer start location The vector pointer is a programmer visible register within a memory location within the processor core 11 that holds the address of the start of a vector table, that is, the vector pointer is loaded with an address value that points to the base of a vector table, which contains a table of addresses of the event handlers relating to a particular program. The vector pointer is usually only visible to privileged mode code, user-mode code can usually not access the vector pointer register.

The vector pointer (VP) is one of the processor core 11 address registers 132, shown in Figure 2. The vector pointer is a 24-bit address register, as shown in Figure 4 corresponding to the 24-bit address space of the processor 10. In the case of a processor having a 32-bit address space, the vector pointer is a 32-bit address register, and in the case of a processor having a 16-bit address space, the vector pointer is a 16-bit address register.

In one example, the executable or program which is to be executed by the processor 10 is loaded into the memory in the form of a single contiguous block, comprising the vectors, constants, initialisation data, and code. In this case, the vector pointer points to the base of the program block, that is, the vector table is located at the base of the program block. In an example, the program is in the form of a firmware block.

It should be noted that the load address of a firmware block is the address of the location in the memory that holds the first byte of that firmware block. The remainder of the firmware block is in the form of a contiguous set of memory locations starting at the load address.

In a processor with a 16-bit address space, the vector pointer would be 16-bits wide, and -23 -in a processor with a 32-bit address space, the vector pointer would 32-bits wide.

The least significant byte of the vector pointer VP [7:0] is set to zero (Ob00000000), and the address value that points to the vector table is loaded into VP[23:8]. This restricts the vector table to be 256-byte aligned, and reduces the number of register bits that actually need to be stored. This also means that address generation for reading an entry from the vector table does not require an adder, since the vector table is restricted to be no larger than 256 bytes. It will be appreciated that this alignment restriction has no significant cost implication in modern embedded systems in which memory sizes are typically measured in megabytes.

The vector table contains the addresses of the event handlers for each type of exception and interrupt. In an alternative example, each entry in the vector table would contain a branch instruction capable of branching to a respective event handler.

During a hard reset sequence or a soft reset sequence, the processor is adapted to automatically load the vector pointer from a predefined location in memory called the vector pointer start location (VP_start location). The address of the base of the vector table is accordingly preloaded into the vector pointer start location in memory.

Once the vector pointer has been loaded the processor 10 uses the vector pointer to load the entry in the vector table which provides the address of the appropriate hard or soft reset handler. This handler is then executed by the processor 10. In the event that any further interrupts or exceptions occur, the processor 10 will use the vector pointer to load the address of the appropriate handler from the vector table.

In one example, the VP_start location occupies the last 4 bytes at the top of memory, which starts at OxFFFFFC for a 24-bit address space processor.

The processor 10 is provided with instructions for loading the vector pointer register from other registers. The processor is further adapted to load the vector pointer register from -24 -an immediate constant in an instruction. This allows the VP value to be changed by a program, which provides flexibility. It is also possible to write an immediate to a general purpose register which can then be moved or copied to the vector pointer register.

Certain system implementations using this processor 10 will overwrite the value in the VP_start location, which will cause the processor to begin executing new firmware upon reset, with the firmware executing in place in Flash memory. Other system implementations will use the value stored in the VP_start location to reset to a boot loader that copies firmware to SRAM, and then changes the vector pointer to point to a vector table in SRAM that is associated with new firmware.

The fact that software is able to change the vector pointer value is also useful during design validation and production testing.

Figure 7 shows the memory addressable by the processor from both a logical (or virtual) point of view and a physical point of view. The vector pointer is also shown schematically in Figure 7, and the VP_start location, containing a VP_start value is shown at address OxFFFFFC. As shown the vector pointer register (referred to as VP reg in Figure 7) is loaded with the VP_start value. The processor accordingly points to the base address of a vector table upon processor start up, or re-start. As described above, this causes the processor to execute the firmware.

As can be seen in Figure 7 the firmware block is a contiguous program block including a vector table, constants, initialisation data and code.

If the VP_start value stored in the VP_start location is overwritten, the VP register will be loaded with a new address after reset, pointing to the base address of a different vector table associated with another firmware block. When the processor re-starts, it will accordingly execute the new firmware block. Thus, an atomic switchover between firmware blocks is provided.

-

The vector table

An example vector table for a XAP5 microprocessor with 24-bit or 32-bit addressing is shown in Table 4. Each entry contains the address of an event handler.

When using a processor with 32-bit addresses, each entry is 4 bytes, or 32-bits, and contains a 32-bit address.

When using a processor with 24-bit addresses, each entry is 4 bytes, or 32-bits, and contains a 24-bit address. Address[31:24] is written with value 0.

When using a processor with 16-bit addresses, each entry is 2 bytes, or 16-bits, and contains a 16-bit address.

The vector table contains zero-relative addresses. These are the start addresses of the relevant interrupt and/or exception (IE) handlers. The vector table manages interrupts and exceptions. It occupies 256 bytes from the vector pointer and to VP�OxFF. The base addresses are: * Exception base = VP * Interrupt base = VP + 0x80 In another embodiment, the vector table could also contain VP relative addresses.

Each table entry is able to store a 32-bit zero-relative event vector. This means that the lE handler can be located at any location addressable by the 24-bit processor 10. When an IE event occurs, the processor 10 performs the following operations: * performs a 32-bit read from (VP + offset) where the offset is related to the particular event (Event Offset), thereby obtaining the address of the particular event handler in the vector table * loads the program counter with event handler address, that is, the processor branches to the address containing the particular event handler -26-VP-Event Event Source Destina Dest PC to Vector Relative No Type Model -tion P(3:OJ Stack Event State Model Offset ______ _______ _______ State ________ _______ ______________________ OxOO ExcOO Reseived, to avoid ________ ______ ________ ________ ________ ________ _______ Nul/Pointer error if VP = 0.

0x04 ExcOl Reset UTSIRN S 0 / HardReset 0x08 ExcO2 Reset TSIAN S 0 / SoftReset OxOC ExcO3 Error UT S ErrPval next InstructionError_S OxlO ExcO4 Error UT S ErrPval next NullPointer_S 0x14 ExcO5 Error UT S ErrPval next DivideByZero_S 0x18 ExcO6 Error UT S ErrPval next Unknownlnstruction_S OxiC ExcO7 Error UT S ErrPval next AlignError_S 0x20 ExcO8 Error UT S ErrPval next MMUDataError_S 0x24 ExcO9 Error UT S ErrPval current MMUProgError_S 0x28 ExclO Error U S ErrPval next MMUUserDataError_S Ox2C Excil Error U S ErrPval current MMUUserProgError_S 0x30 Excl2 Service UT T _______ next SysCaIlO_T 0x34 Excl3 Service UT T _______ next SysCaIIl_T 0x38 Excl4 Service UT T = next SysCaII2_T Ox3C Excl5 Service UT T _______ next SysCaII3_T 0x40 Excl6 Service U T _______ next SingleStep_T 0x44 Excl7 Service U T next or Break_T __________ _______ _________ __________ __________ _________ current ____________________________ 0x48 Excl8 Error U S ErrPval next Privlnstruction_S Ox4C Excl9 Error SI R ErrPval next InstructionErrorR 0x50 Exc2O Error SI R ErrPval next NullPointer_R 0x54 Exc2l Error SI R ErrPvat next DivideByZero_R 0x58 Exc22 Error SI R ErrPval next Unknown Instruction_R Ox5C Exc23 Error SI R ErrPval next AlignError_R 0x60 Exc24 Error SI A ErrPval next MMUDataErrorR 0x64 Exc25 Error SI R ErrPval current MMUProgError_R 0x68 Exc26 Service SI = = next ________________________ Ox6C Exc27 Service SI = ________ next _______________________ 0x70 Exc28 Service SI = ________ next SysCallO_SI 0x74 Exc29 Service SI = _______ next SysCalli_SI 0x78 Exc3O Service SI = ________ next Sy!CaII2_SI Ox7C Exc3l Service SI = _______ next SysCaII3_SI 0x80 -IntOO NMI UTSIR N irq_priori next or IntOO -IntO3 = Ox8C -ty[3:0] current Non Maskable Interrupt _________ IntO3 ________ ________ ________ ________ _______ ________________________ 0x90 -IntO4 MI UTSI I irq_priori next or IntO4 -Int3l = Maskable OxFC -ty[3:0] current Interrupt Int3l Table 4: Example vector table -27 -

In Table 4:

U = User mode T = Trusted mode S = Supervisor mode I = Interrupt mode R = Recovery state N = NMI state NMI = Non-maskable interrupt Ml = Maskable interrupt 93:0] = Interrupt priority level In the XAP5 processor example, bit zero of each vector is used to set whether the event handler uses Near or Far mode, that is, bit [0] 0 uses Near mode and bit [0] = 1 uses Far mode.

Address translation window As shown in Figure 3, the processor 10 is connected to an MMU 22 which is adapted to translate logical addresses (logical_address[23:0]) requested or operated on by the processor 10 to different physical addresses (address[23:0]) in actual physical memory.

Thus the MMU enables a program block to be linked to a fixed logical address, and then physically stored and executed from anywhere in the memory space. Thus, a new firmware block can be stored in the next free memory space provided in, say, a Flash memory, and executed directly from this memory space. This is particularly advantageous in a case where it is required to update firmware in the field.

The address translation window aliases a single contiguous block of memory from its logical address fixed in the program to its actual location in physical memory. In this way the processor executes instruction as though they were located at the aliased logical address when they are in fact located at a different location in physical memory. The position and size of the memory areas that are aliased, and the location to which they are aliased, is under software control.

-28 -The location to which the program is translated is controlled by an offset value, and the control of which logical addresses are translated is controlled by defining a particular address range or window.

In particular, the translation window is configured by using the following three registers: * Window Low = WLO[23:8] * Window High = WHO[23:8] * Window Offset = WO0[23:8] In one example, these registers are all set to 256-byte resolution. They all set to zero when the processor 10 starts up or restarts. These registers are memory mapped registers in one example.

Logical addresses which fall within the window are translated; logical addresses which do not fall within the window are not translated.

The window translates logical addresses to physical addresses in accordance with the following logic:-if ((LogicalAddress >= WLO) && (LogicalAddress <= WHO)) PhysicalAddress = LogicalAddress + WOO; else PhysicalAddress = LogicalAddress; The above logic is implemented in hardware circuitry within the MMU using three 16-bit adders connected in parallel. The time delay resulting from this translation operation is dependent on the time delay associated with one of the adders, which is about 1.7ns in O.l8um CMOS. The addition is computed using 24-bit modular arithmetic for Address[23:0], so that addresses wrap from OxFFFFFF to Ox000000. This address wrapping enables the offset to move in either direction, either increasing or decreasing -29 -the translation within the physical address space.

When WOO is set to zero the window is considered to be turned off. This means that the window is initially disabled following a hard or soft reset. The logical address passed to the three adders is gated (forced to zero) to reduce power consumption. In this case: PhysicalAddress = LogicalAddress The values to be stored in the WLO and WHO registers, and hence the size of the translation window, can be calculated automatically by a program linker at link time. The value to be stored in the WOO register is set by the program loader, based on the actual location in the physical memory at which the program is stored. The value stored in the WOO register can also be set by the operating system at run time.

It should be noted that the processor itself, and any debugging tools, always use logical addresses, and the memory devices (RAM, ROM, Flash, and 10 registers) have program stored at fixed physical addresses. All address translation is done in the MMU. (In terms of the physical partitioning, the logical address is output from processor core 11 to the MMU 22.) If the program run by the processor 10 is a single executable, the vector pointer should point to the bottom of the translation window, that is, have the same value as WLO. In general, the vector pointer is set by the program loader.

With reference now to Figures 8 to 10, certain possible uses of translation window, in combination with the vector pointer and vector pointer start location, are now described.

The address translation window can be used in the following ways: a) To translate constants from High memory to Low memory, which is useful when the processor operates in Near mode. (This is shown most clearly in Figure 11.) b) To translate a whole program block, including vectors, constants, initialisation -30 -data, and code, from any physical address in memory to a fixed logical address.

The program is linked based on this fixed logical address. This enables different versions of a program to be stored anywhere in the Flash memory and executed directly from that location. This effectively provides a Poor Man's MMU'. As can be seen in Figure 8, a program block or firmware block (referred to as Prog 0) can be relocated from either Flash or RAM to a fixed logical location within the memory. The firmware is linked to the fixed logical address defined by the WLO and WHO registers (referred to as WL reg and WH reg in Figure 8). Only the WOO register needs to be changed depending on the location of the firmware in the physical memory. This translation process can also be used to translate a program, including code, initialisation data, constants and vectors from High to Low memory or Near mode.

It will be appreciated that the firmware block could similarly be relocated to another location in Flash. Figure 8 also demonstrates how a particular firmware block might be replaced, that is, how one version of a program is replaced by another. As can be seen in Figure 8, Prog 0 can be replaced by Prog 1 simply be changing the value stored in the WOO register. In this case, the firmware is also linked to a fixed logical address defined by the WLO and WHO registers.

In the case where the data processing apparatus 1 is employed in an embedded system using an operating system which handles several different applications, the processor 10 and MMU 22 may be configured to operate as shown in Figure 9. In particular, * One fixed copy of the operating system is loaded in Flash, which includes the vectors for interrupts and exceptions and the associated handler software. As shown in Figure 9, the VP register points to the base of the operating system program block (referred to as Op Sys in Figure 9). The operating system is linked to this memory location and the translation window is not used to translate any addresses which fall within the operating system program block. The operating system is run in Supervisor mode.

* All applications handled by the operating system are linked to execute from a -31 -fixed window position as shown in Figure 9. The applications execute in User mode. The operating system sets up the WLO and WHO registers. The operating system also sets up the WOO register and any required memory protection. The operating system then switches between the applications (referred to as App Prog 1 and App Prog 0 in Figure 9) by using a task scheduler, and altering the value stored in the WOO register. The applications do not include any vectors (apart from the bra.m function table in Near mode).

It should be noted that as an alternative the vector pointer VP[23:8] may also be fed from the processor 10 to the MMU 22, which can also be used instead of the WLO register to define the bottom of the program block.

As shown in Figure 11, the translation window, vector pointer and vector pointer start value can be used to configure various memory maps depending on whether the processor operates in Near or Far mode. In particular, the memory map should be configured as follows: * Fixed on-chip devices (RAM, 10 regs) from address Ox000000 upwards. These can be accessed with zero-relative addressing. The translation window is not generally required to translate addresses for these devices.

* Large off-chip memories (Flash, RAM, ROM) from address OxFFFFFF downwards. These can be translated down to a zero-relative address with the translation window. Large off-chip Flash memories can be used as a file-store (like a disk). The translation window enables such applications to be executed directly from anywhere in this Flash file-store. The physical location does not need to be selected at application link time.

Second address translation window As shown in Figure 10, the MMU 22 is further adapted to provide a second similar address translation window. This is referred to herein as translation window 1. This translation window is similarly controlled by three registers as described above.

-32 -In particular, translation window 1 is configured using the following three registers: Window Low = WL1[23:8] Window High = WH1[23:8J These registers are also memory mapped registers, which can be set to 256-byte resolution. They all reset to zero.

As above, the window translates logical addresses to physical addresses in accordance with the following logic:-if ((LogicalAddress >= WL1) && (LogicalAddress <= WH1)) PhysicalAddress = LogicalAddress + WO1; else PhysicalAddress = LogicalAddress; Whilst translation window 0 is most often used to relocate a firmware block typically located in Flash, as discussed above, translation window 1 is usually used to relocate the global variables in RAM. This provides a multitasking scheme where a task scheduler is used to perform context switches between a number of processes or application programs. By using the second translation window to relocate the global variables, each task can be compiled independently using a standard logical address range for the global variables, and executed in place without the need for any form of address "fixups". It is also possible to use the second translation window to run multiple instances of one application. In this situation only one copy of the program needs to be provided in memory, shared by each running instance, with each instance having its own version of its own global variables, stack and heap.

As shown in Figure 10, the vector pointer is used to initiate the execution of the operating system, as described above. The operating system is then responsible for starting the other tasks. The translation windows are then used to translate both the program block (firmware) and global variables of the currently running task. -33 -

All applications are linked to execute from the fixed logical address defined by the first window (referred to as WindowO in Figure 10). The global variables are translated using the second translation window (referred to as Windowl in Figure 10), that is, the application will expect the global variable to be at a fixed logical address translated by the second window. The operating system sets up the WLO, WHO, WL1 and WH1 registers.

The operating system then sets up WOO and memory protection as required before calling the application. The applications themselves do not include any vectors. The operating system then sets up WOl to point to the desired region of RAM for the global variables for each particular instance of the program application.

In addition to global variables, embedded applications written in C usually store data on the stack and in the heap. The heap is the area of memory that holds objects allocated using the "malloc" command or similar facilities. The data held by each task on its stack, or in the heap, does not generally need to be translated, as this data is accessed through a stack pointer, or pointer variable, and so there is no added cost or inefficiency for each task to be allocated a different area of SRAM for stack and heap data, when the task is started by the operating system.

Initialisation Sequence When the processor is first started up or resets the translation windows are disabled since the window offset registers will be reset to zero.

The vector pointer is then loaded with the VP_start value, from the VP_start location, which is the base address of a vector table of an installed firmware block. The processor then executes the hard reset vector handler as described above.

During the early part of the execution of the reset handler, the handler configures the required translation window(s) by writing to the various window registers. The reset handler can then branch to code in the translated aliased copy of the firmware to continue the start-up sequence using logical addresses. The reset handler does not -34 -access global variables prior to configuring the translation window(s) and branching to the aliased copy of the firmware.

It is important that the aliased copy of the firmware does not overlap the physical address range used by the reset handler, otherwise the reset handler will stop being accessible part way through its own execution when the overlapping translation window is enabled by writing to the window offset register. However, since the reset handler is very small relative to the available address space on even a 16-bit microprocessor, this constraint does not impose any significant limitation. Typically, this constraint is met by specifying that the aliased copy of the firmware block produced by the translation window should not overlap the physical address range containing the firmware block.

Discussion of certain advantages of various embodiments The combination of the vector pointer, the vector pointer start value, and the translation window or windows, provides many unexpected advantages.

Flexibility of load address of firmware block The combination of the vector pointer, VP_start value, and the translation windows enables firmware blocks to be located anywhere within the available memory. The firmware block can be loaded, without the need for address fix-ups or other changes, starting at almost any address in Flash memory. The firmware block may also be loaded into SRAM, during software development, or copied from Flash to SRAM before use, to achieve higher execution speed. This is because SRAM can be clocked faster than Flash.

To reduce implementation costs and power consumption, the start address is required to be 256-byte aligned (which means that the bottom 8 bits of the start address are set to zero). This alignment results in a gap of up to 255 bytes from an adjacent code block in the memory. However, given that modern Flash memories range in sizes from many megabytes to a few gigabytes, this gap is insignificant.

-35 -The flexibility of load address may be beneficial when delivering a new firmware block to existing systems in the field. For example, a first device might be supplied with version 1 of the firmware, and a second device might be provided at a later date with version 2 of the firmware, which has grown in size, as is common with software development. There is no reason to load version 1 firmware onto the second device, as it will never be used.

The first device might subsequently be upgraded, in the field, to use the version 2 firmware.

At a later date it might be appropriate to deliver a field upgrade in the form of version 3 firmware to both devices. Version 3 firmware might again have grown in size compared to version 2. It is usually desirable to deliver only one version of firmware 3 to both the first and second devices. It is also usually not possible to delete firmware version 2 until version 3 has been successfully loaded and verified (via a signature such as a checksum or CRC).

Using the vector pointer, vector pointer start values and translation windows it is readily possible to load the firmware version 3 into different physical addresses in the memories of each of the devices, without the need for address fix-ups, and then execute-in-place directly from any location in the physical memory in which the new firmware is loaded.

Atomic switch facilitates safe firmware upgrades During field upgrades of firmware it is desirable to minimise the chance of an embedded system being left in a non-operational state due to a mishap occurring during the upgrade process. Examples of such mishaps might include a loss of power to the system during the upgrade, a failure in a network connection (or other communication interface) that is delivering the upgrade, system corruption due to an alpha particle strike on a memory, cache, register or similar storage node, a static electricity discharge disrupting the system, or a user pressing a reset button at the wrong moment. Such mishaps can lead to a non-operational system if they result in the system stopping part way through the writing of data into Flash which is critical to the restart of the embedded system.. -36-

It is desirable to load completely a new firmware block into Flash memory, and then verify its correctness using a computed signature such as a checksum or CRC. During this process the system continues to execute a current version of the firmware block.

Once the signature has been verified as correct a system will update key initialisation data so as to enable a switch-over to the new firmware version when the system next resets. The switch-over is the most vulnerable period during the upgrade process. Thus, if the switch-over requires several values to be written to Flash memory, it is possible that the system might be left in an inconsistent state should a mishap occur during this process. For example, if the switch-over requires the system to write several entries into an exception vector table, and a mishap occurred during this process, the table would be left corrupted, and leave the system in a non-operational state.

In the embodiments of the invention as herein described, switch-over is performed by providing a new VP_start value in the VP_start location. This is, or is very near to, an atomic operation, and hence minimises the time window during which a mishap might occur. In the case of a XAP5 processor this involves writing two words to the VP_start location using a single 32-bit atomic instruction, and in the case of the XAP3 or XAP4 this would involve writing a single word to the VP_start location.

Advantages of execute in place operation The translation windows enable the processor 10 to execute software in the firmware block directly from Flash memory, with the firmware located at any load address (subject to the load address restrictions discussed earlier). This enables Execute In Place (EIP) operation, which facilitates lower system costs, and lower system power consumption.

The alternative to Execute In Place is to copy the firmware into SRAM and execute the software from the SRAM. This adds cost to the system because it requires additional SRAM to hold this copy of the firmware. The copy operation also consumes energy and takes time. As leakage current is significant in advanced semiconductor devices, it is common to power down embedded systems whenever possible, in order to conserve -37 -energy.

It is often necessary to periodically power up an embedded system to perform a brief action such as to transmit a network "keep alive" message, or to power up in response to an I/O device indicating that an action is required. Often the action must be performed within a tight response time. Execute In Place provides a considerable advantage in such a system because the action can be performed almost immediately upon power up.

Having to copy the firmware into SRAM on each power up consumes energy, and might introduce an unacceptable delay before the action is performed.

Added convenience during software development Embodiments of the invention provide added convenience to the programmer and tester during software development and testing because the firmware can be compiled and linked in the same way, to the same virtual load address, regardless of whether the code will be loaded into SRAM or Flash memory, and regardless of the number of previous versions which have already been released, or installed onto any one device. This means that only a single firmware block needs be produced, which can then be loaded into any address location in SRAM or Flash.

Furthermore, even when the firmware block will be stored in and executed from Flash memory in the final system, it is advantageous to be able to store the firmware block in the SRAM during software development and testing. Many versions of firmware are developed and tested in rapid succession during debugging, and it is often desirable to speed up the debugging process by running the firmware from SRAM, since this is generally faster than running it from Flash memory.

Comparison to DPVM Demand paged virtual memory (DPVM) has a number of significant disadvantages in embedded systems: 1. The hardware costs to implement DPVM are significant as such systems must cache page table entries in a translation lookaside buffer (TLB), and store -38 -page tables in memory.

2. Accessing the TLB on every memory access increases power consumption considerably.

3. DPVM is complex for the operating system programmer.

4. DPVM introduces uncertainty to the to the execution time of software because of the difficulty in predicting when TLB misses will occur, with the associated time penalty of additional clock cycles to access page tables.

Benefits of no fix-ups Many embedded systems use a loader that performs address fix-ups which modify a delivered firmware block to insert the correct addresses for branch targets within the code, and for global variables, depending on the system and specific addresses for each of these.

It is desirable to avoid address fix-ups for a number of reasons.

Firstly, the embedded system must store two versions of the firmware, at least temporarily: the originally delivered firmware block, and the fixed up firmware block.

Secondly, the embedded system must include additional code to perform the fix up.

Thirdly, modifying the firmware block prevents checking for corruption of the firmware block by comparison with a pre-computed signature such as a checksum or CRC, which is a desirable automatic system integrity check.

Finally, information provided in the firmware block to facilitate fix-ups reveals information about the internal structure of the software, which may assist third parties in reverse engineering the software, which might be undesirable in certain circumstances.

In summary, various embodiments of the invention provide at least some of the following advantages, in particular with respect to upgrades, efficiency and multitasking:- -39- -Allows a complete replacement of the firmware to be delivered in a single contiguous block containing exception and interrupt vectors, constant data, initialization data for global variables, and code.

-Allows the firmware block to be loaded at any address in memory (subject only to a 256 byte alignment restriction for the start of the block). During software development the firmware block may be loaded into RAM for convenience of speed of loading, or in Flash. In the field, firmware blocks will usually be loaded into Flash memory.

-The delivered firmware block does not need to be modified in any way before it can be executed -there are no address "fixups" or any other modifications of the firmware block required to facilitate the different load addresses.

Allows the code to be executed directly from Flash memory, which is the expected use case. It also allows the option of copying the firmware block unmodified to SRAM and executing it from the S RAM.

-Allows the code to use the same SRAM locations for global variables, the stack, and other SRAM resident data structures, regardless of the memory locations into which the firmware block is loaded.

-Facilitates a safe firmware upgrade operation by having the contents of a single word of memory control which firmware block to use at reset.

-An extension of this feature allows multi-tasking of several separately delivered firmware blocks by a task scheduler.

-Facilitates a low power implementation since the processor can be maintained most of the time in an off state and can be restarted or powered up and able to execute immediately firmware directly from Flash memory without the overhead of having first to copy the firmware block to RAM.

It will be understood that the present invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.

Each feature disclosed in the description, and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination.

Appendix A: Example Instruction Set for XAP5 Processor Notation Conventions In the following sections the following conventions are used to describe various aspects of the instruction set.

* Registers are prefixed by %.

* Rd is the destination register of the instruction * Rs, and Rt are the primary and secondary registers containing source operands 1 0 * Ra, and Rx are registers containing a base address and index for an operand. An offset may also be added to the register contents to form the actual address of the operand. Ra is a byte address for the base. Rx is a scaled address for the index (*4 for 32-bit, *2 for 16-bit, *1 for 8-bit data). Both Ra and Rx are unsigned numbers.

* lmmediate[] indicates a constant operand encoded as part of the instruction. In the documentation 1 5 #immediate[highbit:lowbit]s or #immediateLhighbit:lowbit]u is used to distinguish between signed and unsigned immediates. Immediates are prefixed by #. Note, the # is not used for address offsets (e.g. @(O, %rl)) * Address[J indicates an absolute address. This is a byte-address, unless specified otherwise.

Addresses are prefixed by @. In Near mode, Absolute addresses are interpreted as 16-bit unsigned numbers. In Far mode, Absolute addresses are interpreted as 24-bit unsigned numbers.

* Base addresses stored in Ra should be interpreted as byte addresses. Index offsets stored in Rx should be multiplied by (1, 2, 4) respectively for (8, 16, 32) bit data to get the desired byte address offset (16 bits, 17 bits, 18 bits). In Near mode, Ra is a single 16-bit register. In Far mode, Ra is a 32-bit register pair.

* Offset[] indicates an address offset. This may be relative to the program counter (in branch instructions, for example), or to a pointer read from a register (the stack pointer, for example) or to O (i.e absolute addressing). Offsets can be signed or unsigned numbers (indicated by appended s or u). Offsets are byte-address values, unless specified otherwise. In the documentation offset[highbit:lowbit]s or offset[highbit:lowbit]u is used to distinguish between signed, unsigned and index offsets. In Near mode, an offset should be extended to a 16-bit word before calculating an address. In Far mode, an offset should be extended to a 24-bit word before calculating an address. A signed offset should be sign extended. An unsigned offset should be zero extended.

4emory 16 MB, byte addressing, little-endian. ow memory Ox000000 to OxOOFFFF \vailable for Near and ll 16 MB available for programs. ______________________________________ ar mode data accesses.

eset starts execution by reading VP -ugh memory = OxOl0000 to OxFFFFFF vailable for Far mode tart value from OxFFFFFC, then data accesses.

_________ ranching to HardReset handler.

egisters 16-bit O-R7, FLAGS, INFO, BRKE 4-bit C, SPO, SPI, VP, BRKO-BRK3 Iags 16-bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

PPPPSSSSSIMMVCNZ

321043210 10 * 41 -Flags are not updated when Rd SP. PC, SF, VP imply all 24-bits of each register.

P is SPO in Supervisor mode, Interrupt mode, Recovery state and NMI state, SF1 in User mode and Trusted mode.

d, Rs, Rt, Ra, Rx imply all 16-bits of each register. Rd Destination. Rs, Rt = Source. Ra Address. Rx = Index.

ad = Destination Address. Ras = Source Address, Rn = Number of bytes.

ossible register values are R0-R7 unless stated otherwise.

The figures for number of cycles assume aligned memory accesses. Add one cycle per unaligned access.

xlI figures for number of cycles assume 16-bit single cycle memory is used.

This example of the XAP5 processor has 190 instructions.

Further details relating to the instruction formats and addressing are provided following the instruction listing.

16-bit 32-bit 48-bit iflStr instr instr Unconditional ________________ _______________________________________________ ______ )ra.i label, 0) C = label. Zero-relative. ----4 label, PC) C = label. PC-relative. ---2 3 - _____________ 0, Ra) C = {R(a�1), Ra [23:01. Ra-relative. ---2 - ra.i.2 label, PC) C = label. PC-relative. ---2 - ra.i.4 label, PC) C = label. PC-relative. ----3 - )ra.m a)(label, 0) C = *(label) ----6 - ____________ a)(0, Ra) C = *(Ra) ______ 5 -sr.i label, 0) tack = nextPC; PC = label. Zero-relative. ----6 label, PC) tack = nextPC; PC = label. PC-relative. ---4 5 - _____________ 0, Ra) tack = nextPC; PC = (R(a+1), Ra}[23:0]. Ra-relative. ---4 - )sr.m a(label, 0) tack = nextPC; PC *(label) ----7 - ____________ a(0, Ra) tack = nextPC; PC = *(Ra) ______ 6 -Conditional ________________ _________________________________________________ ______ cc (= bge.u) (label, PC) f (C = 0) PC = label; else PC = next_instruction; ---1/2 2/3 - )cs (= blt.u) label, PC) f (C = 1) PC = label; else PC = next_instruction; ---1/2 2/3 - eg label, PC) f (Z 1) PC label; else PC = next_instruction; ---1/2 2/3 - )ez.r s, (label, PC) lags set for (cmp.i Rs, #0); TNCV 1 / 2 2 / 3 - _____________ __________________ f (Rs = 0) PC = label; else PC = next_instruction; ______ )ge.s (label, PC) f (N = V) PC = label; else PC = next_instruction; ---1/2 2/3 - ge.u (= bcc) (label, PC) f (C =0) PC label; else PC = next_instruction; ---1/2 2/3 - gt.s (label,PC) f((N=V)&&(Z=0))PC=label;elsePC= ---1/2 2/3 - _______________ _____________________ lext_instruction; _______ )gt.u (label,PC) f((C=0)&&(Z==0))PC=label;elsePC= ---1/2 2/3 - _______________ ____________________ iext_instruction; _______ le.s (label, PC) f ((N!= V) I I (Z = 1)) PC = label; else PC = ---1 / 2 2 / 3 - ______________ __________________ text_instruction; _______ le.u (label, PC) f ((C = 1)11 (Z = 1)) PC = label; else PC = ---1 / 2 2 / 3 - _______________ _____________________ lext_instruction; _______ lt.s (label, PC) f (N!= V) PC = label; else PC = next_instruction; ---1/2 2/3 - lt.u (= bcs) (label, PC) f (C 1) PC label; else PC = next_instruction; ---1/2 2/3 - _____________ (label, PC) I (N =1) PC = label; else PC = next_instruction; ----2/3 - )ne (label, PC) I (Z 0) PC label; else PC = next_instruction; ---1/2 2/3 - nz.r (s, (label, PC) lags set for (cmp.i Rs, #0); ZNCV 1 / 2 2 / 3 - _____________ _________________ I (Rs!= 0) PC = label; else PC = next_instruction; ______ -42 - _____________ 1(label, PC) f (N = 0) PC =label; else PC = next_instruction; 1----2/3 - vc j(label. PC) f (V 0) PC = label; else PC = next_instruction; }----2/3 - vs 1(label, PC) f (V 1) PC label; else PC = next_instruction; f----2/3 -Jote: J'he number of cycles = a / b. a = branch not taken. B = branch taken.

oad (displacement) _________________________________________________ ______ d.8z.i d, @(offset, Ra) d = (uintl6) (* (int8*) (offset + Ra)) iN--2 3 4 ____________ ________________ R0-R7,_PC,_SF,_0 ______ d.i d, @(offset, Ra) d * (intl6*) (offset + Ra) TN--2 3 4 ____________ ________________ R0-R7,_PC,_SP,_0 ______ d.32.i d, @(offset, Ra) (R(d+1), Rd} = * (irlt32*) (offset + Ra) TN--3 4 5 ____________ ________________ =_R0-R7,_PC,_SP,_0 ______ Jote: )ffset range depends on Ra: R0-R7=[15:0]s, PC=[23:0]s, SP=[23:0]u, 0=[23:0]u -oad Far (displacement) ____________________________________________________ ______ d.8z.fi d, @(offset, Ra) d = (uintl6)(*(int8*)(offset[23:0]s + {R(a+1),Ra}[23:0])) TN--2 3 4 = R0-R7 ______ d.fi d, @(offset, Ra) d = * (intl6*) ( offset[23:0]s + (R(a+1), Ra}[23:0]) TN--2 3 4 _____________ _________________ R0-R7 ______ d.32.fi d, @(offset, Ra) {R(d+1),Rd} = *(int32*)(offsetl23:0]s + {R(ai-1),Ra)[23:0]) iN--3 4 5 _____________ _________________ R0-R7 ______ oad (indexed) _________________ _________________________________________________ ______ d.8z.r d, @(Rx, Ra) d = (uintl6) (* (int8*) (Rx + Ra)) 2 3 - _____________ _________________ R0-R7,_SP ______ d.r d, @(Rx, Ra) d * (intl6*) (2*Rx --Ra) iN--2 3 - ____________ ________________ =_R0-R7,_SP ______ d.32.r d, @(Rx, Ra) {R(d+1), Rd} = * (int32*) (4*Rx + Ra) N--3 4 - ____________ ________________ =_R0-R7,_SP ______ oad Far (indexed) _______________________________________________ ______ d.8z.fr d, @(Rx, Ra) d (uintl6) (* (int8*) ( Rx + {R(a+1), Ra} [23:0])) N--2 3 - = R0-R7 ______ d.fr d, @(Rx, Ra) d * (intl6*) (2*Rx + {R(a+1), Ra)[23:0]) iN--2 3 - = R0-R7 ______ d.32.fr d, @(Rx, Ra) {R(d+1), Rd} = * (int32*) (4*Rx + {R(a+1), Ra}[23:0]) iN--3 4 - = R0-R7 ______ Store (displacement) ____________________________________________________ _______ t.8.i s, @(offset, Ra) (int8*) (offset + Ra) = Rs[7:0] ---2 3 4 = R0-R7, #0, #1, #OxFF ____________ ________________ R0-R7,_PC,_SP,_0 ______ Li s, @(offset, Ra) (intl6*) (offset + Ra) = Rs ---2 3 4 = R0-R7, #0, #1, #OxFFFF = R0-R7, PC, SF, 0 ______ t.32.i s, @(offset, Ra) (int32*) (offset + Ra) = {R(s+1), Rs} ---3 4 5 = R0-R7, #0, #1 = R0-R7, PC, SF, 0 ______ tote: )ffset range depends on Ra: RO-R7=[15:0]s, PC=(23:OJs, SP=[23:0]u, 0=[23:OJu tore Far (displacement) -43 -t.8.fi s, @(offset, Ra) (int8*) (offsetl23:0]s + IR(a+1),Ra}[23:0]) = Rs[7:0] ---2 3 4 R0-R7, #0, #1, #OxFF _____________ _________________ R0-R7 ______ t.fi.s, @(offset, Ra) (iritl6*) (offset[23:0]s + {R(a+1),Ra}[23:0j) = Rs ---2 3 4 = R0-R7, #0, #1, #OxFFFF = R0-R7 ______ t.32.fi s, @(offset, Ra) (int32*) (offset[23:0]s + {R(a+1),Ra}[23:0J) = {R(s+1),Rs} ---3 4 5 = R0-R7, #0, #1 _____________ _________________ R0-R7 ______ Store (indexed) __________________ ____________________________________________________ _______ t.8.r s, @(Rx, Ra) * (int8*) (Rx + Ra) = Rs[7:0] ---2 3 -R0-R7, #0, #1, #OxFF _____________ _________________ R0-R7,_SP ______ t.r s, @(Rx, Ra) * (intl6*) (2*Rx + Ra) = Rs ---2 3 - = R0-R7, #0, #1, #OxFFFF ____________ ________________ =_R0-R7,_SP ______ t.32.r s, @(Rx, Ra) * (int32*) (4*P + Ra) = {R(s+1), Rs} ---3 4 - = R0-R7, #0, #1 _____________ _________________ =_R0-R7,_SP ______ Store Far (indexed) _________________________________________________ ______ t.8.fr s, @(Rx, Ra) * (int8*) (Rx + {R(a+1), Ra}[23:0] ) = Rs[7:0] ---2 3 - = R0-R7, #0, #1, #OxFF = R0-R7 ______ t.fr s, @(Rx, Ra) * (intl6*) (2*Rx + {R(a+1), Ra}123:0J) Rs ---2 3 - = R0-R7, #0, #1, #OxFFFF = R0-R7 ______ t.32.fr s, @(Rx, Ra) * (int32*) (4* + {R(a+1), Ra}[23:0] ) = {R(s+1), Rs} ---3 4 - = R0-R7, #0, #1 = R0-R7 ______ Swap ______________ _________________________________________ _____ wap.i d, @(0, Ra) wap Register with low memory: *Ra <-> Rd TN---4 - wap.fi d, @(0, Ra) wap Register with high or low memory: N---4 - _____________ __________________ *{R(a+1), Ra}[23:0] <-> Rd _______ flITUY )USh egList, #offset ush registers in RegList to stack. ---n+1 n+2 - ush.i mmList, #0 ush up to four inimediates to stack. ---nfl n+2 - op egList, #offset op registers in RegList from stack. ---nfl n+2 - op.ret egList, #offset op registers in RegList and PC from stack (i.e return). NCV n+5 n+6 -vote: egList example (push): (%r6-%r3, %rl, %rO} mmList examples (push.i): {#1, #2, #3J {#0x8000J egList example (pop, pop.ret) : (%rO, %rl, %r3-%r6} = number of 16-bit words pushed/popped. * * *

nov.i d, #imrn d = #imm[15:0] iN--1 2 - _____________ d, (label, 0) d = label. Zero-relative. TN--1 2 - nov.r d, Rs d = Rs iN--1 2 - nov.32.i (d, #imm R(d+1), Rd) = #imm[23:0] TN--1 2 - _____________ d, (label, 0) {R(d+1), Rd} = label. Zero-relative. TN--I 2 - -44 -d, (label, PC) {R(d+1), Rd} = data label. PC-relative. N---2 3 _____________ d, !(label, PC) I R(d-*-1), Rd} = code label. PC-relative. iN---2 3 nov.32.r d, Rs IR(d+1), Rd} = (R(s+1), Rs} TN--1 2 - = R0-R7, SP _____________ _________________ R0-R7,_SP ______ irnv.32s.r d, Rs (d+1) = Rs[15], Rd = Rs iN--I - nov.32z.r d, Rs (d+1) =0, Rd Rs ZN--I -

---

dd _______________ ___________________________________________ _____ idd.i JRd, Rs, #imm d = Rs + #imm[15:0]s ZNCV 1 2 - idd.r Jd, Rs, Rt d = Rs + Rt ZNCV 1 -- idd.c.i jd, Rs, #irnm d = Rs + #imm[15:0]s + C ZNCV 1 2 - idd.c.r jd, Rs, Rt d Rs + Rt + C ZNCV 1 --idd.32.i d, Rs, #1mm ** R(d+1), Rd} = {R(s+1), Rs} + #imm[15:0]s ZNCV 1/2* 2/3* idd.32c.i d, Rs, #1mm ** R(d+1), Rd} {R(s+1), Rs} + #imm[15:0]s + C ZNCV -3 -Subtract ___________________ ______________________________________________________ _______ ub.r d, Rs, Rt d = Rs -Rt NCV 1 -- ub.c.r d, Rs, Rt d = Rs -Rt -C]NCV 1 -- ub.x.i d, Rs, #imm d = #imm[15:0]s -Rs JNCV -2 - sub.xc.i d, Rs, #imm d = #imm[15:0]s -Rs -C JNCV -2 - ub.32x.i d, Rs, #imm ** R(d+1), Rd} = #imm[15:0]s -{R(s+1), Rs) JNCV -3 - ub.32xc.i d, Rs, #1mm ** R(d-f-1), Rd} #imm[15:0]s -{R(s+1), Rs} -C NCV -3 -ogica1 ___________________ ______________________________________________________ _______ ind.i d, Rs, #imm d = Rs & #imm[15:0]u _____ 1 2 - rnd.r d, Rs, Rt d = Rs & Rt ZN--1 -- )r.i d, Rs, #imm d = Rs #imm[15:0]u ZN--1 2 - )r.r d, Rs, Rt = Rs I Rt ZN--1 -- cor.i d, Rs, #1mm d = Rs" #imm[15:0]u ZN--1 2 - cor.r d, Rs, Rt d = Rs Rt ZN--1 -- 4ultipIy ________________ _______________________________________________ ______ nult.i d, Rs, #1mm d = (intl6*) (Rs * #imm[15:OIs) ZN---4 - nult.r d, Rs, Rt d = (intl6*) (Rs * Rt) ZN---4 - nult.32s.i d, Rs, #imm {R(d+1), Rd} = Rs * #imm[15:0]s ZN---5 - riult.32s.r d, Rs, Rt {R(d+1), Rd} Rs * Rt ZN---5 - nult.32u.i d, Rs, #imm R(d�l), RdJ Rs * #imm[15:OJu ZN---5 - nult.32u.r d, Rs, Rt {R(d+1), Rd} = Rs * Rt ZN---5 - nult.sh.r d, Rs, Rt, #imm d = (Rs * Rt)>> #imm[4:0]u ZN-V -6 - )ivide and Remainder (signed) ____________________________________________________ _______ div.s.i d, Rs, #1mm d= Rs / #imm[15:0]s ZN-V -20 - div.s.r d, Rs, Rt d = Rs / Rt ZN-V -20 - div.32s.i d, Rs, #imm d = {R(s+1), Rs} / #imm[15:OJs ZN-V -20 - div.32s.r d, Rs, Rt d = I R(s-i-1), Rs} / Rt ZN-V -20 - divrem.s.i d, Rs, #imm d = Rs / #imm[15:0]s; R(di-1) = Rs % #imrn[15:0]s --V -20 - livrem.s.r d, Rs, Rt d = Rs / Rt; R(d+1) = Rs % Rt --V -20 - livrem.32s.i d, Rs, #1mm d (R(s+1), Rs) / #imm[15:0]s; R(d+1) = (R(s+1), Rs} % --V -20 - _____________ __________________ imm[15:OIs _______ divrem.32s.r d, Rs, Rt d = {R(s+1), Rs) / Rt; R(d+1) = (R(s+1), RsJ % Rt --V -20 - em.s.i d, Rs, #imm d = Rs % #imm(15:0]s ZN-V -20 - em.s.r d, Rs, Rt d = Rs % Rt ZN-V -20 - em.32s.i IRd, Rs, #1mm}Rd = I R(s+1), Rs} % #imm[15:0]s JZN-V -20 - em.32s.r IRd, Rs, Rt IRd = {R(s+1), Rs} % Rt RN-V -20 -Divide and Remainder (unsigned) _____________________________________________ ______ :liv.u.i d, Rs, #1mm = Rs / #imm[15:0]u iNC--18 - Jiv.u.r d, Rs, Rt = Rs / Rt iNC--18 - div.32u.i d, Rs, #1mm = {R(s+1), Rs} / #imm[15:Oju iNC--18 - div.32u.r d, Rs, Rt = (R(s+1), Rs} / Rt iNC--18 - Iivrem.u.i d, Rs #1mm = Rs / #imm[15:0}u; R(d+1)[15:0J = Rs % #imm[15:Oju -C--18 - livrem.u.r d, Rs, Rt = Rs I Rt; R(d�1) = Rs % Rt -C--18 - livrem.32u.i d, Rs, #1mm = {R(s�1), Rs} / #imm[15:0]u; R(d+1) = IR(s+1), Rs} % -C--18 - _____________ __________________ fimm[15:0}u ______ livrem.32u.r d, Rs, Rt = {R(s+1), Rs} / Rt; R(d+1) = {R(s+1), Rs} % Rt -C--18 - em.u.i d, Rs, #imm = Rs % #imm[15:0]u NC--18 - em.u.r d, Rs, Rt = Rs % Rt ?J'JC--18 - em.32u.i d, Rs, #imm = {R(s+1), Rs} % #imm[15:0]u NC--18 - em.32u.r d, Rs, Rt = {R(s+1), Rs} % Rt NC--18 -Note: f (Rs = SP) Instruction takes one fewer clock cycle (as shown); * Note: 16-bit #imm is sign extended to 32-bits ________________________ k.]i1 :mp.8.i s, #imm s[7:0] -#imm[7:0] NCV 1 2 - :mp.8.r s, Rt s[7:0] -Rt[7:0] NCV 1 -- :mp.8c.i s, #imni s[7:0] -#imm[7:0] -C NCV -2 - :mp.8c.r s, Rt s[7:0] -Rt[7:0J -C NCV 1 -- :mp.8x.i s, #imm imm[7:0J -Rs[7:0} NCV -2 - :mp.8xc.i s, #imm imm[7:0] -Rs[7:0J -C NCV -2 - :mp.i (s, #imm s -#imm[15:O] NCV 1 2 - :mp.r s, Rt s -Rt NCV 1 -- :mp.c.i (s, #imm s -#imm[15:0] -C NCV -2 - :mpc.r s, Rt s -Rt -C NCV 1 -- :mp.x.i s, #imm #imm[15:0] -Rs NCV -2 - :mp.xc.i s, #1mm imm[15:0J -Rs -C NCV -2 - :mp.32.i s, #imm ** R(s+1), Rs) -#imm[15:0]s NCV 2 3 - :mp.32.r s, Rt R(s+1), Rs) -{R(t-i-1), Rt} NCV 2 -- :rnp.32c.i s, #imm ** R(s+1), Rs) -#irnm[15:0]s -C NCV -3 - mp.32c.r s, Rt R(s+1), Rs) -IR(t+1), Rt) -C NCV 2 -- :mp.32x.i s, #1mm ** imm[15:0]s -{R(s-i-1), Rs} NCV -3 - :mp.32xc.i s, #imm ** imm[15:0]s -{R(s+1), Rs) -C NCV -3 - * Note: 16-bit #imm is sign extended to 32-bits hiftl.i d, Rs, #1mm s shiftl.32.i, but not including R(s+1) -R(d+1) and ZNC-I -- _____________ __________________ epeated #imm[3:0] times. _______ - hiftl.r d, Rs, Rt ts shiftl.i, but repeated Rt[3:0] times. ZNC-1 -- hiftl.32.i d, Rs, #imm repeat ZNC--2 -R(si-1) Rs #imm[4:o] ___________________ times discarded -{ R(d+1) J Rd k -LJ hiftl.32.r d, Rs, Rt s shiftl,32.i, but repeated Rt[4:Oj times, NC--2 - -46 - hiftr.s.i d, Rs, #1mm s shiftr.32s.i, but not including R(s+1) -R(d+1) and NC-1 -- _____________ _________________ epeated #imm[3:0] times. ______ hiftr.s.r d, Rs, Rt s shiftr.s.i, but repeated Rt[3:0] times. ZNC-1 -- hiftr32s.i d, Rs, #imm #irnml4:O) ZNC--2 -times R(d+1) Rd discarded hiftr.32s.r d, Rs, Rt s shiftr.32s.i, but repeated Rt[4:0] times. NC--2 - hiftr.u.i d, Rs, #imm s shiftr.32u.i, but not including R(s+1) -R(d-i-1) and NC-1 -- _____________ __________________ epeated #imm[3:0] times. _______ hiftr.u.r d, Rs, Rt s shiftr.ui, but repeated Rt[3:0] times. NC-1 -- hiftr.32u.i d, Rs, #imm rnm(4 0] !NC--2 -times o R(d+1) Rd discarded hiftr.32u.r d, Rs, Rt s shiftr.32u.i, but repeated Rt[4:0} times. NC--2 - otatel.i d, Rs, #imm s rotatel.32.i, but not including R(s+1) -R(d+1) and NC--2 - _____________ __________________ epeated #imm[3:0] times. _______ otatel.r d, Rs, Rt s rotatel.i, but repeated Rt[3:0] times. iNC--2 - otatel.32.i d, Rs, #imm R(s+1) Rs J m4'0] NC--2 -times R(d+1) Rd k otatel.32.r d, Rs, Rt ts rotatel.32.i, but repeated Rt[4:0] times. NC--2 - !èZIi I:1w 1] J Iear (immediate) _______________________________________________________ _______ lkcp.i a(0, Rad), @(0, Ras), opy #num bytes from memory starting at Ras to ----2n +4 -num nemory starting at Rad.

)lkst.8.i (S. @(0, Ra), #num;tore #num bytes to memory starting at Ra. All bytes ----n + 4/ _____________ _________________ qual Rs[7:0]. Rs = R0-R7, #0, #OxFF. ______ )lkst.i (5, @(0, Ra), #num;tore #num bytes to memory starting at Ra. All byte ----n +4 ______________ __________________)airs equal Rs[15:0]. Rs R0-R7. _______ lear (register) _____________________ _____________________________________________________________ _______ lkcp.r d(0, Rad), @(0, Ras), opy Rn bytes from memory starting at Ras to memory ----2n +4 ______________ __________________;tarting at Rad.

lkst.8.r (s, @(0, Ra), Rn;tore Rn bytes to memory starting at Ra. All bytes equal ----n + 4/ _____________ _________________ s[7:0]. Rs = R0-R7, #0, #OxFF. ______ -n+3* lkst.r (s, @(0, Ra), Rn;tore Rn bytes to memory starting at Ra. All byte pairs ----n + 4 _____________ _________________ qual Rs[15:0]. Rs R0-R7. ______ ar (immediate) _______________________________________________________ _______ )Ikcp.fi (0, Rad), @(0, Ras), opy #num bytes from memory starting at {R(as+1), ----2n + 4 - _____________ num (as) [23:0] to memory starting at (R(ad+1), Rad)[23:0]. ______ )lkst.8.fi (s, @(0, Ra), #num tore #num bytes to memory starting at fR(a+1), ----n + 4/ - ____________ ________________ (a}[23:0J. All bytes equal Rs[7:0}. Rs = R0-R7, #0, #OxFF. ______ - )lkSt.fi (s, @(0, Ra), #num;tore #num bytes to memory starting at {R(a+1), ----n + 4 - _____________ _________________ (a)[23:0]. All byte pairs equal Rs[15:0]. Rs R0-R7. ______ -ar (register) _____________________ _____________________________________________________________ _______ )lkcp.fr (0, Rad), @(0, Ras), Rn bytes from memory starting at (R(as+1), ----2n + 4 - _____________ _________________ (as}[23:0] to memory starting at (R(ad+1), Rad}[23:0]. ______ )lkst.8.fr (s, @(0, Ra), Rn;tore Rn bytes to memory starting at {R(a+1), Ra}[23:0]. ----n + 4/ - ____________ ________________ ll bytes equal Rs[7:0]. Rs = R0-R7, #0, #OxFF. ______ -3* -47 - )lkst.fr s, @(0, Ra), Rn tore Rn bytes to memory starting at IR(a�1), Ra}[23:0]. F----n + 4 - _____________ IA!! byte pairs equal Rs[15:0]. Rs R0-R7. I Jote: n and #num are the number of bytes copied or stored. See C7508-UM-002 for more details.

\J is the number of transfers, not the number of bytes.

One fewer clock cycle when using #imm versions, as shown.

TTrIT. --* lip.r d,Rs lipbits;0-l5,l-l4,... TN---2 - iip.8.r d, Rs lip byte bits; 0-7, 8-15; 1-6, 9-14; ... iN---3 - ibs.r d, Rs f (Rs �= 0) Rd = Rs; else Rd = -Rs; TN---2 - nsbit.r d, Rs d = (1 + highest bit to contain a 1 in Rs) TN---2 ---* d.1.i %flags[c], oad selected bit into C flag. -C--3 4 ______________ D(offset[bit], 0) _____________________________________________________________ ________ dand.1.i /oflags[cI, %flags[c], ND C flag with selected bit, then store result in C flag. -C--3 4 ______________ 0(offset[bit], 0]) _____________________________________________________________ ________ dor.1.i /oflags[c], %flags[c], DR C flag with selected bit, then store result in C flag. -C--3 4 ______________ 4(offset[bit], 0) _____________________________________________________________ ________ dxor.1.i of1agsLc], %flags[c], (OR C flag with selected bit, then store result in C flag. -C--3 4 _______________ (offset[bit], 0) ________________________________________________________________ ________ t.1.i mm, ;tore 1-bit immediate to selected bit in bit-enabled 10 -3 4 _____________ (offset[bit], 0) egister. ________ /oflags[c], ;tore C flag to selected bit in bit-enabled 10 register. ----3 4 ______________ 4(offset[bit], 0) _____________________________________________________________ ________ mm, ;tore 1-bit immediate to selected bit in bit-enabled 10 -3 4 ______________ CD(offset[bit], 0) egister. ________ %flags[c], tore C flag to selected bit in bit-enabled JO register. ----3 4 ______________ c(offset[bit], 0) _____________________________________________________________ ________

I I

ext.r jd ign Extend: Set Rd[15:8] = Rd[7] IZN--Ji:..L...:...

System Instructions _______________________________________________ ______ - tie ________________ eturn from Interrupt, Exception NCV 8 -- rk _________________ reak ---1 -- alt _________________ -lalt ---1 -- oftreset erform a Soft Reset ---13 -- sleepnop ________________ leep ---1 -- sleep_sif ___________________ leep and allow SIF ---1 -- op ________________ Jo Operation ---1 -- if ___________________ erform SIF cycle ---7 8 - rint.r s rint Register ----2 - imode ________________ lags and info mode ---1 -- ic d {R(d+1), Rd} = XAP5 licence number ----3 - ier d {R(d+1), Rd} = XAP5 version number ----3 - syscall.i um, #imm ystem Call -Immediate ----13 - yscall.r um, Rs ystem Call -Register ----13 -Flags Register ___________________ ____________________________________________________ ______ nov.1.i f%flags[F], #imm IF flag = #imm[0] [ J 1 --j______________ JF=p3,p2,pl,pO,s4,s3,s2,sl,sO,i,ml,mO,v,c,n,z I _____ nov.1.r [%flags[c], Rs JC flag = Rs[0] k-C-L....... .... -48 -

d, %flags[F] d[l5:l] = 0; Rd[O] = F flag ZN---2 - = p3. p2. p1, p0, s4, s3, s2, si, sO, i, ml, mU, v, c, n, z ____________ /oflagsLi], %flags[c] flag C flag ----- nov.2.i /oflags[mJ, #1mm (Ml, MO) flags = #imm[1:0] ----- nov.2.r d, %flags[m] d[l5:2] 0; Rd[l:0] = (Ml, MO) flags ZN---2 - nov.4.i /oflags[p], #1mm (P3, P2, P1, P0) flags = #imm[3:0J ---1 -- nov.4.r d, %flags[pJ d[l5:4] = 0; Rd[3:0] = (P3, P2, P1, P0) flags ZN---2 - :or.1.i /oflags[c],%flags[c],#1 Eoggle C flag. -C-1 --ddress Registers ______________________________________________________ _______ nova2r.16 Jd, As = As[15:0] JN--1_ _.L nova2r.32 As (d+1) = {0x00, As[23:161); Rd = As[15:0] IZN---2 - novr2a.16 jAd, Rs d[23:l6] = 0; Ad[15:O] = Rs p----2 - novr2a.32 d, Rs d[23:16] = R(s+l)[7:0]; Ad[15:0] = Rs J-----2 -reakpoint Registers novb2r.32 fRd, Bs fR(d+1) = (OxOO, Bs[23:16]}; Rd Bs[15:0] JZN---2 - novr2b.32 IBd, Rs IBd[23:16] = R(s+1)[7:0]; Bd[15:0] = Rs f.----2 -Special Registers novs2r jRd, Ss tRd = Ss IZN---2 - novr2s d, Rs d = Rs]----2 -Instruction Formats Instructions can take several forms * mnemonic label II Conditional and Unconditional Branches * mnemonic %registers * mnemonic %registers, #immediates * mnemonic %registers, @address The convention is to have a tab between the mnemonic and the first argument. This should not be required by the binutils assembler, but should be generated by the C compiler.

Instructions must explicitly list every argument for each instruction. e.g.: * ld.i %r2, @(O, %r4) I/NOT ld.i %r2, @%r4 Instruction Mnemonics The mnemonic is a string in one of the following formats: * Base * Base.Parameters * Base.Type * Base.Parameters.Type * Base.Parameters.Type.Size Examples of Base are: * add * mov * Id 40. bra -49 -Examples of Parameter are * c II with carry * x II exchange order ( reverse order) * z II zero-extend * s I/signed * u II unsigned * 8 II 8-bit operation * 32 II 32-bit operation 1 0 Examples of Type are: * r II register * i I/immediate * h II high immediate a a II absolute address 1 5 * p II PC-relative address a f I/far pointer (32-bit base address register, Ra)

Examples of Size are

* 2 II 2byte, 16-bit instruction * 4 II 4byte, 32-bit instruction * 6 II 6byte, 48-bit instruction Instruction mnemonics do not normally include Size. It is only used when the xap5-gcc compiler wants to force the binutils assembler/linker to use a particular instruction size. This is sometimes needed for branch instructions.

Here are some alternative versions of the add instruction: a add.r II Add Rt to Rs and store result in Rd * add.c.r II Add Rt to Rs with carry and store result in Rd * add.i /1 Add Immediate to Rs and store result in Rd Registers The registers are named in the assembler as follows: * 8 normal registers are referred to as %rO to %r7.

* Program Counter is referred to as %pc.

* Stack Pointer is referred to as %sp.

* Vector Pointer is referred to as %vp * Zero (when used for zero-relative addressing) is referred to as 0.

There can be several Registers in each instruction. They will be in the following sequence from left to right: * Rd /1 Destination Register * Rs II Primary Source Register * Rt II Secondary Source Register * Rx I/Index Address Register (scaled by (I, 2, 4}for {8, 16, 32}-bit data) * Rn II Count Number Register a Ra II Base Address Register (address in bytes) a Ras II Source Base Address Register (address in bytes) * Rad II Destination Base Address Register (address in bytes) -50 -Each Register will be separated by, and prefixed by %.

Some instructions (push, pop, pop.ret) require a list of registers as an argument. The number of registers can vary. Such lists are enclosed in (} e.g. push {%r6, %r3}, #6 pop {%r3, %r6}, #6 pop.ret {%r3, %r6}, #6 Within such register lists, registers can be specified in ranges as follows * %r3-%r6 II The first register must be lower than the second in pop and pop.ret.

%r6-%r3 II The first register must be higher than the second in push.

This enables more compact Assembler code (from the C compiler and Disassembler).

Within the register list, the same register must not be specified more than once { %r3-%r5, %r4} II Not valid because %r4 has been specified twice { %r3, %r5-%r6} II This is a valid register list Register Pairs Some instructions use 2 adjacent registers grouped together as a single 32-bit register-pair. XAP5 can use any adjacent register pair as a 32-bit register-pair. This assumes little-endian ordering and the register-pair is referenced by the lower register.

Iminediates There can be several immediates in each instruction. Each Immediate will be separated by,. Imniediates outside ()are prefixed by #. For example: shiftl.i %r4, %r6, #6 Immediate values in the assembly files are treated as 16-bit numbers. It is the responsibility of the assembler to decide whether the immediate value is compatible with the values permitted by the instruction mnemonics.

The assembler will sign-extend positive and negative decimal numbers, but not hex or binary. The assembler will then calculate whether the resultant 16-bit number is in a valid range for the Immediate field for that particular instruction. eg.: add.i %rl, %r2, #Ox 1234 I/valid positive immediate add.i %rl, %r2, #35 I/valid positive immediate add.i %rl, %r2, #-356 I/valid negative immediate add.i %rl, %r2, #OxF234 I/valid negative immediate add.i %rl, %r2, #Oxl 11234 I/invalid (immediate> 16 bit) add.i %rl, %r2, #OxABCDI234 I/invalid (immediate> 16 bit) add.i %rl, %r2, #-l00000 I/invalid (immediate> 16 bit) Addresses There can only be one data address in each instruction. This will be prefixed by @.

Address indirection is specified as one of: -51 - @(offset, Ra) @(-56, %r7) II Ra = byte address @(offset, SP) @(-56, %sp) II SP-relative addressing @(offset, PC) @(125, %pc) I/PC-Relative addressing @(label, PC) @(narnel, %pc) I/PC-Relative label @(offset, 0) @(125, 0) I/Zero-Relative (absolute) addressing @(label, 0) @(name 1, 0) II Zero-Relative label @(Rx, Ra) @(%r4, %r7) // Rx = scaled addr, Ra = byte addr @(Rx, SP) @(%r4, %sp) II Rx = scaled addr, SP = byte addr Code addresses are always resolved as 24-bit byte addresses. They are stored in registers and memory as 32-bit.

The following instructions set 24-bit byte addresses * mov.32.i * bra.* * bsr.* Data addresses are resolved as 24-bit when Far pointers are used. . They are stored in registers and memory as 32-bit.

Data addresses are resolved as 16-bit when Near pointers are used. . They are stored in registers and memory as 16-bit.

Syntax Examples

Here are some valid instructions in XAP5 Assembler: Id.i %r2, @(1256, %rO) I/Load value at specified address into R2 ld.i %r3, @(l56. 0) I/Load value at (156 + 0) into R3 st.r %r5, @(%r3, %r6) II Store register R5 at specified address add.r %r4, %r3, %r5 II Add KS to R3 and store in R4 add.c.r %r7, %r2, %r4 /* Start of Block comment foo bar End of Block Comment *1 add.r %r6, %r3, %rl II this add.r is executed or.i %r3, %r7, #-563 II OR -563 with R7 and store in R3 and.i %rl, %r5, #OxABcd II AND hex ABCD with R5 and store in RI xor.i %rl, %r5, #0xl230 If XOR Ox 1230 with Ri and store in RI mult.r %r6, %r4, %r2 If I 6x 16 unsigned multiply of R4 with R2.

/116-bit result stored in R6.

mult.32s.r %r5, %r4, %r2 II 16x16 signed multiply of R4 with R2.

II 32-bit result stored in { R6, Ri}.

mov.32.i %rl, (0x123456, %pc) I/RI =PC +0x123456.

Claims

-52 -CLAIMS: 1. A data processing apparatus adapted to operate under control of an executable, the apparatus comprising: a processor, means for addressing an executable stored in a memory; and a memory management unit (MMU) for translating a logical address requested by the processor to a physical address located within the memory, wherein the physical address is computed based on the location of the executable within the memory.
2. A data processing apparatus according to Claim 1, wherein the executable is compiled to be executed at fixed logical address locations which are not dependent on the physical address locations in the memory in which the executable is locatable.
3. A data processing apparatus according to Claims 1 or 2, wherein the executable is in the form of a contiguous block including one or more of the following: code; initialisation data; constants; and vectors.
4. A data processing apparatus according to Claim 3, wherein the code comprises a set of event handlers.
5. A data processing apparatus according to any of the preceding claims, wherein the MMU is further adapted to translate a logical address relating to a data block to a physical address located within the memory.
6. A data processing apparatus according to any of the preceding claims, wherein the MMU is adapted to translate only those logical addresses that fall within a particular address range.
7. A data processing apparatus according to Claim 6, wherein the MMU is adapted to translate a first range of logical addresses to a first range of physical addresses and second and subsequent groups of logical addresses to second and subsequent ranges of physical addresses.

-53 -
8. A data processing apparatus according to Claims 6 or 7, wherein the MMU is adapted to translate only those logical addresses that fall within a particular address range and otherwise to set the physical address to the logical address.
9. A data processing apparatus according to any of Claims 6 to 8, further comprising means for defining at least one translation window around a particular range of logical addresses.
10. A data processing apparatus according to Claim 9, wherein the size of the window is based on the size of an executable block.
11. A data processing apparatus according to Claims 9 or 10, wherein the window comprises upper and lower address limits.
12. A data processing apparatus according to Claim 11, wherein the upper address limit is less than or equal to a highest logical memory location in which the executable is logically stored in the memory.
13. A data processing apparatus according to Claims 11 0112, wherein the lower address limit is greater than or equal to a lowest logical memory location in which the executable is logically stored in the memory.
14. A data processing apparatus according to any of Claims 11 to 13, wherein the address limits are computed at link time by a linker.
15. A data processing apparatus according to any of Claims 11 to 14, further comprising pairs of processor memory registers for storing the upper and lower logical address limits for the or each window.
16. A data processing apparatus according to Claim 15, wherein the registers are -54 -memory mapped registers.
17. A data processing apparatus according to any of the preceding claims, wherein the MMU is adapted to translate a logical address to a physical address by adding an offset to the logical address, the offset being computed based on the physical location in the memory at which the executable is stored.
18. A data processing apparatus according to Claim 17, further comprising means for storing at least one offset.
19. A data processing apparatus according to Claim 18, further comprising a plurality of processor memory registers for storing the or each offset.
20. A data processing apparatus according to Claim 19, wherein the memory registers are memory mapped registers.
21. A data processing apparatus according to Claims 17 to 20, further comprising means for computing the or each offset.
22. A data processing apparatus according to Claim 21, wherein the computing means forms part of an operating system.
23. A data processing apparatus according to Claim 22, wherein the operating system is adapted to compute the offset at run time.
24. A data processing apparatus according to Claim 21, wherein the computing means comprises an executable event handler, which forms part of the executable.
25. A data processing apparatus according to any of Claims 21 to 23, wherein the computing means is adapted to compute the offset during the initialisation of theexecutable.

-55 -
26. A data processing apparatus according to any of Claims 17 to 25, wherein the offset is computed by an executable loader at the time the executable is stored at a particular location in the memory.
27. A data processing apparatus according to any of Claims 17 to 26, wherein the offset is set to zero upon at least one of the following processor events: start up; reset; and initialisation.
28. A data processing apparatus according to any of Claims 17 to 27, wherein the MMU is adapted to add the offset to the logical address using modular arithmetic thereby to enable address wraparound.
29. A data processing apparatus according to any of the preceding claims, wherein the processor is adapted to address at least one of the following types of memory: random access memory (RAM); read only memory (ROM); and Flash memory.
30. A data processing apparatus according to any of the preceding claims, wherein the processor is adapted to address memory external to the data processing apparatus.
31. A data processing apparatus according to any of the preceding claims, wherein the processor is adapted to enable the executable to be stored at any available location within the memory, and executed in place from that location.
32. A data processing apparatus according to any of the preceding claims, further comprising a memory location adapted to point (whether directly or indirectly) to an address of at least one event handler associated with the executable; and means for loading an address value relating to said at least one handler into the memory location.
33. A data processing apparatus adapted to operate under control of an executable, the apparatus comprising a memory location adapted to point (whether directly or -56 -indirectly) to an address of at least one event handler associated with the executable; and means for loading an address value relating to said at least one handler into the memory location.
34. A data processing apparatus according to Claims 32 or 33, wherein the executable is in the form of a contiguous block including one or more of the following: code; initialisation data; constants; and vectors.
35. A data processing apparatus according to Claim 34, wherein the code comprises a set of event handlers.
36. A data processing apparatus according to Claim 34, wherein at least one of the event handlers is a start or re-start handler relating to the executable.
37. A data processing apparatus according to any of Claims 32 to 36, wherein the executable is in the form of a firmware block.
38. A data processing apparatus according to any of Claims 32 to 37, wherein the processor is adapted to load automatically an address value to the memory location upon start up or re-start.
39. A data processing apparatus according to any of Claims 32 to 38, wherein the processor is adapted to load an address value from a predefined location in the memory to said memory location.
40. A data processing apparatus according to Claim 39, wherein the processor is adapted to load the value stored at the top of the memory to said memory location.
41. A data processing apparatus according to Claims 39 or 40, wherein the processor is adapted to enable code (which may form part of the executable) to write to said memory location.

-57 -
42. A data processing apparatus according to Claim 41, wherein the processor is adapted to enable privileged code to write to said memory location.
43. A data processing apparatus according to any of Claims 32 to 42, wherein the apparatus is on start up or restart adapted to use the address value loaded in said memory location to execute an event handler.
44. A data processing apparatus according to any of Claims 32 to 43, wherein the address value directly points to an address of an event handler.
45. A data processing apparatus according to any of Claims 32 to 43, wherein the address value points to a vector table in the executable which includes the addresses of a set of event handlers.
46. A data processing apparatus according to Claim 45, wherein the address value points to a base address of the vector table.
47. A data processing apparatus according to Claims 45 or 46, wherein the vector table includes a set of addresses (vectors) that point to event handlers relating to the executable stored in memory.
48. A data processing apparatus according to any of Claims 45 to 47, wherein the vector table and the executable form part of a single contiguous (firmware) block.
49. A data processing apparatus according to any of Claims 45 to 48, wherein the vector table includes vectors that point to event handlers for handling one or more of the following events: interrupts; exceptions; resets; errors; service requests; and system calls.
50. A data processing apparatus according to any of Claims 45 to 49, wherein the -58 -processor comprises means for indexing event handlers using the address value stored in the memory location.
51. A data processing apparatus according to Claim 50, wherein the indexing means is adapted to add an offset to the address value thereby to index a particular event handler.
52. A data processing apparatus according to any of Claims 32 to 51, wherein the memory location comprises a processor memory register.
53. A data processing apparatus according to Claim 52, wherein the size of the register is related to the processor address space.
54. A data processing apparatus according to Claims 52 or 53, wherein selected bits within the register are set to zero.
55. A data processing apparatus according to Claim 54, wherein the least significant bits within the register are set to zero.
56. A method of operating a data processing apparatus according to any of the preceding claims.
57. A method of operating a data processing apparatus, the method comprising: loading an executable block into a particular location in a memory; and translating a logical address requested by the data processing apparatus to a physical address located within the memory, wherein the physical address is computed based on the location of the executable within the memory.
58. A method of operating a data processing apparatus, the method comprising: loading an executable into a memory accessible by the data processing apparatus, the executable comprising at least one event handler associated with the executable; and -59 -loading an address value relating to the at least one handler into a predefined memory location.
59. A computer program product comprising software code adapted, when executed on a data processing apparatus, to perform all the steps of the method according to any of Claims 56 to 58.
60. A data processing apparatus substantially as herein described and/or as illustrated with reference to the accompanying drawings.
61. A method of operating a data processing apparatus substantially as herein described and/or as illustrated with reference to the accompanying drawings.
62. A computer program product substantially as herein described and/or as illustrated with reference to the accompanying drawings.