WO2001057676A1 - Procedes permettant de synthetiser des tables de translation et systemes relatifs - Google Patents

Procedes permettant de synthetiser des tables de translation et systemes relatifs Download PDF

Info

Publication number
WO2001057676A1
WO2001057676A1 PCT/US2001/002560 US0102560W WO0157676A1 WO 2001057676 A1 WO2001057676 A1 WO 2001057676A1 US 0102560 W US0102560 W US 0102560W WO 0157676 A1 WO0157676 A1 WO 0157676A1
Authority
WO
WIPO (PCT)
Prior art keywords
register
level
bits
memory
address
Prior art date
Application number
PCT/US2001/002560
Other languages
English (en)
Inventor
Gregory Allen North
Matthew Richard Perry
Brian Christopher Kircher
Original Assignee
Cirrus Logic, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cirrus Logic, Inc. filed Critical Cirrus Logic, Inc.
Priority to AU2001231165A priority Critical patent/AU2001231165A1/en
Priority to JP2001556457A priority patent/JP2003521780A/ja
Priority to EP01903336A priority patent/EP1256060A1/fr
Publication of WO2001057676A1 publication Critical patent/WO2001057676A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/572Secure firmware programming, e.g. of basic input output system [BIOS]

Definitions

  • the present invention relates in general to electronic appliances and in particular to circuits, systems and methods for information privatization in personal electronic appliances.
  • Handheld personal electronic appliances have become increasingly popular as new technologies have allowed for the production of affordable devices with a high degree of functionality.
  • One such device is the portable digital audio player, which downloads digital audio data, stores those data in a read-writeable memory, and converts those data into audio on user demand.
  • the digital data is downloaded from a network or retrieved from a fixed medium, such as a compact disk, in one of several forms, including the MPEG Layer 3, ACC, and MS Audio protocols.
  • An audio decoder supported by appropriate firmware, retrieves the encoded data from memory, applies the corresponding decoding algorithm and coverts the decoded data into analog form for driving a headset or other portable speaker system.
  • a system which includes a central processing unit operating in response to a set of instructions for processing information.
  • An interface is included which provides access to selected circuitry forming a part of the system by an external device.
  • a set of non- volatile programmable security elements selectively enable and disable the operation of the interface to provide a private environment for processing the information.
  • the principles of the present invention provide, among other things the ability to privatize information in personal digital appliances. These principles can be implemented in a manner which does not waste processing resources, such as available memory space, which could be more directly used for processing operations. Moreover, these principles can be applied to a wide range of different system configurations that do not depend on where in the appliance the private information is to be stored, whether it be in memory internal or external to the primary processing chip.
  • FIGURE 1 A is a high level functional block diagram of an integrated circuit embodying the principles of the present invention
  • FIGURE IB is a high level diagram of a second system embodying the inventive concepts
  • FIGURE 1C is a third exemplary system to which the present inventive principles can be advantageously applied
  • ID are two additional;
  • FIGURE 2 depicts integrated circuit 100 in a maximum utilization configuration
  • FIGURE 3 is a high level functional block diagram of the processor depicted in FIGURE IB;
  • FIGURES 4A depicts the external clock driving a pin EXPCLK when the clock enable signal on pin CLKEN is asserted with the system entering the Standby State
  • FIGURE 4B depicts the external clock driving a pin EXPCLK when the clock enable signal on pin CLKEN is asserted and the system is exiting the Standby State
  • FIGURE 5 is a state diagram illustrating the operation of the state control circuitry of FIGURE 1A
  • FIGURE 6 is a block diagram of the three serial interfaces comprising the serial interface block of FIGURE 1A;
  • FIGURES 7A and 7B are timing diagrams illustrating the operation of the SSI (ADC) in conjunction with selected external devices;
  • FIGURE 8 is a timing diagram illustrating the operation of the Codec interface of FIGURE 6;
  • FIGURE 9 is a functional block diagram showing an interface between the I 2 S port of the serial interface block of FIGURE 6;
  • FIGURE 10 is a timing diagram illustrating the operation of the I 2 S interface of FIGURE 9;
  • FIGURE 11 is a functional block diagram illustrating the use of the SSI2 port of FIGURE 6 in a master-slave configuration;
  • FIGURE 12 is a flow chart describing system initialization at power-on reset
  • FIGURE 13 is a flow chart illustrating a procedure for locking private data in TLB
  • FIGURE 14 illustrates a cache lockdown procedure for locking secure code into cache
  • FIGURE 15 is a flow chart in which an emulated cache miss procedure is set forth
  • FIGURE 16A illustrates a preferred method of setting-up synthesized translation tables
  • FIGURE 16B is a flow diagram illustrating a table walk through the synthesized tables of FIGURE 16A; and FIGURES 17 A through 17E illustrate a preferred procedure for performing an emulated table walk.
  • FIGURES 1 A through 17E of the drawings in which like numbers designate like parts.
  • FIGURE 1 A is a high level functional block diagram of an integrated circuit 100 embodying the principles of the present invention.
  • Integrated circuit 100 could be, for example a Cirrus Logic EPxx integrated circuit.
  • Integrated circuit 100 can advantageously be utilized in a number of consumer and industrial handheld information appliances, including personal digital assistants, electronics organizers, and two-way pagers, among other things.
  • integrated circuit 100 can be configured to perform audio processing in battery powered internet audio decoders.
  • FIGURES IB and 1C Two additional exemplary systems to which the present inventive principles can be advantageously applied are shown in FIGURES IB and 1C and will be discussed further below.
  • FIGURE 2 depicts integrated circuit in a system configuration, and will be referenced during the discussion of the input/output signals (ports) of the various functional blocks of integrated circuit 100.
  • Integrated circuit 100 is built around an ARM720T processor 101 as described in the ARM720T data sheet available from ARM, Ltd., Cambridge,
  • processor 101 includes a central processing unit (CPU) core 102, 8-kilobyte cache 103, memory management unit (MMU) 104 and write buffer 105, each of which will be described in further detail below. It should be noted that in alternate embodiments, an ARM920 processor may also be used.
  • CPU 102 is a 32-bit microprocessor based on a reduced instruction set computer (RISC) architecture.
  • RISC reduced instruction set computer
  • the associate 8-kilobyte cache 103 is a mixed instruction and data cache (IDC) and is organized as a four way set-associative cache of 512 lines of 16 bytes (4 words).
  • MMU 104 includes a translation look aside buffer (TLB), access control logic and translation-table-walking logic.
  • TLB translation look aside buffer
  • the primary functions of MMU 104 are the translation of virtual addresses into physical addresses and the control of accesses to memory. It also supports a conventional two-level page-table structure.
  • the TLB encaches 64 translated entries and provides the translation to the associated access control logic. If a virtual address causes a hit to a translated entry in the TLB, the access control logic determines whether the access is permitted. In the case of a permitted access, MMU 104 outputs the corresponding physical address from the TLB cache. Otherwise, if the access is not permitted, MMU 104 signals CPU 102 to execute an abort.
  • the translation-table-walking circuitry retrieves the necessary translation information from a translation table in physical memory. This translation information is written into the TLB cache at a replacement point or entry. The access control logic can then determine whether or not the access is allowed.
  • Write buffer 105 is used to buffer up to eight words of data and four independent addresses. When enabled, CPU 102 writes data or an instruction into write buffer 105 using an external clock and then returns to instruction execution. Write buffer 105 can then, in parallel, write data onto internal data bus 106 and addresses onto internal address bus 107.
  • An on-chip phase locked loop (PLL) 108 driven by a 3.6864 MHz crystal
  • the primary (CPU) clock can be programmed to either 18.432 MHz, 36.864 MHz, 49.152 MHz or 73.728 MHz.
  • PLL 108 preferably runs at twice the highest possible CPU clock frequency or 147.456 MHz).
  • internal data bus 106 and internal address bus 107 are also clocked at approximately 36 MHz.
  • processor 101 runs at the higher clock rate, with internal data bus 106 and internal address bus 107 being clocked at the 36 MHz rate.
  • the CPU clock frequency is selected by programming a two-bit register field in the system control register SYSCON3.
  • Table 1 A list of registers internal to integrated circuit 100 is provided as Table 1; a complete description of those registers is found in the Cirrus Logic EP7211 Preliminary Data Sheet, incorporated herein by reference.
  • integrated circuit 100 also includes and external clock input which allows for the input of an external 13 MHz clock for driving substantially all of the on-chip circuitry in a second clocking mode.
  • the external clock drives a pin EXPCLK when the clock enable signal on pin CLKEN is asserted as shown in FIGURES 4 A and 4B, where FIGURE 4 A shows integrated circuit 100 entering the Standby State and FIGURE 4B exiting the Standby State. (The Standby State is discussed further below).
  • Oscillator 110 is used to generate a 1 hertz clock which is used to drive a
  • RTC 112 can be written to or read from and includes a 32-bit output match register which allows the issuance of an interrupt when the time in the RTC matches a predetermined specific time.
  • 112 is also used to drive a programmable LED flasher (not shown).
  • integrated circuit 100 includes a pair of on-chip timer counters 113.
  • Each timer counter is independent and includes a 16-bit readable-writeable data register. The given counter is loaded to a desired value and then decrements in response to a preselected clock. When the timer counter underflows (i.e., reaches zero) the appropriate interrupt is generated.
  • the timer counter registers can be read at any time. The clock frequency of these timers can be selected by writing to corresponding bits in the system control registers SYSCON. For example, when PLL 108 is sourcing the internal clocks, 512 kHz and 2 kHz rates are available to timer counters 113.
  • Each timer counter 113 can operate in either a free running mode or a prescale mode by setting or clearing bits in system control register SYSCON1.
  • the free running mode the given counter wraps around to OxFFFF when it underflows (i.e. reaches zero) and continues to count down.
  • the prescale mode the value written into the given timer counter is automatically reloaded when the counter underflows.
  • the prescale mode can be used to produce a programmable frequency, drive a buzzer, or generate a periodic interrupt.
  • State control circuitry 114 allows integrated circuit 100 to be set to either an Operating, Idle, or Standby state.
  • a state diagram illustrating the operation of state control circuitry 114 is shown in FIGURE 5.
  • the Operating state is the normal program execution state and all clocks and peripheral logic are enabled.
  • the Idle state is similar to the Operating state with the exception that the CPU clock is halted pending an interrupt or wake-up to return it back to the Operating state.
  • PLL 108 is shut down, although Crystal 111 and oscillator 110 and RTC circuitry 112 remains active.
  • the external address and data buses are also forced low in the Standby State to prevent any powered down peripherals from draining current. It should be noted that integrated circuit 100 when first powered, or during a cold reset, is forced into the Standby state, which can only be left by external wake-up prompt.
  • power management is also effectuated through power management control block 115.
  • the state of various functional blocks of integrated circuit 100 in each state are tabulated in TABLE 12.
  • Power management circuitry 115 forces integrated circuit 100 into the Standby mode when an active low power failure signal P WRFL is received from an external power supply unit 201. If integrated circuit 100 is being driven by an external DC power source 202, the external power sense input signal EXTPWR is driven active low. If a battery 203 is being used, an active high on the BATOK pin indicates that the main battery is OK. The falling edge of this signal generates an FIQ (fast interrupt request) while a low level signal on this pin in the Standby state inhibits system start up. The new battery sense signal BATCHG indicates that a new battery is required; an active low at this input occurs if the battery voltage falls below a "no battery” threshold.
  • the battery powering integrated circuit 100 could be, for example, one or more standard AA batteries widely available to retail consumers.
  • interrupt controller 116 When unexpected events arise during execution of a program (i.e., an interrupt or memory faults) an exception is usually generated.
  • interrupt controller 116 operating on a fixed priority system, determines the order in which the exceptions are serviced.
  • Integrated circuit 100 operates on two interrupt types, namely the interrupt request (IRQ) and the fast interrupt request (FIQ). FIQs have a higher priority than IRQs.
  • IRQ interrupt request
  • FIQs fast interrupt request
  • TABLES 2A-2C set out preferred interrupt allocation, wherein INTMRl and INTSR1 are respectively the First Interrupt Mask
  • INTMR2 and ESfTSR-2 are respectively the Second Interrupt Mask Register and Second Interrupt Status Register
  • INTMR3 and ENTSR3 are the Third Interrupt Mask Register and Third Interrupt Status Register. It should be noted that if two interrupts are received from within the same group (IRQ or FIQ), the order in which they are serviced is preferably resolved in software.
  • interrupt controller 116 operates as follows. An external or internal interrupting device asserts the appropriate interrupt. If the appropriate bit is set in the corresponding Interrupt Mask Register, then either an FIQ or IRQ is asserted by Interrupt Controller 116. If the interrupts are enabled, processor 101 jumps to the appropriate address. Interrupt dispatch software then reads the corresponding Interrupt status register to establish the source of the Interrupt and calls the appropriate Interrupt service routine software which then clears the Interrupt source through some action specific to the interrupting device. The Interrupt service routine may then re-enable interrupts, and any other pending interrupts are similarly serviced. All other external Interrupt sources are held active until the corresponding service routine starts executing.
  • processor 101 checks for a low level on its FIQ and IRQ inputs after each instruction is executed. Hence, there is an Interrupt latency directly related to the amount of time it takes to complete the current instruction after an Interrupt condition is first detected.
  • the latency will depend on whether the system clock is shut down and if a control bit FASTWAKE in the system control registers is set. As indicated above, PLL 108 is always shut down in the Standby state. If the FASTWAKE bit is cleared, the latency is between 0.125 seconds to 0.25 seconds. If this bit is set however, then the latency will be between 250 microseconds and 500 microseconds.
  • an external clock is used and disabled during Standby, the latency may also be between 0.125 seconds and 0.25 seconds to allow for oscillator stabilization. If the external clock is not disabled, the latency can be reduced to a few microseconds.
  • An Interrupt can also cause integrated circuit 100 to leave the Idle state. In this case the CPU clock must be restarted and additionally, interrupt servicing may be delayed for instruction execution as described above.
  • an on-chip boot ROM 117 is provided which maintains a set of instructions for initializing integrated circuit 100. On-chip boot ROM also configures UARTl, discussed further below, to received 2048 bytes of serial data which are downloaded into on-chip SRAM 118.
  • processor can continue executing instructions by jumping to the start of the SRAM.
  • this configuration allows code to be downloaded to program a system flash memory during the manufacture of a device employing integrated circuit 100.
  • the user may select between booting from on-chip ROM 117 or from an external memory connected to port CS[0]. Specifically, if the signal at pin MEDCHG is low, boot is from on-chip ROM 117 while a high signal applied to this pin requires that boot be performed from the external memory.
  • the effect of booting from the on-chip boot ROM is a reversal of the decoding of all chip select signals internally. This feature is illustrated in TABLE 5A with the normal, unreversed chip select decoding illustrated in TABLE 5B. Additionally, boot can be accomplished from external memory, with the width of the boot device having a selectable width in accordance with TABLE 4.
  • the ARM720T processor has a 4 Gbyte address space.
  • integrated circuit 100 uses the lower 2 Gbytes of the address space for ROM/RAM/Flash and expansion space. Another 0.5 Gbyte is used for DRAM and the remaining 1.5 Gbytes, less 8K for internal registers, is unused.
  • a memory and I/O expansion interface supports six separate linear memory or expansion segments to external expansion memory 204.
  • Two additional segments are dedicated to the on-chip SRAM and ROM.
  • Each segment is 256 megabytes in size. Any of the six segments can be used to support a conventional SRAM interface.
  • each segment can be individually programed to be 8-, 16- or 32-bit wide, to support page mode accesses, and to execute from one to eight wait states for nonsequential addresses, and zero to three for burst mode accesses.
  • the zero wait state sequential feature allows integrated circuit 100 to interface with burst mode ROMs. It should be noted that the on-chip ROM space is fully decoded while the complete SRAM address space is fully decoded only up to the maximum size of the video frame buffer used to drive an external LCD (up to 128 kBytes).
  • Two of the expansion segments can be reserved for establishing an interface with two PC Card cards 205 using the chip select signals NCS4 and NCS5.
  • Interface with the external PC cards is preferably made through Cirrus Logic CL-PS6700 PC card slot drivers 206.
  • the memory is segmented to allow different types accesses to take place (i.e., attribute, I/O, and common memory space).
  • the EXPCLK port to expansion control block 119 outputs an expansion clock which is equal to the CPU clock in the 13 MHz and 18 MHz modes, and has a rate of 36.864 MHz when integrated circuit 100 is operating in the 36, 49, or 70 MHz modes.
  • EXPCLK port is used as the clock input in the 13 MHz mode discussed above).
  • the EXPRDY pin (Expansion Port Ready) is driven low by the external expansion devices to extend the bus cycle and insert wait states.
  • the chip select signals CS[0:3] are used for SRAM expansions while chip select signal
  • CS[4:5] can be used for either memory expansion or PC card selection.
  • the write strobe WRITE is low during reads from and high during writes to, the expansion devices.
  • the word/halfword bits (2) indicate to the external devices during writes from integrated circuit 100 whether the access size is in words, halfwords or bytes.
  • DRAM controller 120 provides a programmable 16-bit or 32-bit wide interface to up to two banks 207 of DRAM, with each bank having a storage capacity of up 256 Mbytes.
  • the DRAM banks can be any of a number of types of DRAMs available in the marketplace, including conventional DRAM, synchronous DRAM (SDRAM), extended data out DRAM (EDODRAM), fast page mode
  • DRAM double data rate DRAM
  • DDRDRAM double data rate DRAM
  • these DRAMs can be of the self-refresh type which are placed in a low power state when integrated circuit 100 enters the Standby state discussed above.
  • two row address strobes RAS[0:1] can be generated along with four column address strobes CAS[0:3].
  • the output enable signal MOE is used for either the DRAM,
  • DRAM controller includes a programmable refresh counter, with the refresh period by controlled using the refresh period register (DRFPR).
  • DRAMPR refresh period register
  • TABLES 7 and 8 illustrate DRAM address mappings for 32- and 16-bit DRAM memory systems.
  • the 32-bit is assumed to be based on two x 16 devices connected to each RAS line with 32-bit DRAM operations selected.
  • the mapping is repeated for every 256 Mbytes in each bank.
  • the placeholder "n" is these tables is equal to OxC + bank number.
  • the 16/32-bit DRAM selection is programmed by setting a bit in system control register SYSCON2.
  • Flash interface 121 allows integrated circuit 100 to interface with flash memory, using the chip select signal CS[0:1] described above.
  • LCD controller 122 provides all the necessary control signals to allow integrated circuit 100 to interface directly to a single panel multiplexed LCD module 209.
  • the total frame buffer size is programmable up to 128 KBytes, using both on and off chip memory.
  • a system can be built using no external DRAM, with on-chip SRAM 118 used as the LCD video frame buffer, as described above.
  • the screen is preferably mapped to the video frame buffer.
  • LCD direct memory access (DMA) engine 123 is provided for fetching display data for LCD controller 122 from frame buffer memory.
  • the pixel bit rate, hence the LCD refresh rate can be programmed from 18.432 MHz to 576 kHz when operating in the 18.432-73.728 MHz modes, or 13 MHz to 203 kHz when operating from a 13 MHz clock.
  • Integrated circuit 100 includes a pair of universal asynchronous receive-transmit (UART) interfaces 124 and 125. These asynchronous ports can be used, for example, to communicate with a pair of RS-232 transceivers 210.
  • Each UART 124/125 can support data rates of up to 115.2 Kbits per second, when integrated circuit 100 is operating from clocks generated by PLL 108.
  • the UART bit rates that can be generated include 9.6 Kbps, 19.2 Kbps, 38 Kbps, 58 Kbps and 115.2 Kbps.
  • Both UARTs 124/125 include a 16-byte transmit FIFO driving a corresponding transmit (TX) pin and a 16-byte receive FIFO for receiving data from a dedicated receive (RX) pin.
  • An RX interrupt is asserted when a given RX FIFO becomes one-half full or if that FIFO is non-empty for longer than three character length times with no more characters being received.
  • a TX interrupt is asserted whenever the given TX FIFO buffer reaches one-half empty.
  • UART 124 UART 1
  • UART 124 can also receive the three modem control signals CTS, DSR, and DCD.
  • An additional modem control Rl input and output modem control signals RTS and DTR can be implemented using the GPIO ports 129 discussed further below.
  • a Modem Status Interrupt for UARTl is generated if any of these modem control bits change.
  • UART operation and line speeds are programmable through the UART bit rate and line control registers (UBLCl and UBLC2).
  • four of the FIFOs can also be programmed to have a 1-byte depth. Framming and parity error bits, which are detected at each byte is received, are also readable from 11 -bit wide registers.
  • Integrated circuit 100 also includes an IrDA (infrared data association) SIR protocol post processing stage 126 at the output of UARTl 124.
  • SRI encoder 126 is switched into the TX and RX ports of UART 1 , such that these signals can drive the infrared interface directly.
  • Integrated circuit 100 additionally includes an SPI/ Micro wire master mode 128 Kbps ADC interface 127 and serial interface 128, which is shown in further detail in FIGURE 6.
  • a preferred serial pin assignment for the Digital Audio Port is found in TABLE 10.
  • SPI interface 1 ADC interface
  • Serial interface block 128 includes a master slave mode SBI/Microwire (SSI2) interface 603, digital audio interface (DAI) 601, and codec interface 604, all of which are multiplexed through multiplexer 602 onto a single set of external interface pins.
  • the selected interface drives the corresponding circuitry in block
  • ADC interface 127 is compatible in a default mode with SSI or Microwire compatible devices such as the MAXIM, MAX148/9 peripherals.
  • ADC interface 127 is compatible in a default mode with SSI or Microwire compatible devices such as the MAXIM, MAX148/9 peripherals.
  • FIGURES 7A and 7B Exemplary timing diagram when integrated circuit 100 is driving a MAX148/9 and a AD7811/2 are provided as FIGURES 7A and 7B respectively.
  • An exemplary I 2 S interface is shown in FIGURE 8.
  • the clock output frequencies for ADC interface 127 can also be set using the system control registers.
  • the ADC clock (ADCCLK) can be set to either 4, 16, 64, or 128 KHz.
  • the ADC clock can be set to 4.2, 16.9, 67.7, or 135.4 kHz.
  • the sample clock SMPCLK always runs at twice the frequency of the shift clock (ADCCLK).
  • the available ADC frequency options are set forth in TABLE 12.
  • the ADC serial output ADCOUT is fed by either an 8-bit or 16-bit shift register in response to a bit set in the SYNCIO register.
  • the ADC serial input channel ADCIN is captured by a 16-bit shift register.
  • the ACD clock synchronization pulses are activated by a write to the output shift register.
  • an SSEOTI interrupt is asserted and the SSIBUSY bit is cleared.
  • the sample clock SMPCLK is independently enabled.
  • Digital Audio Interface 501 provides an interface to CD quality A/D and D/A converters, such as that shown in FIGURE 9.
  • DAIs are a subset of I 2 S.
  • FIGURE 10 is an exemplary timing diagram illustrating the operation of DAI 601.
  • the left-right clock (LRCK) provides the frame synchronization signal.
  • the serial clock is the bit transfer clock and preferably has a rate fixed at 128 times the audio sample frequency.
  • the SDOUT (SDATAO) and SDIN (SDATAI) are respectively used for sending playback data an external D/A converter and for receiving record data from an external A/D converter.
  • Timing between integrated circuit 100, an external D/A converter and/or an external A/D converter is based of the oversampled clock MCLK.
  • the MCLK has a rate fixed at 256 times the sampling frequency.
  • Asynchronous serial interface 2 (SSI2) 503 is an SPI/microwire interface that can operate in a full master-slave mode.
  • FIGURE 10 illustrates a pair of integrated circuit 100 devices configured to operate in a Master-Slave fashion.
  • the preferred sustained data rate is 85.3 Kbps, which ensures a sufficiently long period between interrupts.
  • An interrupt is generated when the receive FIFO is half-full and the transmit FIFO is half-empty.
  • the serial clock (SSICLK) and the serial receive port (SSIRXDA), the received synch control pin (SSIRXFR) and the transmit synchronization pin (SSITXFR) are inputs and the transmit pin SSITXDA is an output.
  • pins SSICLK, SSITXDA, SSITXFR and SSIRXFR are outputs and pin SSIRXDA is an input.
  • Mode selection is through the programming of bits in the system control registers.
  • Asymmetric (unbalanced) and continuous traffic are both supported through the use of the separate transmit and frame synch control lines SSITXFR and SSIRXFR.
  • the receiving node receives a byte of data on the eight clocks following the assertion of the received frame synch control signal and the sending node transmits a byte on the eight clocks following the assertion of the independent transmit frame synch control pulse. Exemplary timing diagrams illustrating the operation of these two interfaces are provided in FIGURES 7 A and 7B for reference.
  • Codec Interface 604 supports a direct connection to a telephony codec. Along with clock and control signal generation, codec interface 604 also performs parallel to serial and serial to parallel conversions. The interface is full duplex and employs corresponding transmit and receive FIFO operating at 64 Kbs. When enabled, the codec interrupt CSINT is generated every 8 bytes transferred (i.e., FIFO half full/empty) or, in other words, every 1 msec with a latency of 1 msec. This timing is illustrated in FIGURE 8, where CDENRX and CDENTX are respectively the receive and transmit control bits in system control register
  • DAI 601 supports an I 2 S interface, such as interface 900 shown in FIGURE 9.
  • a clock source 903 provides the time base.
  • An exemplary timing diagram is provided in FIGURE 10.
  • the MCLK is the oversampled clock which is typically fixed at 256 times the audio sampling frequency.
  • the SCLK is the bit clock which is typically fixed at 128 times the audio sampling frequency.
  • the LCLK is the frame sync signal and is typically fixed at the audio sampling frequency.
  • SDOUT is the audio data output sending playback digital audio to DAC
  • SDIN receives record data from ADC 901.
  • SSI1 interface 603 supports master-slave operation as shown in FIGURE 11. This interface provides a means for effectuating full duplex serial transfers between two nodes. Data are transferred in bytes in response to a clock and a frame synchronization signal.
  • Integrated circuit 100 is also provided with a set of general purpose input output (GPIO) ports 129.
  • GPIO general purpose input output
  • the GPIO ports can be used for such purposes as establishing an interface with a keyboard driver 215.
  • Pulsed with modulator (PWM) circuitry 130 includes two outputs for driving DC to DC 216 converters operating in conjunction with external power supply unit (PSU) subsystem 201.
  • the external input pins normally connected to the output from comparators monitoring the external DC to DC converter output are used to enable these clocks.
  • PWM clocks When integrated circuit 100 is operating from internal PLL 108, the PWM clocks have a frequency of 96 kHz.
  • the duty cycle ratio for these signals can be programmed from 1 and 16 to 15 and 16.
  • the sense of the PWM drive signal active cycle can be set high or low by latching the state of the drive signal during power on reset (i.e., a pull up on the drive signal will results in an active low drive output, and vice versa).
  • a pull up on the drive signal will results in an active low drive output, and vice versa.
  • either positive or negative voltages can be generated by the external DC to DC converters. These outputs can similarly be disabled by clearing bits in a control register.
  • Advanced Peripheral Bus 132 Communication between the blocks of integrated circuit 100 is established through an Advanced Peripheral Bus 132 and an Advanced Peripheral Bus Bridge 131.
  • Internal data bus 106 is 32-bits wide and can be connected to the external devices through multiplexing circuitry 132.
  • Internal address bus 107 is 28-bits wide and can communicate with external devices through multiplexing circuitry 133.
  • ICE-JTAG circuitry 134 which is IEEE 1149.1 compliant, is included for boundary scanning during test and development. Additionally, the Embedded ICE supports the debugging of the ARM processor core.
  • the internal registers of integrated circuit 100 are in the little endian configuration.
  • integrated circuit 100 can advantageously interface with a big endian external memory system. Specifically, the big end bit and the CPU 101 register sets determines whether words in the external memory are being stored in a big endian or little endian format.
  • memory is viewed as a linear collection of bytes numbered upwards from zero.
  • Bytes 0-3 hold the first stored word
  • bytes 4-7 the second stored word, and so on.
  • the little endian scheme the lowest number byte in a word is considered to be the least significant byte of the word and the highest number byte is the most significant word.
  • byte zero in a little endian system are connected to data lines 7-0.
  • the big endian scheme the most significant byte of a word is stored at the lowest numbered byte, and the least significant byte is stored at the highest number byte. Therefore, byte zero in a big endian system is connected to data lines 31-24.
  • TABLES 13 and 14 illustrate the operation of integrated circuit 100 for both reads (TABLE 13) and writes (TABLE 14).
  • the column address strobe lines NCAS[3:0] to the DRAM banks are always connected to the same byte lane irrespective of the endianness.
  • column address strobe line NCAS[0] will be associated with data line D[7:0] and NCAS[3] associated with data lines D[31:24].
  • Integrated circuit 100 includes a set of programmable fuses which allow each chip to be assigned one or more unique ID numbers and passwords.
  • the programmable fuses and related registers are disposed within Security Registers and Hardware block 133 operating off APB 132 (FIGURE 1). With specific regards to the embodiment of FIGURE ID, the boot ROM itself will reside on the
  • ARM local bus 107 and the access checking will be split and have logic on both the ARM local bus and within the ARM local-global AHB wrapper.
  • the addresses and values of the private fuses are hidden such that only private firmware corresponding to those fuses is allowed accesses. In a non-private environment, these addresses and values return all zeros.
  • Integrated circuit 100 also includes embedded hardware within block 133 to check the fused hamming code with the hamming code that matches the selected
  • Table 18 provides the addresses that return the validation codes for the public ID-CHK pairs.
  • Table 19 provides the addresses that return the validation codes for the private ID-CHK pairs. These addresses are only accessible by the firmware when integrated circuit 100 is operating in a private mode and will read 0's otherwise.
  • Figure IB is a high level functional block diagram of a second system on a chip 140 suitable for practicing the principles of the present invention.
  • This embodiment employs an ARM920T processor 141 having both instruction and data caches, as well as an MMU.
  • System 141 does not include general purpose SRAM in contrast to integrated circuit 100.
  • FIGURE 13 is a more detailed functional block diagram of processor 141, in particular for those embodiments based on an ARM 920T core.
  • the available cache comprises both an instruction cache 1301 and a data cache 1302.
  • separate instruction and data MMUs 1303 and 1304 are used.
  • the instruction modified virtual address (IMNA), instruction physical address (IP A) and instruction data (ID) buses are each 32 bits wide.
  • the data modified virtual address (DNMA), data physical address (DP A) and data data (DD) buses are 32 bits wide.
  • Physical addresses and data are exchanged to AHB bus 142 through AMBA bus interface 1305.
  • a write buffer 1306 allows for the parallel exchange of data through interface 1305 during processor core operations.
  • Data from data cache 1302 can be output through write-back physical address (PTAG) RAM 1307.
  • PTAG write-back physical address
  • Integral to the processor core is a coprocessor which includes a register for translating virtual addresses issued by the CPU into the modified instruction and data virtual addresses (MN A) transmitted on IMNA and DMNA shown in FIGURE
  • System 141 is based on an internal AHB (Advanced Microcontroller Bus
  • AHB/APB bridge 144 interfaces AHB 142 and APB 143.
  • a second bridge 145 interfaces processor 141 with AHB 142.
  • Graphics Engine 146 and Raster Engine 147 are devices operating off AHB 142.
  • Graphics Engine off-loads such functions as block transfers and line draws from processor 141 to improve system graphics performance.
  • Graphics Engine 146 uses a standard Device Independent Bitmap (DIP) format for supporting Windows CE.
  • DIP Device Independent Bitmap
  • Raster Engine 147 is provided to raster data from an external display buffer, through synchronous DRAM interface 148, to drive an external LCD, CRT or TN display unit.
  • Additional on-chip interfaces to internal AHB include an interface 149 for coupling system 141 to an external system bus, a PCMIA for interfacing with an external PC card, and Test Interface Controller (TIC) interface 151 for testing such on-chip circuit blocks as the DMA controller and the raster system.
  • Memory interface 152 provides for the exchange of control signals and data with external SRAM, Flash or ROM in a manner similar to that discussed above.
  • boot of the system which will be discussed further below, is effectuated, at least in part, using the Boot ROM 153. In this example, boot ROM 153 is operating off AHB
  • System 140 includes an 8-channel DMA engine 154, which prioritizes and services request by on-chip resources, such as the UARTs, for accesses to external memory.
  • the Joint Test Action Group (JTAG) port 155 supports debugging of the on-chip processor and related circuitry Additionally, a Universal Serial Bus (USB) controller 156 and Ethernet port 157 operate directly from the AHB.
  • JTAG Joint Test Action Group
  • USB Universal Serial Bus
  • a number of peripheral devices are provided on-chip and operate off of APB 143.
  • system 140 includes three UARTs 158, 159 and 160. Additionally, a pair of SPI interfaces 161 and 162 and an AC97 interface 163 are included in the illustrated embodiment.
  • a real time clock (RTC) 165, general timer set 166 and watchdog timer 167 are also provided in this embodiment.
  • An additional memory interface, EEPROM interface 168, also couples to the APB. Manual input of data can be made through an external key matrix coupled to Key Matrix Interface 169, or a Touchscreen interfacing with Touchscreen ADC 171 and Touchscreen Interface 170.
  • LED outputs 172 are also included in the system 140 user interface.
  • FIGURE 1 C is a high level functional block diagram of another exemplary system-on-a-chip 180 to which the principles of the present invention can be suitably applied.
  • the CPU core 181 which could be for example a ARM7TDMI controller, does not utilize an MMU or on-chip cache.
  • CPU 181 operates in conjunction with the AHB bus 142 via a local AHB bus and Local/Main AHB interface 182.
  • CPU 181 is supported by memory 183, security gates 184 and security/reset circuitry 185. Security will be discussed in further detail below.
  • system 180 additionally includes a digital signal processor (DSP) 186 supported by global memory 189, data memory 190 and program memory 191.
  • DSP digital signal processor
  • DSP Peripheral Bus 196 These devices also operate off the APB.
  • peripheral devices also operating from the APB include a USB Slave Port 197, SPI for Serial Media Input 198 and
  • the Motion Picture Expert Group (MPEG) audio compression standard defines the syntax for a coded stream of digitized audio data, along with a process for decoding that stream.
  • MPEG Motion Picture Expert Group
  • three layers, Layers I - HI respectively, are defined.
  • Layer HI which provides the highest quality audio reproduction, will be considered.
  • the encoding process begins with the sampling of one or more audio channels at a given sampling rate, which may be 32, 44.1 or 48 kHz.
  • the resulting digitized stream is passed through a polyphase filter bank which divides the received time-domain stream into 32 frequency subbands.
  • the filter bank operates of 64 input samples at a time with 50% overlap such that 32 output frequency-domain samples are produced for 32 input time-domain samples.
  • a psychoacoustic model is used to remove those parts of the audio signal which cannot be heard by the human ear due to auditory masking. Auditory masking is the characteristic of the human auditory system wherein a strong audio signal renders a temporally or spatially close weaker audio signal imperceptible. Moreover, the ability of the human ear to distinguish sounds is frequency dependent. Within certain critical bands, the ear does not precisely delineate between various in-band audio components. The processing subbands, which approximate these critical hearing bands, are quantized as a function of the audibility of the quantization noise within that subband.
  • the psychoacoustic model engine determines the available noise masking for a given frequency component and a given loudness. From this information, the data stream output from the polyphase filter are quantized and coded. In Layer HI, each of the 32 subbands output from the polyphase filter are passed through a window which parses the stream into long blocks of 18 samples or short blocks of 6 samples, with 50% overlap such that the window lengths are respectively 36 and 12 samples wide. Long blocks are used to achieve better frequency resolution for the relatively constant components of the audio signals while short blocks are used for improved frequency resolution of transients. The blocks for each subband are then processed with a Modified Discrete Cosine Transform (MDCT). The subbands are further divided in frequency to improve spectral resolution such that some of the aliasing caused by the polyphase filter can be canceled.
  • MDCT Modified Discrete Cosine Transform
  • Layer HI the quantization is non-uniform to make the signal to noise ration over the range of quantization values more consistent. Additionally, Layer HI utilizes scale factor bands of approximating critical band widths and cover several MDCT coefficients. The scale factors are used during noise allocation to vary the frequency-dependent masking threshold, and essentially set the gain for each subband. Moreover, Huffman encoding is performed on the quantized MDCT coefficients for improving data compression. Finally, a "bit reservoir" is employed, to which bits can be donated when less than the average number of bits are required to code a frame and from which bits can be borrowed when more than the average number of bits are required to code a frame.
  • Frames are formed from a header, a CRC value, side information and main data, although the relative position of these components of the frame are not necessarily always in the same sequence, or even adjacent in the stream.
  • the header includes a set of frame sync bits, MPEG version and layer identifiers, a CRC protection bit, a bitrate index indicating the bitrate at which the frame was created, and a sampling rate frequency index indicating the frequency at which audio data was sampled, and along with additional information about the transported data.
  • An MPEG- 1 , Layer HI bitstream can then be decoded generally as follows. Data is input to the decoder in a predetermined number of frames per second. The frame sync bits in the header portion of each frame is detected. Next, the scale factors are extracted and decoded. This is followed by decoding of the Huffman encoded main data representing the frequency energies. The scale factors are applied and the data requantized. At this point, if stereo data is being processed, the stereo channels are recovered and aliasing reduction performed. An inverse MDCT operation is performed to followed by an overlapping inverse Discrete Cosine Transform (DCT) to return the data to the time-domain. A low pass filter is applied to recover the PCM samples, each of which is essentially a weighted average of the adjacent 512 time-domain samples.
  • DCT overlapping inverse Discrete Cosine Transform
  • a stereo DAC such as a Cirrus Logic CS43Lxx Stereo Audio DAC
  • Digital Audio Port 128 for driving a set of headphones.
  • An analog to digital converter such as a Cirrus Logic CS53L32 Audio A/D Converter may also be coupled to this port for the input of data from a microphone.
  • This embodiment of FIGURE ID includes an on-chip PWM circuit that can drive headphones at CD quality levels without an external stereo audio DAC. It is often necessary to prevent tampering, copying or logic analyzer examination of the software and firmware bundled with an electronic product.
  • some level of security must be provided, for example through the use of encrypted passwords, which allow the manufacturer authorized end users access to the system memory assets for purposes of downloading, debugging, and upgrading the software or firmware, but denies that same level of access to unauthorized end users.
  • this will allow online music distributors the confidence to allow end users who have paid the royalty and received the requisite passwords to download songs, with the knowledge that unauthorized downloads will at least be deterred to some degree.
  • secure information such as encrypted passwords, security code, and the information concerning locations in memory where the secure information resides, must not be readily accessible outside the system. Notwithstanding, this secure information must be checked during production test procedures to guarantee acceptable end user system quality with regards to normal manufacturing defects. Finally, if security measures are not provided or not invoked, normal operation of the system should proceed in the expected fashion.
  • the principles of the present invention provide security techniques which allow integrated circuit 100 to meet each of these criteria. In accordance with one such technique, the capability of processor 101, in response to either certain default conditions or the dynamic assertion of certain instructions, to reverse the Chip Select signal decoding discussed above is employed. By reversing the chip select decoding on power-on reset, the security code can be run from a normally unaccessible memory space.
  • FIGURE 12 is a flow chart illustrating a preferred procedure 1200 for booting integrated circuit 100 in accordance with the inventive concepts. It will be assumed that processor 101 is an ARM720T or ARM920T processor, and signal names will be in reference to the signals and/or instructions thereof.
  • the procedure begins with the power on reset of integrated circuit 100 by the assertion of the power-on reset (NPOR) signal, at Step 1201. Circuits within the system immediately disable all hardware and debug features and hides all security elements (e.g. firmware, registers, passwords) from external probing (Step 1202). This step insures that the system is secure, at least up until Step 1203, where a check is made to determine if security firmware routines are in place and enabled. In the preferred embodiment, this is accomplished by reading the programmable fuse registers.
  • NPOR power-on reset
  • Step 1204 a determination is made as to whether boot is to continue from an internal ROM or if an external memory will be used.
  • the NMEDCHG bit is used to select between internal and external boot memory options. If at Step 1204, the signal at pin NMEDCHG is clear (i.e. in an active low state), then boot of integrated circuit 100 will be from internal ROM. In this case, the address mapping to internal boot ROM is reversed by default at Step 1205. After reversal of the address mapping, execution is from current boot ROM location 0 (Step 1207). In this illustrated embodiment, the power-reset signal NPOR must be asserted to return the address mapping to its normal state.
  • boot will be from external memory (ROM/EPROM/Flash).
  • the chip select mapping is set as shown in TABLE 5 A with the external Chip Select 0 being selected as the boot memory.
  • Integrated circuit 100 can be configured to respond to different sets of boot and/or security code. This advantageously allows integrated circuit 100 to operate using the boot/security firmware from multiple vendors, even though the secure information of each vendor may only be accessible by that vendor's own boot/security procedures.
  • the boot memory is programmed with multiple boot code sets or options. This can done using the internal boot ROM or one or more chips of external memory (ROM/RAM/Flash). With multiple boot options, the end user will be able to select between security firmware available from different vendors. Consequently, at Step 1209 a first one of the booting options in boot memory is identified and at Step 1210 aliased to the reset vector, typically location 0x00 for the first option.
  • Step 1211 All necessary security elements (registers, firmware, I/O devices) required for the given implementation are enabled by the current boot option while all other security options (implementations) are kept hidden (Step 1211).
  • the selected boot code is then run by the processor at Step 1212 to attempt to initialize for the selected security firmware/software.
  • Step 1213 If at Step 1213 the proper security firmware/software is found in memory as called by the boot code, then integrated circuit 100 completes boot and runs in the selected secured environment at Step 1214 under supervising control of the security firmware/software. On the other hand, if the required security firmware/software is not found, another boot option must be tried.
  • Step 1216 If the last security option has not been reached at Step 1215, then the next security option in boot code is selected (Step 1216). An instruction is issued which dynamically forces the processor to the new reset vector. In this instance, the reset vector jumps to point to the second security option in boot code. At Step 1218, the processing returns to Step 1211 and the boot process is attempted again.
  • the instruction pipeline has three stages. Consequently, the instruction resetting the program counter to 0 has already been loaded from internal boot ROM before execution of the instructions that changes the chip selects.
  • the MON pc, #0 instruction causes the processor pipeline to be flushed thereby allowing several cycles to occur before the change of chip selection must occur. During this process, no other accesses are allowed to those memory resources whose chip select signals will change during the execution of the remap command.
  • Step 1215 This process repeats itself until either a security option is found which causes integrated circuit 100 to enter secure operation at Step 1214 or the last security option is reached at Step 1215.
  • the last or default option returns the boot procedure at Step 1219 back to a normal (unsecure) boot.
  • all the debug features are enabled and the security features are hidden at Step 1220.
  • Step 1221 a default boot ROM is selected and at Step 1222 the processor is dynamically forced to the reset vector.
  • default security code may be provided in order that integrated circuit 100 can still run in a secure environment even though all of the primary options are unavailable.
  • instructions and data can be locked into the corresponding instruction and data caches, such that they are not chosen as victims for replacement by the replacement algorithm on a cache miss. Locked in data/instructions guarantee a cache hit with the corresponding information being fetched directly from cache and the favorable cache access latency. Moreover, the locked encached information is unaccessible outside of integrated circuit 100, except through the JTAG port or other test-debug modes that allow visibility to the cache or TLB memories.
  • the JTAG port used primarily during product development and testing, can be disabled integrated circuit 100 leaves the manufacturing floor.
  • TLB Translation Look aside Buffers
  • many devices such as the ARM 920T used in the present examples, include both data and instruction translation look-aside buffers (TLBs).
  • TLBs data and instruction translation look-aside buffers
  • the CPU For a given instruction or field of data, the CPU generates a virtual address. A modified virtual address is then presented to the corresponding TLB and a comparison is performed between fields of the modified virtual address and the comparison (tag) registers in the TLB.
  • the physical address bits returned from the corresponding TLB entry is used, along with index bits from the modified virtual address, to generate a physical address, accessing cache or external memory, as required. If a miss occurs, the process discuss below is invoked to translate the virtual address into a physical address in hardware.
  • TLB entries are locked by writing identifiers for the specific entries in the data and instruction TLBs being locked into TLB Lock Down field of the System Control Processor register Cl 5.
  • TLB Lockdown procedure 1300 of FIGURE 13 is one method of locking entries in either an instruction or data TLB.
  • a page table is setup including physical address bits and permissions corresponding to the protected data or instructions.
  • At least some of the entries in the target TLB are then flushed or cleaned to insure that the code to be locked is not already in the TLB registers (Step
  • both the data and instruction TLBs are organized in a single segment of 64 lines.
  • a replacement (victim) counter points to the entry being replaced. Therefore, at Step 1303, the replacement counter is updated to point to the next entry to which locked information is to be written. In the preferred embodiment, the process begins at entry 0.
  • a Prefetch instruction is used to generate a modified virtual address force a TLB miss (Step 1304).
  • TLB miss a Prefetch instruction is used to generate a modified virtual address force a TLB miss (Step 1304).
  • a Load instruction can be used to force the miss.
  • a page table walk must be performed to generate the descriptor (e.g. physical address and permissions) to be loaded into the TLB (Step 1305).
  • the descriptor generated from the page table walk using physical address bits from the accessed page table entry and index bits from the modified virtual address, is loaded into the given TLB at the entry pointed-to by the current replacement counter contents.
  • the loaded TLB entry is locked at Step
  • Step 1307 by setting a bit in a corresponding TLB Lockdown register. If the last entry has been reached at Step 1308, the procedure ends, otherwise, at Step 1309, the procedure loops back to Step 1303 and the replacement counter updated in preparation to load the next entry.
  • FIGURE 14 illustrates a cache lockdown procedure 1400 for locking secure code into cache.
  • a cache miss must be forced in the illustrated embodiment.
  • a preferred method of forcing a cache miss is discussed later in conjunction with FIGURE 15.
  • an actual or emulated page table is set up with the physical addresses to the locations in memory where the data or instructions to be locked in cache reside.
  • This table is used to update the corresponding TLB, preferably using procedure 1300.
  • the given cache is flushed or cleaned of at least some cache lines to insure that the code to be locked-in is not already encached.
  • the replacement (victim) counter associated with the cache is forced to point to the first cache line (cache line 0) at Step 1403.
  • each of the data and instruction caches is partitioned into 8 64-line segments each indexed by index fields in the modified virtual address.
  • cache lines will be filled sequentially, with for example, all cache lines 0 of all segments filled in sequence first, followed by the sequential fill of all cache lines 1, and so on.
  • the data or instructions to be encached are generated, possibly requiring a decryption process (de-encrypted), and stored at corresponding locations in an alternative memory, such as internal SRAM or external SRAM/DRAM/Flash at
  • Step 1404 a Prefetch Cache Line operation is performed for an instruction encache to invoke a look-up at the pointed-to cache entry.
  • the LOAD instruction can be used. This causes a cache miss thereby requiring the processor to access the alternate memory containing the necessary data or instructions. It can do this by referring to the TLB for the necessary bits for the physical address, if the TLB is current and accurate, or by walking through the page tables set up at Step 1401 directly.
  • the physical address itself is generated from base address bits in the entry accessed in the TLB and index bits from the virtual address.
  • Step 1405 the generated code or data is placed where the cache miss is to be processed and a line fill is performed at Step 1406 to the cache line at the current replacement pointer entry.
  • the cache segment is indexed by cache segment index bits from the virtual address causing the cache miss.
  • the processor increments the cache segment index bits at Step 1408 to force the next cache access to the next cache segment at the current replacement counter value.
  • the procedure returns to Step 1404 and continues from there. However, if the just completed operation was to the last cache segment, and more cache operations are to follow (i.e. the last cache line to be filled has not been reached at Step 1409), then at Step 1410, the procedure jumps back to Step 1403, the replacement counter value is updated, and the procedure continues from that point.
  • Counter base is set to a value one higher that the base to the locked cache lines (Step 1411). This insures that the private data (now decrypted) will not be overwritten on a cache miss or become accessible by an unauthorized party.
  • the code can then be executed from cache at Step 1412.
  • One means of creating locked, encached data without using memory locations for the entire region to be locked is to use a cache line's length of registers to emulate the region.
  • cache miss emulation can also be used to remedy hardware limitations on the cache locking granularity. For example, in the ARM 920T embodiment, cache can be locked in 64 word blocks (256 bytes). Each cache line however is only 8 words (32 bytes) long, and therefore can be mapped a different locations within the 64 word block, depending on the address bits.
  • ECLINE emulated cache line
  • ECOFFSET comparison (offset) register
  • An emulated cache miss procedure 1500 is then set forth in the flow chart of FIGURE 15.
  • Step 1501 the contents to be encached (in either the instruction or data cache) are written into the ECLINE registers.
  • An offset to the Lockable cache space to which the data are to be written is then programmed into the
  • an operation is performed to cause a cache miss For the instruction cache, this can be done through a Prefetch Instruction for the instruction cache, and for the data cache, through a Load.
  • the virtual address generated to this location causes a miss to the given cache, and the corresponding physical address is then generated using index bits from the virtual address and base bits retrieved from the appropriate TLB or alternatively, through a page table walk.
  • the information in the corresponding ECLINE registers is retrieved and at Step 1506, loaded into the cache at the addressed entry. This entry is now prepared to be locked using procedure 1400.
  • the procedure has allowed the locked portion of cache to be loaded without resort to either internal or external SRAM.
  • page table walks are required during cache and TLB locking operations in order to generate addresses to physical memory from where the data or instructions are to be retrieved.
  • the present inventive concepts allow for the creation of streamlined page tables which save on the amount of memory which must be dedicated to page table support. Moreover, even in view of a TLB miss, the inventive concepts also protect data and instruction code against tampering, copying or electronic analysis through secure operation of MMU 104 during address translation by section/page table walks.
  • an ARM 920T processor core will be considered for illustrative purposes, although the inventive principles can be applied to the memory management schemes of other processors and memory management units.
  • a conventional page table walk for this embodiment generally proceeds as follows.
  • a section descriptor (Level 1), course page table or fine page table base address is retrieved from a 4096-entry Translation Base Table (TBT).
  • TBT Translation Base Table
  • the TBT is accessed using a TBT base address from the Translation Base Register and a Table Index field from the modified virtual address. If the output from the TBT is a section descriptor, that descriptor includes a Section Base Address and access permissions.
  • a physical address to a 1MByte section of memory is then generated using the Section Base Address bits from the Level 1 descriptor and Section Index bits from the modified virtual address. (Assuming that the permissions contained in the Level 1 Section Descriptor are favorable).
  • the Course Page Table returns either a Large or Small Base Address along with access permissions. Depending on the state of the permissions, the Large or Small
  • Page Base Address bits are combined with Page Index bits from the modified virtual address to produce a physical address either a 64 Kbyte large page or 4 Kbyte small page from memory.
  • Level 2 Descriptor which includes either a large, small or tiny base address along with access permissions.
  • Large pages are 64 kBytes, small pages 4 KBytes and tiny pages 1 Kbyte.
  • the page base address is concatenated with Page Index bits from the modified virtual address to generate a physical address to either the large or small pages in memory already discussed, or 1 kbyte tiny pages in memory.
  • the memory accessed as a result of the page table walk can be either cache, internal memory or external memory.
  • the physical addresses and permissions are used to update the TLB. Any secure information is then locked, as described above, into the TLB.
  • the table walking process can be significantly simplified and the amount of memory required for the translation tables greatly reduced. Not only is this important in terms of increased operating efficiency, but it also insures that resort to unsecure external memory is not required.
  • the memory space is divided up into 256 MByte regions, each of which is associated with a common set of access characteristics.(e.g. access permissions, cacheability, bufferability). Of only one of these regions, only 1 MByte requires a second level page table. Thus, since large regions of memory have common access characteristics, much smaller translation tables can be created within the available SRAM space.
  • the access permissions indicate whether given information can be accessed from the corresponding memory block.
  • the cacheability and bufferability attribute bits are used to determine if an accessed piece of information can be stored in cache or transferred through the write buffer. For example, the contents of the real hardware registers controlling the UARTs and other peripherals and I/O devices are generally not allowed to be cached or buffered by the CPU subsystem. This would cause incorrect behavior of these peripherals due to the timing of when the accesses would actually occur.
  • the page/section table information must be kept within the confines of the private area such that this information can not pass from memory to the device pins that can be examined by a logic analyzer.
  • a 32-bit register is created for storing the Level 1 AP bits, each two bit pairs mapping to a 256 MByte memory region. For example bits [1 :0] map to Region 1, bits [3:2] to Region 2, and so on.
  • a 16-bit register is setup for holding a set of bits indicating the cacheability of each region for Level 1.
  • Another 16-bit register is setup for holding either a set of bits indicating the cacheability of each region.
  • Step 1601 For a given 256 Mbyte region, a determination is made at Step 1601 as to whether it has a common set of access characteristics. If the determination is affirmative, then at Step 1602, the corresponding entry in the global Level 1 AP register is loaded with the appropriate AP bits. The corresponding entries in the global Level 1 bufferability and cacheability registers are similarly updated at Steps 1602 and 1603.
  • Step 1605 the procedure returns for the update of the register entries for the next memory region (block) requiring update.
  • Initialization/update of the global access control registers is preferably done in a loop.
  • the values in general do not change but can be updated if necessary during system processing.
  • the full register values for entries that are not to be synthesized are updated as appropriate during system operation. For example they will need updated when a page of memory is substituted for another when it is "swapped" out to disk or similar mass storage devices.
  • Level 2 For those memory blocks or registers which have a unique set of access characteristics at Step 1601, including access permissions, bufferability and cacheability bits, and physical address bits, a full 32-bit register is loaded with a complete Level 1 descriptor at Steps 1606 and 1607. The procedure again loops back at Step 1608.
  • This descriptor can include a course or fine page (Level 2) table address.
  • a constant is pointed to in hardwired gates at Step 1608.
  • the stored constant can be a fixed value or a base address to a Level 2 table. If a walk to
  • Step 1611 a corresponding register in the Level 2 synthesized table is set up at Step 1611.
  • Level 2 A similar process is used to synthesize Level 2. Specifically, for each Level 2 page, a register pointed-to by the Level 2 base address bits from the Level 1 registers A global Level 2 AP register, along with Level 2 bufferability and cacheability registers are set up as before, for pages and sub-blocks having common characteristics.
  • Step 1612 For a given page or set of pages, a determination is made at Step 1612 as to whether it has a common set of access characteristics. If the determination is affirmative, then a Step 1613, the corresponding entry in the global Level 2 AP register is loaded with the appropriate AP bits. The corresponding entries in the global Level 2 bufferability and cacheability registers are similarly updated at Steps
  • Step 1616 the procedure returns to Step 1601 for the update of the register entries for the next memory region (block) requiring update.
  • Level 2 pages sets of pages, blocks or registers which have a unique set of Level 2 access characteristics at Step 1612, including access permissions, bufferability and cacheability bits, and physical address bits
  • a full 32-bit register is loaded with a complete Level 2 descriptor at Step 1618. Otherwise, a constant is pointed to in hardwired gates at Step 1618.
  • the stored constant can be a fixed value, base address, or the like.
  • the procedure again loops back at Step 1619 to Step 1601.
  • An exemplary synthesized page table walk is illustrated in FIGURE 12B.
  • the table walk is requested. This request could be in response to a TLB and/or cache miss.
  • a second level of Table walk is not required at Step 1621.
  • the Level 1 registers discussed above are then pointed-to by the translation base register in the MMU at Step 1622.
  • the Level 1 register entries are indexed using the table index bits from the virtual address (Step 1623).
  • Steps 1624 and 1625 a determination is made as to whether the return from the indexed entry in the Level 1 registers is either a full descriptor or a constant. The case in which the return is neither a constant nor a full descriptor will be considered first.
  • the access control bits in the first level global access registers are retrieved.
  • the table index from the virtual address are then transformed into physical address bits at Step 1627 by moving bit positions relative to the virtual address.
  • the transformed virtual address bits for section entries will be the table index bits (bits 13:2 of the lookup word index into the 4096 entry level 1 page table) become bits (31 :20) of the result for the entry (1 Mbyte memory region).
  • the domain of the section will be defined by bits (13:10) of the memory location.
  • several bits in the page table entries are always a constant 0 or 1.
  • the Level 1 descriptor is formed at Step 1628 by merging the transformed address bits and the retrieved access control bits.
  • the synthesized descriptor is returned at Step 1629 for updating the TLB and or cache.
  • the Level 1 entry can also be a full descriptor (Step 1630), or a constant (Step 1631).
  • the descriptor or constant can be used immediately at Step 1632.
  • Step 1621 Assume next that a Level 2 table reference is required at Step 1621.
  • the Level 2 translation is similar to that performed when only a Level 1 reference is required.
  • Step 1633 the Level 2 registers, setup as described above, are pointed to by a base address in the MMU.
  • the specific register or entry is indexed using the table index bits from the virtual address at Step 1634.
  • a determination is made at Steps 1635 and 1636 as to whether the indexed register (entry) contains a full descriptor or a constant. If a descriptor is found, then that descriptor is retrieved at Step 1637 and if a constant is found, that constant is retrieved at Step 1638.
  • the descriptor or constant can then be immediately used at Step 1639.
  • Step 1640 the second level access control registers are accessed and the corresponding access control bits retrieved at Step 1641 using the page index bits from virtual address.
  • Step 1642 the page index bits from the virtual address are transformed into physical address bits by shifting bit positions. These physical address bits, along with the retrieved access control bits are merged at Step 1643 to form a synthesized descriptor.
  • the synthesized descriptor is returned at Step 1643
  • the inventive concepts also advantageously allow for address translation and TLB update upon a cache miss through register emulation of memory similar to the cache miss emulation. Subsequently, the cache and/or TLB entries can be locked as described above for security.
  • the preferred emulation process employs an alternate, emulated memory, such that the integrated circuit 100 internal memory can be spared for other tasks.
  • the memory addresses of the page tables are preferably mapped inside the integrated circuit.
  • a preferred procedure embodying these concepts is the Emulated Table Walk / TLB Update procedure
  • an emulated Level 1 Translation Register (table) (ELI TR) containing either Level 1 Descriptors or Level 2 base addresses is created at (Step 1701). Additionally, an emulated Level 1 Index Register (ELIIR), maintaining indices to the entries in the EL1TR, is setup in the alternate memory space (Step 1702).
  • ELI TR Level 1 Translation Register
  • ELIIR emulated Level 1 Index Register
  • TTB Translation Base Table
  • EL2TR Level 2 Translation Register
  • Step 1705 an emulated Level 2 Index Register holding the corresponding indices
  • Step 1706 virtual address is generated by prompting CPU 101 or through the use of an external address generator. If the cache and TLB have been flushed or cleaned, a cache/TLB miss will occur, and therefore, at Step 1707, the table walk procedure is invoked using the emulated level 1 table pointed-to by the MMU. Level 1 Table Index bits in the virtual address are compared with those in ELIIR and the corresponding Level 1 information returned from EL1TR (Step 1708).
  • Step 1709 If the information is a descriptor (i.e. no Level 2 translation is required) at Step 1708,then a Level 1 access is performed (Step 1709) wherein the permissions in the descriptor are examined (Step 1710). If permission is not granted, then the operation aborts at Step 1711. Otherwise, the physical address is generated from the Section address bits in the Level 1 descriptor, along with the Section Index from the virtual address, at Step 1712. The physical address can then be loaded into the TLB at Step 1713 to await locking and the corresponding data or instruction loaded into the appropriate cache. If it is determined at Step 1714 that the current entry in the TLB is not the last to be loaded, then at Step 1715 the procedure loops back to Step 1706 to initiate the next table walk. Otherwise, the TLB locking procedure is executed at Step 1716.
  • the Level 2 page walk is invoked at Step 1717.
  • the EL2TR registers are accessed (Step 1718) using the base address from ELI TR.
  • the specific register is indexed using the contents of corresponding EL2IR register by comparison against the index bits from the virtual address (Step 1719).
  • the permissions in the returned Level 2 descriptor are examined at Step 1720. If the access is not allowed, the access is aborted at Step 1721, otherwise the physical address is generated at Step 1717.
  • Step 1723 using the physical address bits from the Level 2 descriptor and index bits from the virtual address.
  • the physical address is loaded into the TLB at Step 1723 to await locking.
  • the TLB lock procedure can be invoked at Step 1725, otherwise, at Step 1726 the procedure jumps back to Step 1706 and the table walk for the next TLB entry is initiated-
  • a bare CPU may be employed which does not include a Memory Management Unit (MMU) or hardware cache.
  • MMU Memory Management Unit
  • CPU core 101 could be based upon an ARM7tdmi processor 102 alone, without cache 103 or MMU 104.
  • MMU Memory Management Unit
  • all software must be stored in memory in a flat memory space.
  • this may require the use of external memory (e.g. NOR Flash, SRAM, DRAM).
  • external memory e.g. NOR Flash, SRAM, DRAM
  • the security code runs in an supervisor mode.
  • the supervisor mode access to specific areas of memory and certain registers are subject to a check against supervisor privilege.
  • the security firmware preferably runs from internal memory, such as SRAM.
  • SRAM static random access memory
  • all other software/firmware is interpreted as running in the user mode and is therefore subject to supervisor privilege checking by the secured software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Microcomputers (AREA)
  • Storage Device Security (AREA)

Abstract

L'invention concerne un procédé permettant de synthétiser des tables de translation. Ce procédé comprend les étapes consistant à créer au moins un registre destiné à stocker des informations commandant l'accès à une pluralité d'espaces mémoire. Une adresse virtuelle est produite. Elle comprend un pointeur destiné à sélectionner des informations comprises dans le registre commandant l'accès à un des espaces mémoire sélectionné. On peut accéder aux informations sélectionnées au niveau du pointeur à partir du registre et une adresse physique vers un des espaces mémoire sélectionné est produite à partir des informations accédés à partir du registre.
PCT/US2001/002560 2000-02-01 2001-01-25 Procedes permettant de synthetiser des tables de translation et systemes relatifs WO2001057676A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2001231165A AU2001231165A1 (en) 2000-02-01 2001-01-25 Methods for synthesizing translation tables and systems using the same
JP2001556457A JP2003521780A (ja) 2000-02-01 2001-01-25 変換テーブルを合成する方法およびそれを使用するシステム
EP01903336A EP1256060A1 (fr) 2000-02-01 2001-01-25 Procedes permettant de synthetiser des tables de translation et systemes relatifs

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US49581300A 2000-02-01 2000-02-01
US09/495,813 2000-02-01
US60807200A 2000-06-30 2000-06-30
US09/608,072 2000-06-30

Publications (1)

Publication Number Publication Date
WO2001057676A1 true WO2001057676A1 (fr) 2001-08-09

Family

ID=27051890

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/002560 WO2001057676A1 (fr) 2000-02-01 2001-01-25 Procedes permettant de synthetiser des tables de translation et systemes relatifs

Country Status (4)

Country Link
EP (1) EP1256060A1 (fr)
JP (1) JP2003521780A (fr)
AU (1) AU2001231165A1 (fr)
WO (1) WO2001057676A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286268A1 (fr) * 2001-08-21 2003-02-26 Alcatel Circuit intégré avec allocation des régions d'adresses
EP1300762A1 (fr) * 2001-08-21 2003-04-09 Alcatel Circuit integré avec mémoire externe
WO2004046738A2 (fr) * 2002-11-18 2004-06-03 Arm Limited Mise en correspondance d'adresses de memoire virtuelle vers des adresses de memoire physique dans un systeme possedant un domaine securise et un domaine non securise
WO2005078590A2 (fr) * 2004-02-06 2005-08-25 Intel Corporation Technique de conversion d'adresse dans un environnement de commutation contextuelle
CN108463811A (zh) * 2016-01-20 2018-08-28 Arm有限公司 记录组指示符

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0008355A1 (fr) * 1978-08-25 1980-03-05 Siemens Aktiengesellschaft Dispositif pour protéger des données summagesinées dans des ordinateurs contre l'accès non-autorisé
US5606687A (en) * 1993-10-07 1997-02-25 Sun Microsystems, Inc. Method and apparatus for optimizing supervisor mode store operations in a data cache
US5724551A (en) * 1996-05-23 1998-03-03 International Business Machines Corporation Method for managing I/O buffers in shared storage by structuring buffer table having entries include storage keys for controlling accesses to the buffers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0008355A1 (fr) * 1978-08-25 1980-03-05 Siemens Aktiengesellschaft Dispositif pour protéger des données summagesinées dans des ordinateurs contre l'accès non-autorisé
US5606687A (en) * 1993-10-07 1997-02-25 Sun Microsystems, Inc. Method and apparatus for optimizing supervisor mode store operations in a data cache
US5724551A (en) * 1996-05-23 1998-03-03 International Business Machines Corporation Method for managing I/O buffers in shared storage by structuring buffer table having entries include storage keys for controlling accesses to the buffers

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286268A1 (fr) * 2001-08-21 2003-02-26 Alcatel Circuit intégré avec allocation des régions d'adresses
EP1300762A1 (fr) * 2001-08-21 2003-04-09 Alcatel Circuit integré avec mémoire externe
US6963945B2 (en) 2001-08-21 2005-11-08 Alcatel Integrated circuit
WO2004046738A3 (fr) * 2002-11-18 2004-10-21 Advanced Risc Mach Ltd Mise en correspondance d'adresses de memoire virtuelle vers des adresses de memoire physique dans un systeme possedant un domaine securise et un domaine non securise
GB2409745A (en) * 2002-11-18 2005-07-06 Advanced Risc Mach Ltd Virtuel to physical memory address mapping within a system having a secure domain and a non-secure domain
WO2004046738A2 (fr) * 2002-11-18 2004-06-03 Arm Limited Mise en correspondance d'adresses de memoire virtuelle vers des adresses de memoire physique dans un systeme possedant un domaine securise et un domaine non securise
GB2409745B (en) * 2002-11-18 2006-01-11 Advanced Risc Mach Ltd Virtual to physical memory address mapping within a system having a secure domain and a non-secure domain
US7124274B2 (en) 2002-11-18 2006-10-17 Arm Limited Virtual to physical memory address mapping within a system having a secure domain and a non-secure domain
WO2005078590A2 (fr) * 2004-02-06 2005-08-25 Intel Corporation Technique de conversion d'adresse dans un environnement de commutation contextuelle
WO2005078590A3 (fr) * 2004-02-06 2006-03-30 Intel Corp Technique de conversion d'adresse dans un environnement de commutation contextuelle
US7519791B2 (en) 2004-02-06 2009-04-14 Intel Corporation Address conversion technique in a context switching environment
KR100895715B1 (ko) * 2004-02-06 2009-04-30 인텔 코포레이션 메모리 관리 유닛, 메모리 관리 유닛을 포함하는 시스템 및어드레스 변환 방법
CN108463811A (zh) * 2016-01-20 2018-08-28 Arm有限公司 记录组指示符
CN108463811B (zh) * 2016-01-20 2023-02-28 Arm有限公司 记录组指示符

Also Published As

Publication number Publication date
AU2001231165A1 (en) 2001-08-14
EP1256060A1 (fr) 2002-11-13
JP2003521780A (ja) 2003-07-15

Similar Documents

Publication Publication Date Title
US6754784B1 (en) Methods and circuits for securing encached information
US20210240637A1 (en) Methods, apparatus, and systems for secure demand paging and paging operations for processor devices
US6622208B2 (en) System and methods using a system-on-a-chip with soft cache
US9747220B2 (en) Methods, apparatus, and systems for secure demand paging and other paging operations for processor devices
US6912557B1 (en) Math coprocessor
US6816750B1 (en) System-on-a-chip
US5515540A (en) Microprocessor with single pin for memory wipe
US5640542A (en) On-chip in-circuit-emulator memory mapping and breakpoint register modules
EP2725517A1 (fr) Contenus sécurisés de traitement de système sur puce et dispositif mobile comprenant ces derniers
US5737579A (en) System and method for emulating computer architectures
US6779125B1 (en) Clock generator circuitry
JP2000276397A (ja) 単純高性能メモリ管理ユニット
EP1256056B1 (fr) Procedes et circuits d'exploitation selective d'un systeme dans un environnement securise
EP1256060A1 (fr) Procedes permettant de synthetiser des tables de translation et systemes relatifs
JP4603753B2 (ja) キャッシュされた情報を安全にする方法および回路
WO2001057872A1 (fr) Décodeur audio portatif
US6484227B1 (en) Method and apparatus for overlapping programmable address regions
TW514830B (en) Circuits, systems and methods for information privatization in personal electronic appliances

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2001 556457

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 2001903336

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2001903336

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 2001903336

Country of ref document: EP