WO2001086432A2 - Systemes de traitement de donnees cryptographiques, produits-programmes informatiques, et procedes de fonctionnement correspondants permettant l'execution par plusieurs unites d'execution cryptographiques de commandes emanant d'un processeur hote en parallele - Google Patents

Systemes de traitement de donnees cryptographiques, produits-programmes informatiques, et procedes de fonctionnement correspondants permettant l'execution par plusieurs unites d'execution cryptographiques de commandes emanant d'un processeur hote en parallele Download PDF

Info

Publication number
WO2001086432A2
WO2001086432A2 PCT/US2001/015176 US0115176W WO0186432A2 WO 2001086432 A2 WO2001086432 A2 WO 2001086432A2 US 0115176 W US0115176 W US 0115176W WO 0186432 A2 WO0186432 A2 WO 0186432A2
Authority
WO
WIPO (PCT)
Prior art keywords
local memory
command
cryptographic
ofthe
relative position
Prior art date
Application number
PCT/US2001/015176
Other languages
English (en)
Other versions
WO2001086432A3 (fr
Inventor
David Blaker
Raymond Savarda
Original Assignee
Netoctave, Inc.
Hanna, Michael
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netoctave, Inc., Hanna, Michael filed Critical Netoctave, Inc.
Priority to AU2001266571A priority Critical patent/AU2001266571A1/en
Publication of WO2001086432A2 publication Critical patent/WO2001086432A2/fr
Publication of WO2001086432A3 publication Critical patent/WO2001086432A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/121Restricting unauthorised execution of programs
    • G06F21/123Restricting unauthorised execution of programs by using dedicated hardware, e.g. dongles, smart cards, cryptographic processors, global positioning systems [GPS] devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • G06F21/72Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information in cryptographic circuits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor
    • G06F9/3879Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set

Definitions

  • the present invention relates generally to the field of data processing systems, and, more particularly, to cryptographic data processing systems, computer program products, and methods of operating same.
  • Signal processors and integrated circuit chips have been developed to accelerate cryptographic operations, such as public key operations. Examples of such chips include, but are not limited to, the Hif 6500 available from Hifn, Inc., the SafeNet ADSP 2141 available from SafeNet, Inc., and the Rainbow Mykotronx
  • conventional cryptographic data processing systems generally use two main methods for issuing a command to a cryptographic accelerator:
  • the first method involves the provision of a command register on the cryptographic accelerator that a host processor uses to issue a single command. Once the cryptographic accelerator completes executing a command, the host processor may issue a new command. After completing a command, the cryptographic accelerator is generally idle until the host processor issues a new command. Unfortunately, the host processor may spend much time interacting directly with the cryptographic accelerator to download data and issue commands. This may reduce the amount of time available to the host processor for attending to other tasks.
  • the second method allows the host processor to download one or more command sequences to the cryptographic accelerator and then to instruct the cryptographic accelerator to execute one or more of the downloaded command sequences.
  • the cryptographic accelerator is generally idle until the host processor issues a new command.
  • the size of the command sequences may be limited based on the amount of memory that may be placed on the cryptographic accelerator.
  • the host processor may spend much time interacting directly with the cryptographic accelerator to download data and issue command sequences. This may reduce the amount of time available to the host processor for attending to other tasks.
  • the size of the operands will always be less than or equal to the register size. As a result, some of the memory in the registers may be wasted. This reduces the number of operands that may be stored on a chip in a given amount of space.
  • the cryptographic accelerator is redesigned to accommodate larger operands, then each of the registers may need to be modified. More registers may be designed into a cryptographic accelerator; however, adding more memory to a cryptographic accelerator may reduce the amount of other functionality that may be included and/or increase the cost.
  • Cryptographic processors and/or other types of signal processors and integrated circuits may use a hardware-based random number generator.
  • Various conventional methods may be used to retrieve random numbers from an integrated circuit incorporating a random number generator.
  • One method is for the random number generator to provide one or more data registers that a host processor may read to obtain random numbers.
  • the host processor may tell the random number generator to provide more random data before or after retrieving random data from the registers.
  • the random number generator may generate the random data in the background so that random data may be available when needed by the host processor.
  • Another method for obtaining random data is for the host processor to request a block of random data from the random number generator.
  • the host processor may provide the random number generator with a request that specifies an amount of random data and a location in memory where the random data should be placed.
  • the random number generator may then generate the random data and transfer the random data to the requested location in the background.
  • any buffer management that may be desired is generally performed by the host processor.
  • the bus that connects the host processor with the random number generator may be used inefficiently because single data reads are typically used instead of block reads. If a host processor requests a block of random data, however, then the host processor may initiate the data transfers and any desired buffer management that may be desired is generally performed by the host processor. The foregoing operations may be performed in the background and/or a fast host processor may be used; however, a faster host processor may increase system costs.
  • Embodiments of the present invention provide cryptographic data processing systems, computer program products, and methods of operating same.
  • cryptographic data processing systems comprise a host processor, a system memory coupled to the host processor, and a cryptographic processor integrated circuit that comprises a local memory.
  • One or more operands are downloaded into the local memory from the system memory and the cryptographic processor executes an instruction that references one of the downloaded operands using a first relative position in the local memory.
  • a result is generated based on the operand referenced when executing the instruction and this result is stored at a second relative position in the local memory.
  • the first and second relative positions may comprise first and second offsets from a base address in the local memory.
  • operands and results may be packed together in the local memory, which may conserve storage space.
  • the performance of cryptographic data processing systems may be improved by providing separate command interfaces that are respectively associated with execution units in the cryptographic processor.
  • a plurality of execution units may be provided in the cryptographic processor.
  • Commands blocks may be respectively provided to the execution units and these command blocks may be executed simultaneously by the plurality of execution units.
  • FIG. 1 is a block diagram that illustrates cryptographic data processing systems, computer program products, and methods of operating same in accordance with embodiments of the present invention
  • FIG. 2 is a flowchart that illustrates operations of cryptographic data processing systems and computer program products in accordance with embodiments of the present invention
  • FIGS. 3 - 5 are block diagrams that illustrate functional execution units of a cryptographic accelerator processor in accordance with embodiments of the present invention
  • FIG. 6 is a flowchart that illustrates operations of cryptographic data processing systems and computer program products in accordance with further embodiments of the present invention
  • FIG. 7 - 8 are block diagrams that illustrate an encryption/authentication command queue and a public key command queue, respectively, in accordance with embodiments of the present invention
  • FIGS. 9 - 11 are flowcharts that illustrate operations of cryptographic data processing systems and computer program products in accordance with further embodiments of the present invention.
  • FIGS. 12A - 12D are block diagrams that illustrate command blocks in accordance with embodiments of the present invention.
  • FIG. 13 is a flowchart that illustrates operations of cryptographic data processing systems and computer program products in accordance with further embodiments of the present invention.
  • FIGS. 14A, 14B, and 15 are block diagrams that illustrate command blocks in accordance with further embodiments of the present invention.
  • FIGS. 16 and 17 are flowcharts that illustrate operations of cryptographic data processing systems and computer program products in accordance with further embodiments of the present invention.
  • FIG. 18 is a block diagram that illustrates a random number generator data queue in accordance with embodiments of the present invention.
  • FIG. 19 is a flowchart that illustrates operations of cryptographic data processing systems and computer program products in accordance with further embodiments of the present invention.
  • FIG. 20 is a block diagram that illustrates a command interface for a conventional application specific integrated circuit
  • FIG. 21 is a block diagram that illustrates parallel command interfaces for an application specific integrated circuit in accordance with embodiments of the present invention.
  • FIG. 22 is a block diagram of a cryptographic accelerator processor in which command interface managers are respectively associated with functional execution units in accordance with embodiments of the present invention
  • FIG. 23 is a flowchart that illustrates operations of cryptographic data processing systems and computer program products in accordance with further embodiments of the present invention.
  • the present invention may be embodied as methods, data processing systems, and/or computer program products. Accordingly, the present invention may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.). Furthermore, the present invention may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system.
  • a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, and a portable compact disc read-only memory (CD-ROM).
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM portable compact disc read-only memory
  • the computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
  • an exemplary cryptographic data processing system 12 in accordance with embodiments of the present invention, comprises a cryptographic accelerator processor 14, a host processor 16, a cache memory 18, a system memory 22, and a system bus controller 24, such as a north-bridge system controller.
  • the system bus controller 24 couples the host processor 16 to the cache memory 18 and the system memory 22, and also couples the host processor 16 and the system memory 22 to the cryptographic accelerator processor 14 via a system bus 26, which may be, for example, a peripheral component interconnect (PCI) bus.
  • the host processor 16 may be, for example, a commercially available or custom microprocessor.
  • the system memory 22 is representative of an overall hierarchy of memory devices containing the software and data used to implement the functionality of the cryptographic data processing system 12.
  • the system memory 22 may include, but is not limited to, the following types of devices: ROM, PROM, EPROM, EEPROM, flash, SRAM, and DRAM.
  • the cryptographic accelerator processor 14 comprises a random number generator (RNG) execution unit 28, an encryption/authentication (E/A) execution unit 32, and a public key (PK) engine execution unit 34, which are coupled to a local memory 36 via a local bus 38.
  • the system memory 22 contains a random number (RN) data queue 42, an E/A command queue 44, a PK command queue 46, and data buffer(s) 47.
  • FIG. 1 illustrates an exemplary cryptographic data processing system architecture, it will be understood that the present invention is not limited to such a configuration, but is intended to encompass any configuration capable of carrying out operations described herein.
  • Computer program code for carrying out operations of embodiments of the cryptographic data processing system 12 may be written in a high-level programming language, such as C or C++, for development convenience. Nevertheless, some modules or routines may be written in assembly language or even micro-code to enhance performance and/or memory usage. It will be further appreciated that the functionality of any or all of the program modules may also be implemented using discrete hardware components, a single application specific integrated circuit (ASIC), or a programmed digital signal processor or microcontroller.
  • ASIC application specific integrated circuit
  • These computer program instructions may also be stored in a computer usable or computer-readable memory that may direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer usable or computer-readable memory produce an article of manufacture including instructions that implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • the host processor 16 loads a command block into one of the command queues 44 and 46 at block 52.
  • the cryptographic accelerator processor 14 may be notified by the host processor 16 that the command block is available for processing or may periodically access the command queues 44, and/or 46 to determine if a command block is available for processing.
  • the cryptographic accelerator processor 14 downloads the command block from one of the command queues 44 and 46 and executes the command block at block 54. Once the cryptographic accelerator processor 14 completes execution of the command block, the host processor 16 is notified at block 56.
  • the host processor 16 need not spend time interacting directly with the cryptographic accelerator processor 14 (e.g., issuing a command to the cryptographic accelerator processor 14, waiting for that command to complete, and then issuing another command). Instead, the host processor 16 may load commands into command queues 44 and 46, which may then be processed in background by the cryptographic accelerator processor 14. Moreover, the size and number of command block sequences may be less constrained because the availability of system memory is generally more abundant.
  • the RNG execution unit 28, the E/A execution unit 32, and the PK engine execution unit 34 may use various registers that facilitate communication with the RN data queue 42 and the command queues 44 and 46.
  • a control/status register 62, a RN data queue base address register 64, a RN data queue size register 66, and a RN data queue pointer register 68 may be defined for use by the RNG execution unit 28.
  • the control/status register 62 may include a self-test error field, which may be set if the RNG execution unit 28 generates two successive random number samples that are the same, and/or an error flag field, which may be used to notify the host processor 16 of an error on the system bus 26.
  • the RN data queue base address register 64 may be used to hold the base address of the RN data queue 42 in the system memory 22. If the RN data queue 42 does not have a fixed size, then the RN data queue size register 66 may be used to hold the size of the RN data queue 42.
  • the RN data queue pointer register 68 may comprise a read pointer 72 portion and a write pointer 74 portion, which may be used by the RNG execution unit 28 and the host processor 16 as will be discussed in more detail hereinafter.
  • a control/status register 82, an E/A command queue base address register 84, an E/A command queue size register 86, and an E/A command queue pointer register 88 may be defined for use by the E/A execution unit 32.
  • the control/status register 82 may include an interrupt flag field, which may be set if the host processor 16 requests an interrupt upon completion of a command block and/or if execution of a command block fails and/or an error flag field, which may be used to notify the host processor 16 of an error on the system bus 26.
  • the E/A command queue base address register 84 may be used to hold the base address of the E/A command queue 44 in the system memory 22.
  • the E/A command queue size register 86 may be used to hold the size of the E/A command queue 44.
  • the E/A command queue pointer register 88 may comprise a read pointer 92 portion and a write pointer 94 portion, which may be used by the E/A execution unit 32 and the host processor 16, respectively, as will be discussed in more detail hereinafter.
  • a control/status register 102, a PK command queue base address register 104, a PK command queue size register 106, and a PK command queue pointer register 108 may be defined for use by the PK engine execution unit 34.
  • the control/status register 102 may include an interrupt flag field, which may be set if the host processor 16 requests an interrupt upon completion of a command block and/or if execution of a command block fails and/or an error flag field, which may be used to notify the host processor 16 of an error on the system bus 26.
  • the PK command queue base address register 104 may be used to hold the base address of the PK command queue 46 in the system memory 22.
  • the PK command queue size register 106 may be used to hold the size of the PK command queue 46.
  • the PK command queue pointer register 108 may comprise a read pointer 112 portion and a write pointer 114 portion, which may be used by the PK engine execution unit 34 and the host processor 16, respectively, as will be discussed in more detail hereinafter.
  • the host processor 16 writes commands into the command queues 44 and 46 beginning at write address locations stored in the write pointers for the respective command queues (e.g., write pointers 94 and 114). Before writing a command block into a command queue, however, the host processor determines at block 122 whether the write address plus the command block size equals the read address stored in the corresponding read pointer 92 or 112.
  • the host processor 16 postpones loading a new command block into the command queue until the cryptographic accelerator processor 14 has incremented the read address. If, however, the result determined at block 122 is "No,” then the host processor 16 loads a command block into the command queue at block 124 at the write address associated with the command queue and then increments the write address at block 126 by an amount corresponding to the size of the loaded command block.
  • the host processor 16 need not check the current read address every time a new command block is loaded. Instead, the host processor 16 may check the read address when the write address is getting close to the last value the host processor 16 has for the read address. Checking the read address may be expensive in terms of processor cycles consumed.
  • FIGS. 7 and 8 show embodiments of the E/A command queue 44 and the PK command queue 46, respectively.
  • both the E/A command queue 44 and the PK command queue 46 are configured to hold m command blocks, which each comprise eight, thirty-two bit words.
  • the host processor 16 has written a single command block into the first command block position (i.e. , the "0" position) and the write address has been incremented to point to the next empty command block slot.
  • command block slot numbers 7 and 8 are based on command block slot numbers for purposes of illustration. These addresses may be converted into absolute addresses by multiplying the command block slot number by 256 and adding the resulting product to the respective base addresses for the command queues, which are stored in the E/A command queue base address register 84 and the PK command queue base address register 104, respectively. Note that the test used at block 122 of FIG. 6 to determine whether a new command block may be loaded into a command queue implies that if a command queue may hold up to m command blocks, then only m -1 command blocks may be stored in the command queue at the same time.
  • the cryptographic accelerator processor 14 determines whether the write address is equal to the read address. Specifically, the E/A execution unit 32 determines whether the write address is equal to the read address for the E/A command queue 44 and the PK engine execution unit 34 determines whether the write address is equal to the read address for the PK command queue 46. If the result determined at block 132 is "Yes,” then the cryptographic processor 14 waits until the host processor 16 loads a new command block into the command queue.
  • the cryptographic accelerator processor 14 downloads the command block at the read address associated with the command queue and executes the command block at block 134. In particular embodiments ofthe present invention, multiple command blocks may be downloaded for execution on the cryptographic accelerator processor 14 at the same time, which may further improve performance. The cryptographic accelerator processor 14 then increments the read address at block 136 by an amount corresponding to the size of the executed command block. Returning to FIGS. 7 and 8, the read addresses are set to point to the first command block slot, which has been loaded with a command block by the host processor 16.
  • the E/A execution unit 32 and the PK engine execution unit 34 may read the command blocks loaded in the E/A command queue 44 and the PK command queue 46, respectively, with only minimal interaction with the host processor 16, e.g., maintenance ofthe read pointers 92 and 112, and the write pointers 94, and 114.
  • the cryptographic accelerator processor 14 may continue to execute commands located in a circular command queue in system memory until the read address equals the write address for that command queue.
  • interaction between the host processor 16 and the cryptographic accelerator processor 14 may be further reduced and overall system performance improved by including load and store commands in the cryptographic accelerator processor's command set.
  • a load command loads one or more operands from the system memory 22 (e.g., the data buffer(s) 47) to the local memory 36 at block 142.
  • the cryptographic accelerator processor 14 then performs one or more operations on the operand(s) at block 144 to generate a result that is stored in the local memory 36.
  • a store command then stores the result in the system memory 22 at block 146.
  • the host processor 16 need not consume processing time downloading operands to the cryptographic accelerator processor 14 and/or uploading results from the cryptographic accelerator processor 14 into the system memory 22.
  • At least a portion ofthe operands downloaded from the system memory 22 may be stored in the local memory 36.
  • the result generated in the local memory 36 may also be stored in a result field of a command block, which is located in one ofthe command queues 44 and 46 in the system memory 22.
  • operands and results may be packed together into the local memory 36, which may conserve storage space. Because there is no wasted space in storing the operands and results in the local memory 36, memory utilization may be improved. If the cryptographic accelerator processor 14 needs to be redesigned to handle larger operands, then the local memory 36 may be easier to resize than resizing several registers.
  • interaction between the host processor 16 and the cryptographic accelerator processor 14 may be further reduced and overall system performance improved by allowing the cryptographic accelerator processor 14 to inform the host processor 16 when command blocks have been executed.
  • the host processor 16 loads a command block into one ofthe command queues 44 and 46 at block 152.
  • the command block may include an interrupt field, which may be set by the host processor 16 to turn an interrupt request on or off.
  • the cryptographic accelerator processor 14 downloads the command block from one of the command queues 44 and 46 and executes the command block at block 154.
  • the cryptographic accelerator processor 14 may optionally store error information in the command block as shown in FIG. 12B at block 156.
  • the error information may comprise information that is associated with downloading the command block to the cryptographic accelerator processor 14 and/or executing the command block on the cryptographic accelerator processor 14.
  • the cryptographic accelerator processor 14 invokes an interrupt to notify the host processor 16 that the command block has completed.
  • the cryptographic accelerator processor 14 may update a completion field in the command block as shown in FIG. 12C.
  • a periodic interrupt may be defined that upon each occurrence triggers the host processor 16 to check one or more ofthe command queues 44 and 46 to determine whether any ofthe command blocks stored therein have been executed by examining their completion fields.
  • the cryptographic accelerator processor 14 may store the results from executing a command block in the command block as shown in FIG. 12D.
  • the host processor 16 may set a timer when storing a command block into a command queue 42, 44. Upon expiration ofthe timer, the host processor 16 may check to determine whether the command block has been executed.
  • the status of a command block may be determined by the host processor 16 without the need to process an interrupt from the cryptographic accelerator processor 14.
  • improved utilization ofthe system memory 22 may be attained by re-using at least a portion of a command block that contains input data to store a result or output that is generated by an adjunct processor, such as the cryptographic accelerator processor 14, upon executing the command block.
  • exemplary operations begin at block 162 where the host processor loads a command block that includes input data into one ofthe command queues 44 or 46 in the system memory 22.
  • the command block may include pointers to input data that reside, for example, in the data buffer(s) 47 in the system memory 22.
  • An adjunct processor such as the cryptographic accelerator processor 14, may download the command block and perform one or more operations on the input data to generate a result at block 164. If the command block includes pointers to input data, then the data are separately downloaded to the cryptographic accelerator processor 14 using the input data pointers. The result is then stored in the command block in the system memory 22 at block 166 such that at least a portion of the input data is overwritten.
  • the memory reserved for the command block in the system memory 22 may be reduced because additional storage space need not be reserved to store the result of executing the command block either in the command block or elsewhere in the system memory 22.
  • FIGS. 14A and 14B show an exemplary command block for decrypting an encrypted packet.
  • a command block is shown that comprises a field that contains a hash key for the encrypted packet and another field that contains input information.
  • the cryptographic accelerator processor 14 downloads the command block of FIG. 14 A and performs hash operations using the hash key and input information to generate a hash value.
  • this hash value is then stored in the command block in the system memory 22 by overwriting the input information, which is no longer needed once the hash value has been computed.
  • the input information may be one or more pointers to input data stored, for example, in the data buffer(s) 47 in the system memory 22.
  • the command block may include an input pointer field and/or an output pointer field, which are used to identify the location ofthe encrypted packet in the system memory 22 and the location where the decrypted packet is to be stored in the system memory 22.
  • the cryptographic accelerator processor 14 may use the input pointer to download the encrypted packet from the system memory 22 and may then decrypt the encrypted packet using the hash key and input information to generate a hash value as discussed hereinabove.
  • the input information may be one or more pointers to input data stored, for example, in the data buffer(s) 47 in the system memory 22.
  • the hash value may be attached to the decrypted packet and the decrypted packet with the attached hash value may be stored in the system memory 22 at the address identified by the output pointer field in the command block.
  • Cryptographic processors and/or other types of signal processors and integrated circuits may use a hardware-based random number generator.
  • the cryptographic accelerator processor 14 may include a RNG execution unit 28 that may be used to generate random numbers for use by other execution units ofthe cryptographic accelerator processor 14 and/or the host processor 16. Exemplary operations that may be used to reduce interaction between the host processor 16 and the cryptographic accelerator processor 14 and to improve overall system performance will be described hereafter. Referring now to FIG.
  • operations begin at block 172 where the cryptographic accelerator processor 14 loads a random number sample into the RN data queue 42 beginning at the write address stored in the write pointer field 74 ofthe RN data queue pointer register 68 (see FIG. 3).
  • the host processor 16 reads the random number sample in the RN data queue 42 beginning at the read address stored in the read pointer field 72 ofthe RN data queue pointer register 68 (see FIG. 3).
  • the host processor 16 need not spend time interacting directly with the cryptographic accelerator processor 14 to request blocks of random data and/or reading random data from, for example, one or more registers on the cryptographic accelerator processor 14 chip.
  • the cryptographic accelerator processor 14 determines at block 182 whether the write address plus the random number sample size equals the read address stored in the read pointer field 72. If the result determined at block 182 is "Yes,” then the cryptographic processor 14 postpones loading a new random number sample into the RN data queue 42 until the host processor 16 has incremented the read address.
  • the cryptographic processor 14 loads a random number sample into the RN data queue 42 at block 184 at the write address stored in the write pointer field 74 and then increments the write address at block 186 by an amount corresponding to the size of the loaded random number sample.
  • the cryptographic processor 14 may include a register and/or may recognize a command block that may be written to the cryptographic processor 14 that allows the host processor 16 to, for example, provide the cryptographic processor 14 with a random number seed and/or instruct the cryptographic processor 14 to begin generating random numbers.
  • FIG. 18 shows an exemplary embodiment ofthe RN data queue 42.
  • the foregoing operations are illustrated, for example, in FIG. 18, which shows an exemplary embodiment ofthe RN data queue 42. As shown in FIG. 18, the
  • RN data queue 42 is configured to hold 512 random number samples, which each comprise 64 bits.
  • the cryptographic processor 14 has written four random number samples into addresses 1 through 4 and the write address has been incremented to point to the next available address, which is empty or contains data that have already been read by the host processor 16.
  • the addresses shown in FIG. 18 are based on random number sample units for purposes of illustration. These addresses may be converted into absolute addresses by multiplying the random number sample number by 64 and adding the resulting product to the respective base address for the RN data queue 42, which is stored in the RN data queue base address register 64. Note that the test used at block 182 of FIG.
  • RN data queue 42 17 to determine whether a new random number sample may be loaded into the RN data queue 42 implies that if the RN data queue 42 may hold up to m random number samples, then only m - 1 random number samples may be stored in the RN data queue 42 at the same time. Thus, if the RN data queue 42 is filled to its capacity, then it may hold 32,704 bits (511, 64-bit random number samples), which exceeds the 20,000 bits required by the Federal Information Processing Standard (FIPS) 140-1, Security Requirements for Cryptographic Modules issued January 11, 1994.
  • FIPS Federal Information Processing Standard
  • the host processor 16 determines whether the write address is equal to the read address. If the result determined at block 192 is "Yes,” then the host processor 16 waits until the cryptographic accelerator processor 14 loads a new random number sample into the RN data queue 42. If, however, the result determined at block 192 is "No,” then the host processor 16 reads the random number sample at the read address stored in the read pointer field 72 at block 194. The host processor 16 then increments the read address at block 196 by an amount corresponding to the size ofthe random number sample. The host processor 16 need not check the current write address every time a new random number sample is read. Instead, the host processor 16 may check the write address when the read address is getting close to the last value the host processor 16 has for the write address.
  • a cryptographic accelerator processor 14 may provide random number samples for use by a host processor 16 with reduced interaction between the host processor 16 and the cryptographic accelerator processor 14.
  • the host processor 14 need only interact with the cryptographic accelerator processor 14 to update the read address and to check the value ofthe write address when the read address approaches the last value the host processor 14 has for the write address.
  • the cryptographic accelerator processor 14 may manage the buffering ofthe random number samples, which may conserve processor cycles ofthe host processor 16 and may reduce transactions on the system bus 26, which may improve overall system performance.
  • ASICs such as the ASIC 202 shown in FIG. 20.
  • the ASIC 202 comprises a plurality of functional units 204, 206, and 208, which are configured to perform specific operations.
  • input commands are provided to the ASIC 202 serially and then routed to the appropriate functional unit 204, 206, and/or 208.
  • the outputs and/or results of executing the input commands are provided serially as command outputs from the ASIC 202.
  • the ASIC 202 typically processes commands sequentially such that a first command must finish before a subsequent command may be processed even if the commands are executed by different functional units.
  • an ASIC 212 includes a plurality of functional units 214, 216, and 218, which each receive command inputs through its own command interface and generate outputs and/or results that may be communicated to another processor through the command interface.
  • the functional units 214, 216, and 218 may operate independently and in parallel, thereby improving the performance of a cryptographic data processing system.
  • the functional units 214, 216, and 218 may comprise the E/A execution unit 32, the RNG execution unit 28, and the PK engine execution unit 34.
  • the E/A execution unit 32 comprises a command interface manager 222
  • the RNG execution unit 28 comprises a command interface manager 224
  • the PK engine execution unit 34 comprises a command interface manager 226.
  • These respective command interface managers 222, 224, and 226 may be used to receive input command blocks from the E/A command queue 44, to transmit random number samples to the RN data queue 42, and to receive input command blocks from the PK command queue 46, respectively, and to allow the respective execution units 28, 32, and 34 to perform operations in parallel.
  • Operations begin at block 232 where one or more command blocks are provided to each ofthe functional units, such as, for example, by providing command blocks in the E/A command queue 44 and the PK command queue 46 for the E/A execution unit 32 and the PK engine execution unit 34, respectively.
  • the command blocks are simultaneously executed by the functional units by accessing the command blocks in parallel through, for example, the command interface manager 222 and the command interface manager 226, which are associated with the E/A execution unit 32 and the PK engine execution unit 34, respectively.
  • command blocks may be provided to the cryptographic processor 14 in serial fashion over the system bus 24. Nevertheless, the cryptographic processor 14 may distribute command blocks to the command interface managers 222, 224, and 226 associated with the execution units 32, 28, and 34, which may then process the command blocks in parallel.
  • exemplary embodiments ofthe present invention have been discussed hereinabove in which operations related to random number generation, encryption authentication, and public key generation are performed in parallel based on functional units defined therefor. It will be understood that the operations that may be performed in parallel may be adjusted based on requirements and/or needs.
  • commands may be provided to the command interface managers in a variety of ways.
  • a processor may write commands directly to the command interface managers or, alternatively, commands may be stored in a memory and the command interface managers may be provided with the addresses where they may retrieve the stored commands for execution.
  • the total number of operations that may be performed may be increased and the average latency for completing operations may be reduced.
  • each block may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the blocks may occur out ofthe order noted in FIGS. 2, 6, 9 - 11, 13, 16, 17, 19, and 23.
  • two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.

Abstract

L'invention concerne des modes de réalisation de systèmes de traitement de données cryptographiques, de produits-programmes, et de procédés de fonctionnement correspondants. Par exemple, des systèmes de traitement de données cryptographiques englobent un processeur hôte, une mémoire de système couplée audit processeur, et un circuit intégré de processeur cryptographique comportant une mémoire locale. Au moins une des opérandes est téléchargée dans la mémoire locale à partir de la mémoire du système et le processeur cryptographique exécute une instruction qui fait référence à une des opérandes téléchargées au moyen d'une première position relative dans la mémoire locale. On peut condenser ensemble des opérandes et des résultats dans la mémoire locale, qui peut conserver un espace de stockage. Dans d'autres modes de réalisation, des interfaces de commande séparées sont respectivement liées aux unités d'exécution dans le processeur cryptographique. Les unités d'exécution sont pourvues respectivement de blocs de commandes qui sont exécutés simultanément par plusieurs unités d'exécution. On peut accroître le nombre total d'opérations que l'on peut réaliser et réduire le temps d'attente moyen d'exécution de ces opérations, en réalisant ces dernières en parallèle au moyen de plusieurs unités fonctionnelles.
PCT/US2001/015176 2000-05-11 2001-05-10 Systemes de traitement de donnees cryptographiques, produits-programmes informatiques, et procedes de fonctionnement correspondants permettant l'execution par plusieurs unites d'execution cryptographiques de commandes emanant d'un processeur hote en parallele WO2001086432A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001266571A AU2001266571A1 (en) 2000-05-11 2001-05-10 Cryptographic data processing systems, computer program products, and methods of operating same, using parallel execution units

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US20346500P 2000-05-11 2000-05-11
US20340900P 2000-05-11 2000-05-11
US60/203,409 2000-05-11
US60/203,465 2000-05-11

Publications (2)

Publication Number Publication Date
WO2001086432A2 true WO2001086432A2 (fr) 2001-11-15
WO2001086432A3 WO2001086432A3 (fr) 2002-07-18

Family

ID=26898582

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2001/015176 WO2001086432A2 (fr) 2000-05-11 2001-05-10 Systemes de traitement de donnees cryptographiques, produits-programmes informatiques, et procedes de fonctionnement correspondants permettant l'execution par plusieurs unites d'execution cryptographiques de commandes emanant d'un processeur hote en parallele
PCT/US2001/015180 WO2001086430A2 (fr) 2000-05-11 2001-05-10 Systemes de traitement de donnees cryptographiques, produits-programmes informatiques, et procedes de fonctionnement correspondants permettant l'utilisation d'une memoire du systeme dans le transfert d'information entre un processeur hote et un processeur adjoint

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US2001/015180 WO2001086430A2 (fr) 2000-05-11 2001-05-10 Systemes de traitement de donnees cryptographiques, produits-programmes informatiques, et procedes de fonctionnement correspondants permettant l'utilisation d'une memoire du systeme dans le transfert d'information entre un processeur hote et un processeur adjoint

Country Status (2)

Country Link
AU (2) AU2001266572A1 (fr)
WO (2) WO2001086432A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017501478A (ja) * 2014-10-23 2017-01-12 スンシル ユニバーシティー リサーチ コンソルティウム テクノ−パークSoongsil University Research Consortium Techno−Park モバイル機器及び該モバイル機器の動作方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6434620B1 (en) 1998-08-27 2002-08-13 Alacritech, Inc. TCP/IP offload network interface device
US7543087B2 (en) 2002-04-22 2009-06-02 Alacritech, Inc. Freeing transmit memory on a network interface device prior to receiving an acknowledgement that transmit data has been received by a remote device
US7392399B2 (en) * 2003-05-05 2008-06-24 Sun Microsystems, Inc. Methods and systems for efficiently integrating a cryptographic co-processor
US8539513B1 (en) 2008-04-01 2013-09-17 Alacritech, Inc. Accelerating data transfer in a virtual computer system with tightly coupled TCP connections
US8341286B1 (en) 2008-07-31 2012-12-25 Alacritech, Inc. TCP offload send optimization
US9306793B1 (en) 2008-10-22 2016-04-05 Alacritech, Inc. TCP offload device that batches session layer headers to reduce interrupts as well as CPU copies
CN112713993A (zh) * 2020-12-24 2021-04-27 天津国芯科技有限公司 一种加密算法模块加速器及数据高速加密方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0389175A2 (fr) * 1989-03-15 1990-09-26 Fujitsu Limited Système de préextraction de données
EP0395958A2 (fr) * 1989-04-20 1990-11-07 Hitachi, Ltd. Microprocesseur, et méthode et appareil de traitement graphique l'utilisant
EP0560020A2 (fr) * 1992-03-13 1993-09-15 International Business Machines Corporation Fonction de traitement de signaux numériques ressemblant à une pile FIFO
EP0668560A2 (fr) * 1994-02-18 1995-08-23 International Business Machines Corporation Méthode de co-exécution et dispositif pour performer un traitement parallèle dans des types conventionnels de systèmes de traitement de données
WO1997035252A1 (fr) * 1996-03-18 1997-09-25 Advanced Micro Devices, Inc. Unite centrale de traitement ayant un noyau x86 et un noyau dsp, et comportant un decodeur de fonctions dsp qui configure les instructions x86 en instructions dsp

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4763242A (en) * 1985-10-23 1988-08-09 Hewlett-Packard Company Computer providing flexible processor extension, flexible instruction set extension, and implicit emulation for upward software compatibility
US5706489A (en) * 1995-10-18 1998-01-06 International Business Machines Corporation Method for a CPU to utilize a parallel instruction execution processing facility for assisting in the processing of the accessed data
US6075546A (en) * 1997-11-10 2000-06-13 Silicon Grahphics, Inc. Packetized command interface to graphics processor
KR100572945B1 (ko) * 1998-02-04 2006-04-24 텍사스 인스트루먼츠 인코포레이티드 효율적으로 접속 가능한 하드웨어 보조 처리기를 구비하는디지탈 신호 처리기
US7600131B1 (en) * 1999-07-08 2009-10-06 Broadcom Corporation Distributed processing in a cryptography acceleration chip

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0389175A2 (fr) * 1989-03-15 1990-09-26 Fujitsu Limited Système de préextraction de données
EP0395958A2 (fr) * 1989-04-20 1990-11-07 Hitachi, Ltd. Microprocesseur, et méthode et appareil de traitement graphique l'utilisant
EP0560020A2 (fr) * 1992-03-13 1993-09-15 International Business Machines Corporation Fonction de traitement de signaux numériques ressemblant à une pile FIFO
EP0668560A2 (fr) * 1994-02-18 1995-08-23 International Business Machines Corporation Méthode de co-exécution et dispositif pour performer un traitement parallèle dans des types conventionnels de systèmes de traitement de données
WO1997035252A1 (fr) * 1996-03-18 1997-09-25 Advanced Micro Devices, Inc. Unite centrale de traitement ayant un noyau x86 et un noyau dsp, et comportant un decodeur de fonctions dsp qui configure les instructions x86 en instructions dsp

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017501478A (ja) * 2014-10-23 2017-01-12 スンシル ユニバーシティー リサーチ コンソルティウム テクノ−パークSoongsil University Research Consortium Techno−Park モバイル機器及び該モバイル機器の動作方法
EP3057022A4 (fr) * 2014-10-23 2017-05-31 Soongsil University Research Consortium Techno-Park Dispositif mobile et procédé pour son exploitation

Also Published As

Publication number Publication date
WO2001086430A2 (fr) 2001-11-15
AU2001266571A1 (en) 2001-11-20
WO2001086430A3 (fr) 2002-10-17
WO2001086432A3 (fr) 2002-07-18
AU2001266572A1 (en) 2001-11-20

Similar Documents

Publication Publication Date Title
US20020004904A1 (en) Cryptographic data processing systems, computer program products, and methods of operating same in which multiple cryptographic execution units execute commands from a host processor in parallel
TWI747933B (zh) 硬體加速器及用於卸載操作之方法
KR101764187B1 (ko) 가속기들의 낮은-레이턴시 인보크를 위한 장치 및 방법
JP2021174506A (ja) 事前設定された未来時間において命令を実行するためのパイプライン制御を備えるマイクプロセッサ
CN100541665C (zh) 可编程并行查找存储器
EP0550164A1 (fr) Méthode et dispositif d'entrelacement d'opérations d'accès direct en mémoire en canaux-multiples
EP0550163A1 (fr) Architecture de circuit pour l'opération à canaux multiples d'accès direct en mémoire
TWI461910B (zh) 用於依照組態資訊執行原子記憶體操作之記憶體及方法
JP2002532772A5 (fr)
US5805930A (en) System for FIFO informing the availability of stages to store commands which include data and virtual address sent directly from application programs
KR20080059106A (ko) 프로세서에서의 마스킹된 저장 동작들을 위한 시스템 및방법
US5696990A (en) Method and apparatus for providing improved flow control for input/output operations in a computer system having a FIFO circuit and an overflow storage area
US5924126A (en) Method and apparatus for providing address translations for input/output operations in a computer system
JP4226085B2 (ja) マイクロプロセッサ及びマルチプロセッサシステム
CN108319559B (zh) 用于控制矢量内存存取的数据处理装置及方法
US6415338B1 (en) System for writing a data value at a starting address to a number of consecutive locations equal to a segment length identifier
JP2008510246A (ja) バーストリードライト動作による処理装置
EP4235408A2 (fr) Processeurs matériels et procédés de correction étendue de microcode
KR100618248B1 (ko) 실행 엔진으로부터 다중 데이터 소스까지 다중 로드 및 기억 요구를 지원하는 장치 및 방법
CN102566970A (zh) 用于处理具有高速缓存旁路的修饰指令的数据处理器
JP2003521034A (ja) マイクロプロセッサシステムおよびそれを操作する方法
TW202314497A (zh) 用於加速串流資料變換運算之電路系統及方法
JP4130465B2 (ja) メモリ転送処理サイズが異なるプロセッサに関してアトミックな処理を実行するための技術
WO1996008769A1 (fr) Systeme informatique a prelecture d'instructions
WO2001086432A2 (fr) Systemes de traitement de donnees cryptographiques, produits-programmes informatiques, et procedes de fonctionnement correspondants permettant l'execution par plusieurs unites d'execution cryptographiques de commandes emanant d'un processeur hote en parallele

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP