CN104012031B - Instruction for performing JH keyed hash - Google Patents
Instruction for performing JH keyed hash Download PDFInfo
- Publication number
- CN104012031B CN104012031B CN201180075719.6A CN201180075719A CN104012031B CN 104012031 B CN104012031 B CN 104012031B CN 201180075719 A CN201180075719 A CN 201180075719A CN 104012031 B CN104012031 B CN 104012031B
- Authority
- CN
- China
- Prior art keywords
- instruction
- register
- states
- stored
- box
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0643—Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/12—Details relating to cryptographic hardware or logic circuitry
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Power Engineering (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Executing Machine-Instructions (AREA)
- Advance Control (AREA)
Abstract
Describe a kind of method.This method includes performing one or more JH_SBOX_L instruction to be converted with performing S Box mappings and linear (L) in JH states, and once has been carried out S Box mappings and L conversion is carried out one or more JH_Permute instructions to perform permutation function in JH states.
Description
Technical field
This disclosure relates to AES, and especially relate to JH hashing algorithms.
Background technology
Cryptography is to rely on the instrument of algorithm and is the key of protection information.Algorithm is complicated mathematical algorithm and key
It is the string (string of bits) of position.There is the encryption system of two kinds of fundamental types:Secret-key systems and public key systems.
Secret-key systems are also referred to as balanced system, with the single key (" privacy key ") just shared by two sides or more.Should
Single key is both also used for solving confidential information for encryption information.
JH hash functions (JH) are a kind of encryption functions, and the encryption function is directed to national standard and technological associations
(NIST) hash function competes and submits to develop new SHA-3 functions to substitute older SHA-1 and SHA-2.JH is to be based on
Include the algorithm of four modifications (JH-224, JH-256, JH-384 and JH-512), different size of summary can be produced
(digest).However, identical compression function is realized in JH each modification.
At present, the finger in streaming SIMD extensions (SSE) or high-level vector extension (AVX) can be used on general purpose processor
Make performing JH.Anyway, such application must realistic row be up to 30 instructions and perform JH algorithms.
Brief description of the drawings
It can from the following detailed description be obtained with reference to accompanying drawing and the present invention is best understood from, wherein:
Fig. 1 is the block diagram of the one embodiment for the system that shows;
Fig. 2 is the block diagram of the one embodiment for showing processor;
Fig. 3 is the block diagram of the one embodiment for showing packing data register;
Fig. 4 shows one embodiment of gained nibble displacement;
Fig. 5 is the flow chart of the one embodiment for showing the process by instruction execution;
Fig. 6 is the flow chart of the one embodiment for showing the process by instruction execution;
Fig. 7 is shown with two round JH of instruction embodiment;
Fig. 8 is the block diagram of register architecture according to an embodiment of the invention;
Fig. 9 A are to be connected on tube core internet according to an embodiment of the invention and slow at a high speed with the second level (L2)
The block diagram of the single CPU core for the local subset deposited;
Fig. 9 B are the expanded views of a part for the CPU core according to various embodiments of the present invention;
Figure 10 is the block diagram for showing unordered framework exemplary according to an embodiment of the invention;
Figure 11 is the block diagram of system according to an embodiment of the invention;
Figure 12 is the block diagram of second system according to an embodiment of the invention;
Figure 13 is the block diagram of the 3rd system according to an embodiment of the invention;
Figure 14 is the block diagram of on-chip system (SoC) according to an embodiment of the invention;
Figure 15 is that the monokaryon according to an embodiment of the invention with integrated Memory Controller and graphics devices is handled
The block diagram of device and polycaryon processor;And
Figure 16 is that control according to an embodiment of the invention uses software instruction converter by the binary system in source instruction set
Instruction is converted into the block diagram of the binary command of target instruction target word concentration.
Embodiment
In the following description, for purpose of explanation, numerous details are elaborated to provide comprehensive reason to the present invention
Solution.However, the skilled person will be apparent that, it can also implement this hair without some of these details
It is bright.In other instances, well-known structure and equipment are shown in form of a block diagram, to avoid the bottom of the desalination present invention former
Reason.
In this manual, the reference to " one embodiment " or " embodiment " means to combine what the embodiment was described
Special characteristic, structure or characteristic are included at least one embodiment of the invention.In the short of this specification middle appearance everywhere
Language " in one embodiment " is not necessarily all referring to same embodiment.
Describe the mechanism of the instruction including handling JH hashing algorithms.According to one embodiment, via in AVX instruction set
Instruct to realize JH hashing algorithms.AVX instruction set is x86 instruction set architectures (ISA) extension, and this adds deposit from 128
Device group.
Fig. 1 is the block diagram of one embodiment of system 100, and system 100 includes being used for performing in general purpose processor
The AVX instruction set extensions that JH is encrypted and decrypted.
System 100 is included in processor 101, storage control maincenter (MCH) 102 and input/output (I/O) controller
Pivot (ICH) 104.MCH102 includes the storage control 106 of the communication between control processor 101 and memory 108.Processor
101 and MCH102 communicates on system bus 116.
Processor 101 can be any one in multiple processors, these processors such as monokaryon Processor, monokaryon Intel Celeron processors,XScale processors or polycaryon processor, such asPentium D,ProcessorI3, i5, i7,2Duo and Quad,The processor of processor or any other type.
Memory 108 can be dynamic random access memory (DRAM), static RAM (SRAM), synchronization
Dynamic random access memory (SDRAM), double data rate (DDR) 2 (DDR2) RAM or Rambus dynamic random access memory
(RDRAM) or any other type memory.
114 (such as direct media interfaces (DMI)) are interconnected using high-speed chip-p- chip, ICH104 can be coupled to
MCH102.Via two half-duplex channels, DMI supports the concurrent transmission speed of 2 gigabit/secs.
ICH104 may include memory I/O controller 110, for controlling to set with least one storage coupled to ICH104
Standby 112 communication.Storage device may include, for example, disk drive, digital versatile disc (DVD) driver, compact disk (CD) are driven
Dynamic device, RAID (RAID), tape drive or other storage devices.Using serial storage protocol, such as go here and there
Row attachment small computer system interface (SAS) or serial advanced technology attachment meet (SATA), on storage protocol interconnection 118,
ICH104 can communicate with storage device 112.
In one embodiment, processor 101 includes JH functions 103, for performing JH encrypt and decrypt operations.It can be used
JH functions 103 to the information for being stored in memory 108 and/or being stored in storage device 112 are encrypted or decrypted.
Fig. 2 is the block diagram of the one embodiment for showing processor 101.Processor 101 includes fetching and decoding unit 202,
For being decoded to the processor instruction received from rank 1 (L1) instruction cache 202.For performing the instruction
Data can be stored in register group 208.In one embodiment, register group 208 includes multiple registers, and it can be by
AVX is instructed instructs the data used for storing by AVX.
Fig. 3 is the block diagram of the example embodiment of one group of suitable packing data register in register group 208.It is shown
Packing data register includes 32 512 packing datas or vector registor.These 32 512 bit register quilts
Labeled as ZMM0 to ZMM31.In the embodiment shown, 256 (that is, ZMM0- of the low order of the low level in these registers 16
ZMM15) by aliasing or it is covered on corresponding 256 packing datas or vector registor (being labeled as YMM0-YMM15), still
What this was not required.
Equally, in the embodiment shown, YMM0-YMM15 low order 128 is by overlapping or be covered in corresponding 128 and beat
In bag data or vector registor (being labeled as XMM0-XMM1), but this is nor required.512 bit register ZMM0 are extremely
ZMM31 can be used to keep 512 packing datas, 256 packing datas or 128 packing datas.
256 bit register YMM0-YMM15 can be used to keep 256 packing datas or 128 packing datas.128
Bit register XMM0-XMM1 can be used to keep 128 packing datas.Each register can be used for storage packing floating-point data
Or packing integer data.Support different pieces of information element size, including at least octet data, 16 digital datas, 32 double words or
Single-precision floating-point data and 64 four words or double-precision floating point data.The alternative embodiment of packing data register may include
The register of varying number, different size of register, and can or larger register aliasing (alias) can not be existed
On smaller register.
Referring back to Fig. 2, extract and decoding unit 202 takes out macro-instruction from L1 instruction caches 202, to decode this grand
Instruct and divide them into so-called microoperation (μ op) simple operations.Execution unit 210 is dispatched and performs the microoperation.Institute
Show in embodiment, the JH functions 103 in execution unit 210 include the microoperation that AVX is instructed.Retirement unit 212 will be performed
The result write-in register or memory of instruction.
JH functions 103 perform compression function, include three functions of 42 rounds of operation.First function is S-Box functions,
It includes realizing two conversion (S0And S1) one of convert 4 adjacent nibbles (nibble).Table 1 shows that S-Box converts S0
And S (x)1(x) one embodiment.
Table 1
x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
S0(x) | 9 | 0 | 4 | 11 | 13 | 12 | 3 | 15 | 1 | 10 | 2 | 6 | 7 | 5 | 8 | 14 |
S1(x) | 3 | 12 | 6 | 13 | 5 | 7 | 1 | 9 | 15 | 2 | 0 | 4 | 11 | 10 | 14 | 8 |
Second function is linear transformation (L), and it is in GF (24) on realize separable distance (MDS) code of (4,2,3) maximum,
Wherein GF24It is defined as binary polynomial mould irreducible function X4+ X+1 multiple (multiplication).Adjacent
Octet (or two adjacent S-Box output) on perform linear transformation.A, B, C and D is set to represent 4 words, then L is by (A, B)
(C, D) is converted to, i.e. (C, D)=L (A, B)=(5A+2B, 2A+B).Therefore function (C, D)=L (A, B) is calculated
For:
D0=B0 ⊕ A1;D1=B1 ⊕ A2;
D2=B2 ⊕ A3 ⊕ A0;D3=B3 ⊕ A0;
C0=A0 ⊕ D1;C1=A1 ⊕ D2;
C2=A2 ⊕ D3 ⊕ D0;C3=A3 ⊕ D0.
3rd function is permutation function (Pd)。PdIt is the simple substitute on 2d elements, from πd(exchange alternate half-word
Section), P 'd(nibble for exchanging lower half and high half portion from state) and φd(the half-word in the high half portion of swap status
Section) build.Fig. 4 shows that the gained nibble for d=4 in 64 bit datapaths replaces Pd(πd,P′d,φd) one
Individual embodiment, wherein d are the sizes (dimension) of a block.In one embodiment, JH functions are for 256 4 nibbles
The data width of (or 1024) uses d=8.
In the conventional system, JH is by " position section ", rather than operation in the nibble in byte.Position section can make half-word
The position of section is divided into the word of separation.Therefore, S-Box nibbles allow to perform all S- via SSE/AVX parallel instructions
Box nibbles.Further, position section and alternate odd even SBOX register-combinatorials can be realized that SBOX and L conversion is estimated
(evaluation).During section in place is realized, it is not necessary that replaced completely for each round.Specifically, it is suitable strange
S-Box is transfused to position to operate with lower suitable even S-Box in next one.Exchanged and replaced by using 7, for
42 JH rounds are repeated 6 times, and complete this measure.
Although position dicing method can cause all SBOX are calculated and L conversion is parallel to perform, 20 instructions are needed
23 logical functions of SBOX logics are performed, and need 10 to instruct for 10 XOR (XOR) functions converted comprising L
(being used for 2 operand XOR).Such performance be can give it is improved.
According to one embodiment, the new instruction and data path of definable two, it is in 4 nibbles and nibble to upper
Operate to perform SBOX and L transforming function transformation functions using 512 ZMM registers in register group 208.In such embodiment
In, 1024 states are stored in two ZMM registers, and wherein nibble 0-127 is in the first ZMM registers, and half-word
128-255 is saved in the 2nd ZMM registers.
New instruction and data path JH_SBOX_L is defined as JH_SBOX_L ZMM, ZMM masks (ZMMmask).Fig. 5
It is the flow chart for showing to be instructed one embodiment of the process performed by JH_SBOX_L.As described above, 1024 mode bit quilt
Continuously organized (being represented in JH specifications) in two ZMM registers from 0 to 1023.
In processing block 510, retrieved from ZMM registers and represent mode bit1/2512 sections.In processing block 520, examined
S-Box and L conversion is performed on the mode bit of rope.In one embodiment, S- is performed using the mask information from ZMM masks
Box functions.In one embodiment, ZMM masks represent constant (A.2, the wheel in E8 position section is realized from JH specifications
Secondary constant).Using ZMM, 256 can be exported by carrying out odd even bit interleave to each round.
Once S-Box operations are completed, in each 8 nibbles to upper progress L conversion operations.In processing block 530, it will become
512 results changed are stored in destination register.JH_SBOX_L instructions are performed twice (for low 512, then for height
512) converted with the round S-Box and L completed for complete JH states.
JH_Permute (JH_ displacements) instruction and data path is implemented as the result to keeping S-Box and L conversion
Each in ZMM registers performs displacement step Pd.In one embodiment, JH_Permute instructions are implemented as, and are performed
Be defined as JH_Permute ZMM1, ZMM2, imm8, wherein ZMM1 store the low nibble of pre-permutation 128 (for example, 512),
ZMM stores the high nibble of pre-permutation 128, and imm8=0/1, and it specifies low high nibble.
Fig. 6 is the flow chart for showing to be instructed one embodiment of the process performed by JH_PD.In processing block 550, JH states
1/2 section of pre-permutation retrieved from the ZMM registers indicated by imm8.In processing block 560, the position retrieved is performed and put
Change processing.In processing block 570, the result of displacement is stored.JH_Permute instructions are performed twice to complete round displacement.Figure
7 are shown with two in the JH of above-mentioned instruction 42 rounds.
Three cycle pipeline data paths are realized in above-mentioned JH instructions.Therefore, the JH of a round is completed within 8 cycles
(for example, execution twice of each of JH_SBOX_L and JH_Permute instructions).This causes the 2-3 for being better than a dicing method
Times performance improvement.
Exemplary register architectural framework-Fig. 8
Fig. 8 is the block diagram for showing register architecture 800 according to an embodiment of the invention.Register bank tying
The register group (register file) and register of structure are listed below:
Vector registor group 810-in the embodiment illustrated, there is the vector registor of 32 512 bit widths;These
Register is cited as zmm0 to zmm31.856 positions of low order of 16 zmm registers of low level are covered in register ymm0-16
On.128 positions of low order (128 positions of low order of ymm registers) of 16 zmm registers of low level are covered in register xmm0-15
On.
Mask register 815-in an illustrated embodiment is write, there are 8 and writes mask register (k0 to k7), it is each to write
The size of mask register is 64.In one embodiment of the invention, vector mask register k0 is not used as writing mask;
When normally may indicate that k0 coding be used as write mask when, it select it is hard-wired write mask 0xFFFF so that effectively disable should
What is instructed writes mask.
Multimedia extension state of a control register (MXCSR) 1020-in an illustrated embodiment, this 32 bit register
The state used in floating-point operation and control bit are provided.
General register 825-in the embodiment illustrated, there are 16 64 general registers, these registers connect
Compared with x86 addressing modes be used to addressable memory operation number.These registers by title RAX, RBX, RCX, RDX,
RBP, RSI, RDI, RSP, and R8 to R15 are quoted.
Extension flag (EFLAGS) register 830-in the embodiment shown, recorded very using this 32 bit register
The result of MIMD.
Floating-point control word (FCW) register 835 and floating-point status word (FSW) register 840-in the embodiment shown, this
A little registers are used come setting rotation (rounding) pattern, abnormal mask and mark in the case of FCW by x87 instruction set extensions
Will, and keep in the case of FSW the tracking for exception.
Scalar floating-point stack register group (x87 storehouses) 845, in the above aliasing have MMX pack the flat register of integer
Group 1050-in the embodiment illustrated, x87 storehouses are used for using x87 instruction set extensions come to 32/64/80 floating number
According to eight element stacks for performing Scalar floating-point operation;And operation is performed to 64 packing integer datas using MMX registers, with
And preserve operand for some operations performed between MMX and XMM register.
Segment register 855-in the embodiment shown, there are six 16 bit registers, for storing the address for being segmented
The data of generation.
RIP register 865-in the embodiment shown, this 64 bit register store instruction pointer.
The alternative embodiment of the present invention can use wider or narrower register.In addition, the replacement of the present invention is implemented
Example can use more, less or different register group and register.
Exemplary order processor architecture-Fig. 6 A-6B
Fig. 9 A-B show the block diagram of exemplary order processor architecture.These exemplary embodiments be surround from width to
Measure the multiple examples for the orderly CPU core that processor (VPU) expands and design.The Internet that high bandwidth is passed through according to application, core
Network and function logic, memory I/O Interface and the other necessary I/O logic communications of some fixations.For example, this embodiment
PCIe buses will be generally comprised as independent GPU realization.
Fig. 9 A be connected to according to an embodiment of the invention on tube core internet 902 and with the second level (L2) at a high speed
The block diagram of the single cpu core of the local subset 904 of caching.Instruction decoder 900 supports the x86 instruction set with extension.Although
In one embodiment of the invention (in order to simplify design), scalar units 908 and vector location 910 use separated register
Gather (being respectively scalar register 912 and vector registor 914), and the data shifted between these registers are written into
Then read back to memory and from one-level (L1) cache 906, but alternative embodiment can use different methods (for example
Using single set of registers or including allowing data to transmit between the two register groups without being written into and reading back
Communication path).
L1 caches 906, which allow to access the low latency of cache memory, enters scalar sum vector location
In.Together with loading operation (load-op) instruction in vectorial friendly instruction format, it means that L1 caches 906 can quilt
It is considered as the register group of similar extension in a way.This significantly improves the performance of many algorithms.
The local subset 904 of L2 caches is a part for global L2 caches, and the global L2 caches are drawn
It is divided into multiple separated local subsets, i.e., each local subset of CPU core one.Each CPU has the L2 to their own slow at a high speed
The direct access path for the local subset 904 deposited.The data read by CPU core are stored in its L2 cached subset 904,
And it can be quickly accessed, the access and the local L2 cached subsets that other CPU cores access their own are parallel.By CPU
The data of core write-in are stored in the L2 cached subsets 904 of its subset, and clear from other subsets in the case of necessary
Remove.Loop network ensures the uniformity of shared data.
Fig. 9 B are the expanded views of a part for the CPU core in Fig. 9 A according to various embodiments of the present invention.Fig. 9 B include L1
L1 data high-speeds caching 906A parts of cache 904 and on the more of vector location 910 and vector registor 1114
Details.Specifically, vector location 910 is 16 fat vector processing units (VPU) (see 16 width ALU928), and the unit performs whole
Type, single-precision floating point and double-precision floating point instruction.The VPU supports to mix (swizzling) register by mixed cell 920
Input, numerical value conversion carried out by numerical value converting unit 922A-B, and carry out by copied cells 924 answering memory input
System.Write the vector write-in that mask register 926 allows to assert gained.
Register data can be mixed in a variety of ways, e.g., carry out support matrix multiplication.Data from memory can be across VPU
It is replicated passage.This is the general operation in figure and the processing of non-graphic parallel data, and this dramatically increases cache effect
Rate.
Loop network is two-way, to allow the agency of such as CPU core, L2 caches and other logical blocks etc in core
Communicated with one another in piece.Each circular data path is each bit width of direction 1012.
Exemplary out-of-order architecture-Fig. 7
Figure 10 is the block diagram for showing unordered framework exemplary according to an embodiment of the invention.Specifically, Figure 10 shows public affairs
The exemplary unordered framework known, it has been modified into combining vectorial friendly instruction format and its execution.In Fig. 10, arrow
Head indicates the coupling between two or more units, and the direction of arrow indicates the direction of the data flow between these units.
Figure 10 includes the front end unit 1005 for being coupled to enforcement engine unit 1010 and memory cell 1015;Enforcement engine unit 1010
It is additionally coupled to memory cell 1015.
Front end unit 1005 includes being coupled to one-level (L1) inch prediction unit of two grades of (L2) inch prediction units 1022
1020.L1 and L2 inch prediction units 1020 and 1022 are coupled to L1 Instruction Cache Units 1024.L1 instruction caches
Unit 1024 is coupled to instruction translation look-aside buffer (TLB) 1026, and the TLB1026 is further coupled to instruction and extracted and pre- solution
Code unit 1028.Instruction is extracted and pre-decode unit 1028 is coupled to instruction queue unit 1030, the further coupling of unit 1030
It is bonded to decoding unit 1032.Decoding unit 1032 include complex decoder unit 1034 and three simple decoder elements 1036,
1038 and 1040.Decoding unit 1032 includes microcode ROM cell 1042.In decoding level segment, decoding unit 7 can be as described above
Ground is operated.L1 Instruction Cache Units 1024 are additionally coupled to the L2 cache elements 1048 in memory cell 1015.Refer to
Make two grades of TLB units 1046 that TLB unit 1026 is additionally coupled in memory cell 1015.Decoding unit 1032, microcode ROM
Unit 1042 and circulation detector (LSD) unit 1044 are respectively coupled to renaming/distributor in enforcement engine unit 1010
Unit 1056.
Enforcement engine unit 1010 include being coupled to the renaming of retirement unit 1074 and United Dispatching device unit 1058/point
Orchestration unit 1056.Retirement unit 1074 is additionally coupled to execution unit 1060 and including resequencing buffer unit 1078.It is unified
Dispatcher unit 1058 is additionally coupled to physical register group unit 1076, and physical register group unit 1076 is coupled to execution unit
1060.Physical register group unit 1076 includes vector registor unit 1077A, writes mask register unit 1077B and scalar
Register cell 1077C;These register cells can provide vector registor 510, vector mask register 515 and general
Destination register 825;And physical register group unit 1076 may include that unshowned adjunct register group (e.g., is aliasing in MMX to beat
Scalar floating-point stack register group 845 in bag integer plane registers device group 850).Execution unit 1060 includes three mixing scalars
With vector location unit 1062,1064 and 1072;Load unit 1066;Storage address unit 1068;Data storage unit 1070.
Load unit 1066, storage address unit 1068 and data storage unit 1070 are each further coupled to memory cell 1015
In data TLB unit 1052.
Memory cell 1015 includes two grades of TLB units 1046 for being coupled to data TLB unit 1052.Data TLB unit
1052 are coupled to L1 data cache units 1054.L1 data cache units 1054 are additionally coupled to L2 cache elements
1048.In certain embodiments, L2 cache elements 1048 are additionally coupled to the L3 of memory cell 1015 internally and/or externally
With higher level cache element 1050.
In an illustrative manner, process line 8200 can be implemented as described below in exemplary out-of-order architecture:1) instruction extract and
Pre-decode unit 728 performs fetching and length decoder level;2) perform decoding of decoding unit 732 level;3) renaming/dispenser unit
1056 perform distribution stage and renaming level;4) United Dispatching device 1058 performs scheduling level;5) physical register group unit 1076, again
Order buffer unit 1078 and memory cell 1015 perform register reading/memory and read level;Execution unit 1060 enters
Row execution/data conversion level;6) memory cell 1015 and resequencing buffer unit 1078, which are performed, writes back/memory write level
1960;7) retirement unit 1074 performs ROB and reads level;8) each unit can involve abnormality processing level;And 9) retirement unit
1074 and physical register group unit 1076 perform submit level.
Exemplary computer system and processor-Fig. 8-10
Figure 11-13 shows to be suitable to include the example system of processor 101.It is known in the art to laptop devices, platform
Formula machine, Hand held PC, personal digital assistant, engineering work station, server, the network equipment, network backbone, interchanger, embedded place
Manage device, it is digital signal processor (DSP), graphics device, video game device, set top box, microcontroller, cell phone, portable
The other systems design and configuration of formula media player, handheld device and various other electronic equipments are also suitable.Typically
For, a large amount of systems and electronic equipment that can contain processor and/or other execution logics disclosed herein are general all
It is suitable.
Referring now to Figure 11, shown is the block diagram of system 1100 according to embodiments of the present invention.System 1100 can be wrapped
Include the one or more processors 1115,1120 coupled to Graphics Memory Controller maincenter (GMCH) 1110.Additional processing
Device 1115 is optionally represented by a dotted line in fig. 11.
Each processor 1110,1115 can be certain version of processor 1100.It is to be noted, however, that integrated graphics
Logical sum integrated memory control unit may not be present in processor 1110 and 1115.
Figure 11 shows that GMCH1120 can be coupled to memory 1140, and the memory 1140 can be such as dynamic randon access
Memory (DRAM).For at least one embodiment, DRAM can be associated with non-volatile cache.
GMCH1120 can be a part for chipset or chipset.GMCH1120 can with processor (multiple) 1110,
1115 are communicated, and interacting between control processor 1110,1115 and memory 1140.GMCH1120 can also act as (each)
Acceleration EBI between processor (multiple) 1110,1115 and other elements of system 1100.For at least one implementation
Example, GMCH1120 enters via the multiple-limb bus of such as Front Side Bus (FSB) 1195 etc with processor (multiple) 1110,1115
Row communication.
In addition, GMCH1120 is coupled to display 1145 (such as flat-panel monitor).GMCH1120 may include integrated graphics
Accelerator.GMCH1120 is also coupled to input/output (I/O) controller maincenter (ICH) 1150, the input/output (I/O) control
Device maincenter (ICH) 1150 can be used for various ancillary equipment being coupled to system 1100.For example, showing in the embodiment in figure 11
External graphics devices 860 and another ancillary equipment 1170, the external graphics devices 860 can be coupled to ICH1150 point
Vertical graphics device.
Alternatively, additional or different processor also may be present in system 1100.For example, Attached Processor (multiple) 1115
It may include and the identical Attached Processor (multiple) of processor 1110 and the foreign peoples of processor 1110 or asymmetric Attached Processor
(multiple), accelerator (such as graphics accelerator or Digital Signal Processing (DSP) unit), field programmable gate array or it is any its
Its processor.The measurement spectrum of the advantages of according to including architecture, microarchitecture, heat, power consumption features etc., physical resource
1110th, there are various difference between 1115.These difference itself can effectively be shown as not right between treatment element 1110,1115
Title property and diversity.For at least one embodiment, various treatment elements 1110,1115 can reside in same die package.
Referring now to Fig. 9, shown is the block diagram of second system 1200 according to an embodiment of the invention.Such as Figure 12 institutes
Show, multicomputer system 1200 is point-to-point interconnection system, and the first processor including being coupled via point-to-point interconnection 1250
1270 and second processor 1280.As shown in figure 12, in processor 1270 and 1280 can be each a certain of processor 101
Version.
Alternatively, processor 1270, one or more of 1280 can be element in addition to processors, such as accelerate
Device or field programmable gate array.
Although only being shown with two processors 1270,1280, it should be understood that the scope of the present invention not limited to this.Other
In embodiment, one or more additional processing elements may be present in given processor.
Processor 1270 may also include integrated memory controller maincenter (IMC) 1272 and point-to-point (P-P) interface 1276
With 1278.Similarly, second processor 1280 may include IMC1282 and P-P interfaces 1286 and 1288.Processor 1270,1280
Data can be exchanged via using point-to-point (PtP) interface 1250 of point-to-point (PtP) interface circuit 1278,1288.As schemed
Shown in 12, the 1272 of IMC and 1282 couple the processor to corresponding memory, i.e. memory 1242 and memory 1244, this
A little memories can be the portion of main memory for being locally attached to respective processor.
Processor 1270,1280 can be each via each of use point-to-point interface circuit 1276,1294,1286 and 1298
Individual P-P interfaces 1252,1254 exchange data with chipset 1290.Chipset 1290 can also via high performance graphics interface 1239 with
High performance graphics circuit 938 exchanges data.
Shared cache (not shown) can be included within any one of two processors or be included at two
It is connected via P-P interconnection outside reason device but still with these processors, if so as to place a processor into low-power mode, can be by
The local cache information of any processor or two processors is stored in the shared cache.Chipset 1290 can be with
The first bus 1216 is coupled to via interface 1296.In one embodiment, the first bus 916 can be peripheral parts interconnected
(PCI) bus, or such as bus of PCI Express buses or other third generation I/O interconnection bus etc, but the model of the present invention
Enclose and be not limited thereto.
As shown in figure 12, various I/O equipment 1214 can be coupled to the first bus 1216, bus together with bus bridge 1218
First bus 1216 is coupled to the second bus 1220 by bridge 1218.In one embodiment, the second bus 1220 can be low draws
Pin number (LPC) bus.In one embodiment, each equipment can be coupled to the second bus 1220, including such as keyboard and/or mouse
1222nd, communication equipment 1226 and it may include that such as disk drive of code 1230 or the data of other mass memory units are deposited
Storage unit 1228.Further, audio I/O1224 may be coupled to the second bus 1220.Note, other architectures are possible
's.For example, instead of Figure 12 Peer to Peer Architecture, system can realize multiple-limb bus or other such frameworks.
Referring now to Figure 13, shown is the block diagram of the 3rd system 1300 according to embodiments of the present invention.Figure 12 and figure
Same parts in 13 represent with same reference numerals, and in terms of eliminating some of Figure 12 from Figure 13, to avoid making figure
13 other side becomes ambiguous.
Figure 13 shows that treatment element 1270,1280 can include integrated memory and I/O control logics (" CL ") 1272 respectively
With 1282.For at least one embodiment, CL1272,1282 may include memory controller hub logic (IMC).In addition,
CL1272,1282 may also include I/O control logics.Figure 10 is shown:Not only memory 1242,1244 is coupled to CL1272,1282,
I/O equipment 1214 is also coupled to control logic 1272,1282.Traditional I/O equipment 1215 is coupled to chipset 1290.
Referring now to Figure 14, shown is SoC1400 according to embodiments of the present invention block diagram.Similar member in Figure 15
Part has similar reference.In addition, dotted line frame is more advanced SoC optional feature.In fig. 14, interconnecting unit is (more
It is individual) 1402 it is coupled to:Application processor 1410, includes one or more core 1402A-N set and shared cache element
(multiple) 1406;System agent unit 1410;Bus control unit unit (multiple) 1414;Integrated memory controller unit is (more
It is individual) 1414;The set of one or more Media Processors 1420, it may include integrated graphics logic 1408, for provide it is static and/
Or image processor 1424, the audio process 1426 for providing hardware audio acceleration, the Yi Jiyong of video camera functionality
In the video processor 1428 for providing encoding and decoding of video acceleration;Static RAM (SRAM) unit 1430;Directly
Memory access (DMA) unit 1432;And display unit 1440, for coupled to one or more external displays.
Each embodiment of mechanism disclosed herein can be implemented in the group of hardware, software, firmware or these implementation methods
In conjunction.Computer program or program code that embodiments of the invention can be realized to perform on programmable system, this may be programmed
System includes at least one processor, storage system (including volatibility and nonvolatile memory and/or memory element), at least
One input equipment and at least one output equipment.
Can be by program code using performing functions described herein to input data and produce output information.Output information
One or more output equipments can be applied in a known manner.For the purpose of the application, processing system is included with all
Such as such as digital signal processor (DSP), microcontroller, application specific integrated circuit (ASIC) or the processor of microprocessor
Any system.
Program code can realize with the programming language of advanced procedures language or object-oriented, so as to processing system
Communication.Program code can also be realized with assembler language or machine language in case of need.In fact, described herein
Mechanism be not limited only to the scope of any certain programmed language.In either case, language can be compiler language or interpretation language
Speech.
The one or more aspects of at least one embodiment can be by storing representative instruction on a machine-readable medium
To realize, the instruction represents the various logic in processor, and the instruction is when being read by a machine so that the machine is made for holding
The logic of row the techniques described herein.Tangible machine readable media can be stored in by being referred to as these expressions of " IP kernel "
On, and be provided to various clients or production facility to be loaded into the manufacture machine for actually manufacturing the logic or processor.
Such machinable medium may include but be not limited to by the non-volatile of machine or device fabrication or formation
Physical device, including storage medium, such as:Hard disk;Including floppy disk, CD, compact disk read-only storage (CD-ROM), it can weigh
Write the disk of compact disk (CD-RW) and any other type of magneto-optic disk;Such as semiconductor device of read-only storage (ROM) etc
Part;Such as random access memory of dynamic random access memory (DRAM), static RAM (SRAM) etc
(RAM);Erasable Programmable Read Only Memory EPROM (EPROM);Flash memory;Electrically Erasable Read Only Memory (EEPROM);Magnetic
Card or light-card;Or suitable for the medium for any other type for storing e-command.
Therefore, various embodiments of the present invention also include non-transient, tangible machine-readable media, and the medium is friendly comprising vector
The instruction of instruction format includes design data, such as hardware description language (HDL), its definition structure described herein, electricity
Road, device, processor and/or system performance.These embodiments are also referred to as program product.
In some cases, dictate converter can be used to from source instruction set change instruction to target instruction set.For example, referring to
Making converter can convert and (for example include the dynamic binary translation of on-the-flier compiler using static binary conversion), deform
(morph), emulate or otherwise convert instructions into the one or more of the other instruction that will be handled by core.Instruction conversion
Device can be realized with software, hardware, firmware or its combination.Dictate converter can on a processor, outside processor or
Part is on a processor partly outside processor.
Figure 16 is that control according to an embodiment of the invention uses software instruction converter by the binary system in source instruction set
Instruction is converted into the block diagram of the binary command of target instruction target word concentration.In an illustrated embodiment, dictate converter is that software refers to
Converter is made, but can be realized as the dictate converter is substituted with software, firmware, hardware or its various combination.
Figure 16 shows that x86 compilers 1604 can be used to compile the program of high-level language 1602, can be by generate
The x86 binary codes 1606 that processor Proterozoic with least one x86 instruction set core 1616 is performed are (in presumptive instruction
Some are compiled with vectorial friendly instruction format).Processor with least one x86 instruction set core 1816 represents any place
Manage device, the processor can be by compatibly performing or the otherwise instruction set of processing (1) Intel x86 instruction set cores
Most of or (2) are directed at the application run on the Intel processors with least one x86 instruction set core or other softwares
Object identification code version so that perform with the essentially identical function of Intel processors with least one x86 instruction set core,
To realize the result essentially identical with the Intel processors with least one x86 instruction set core.X86 compilers 1804 are represented
Compiler for generating x86 binary codes 1606 (for example, object identification code), the binary code 1616 can by or it is obstructed
The additional processing that links is crossed to perform on the processor with least one x86 instruction set core 1016.Similarly, Figure 90 is shown
With the program of high-level language 1602 the instruction set compiler 1608 of replacement can be used to compile, can be by without extremely with generation
The processor 1614 of few x86 instruction set cores is (such as public with the MIPS technologies for performing California Sunnyvale city
The MIPS instruction set of department, and/or perform the core of the ARM instruction set of the ARM holding companies in California Sunnyvale city
Processor) come primary execution alternative command collection binary code 1610.Dictate converter 1612 was used to x86 binary system generations
Code 1606 be converted into can by the primary execution of processor without x86 instruction set core 1614 code.The converted code
It is unlikely identical with replaceability instruction set binary code 1610, because it is difficult to make the dictate converter that can so do;
However, the code after conversion will complete general operation and is made up of the instruction from replaceability instruction set.Therefore, dictate converter
1612 represent:Allowed by emulation, simulation or any other process processor without x86 instruction set processors or core or
Other electronic equipments are carried out software, firmware, hardware or its combination of x86 binary codes 1606.
Some operations of instruction (multiple) can be performed by nextport hardware component NextPort, and may be embodied in machine-executable instruction, and this refers to
Make and the operation is performed with the circuit of the instruction programming or other nextport hardware component NextPorts for causing or at least resulting in.Circuit may include
Universal or special processor or logic circuit, only provide several examples here.These operations are also optionally by hardware and software
Combination perform.Execution logic and/or processor may include special or particular electrical circuit or other logics, and it is in response to machine instruction
Or derived from machine instruction or one or more control signals, and the result operand that store instruction is specified.For example, public herein
The embodiment for the instruction (multiple) opened can be performed in one or more systems, and the instruction (multiple) of the friendly instruction format of vector
Embodiment be storable in the program code that will be performed in systems.The treatment element of these other accompanying drawings can be using this paper in detail
One of streamline and/or framework (such as orderly and unordered framework) of the detailed description carefully described.For example, the decoding of framework in order
Unit decodable code instructs (multiple), decoded instruction is sent into vector or scalar units etc..
Foregoing description is intended to illustrate the preferred embodiment of invention.According to the above discussion, it should also be apparent that,
Quickly grow and be further in progress in this technical field for being difficult to predict, those skilled in the art can be right in arrangement and details
The present invention modifies, without departing from the principle of the invention fallen in the range of appended claims and its equivalence.Example
Such as, one or more operations of method can be combined or be spaced further apart.
Alternative embodiment
Although it have been described that by the primary embodiment for performing the friendly instruction format of vector, but the alternative embodiment of the present invention
The processor of different instruction set can be performed by operating in (for example, performing the MIPS technologies of U.S. Jia Lifuya states Sunnyvale
The processor of the MIPS instruction set of company, the processing of the ARM instruction set of the ARM holding companies of execution Jia Lifuya states Sunnyvale
Device) on the simulation layer that runs perform vectorial friendly instruction format.Equally, although the flow in accompanying drawing illustrates certain of the present invention
The specific operation order of a little embodiments, it should be understood that this is sequentially exemplary (for example, alternative embodiment can be held by different order
Row operation, combine some operations, make some operations overlapping etc.).
In the above description, for illustrative purposes, numerous details are illustrated to provide to embodiments of the invention
Thorough understanding.However, will be apparent to those skilled in the art also may be used without some in these details
Put into practice one or more other embodiments.Described specific embodiment is provided and is not limited to the present invention but in order to illustrate
Embodiments of the invention.The scope of the present invention is determined by the specific example provided, but only true by appended claims
It is fixed.
Claims (19)
1. a kind of method for the implementation procedure in computer processor, including:
Before the instruction of the first kind is performed, JH mode bits are stored in multiple registers;
The instruction of instruction and Second Type to the first kind is decoded;
The instruction of one or more first kind is performed by following operation to map and linear to perform S-Box in JH states
(L) convert:
The instruction of the first kind is performed for the first time, to be held on the first component being stored in the first register of the JH states
Row S-Box maps and L conversion;And
The instruction of second of execution first kind, to be held on the second component being stored in the second register of the JH states
Row S-Box maps and L conversion;And
Once having been carried out S-Box mappings and L conversion, the instruction for being carried out one or more Second Types comes in the JH shapes
Permutation function is performed in state, wherein the first of half of the form of the instruction of the first kind including being used to store JH states posts
Storage operand, and the form of the instruction of the Second Type includes the implementing result of the instruction for keeping the first kind
Second and the 3rd register operand.
2. the method as described in claim 1, it is characterised in that the multiple register is 512 bit registers.
3. method as claimed in claim 2 a, it is characterised in that register stores the low 512 of the JH states, and one
Different registers stores the high 512 of the JH states.
4. the method as described in claim 1, it is characterised in that perform first for the first time and for the second time using mask register
The instruction of type.
5. the method as described in claim 1, it is characterised in that further comprise:
The result that first time is performed to the instruction of the first kind in the first destination register is stored as the first JH state outcomes;With
And
The result for the instruction for performing the first kind by second in the second destination register is stored as the 2nd JH state outcomes.
6. method as claimed in claim 5, it is characterised in that performing the instruction of the Second Type also includes:
JH state outcomes are retrieved from first and second destination register;
The first permutation function is performed in the first JH state outcomes;And
The second permutation function is performed in the 2nd JH state outcomes.
7. a kind of instruction processing unit, including:
Multiple data registers, wherein the multiple data register include be used for store JH mode bits half register with
And for storing second half register of JH mode bits;And
The execution unit coupled with the multiple data register, for performing the instruction of one or more first kind with JH
S-Box mappings and linear (L) conversion are performed in state, and once has been carried out S-Box mappings and L conversion, one is carried out
Or the instruction of multiple Second Types performs permutation function in JH states, wherein the form of the instruction of the first kind includes
Form for the first register operand of the half for storing JH states, and the instruction of the Second Type includes being used to keep
Second and the 3rd register operand of the implementing result of the instruction of the first kind, wherein the execution unit is used for first
The secondary instruction for performing the first kind is converted and is used for L to perform S-Box mappings in the first half portion of the JH mode bits
The instruction of second of execution first kind is converted with performing S-Box mappings and L in the second half portion of the JH mode bits.
8. instruction processing unit as claimed in claim 7, it is characterised in that first register is 512 bit registers.
9. instruction processing unit as claimed in claim 7, it is characterised in that the execution unit is used to use mask register
Carry out for the first time and perform for second the instruction of the first kind.
10. instruction processing unit as claimed in claim 9, it is characterised in that the execution unit is used for:Posted in the first purpose
The result that first time is performed to the instruction of the first kind in storage is stored as the first JH state outcomes, and in the second destination register
The result of the interior instruction for performing the first kind by second is stored as the 2nd JH state outcomes.
11. instruction processing unit as claimed in claim 10, it is characterised in that the execution unit is used to pass through following steps
Perform the instruction of the Second Type:JH state outcomes are retrieved from first and second destination register, in the first JH
The first permutation function is performed in state outcome, and the second permutation function is performed in the 2nd JH state outcomes.
12. a kind of equipment for performing JH keyed hash, including:
First instruction executing device, for perform the first kind instruction before by JH mode bits be stored in multiple registers with
And mapped and linear to perform the instruction of one or more first kind with performing S-Box in JH states by following operate
(L) convert:
The instruction of the first kind is performed for the first time, to be held on the first component being stored in the first register of the JH states
Row S-Box maps and L conversion;And
The instruction of second of execution first kind, to be held on the second component being stored in the second register of the JH states
Row S-Box maps and L conversion;And
Second instruction executing device, is used for:Once having been carried out S-Box mappings and L conversion, one or more second are carried out
The instruction of type in the JH states performs permutation function, wherein the form of the instruction of the first kind includes being used to deposit
The first register operand of the half of JH states is stored up, and the form of the instruction of the Second Type includes being used to keep described the
Second and the 3rd register operand of the implementing result of the instruction of one type.
13. equipment as claimed in claim 12, it is characterised in that first instruction executing device is used to deposit using mask
Device carrys out for the first time and performed for second the instruction of the first kind.
14. equipment as claimed in claim 12, it is characterised in that first instruction executing device is further used for:
The result that first time is performed to the instruction of the first kind in the first destination register is stored as the first JH state outcomes;With
And
The result for the instruction for performing the first kind by second in the second destination register is stored as the 2nd JH state outcomes.
15. equipment as claimed in claim 14, it is characterised in that second instruction executing device is further used for:
JH state outcomes are retrieved from first and second destination register;
The first permutation function is performed in the first JH state outcomes;And
The second permutation function is performed in the 2nd JH state outcomes.
16. a kind of computer system, including:
Interconnection;
The processor coupled is interconnected with described, the processor includes:
Multiple data registers, the multiple data register include be used for store JH mode bits the first half portion register with
And for the register for the second half portion for storing JH mode bits;And
The execution unit coupled with the multiple data register, for performing the instruction of one or more first kind with JH
S-Box mappings and linear (L) conversion are performed in state, and once has been carried out S-Box mappings and L conversion, one is carried out
Or the instruction of multiple Second Types performs permutation function in JH states, wherein the form of the instruction of the first kind includes
Form for the first register operand of the half for storing JH states, and the instruction of the Second Type includes being used to keep
Second and the 3rd register operand of the implementing result of the instruction of the first kind, wherein the execution unit is used for first
The secondary instruction for performing the first kind is converted and is used for L to perform S-Box mappings in the first half portion of the JH mode bits
The instruction of second of execution first kind is converted with performing S-Box mappings and L in the second half portion of the JH mode bits;And
The dynamic random access memory (DRAM) coupled is interconnected with described.
17. computer system as claimed in claim 16, it is characterised in that the execution unit is used to use mask register
Carry out for the first time and perform for second the instruction of the first kind.
18. computer system as claimed in claim 16, it is characterised in that the execution unit is used for the deposit in the first mesh
The result that first time is performed to the instruction of the first kind in device is stored as the first JH state outcomes, and in the second destination register
The result for the instruction for performing the first kind by second is stored as the 2nd JH state outcomes.
19. computer system as claimed in claim 18, it is characterised in that the execution unit is used to hold by following steps
The instruction of the row Second Type:JH state outcomes are retrieved from first and second destination register, in the first JH shapes
The first permutation function is performed in state result, and the second permutation function is performed in the 2nd JH state outcomes.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2011/066733 WO2013095484A1 (en) | 2011-12-22 | 2011-12-22 | Instructions to perform jh cryptographic hashing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104012031A CN104012031A (en) | 2014-08-27 |
CN104012031B true CN104012031B (en) | 2017-07-21 |
Family
ID=48669126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180075719.6A Active CN104012031B (en) | 2011-12-22 | 2011-12-22 | Instruction for performing JH keyed hash |
Country Status (4)
Country | Link |
---|---|
US (1) | US9251374B2 (en) |
CN (1) | CN104012031B (en) |
TW (1) | TWI517654B (en) |
WO (1) | WO2013095484A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104011709B (en) | 2011-12-22 | 2018-06-05 | 英特尔公司 | The instruction of JH keyed hash is performed in 256 bit datapaths |
US11032061B2 (en) * | 2018-04-27 | 2021-06-08 | Microsoft Technology Licensing, Llc | Enabling constant plaintext space in bootstrapping in fully homomorphic encryption |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1531820A (en) * | 2001-06-30 | 2004-09-22 | ض� | Multi-level, multi-dimensional content protection |
CN104011709A (en) * | 2011-12-22 | 2014-08-27 | 英特尔公司 | Instructions To Perform JH Cryptographic Hashing In A 256 Bit Data Path |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2001269086A1 (en) * | 2000-07-04 | 2002-01-14 | Koninklijke Philips Electronics N.V. | Substitution-box for symmetric-key ciphers |
AU2003257045A1 (en) | 2002-07-29 | 2004-02-16 | Qualcomm Incorporated | Digital image encoding |
US7502470B2 (en) * | 2003-01-13 | 2009-03-10 | Silicon Image, Inc. | Method and apparatus for content protection within an open architecture system |
KR100996023B1 (en) | 2005-10-31 | 2010-11-22 | 삼성전자주식회사 | Apparatsu and method for transmitting/receiving of data in a multiple antenna communication system |
US8036379B2 (en) | 2006-03-15 | 2011-10-11 | Microsoft Corporation | Cryptographic processing |
TWI322613B (en) | 2006-11-15 | 2010-03-21 | Quanta Comp Inc | 3d image adjusting apparatus and method of the same |
US8655939B2 (en) * | 2007-01-05 | 2014-02-18 | Digital Doors, Inc. | Electromagnetic pulse (EMP) hardened information infrastructure with extractor, cloud dispersal, secure storage, content analysis and classification and method therefor |
US8675865B2 (en) * | 2010-09-24 | 2014-03-18 | Intel Corporation | Method and apparatus for a high bandwidth stream cipher |
US20120254591A1 (en) * | 2011-04-01 | 2012-10-04 | Hughes Christopher J | Systems, apparatuses, and methods for stride pattern gathering of data elements and stride pattern scattering of data elements |
CN107133018B (en) * | 2011-12-22 | 2020-12-22 | 英特尔公司 | Instruction to perform GROESTL hashing |
-
2011
- 2011-12-22 CN CN201180075719.6A patent/CN104012031B/en active Active
- 2011-12-22 WO PCT/US2011/066733 patent/WO2013095484A1/en active Application Filing
- 2011-12-22 US US13/992,225 patent/US9251374B2/en not_active Expired - Fee Related
-
2012
- 2012-12-13 TW TW101146621A patent/TWI517654B/en not_active IP Right Cessation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1531820A (en) * | 2001-06-30 | 2004-09-22 | ض� | Multi-level, multi-dimensional content protection |
CN104011709A (en) * | 2011-12-22 | 2014-08-27 | 英特尔公司 | Instructions To Perform JH Cryptographic Hashing In A 256 Bit Data Path |
Non-Patent Citations (1)
Title |
---|
The Hash Function JH;Hongjun Wu;《Submission to NIST》;20090915;正文第4页第2行至第34页倒数第1段,图1至图6 * |
Also Published As
Publication number | Publication date |
---|---|
CN104012031A (en) | 2014-08-27 |
TW201338492A (en) | 2013-09-16 |
US20140053000A1 (en) | 2014-02-20 |
US9251374B2 (en) | 2016-02-02 |
TWI517654B (en) | 2016-01-11 |
WO2013095484A1 (en) | 2013-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103975302B (en) | Matrix multiplication accumulated instruction | |
CN104126174B (en) | Perform the instruction of GROESTL hash | |
CN104484284B (en) | For providing instruction and the logic of advanced paging ability for Secure Enclave page cache | |
CN105051743B (en) | For handling the instruction processing unit of Secure Hash Algorithm, method and system | |
CN104025039B (en) | Packaged data operation mask concatenation processor, method, system and instruction | |
TWI525533B (en) | Systems, apparatuses, and methods for performing mask bit compression | |
CN104509026B (en) | Method and apparatus for handling SHA-2 Secure Hash Algorithm | |
CN104137060B (en) | Cache assists processing unit | |
CN104335166B (en) | For performing the apparatus and method shuffled and operated | |
CN104011663B (en) | Broadcast operation on mask register | |
CN103562854B (en) | Systems, devices and methods for the register that aligns | |
CN110233720A (en) | SM4 OverDrive Processor ODP, method and system | |
CN107667499A (en) | Band Keyed-Hash Message authentication code processor, method, system and instruction | |
CN104126167B (en) | Apparatus and method for being broadcasted from from general register to vector registor | |
JP2019207393A (en) | Hardware accelerators and methods for high-performance authenticated encryption | |
CN104011650B (en) | The systems, devices and methods that mask and immediate write setting output masking during mask register writes mask register in destination from source are write using input | |
CN110138541A (en) | Uniform hardware accelerator for symmetric key cipher | |
CN104583940B (en) | For the processor of SKEIN256 SHA3 algorithms, method, data handling system and equipment | |
CN107924308A (en) | Data element comparator processor, method, system and instruction | |
CN104011709B (en) | The instruction of JH keyed hash is performed in 256 bit datapaths | |
CN108027864A (en) | The s- box hardware accelerators mapped through biaffine | |
CN109582283A (en) | Bit matrix multiplication | |
CN104126171B (en) | For writing the systems, devices and methods that mask register generates dependence vector based on two sources | |
CN107111554A (en) | Apparatus and method for considering spatial locality when loading data element for performing | |
CN104012031B (en) | Instruction for performing JH keyed hash |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |