WO2020220743A1 - 一种计算机数据处理方法及装置 - Google Patents

一种计算机数据处理方法及装置 Download PDF

Info

Publication number
WO2020220743A1
WO2020220743A1 PCT/CN2020/070620 CN2020070620W WO2020220743A1 WO 2020220743 A1 WO2020220743 A1 WO 2020220743A1 CN 2020070620 W CN2020070620 W CN 2020070620W WO 2020220743 A1 WO2020220743 A1 WO 2020220743A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
multiplication
bit
result
group
Prior art date
Application number
PCT/CN2020/070620
Other languages
English (en)
French (fr)
Inventor
赵原
殷山
Original Assignee
创新先进技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 创新先进技术有限公司 filed Critical 创新先进技术有限公司
Priority to US16/779,073 priority Critical patent/US10782933B2/en
Publication of WO2020220743A1 publication Critical patent/WO2020220743A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products

Definitions

  • This specification belongs to the field of computer technology, and in particular relates to a computer data processing method and device.
  • the purpose of the embodiments of this specification is to provide a computer data processing method and device, which can realize fast calculation of 256-bit multiplication, improve data processing efficiency, and lay a data foundation for data processing processes such as cryptography and large integer operations.
  • the embodiments of this specification provide a computer data processing method for realizing two 256-bit multiplication data processing, and the method includes:
  • the preset rules include:
  • the split multiplier and the split multiplicand are divided into seven data pairs, the first data pair includes: a[0]b[0], the second data pair includes: a[1 ]b[0], a[0]b[1], the third data pair includes: a[2]b[0], a[1]b[1], a[0]b[2], the first
  • the four sets of data pairs include: a[3]b[0], a[2]b[1], a[1]b[2], a[0]b[3], the fifth set of data pairs includes: a [3]b[1], a[2]b[2], a[1]b[3], the sixth group of data pairs includes: a[3]b[2], a[2]b[3] ,
  • the seventh set of data pairs includes: a[3]b[3];
  • the intra-group accumulation includes: in the same set of data pairs, Each time the multiplication result of a data pair is calculated, the calculated multiplication result and the multiplication result of the previous data pair in the group are accumulated in the group, and the lower 64 bits of the final accumulation result of the data pair in the same group are saved to the memory. Obtain the remaining accumulation result of the corresponding group of data pairs, and release the corresponding register;
  • the multiplication result of the first data pair of each data pair is accumulated with the remaining accumulation result of the previous data pair, and then accumulated with the multiplication result of the next data pair, until the multiplication result of the data pair in the seventh group of data is obtained. End, save the accumulation result corresponding to the data pair in the seventh set of data in the memory, and obtain the multiplication processing result of the target data.
  • the method further includes:
  • the register storing the multiplication result of each data pair is released, and the accumulation result is stored in 3 registers.
  • the method is applied to a 64-bit computer operating system.
  • the method further includes:
  • registers RBX, RBP, R12, R13, R14, and R15 of the 64-bit computer operating system are arbitrarily selected from the registers RBX, RBP, R12, R13, R14, and R15 of the 64-bit computer operating system, and the value of the selected register is stored in the memory;
  • the saved value of the selected register is obtained from the memory, and the value of the selected register is restored.
  • the method further includes:
  • the registers RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11 are selected from the 64-bit computer operating system, and the selected registers are used for data storage during data processing.
  • this specification provides a computer data processing device for realizing two 256-bit multiplication data processing, the device includes:
  • the data splitting module is used to split the multiplier a and the multiplicand b from high to low into 4 64-digit numbers, respectively, to obtain the split multiplier and the split multiplicand.
  • the multiplier includes: a[3], a[2], a[1], a[0], the multiplicand after the split includes: b[3], b[2], b[1], b [0];
  • a data processing module for reading the split multiplier and the split multiplier into a register, and according to a preset rule, the split multiplier and the split multiplied The number is multiplied to obtain the multiplication processing result of the target data;
  • the preset rules include:
  • the split multiplier and the split multiplicand are divided into seven data pairs, the first data pair includes: a[0]b[0], the second data pair includes: a[1 ]b[0], a[0]b[1], the third data pair includes: a[2]b[0], a[1]b[1], a[0]b[2], the first
  • the four sets of data pairs include: a[3]b[0], a[2]b[1], a[1]b[2], a[0]b[3], the fifth set of data pairs includes: a [3]b[1], a[2]b[2], a[1]b[3], the sixth group of data pairs includes: a[3]b[2], a[2]b[3] ,
  • the seventh set of data pairs includes: a[3]b[3];
  • the intra-group accumulation includes: in the same set of data pairs, Each time the multiplication result of a data pair is calculated, the calculated multiplication result and the multiplication result of the previous data pair in the group are accumulated in the group, and the lower 64 bits of the final accumulation result of the data pair in the same group are saved to the memory. Obtain the remaining accumulation result of the corresponding group of data pairs, and release the corresponding register;
  • the multiplication result of the first data pair of each data pair is accumulated with the remaining accumulation result of the previous data pair, and then accumulated with the multiplication result of the next data pair, until the multiplication result of the data pair in the seventh group of data is obtained. End, save the accumulation result corresponding to the data pair in the seventh set of data in the memory, and obtain the multiplication processing result of the target data.
  • the data processing module is specifically configured to:
  • the register storing the multiplication result of each data pair is released, and the accumulation result is stored in 3 registers.
  • the device is applied to a 64-bit computer operating system.
  • the device further includes a register preparation module for:
  • registers RBX, RBP, R12, R13, R14, and R15 of the 64-bit computer operating system are arbitrarily selected from the registers RBX, RBP, R12, R13, R14, and R15 of the 64-bit computer operating system, and the value of the selected register is stored in the memory;
  • the saved value of the selected register is obtained from the memory, and the value of the selected register is restored.
  • the register preparation module is further used for:
  • the registers RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11 are selected from the 64-bit computer operating system, and the selected registers are used for data storage during data processing.
  • this specification provides a computer device including: at least one 64-bit processor, a memory for storing 64-bit instructions executable by the processor, and at least 13 64-bit registers, and the processor executes the 64-bit Bit instructions implement the computer data processing method described above.
  • the computer data processing method, device, and processing equipment provided in this manual realize 256-bit multiplication, split two 256-bit data into 64-bit data, and then calculate the split data according to preset rules. And in the entire calculation process, the multiplication results of each group of data pairs are calculated one by one, and the multiplication results of each group of data pairs are accumulated according to the group. After the accumulation of the multiplication results of each group of data pairs is completed, the lower 64 of the accumulated result is stored in In the memory, release the corresponding register. That is, multiply and accumulate, and the operation mode of releasing the low 64-bit register after a group of accumulating can achieve less registers, so that the entire operation process can only access the registers without accessing the cache and memory, which improves the efficiency of data processing. It realizes the fast operation of 256-bit multiplication, and the calculated multiplication processing result lays the data foundation for the data processing process such as cryptography and large integer operation.
  • Fig. 1 is a schematic flowchart of a computer data processing method in an embodiment of this specification
  • Figure 2 is a schematic diagram of data pair grouping in an embodiment of this specification
  • Fig. 3 is a schematic flow chart of a 256-bit multiplication operation in an embodiment of this specification
  • FIG. 4 is a schematic diagram of the module structure of an embodiment of the computer data processing device provided in this specification.
  • FIG. 5 is a schematic diagram of the structure of a computer data processing device in another embodiment of this specification.
  • Fig. 6 is a block diagram of the hardware structure of the computer data processing server in an embodiment of this specification.
  • More and more application scenarios require the use of large integer operations, which can utilize computer data processing capabilities to implement large integer operations through computer programming and other methods.
  • public key cryptographic algorithms widely use large integer multiplication.
  • elliptic curve algorithms in cryptographic algorithms such as National Secret SM2 (elliptic curve public key cryptographic algorithm issued by the National Cryptographic Administration), US NIST 256r1 ECDSA (US NIST announced Elliptic curve digital signature algorithm), both use 256-bit modular multiplication. Because SM2 and NIST 256r1 use special pseudo Mersenne prime numbers, fast reduction can be used for the 256-bit modulus, so the 256-bit multiplication can be calculated first, and then the fast reduction is used to find the modulus.
  • 256-bit multiplication accounts for about half of the calculations of National Secret SM2 and NIST 256r1 ECDSA.
  • 256-bit multiplication can also be used as an implementation module to construct higher-bit multiplication and modular multiplication, used for RSA2048, RSA4096, etc., RSA2048, RSA4096 can be understood as two public key encryption algorithms.
  • the embodiment of this specification provides a computer data processing method, which can realize 256-bit multiplication, by splitting the multiplier and multiplicand used for 256-bit multiplication into 4 64-bit numbers, and then splitting The subsequent data is calculated according to certain rules.
  • the whole process can be realized by computer program code. During the operation, only the internal registers of the computer can be used, and there is no need to limit which registers are used, and there is no need to access the cache space, which improves the efficiency of data processing.
  • the computer data processing method in this manual can be applied to the client or server.
  • the client can be a smart phone, tablet computer, smart wearable device (smart watch, virtual reality glasses, virtual reality helmet, etc.), smart car equipment and other electronic devices .
  • FIG. 1 is a schematic flowchart of a computer data processing method in an embodiment of this specification.
  • the computer data processing method provided in an embodiment of this specification may include:
  • Step 102 Divide the multiplier a and the multiplicand b from high to low into four 64-digit numbers, respectively, obtain the split multiplier and the split multiplicand, and the split multiplier includes: a [3], a[2], a[1], a[0], the multiplicand after the split includes: b[3], b[2], b[1], b[0].
  • the embodiments of this specification can be used to implement a 256-bit multiplication, that is, a large integer multiplication operation of two 256-bit data can be implemented.
  • the number and the multiplied after the split can be divided according to the number of bits of the data, which is divided every 64 bits. Refer to Figure 3 for details.
  • the multiplier is divided into 64 bits from high to low after splitting: a[3], a[2], a[1], a[0], and the multiplicand from high to low after splitting is 64 bits Divided into: b[3], b[2], b[1], b[0].
  • Step 104 Read the split multiplier and the split multiplicand into a register, and compare the split multiplier with the split multiplicand according to a preset rule. Multiplication processing to obtain the multiplication processing result of the target data;
  • the preset rules include:
  • the split multiplier and the split multiplicand are divided into seven data pairs, the first data pair includes: a[0]b[0], the second data pair includes: a[1 ]b[0], a[0]b[1], the third data pair includes: a[2]b[0], a[1]b[1], a[0]b[2], the first
  • the four sets of data pairs include: a[3]b[0], a[2]b[1], a[1]b[2], a[0]b[3], the fifth set of data pairs includes: a [3]b[1], a[2]b[2], a[1]b[3], the sixth group of data pairs includes: a[3]b[2], a[2]b[3] ,
  • the seventh set of data pairs includes: a[3]b[3];
  • the intra-group accumulation includes: in the same set of data pairs, Each time the multiplication result of a data pair is calculated, the calculated multiplication result and the multiplication result of the previous data pair in the group are accumulated within the group, and the lower 64 bits of the final accumulation result of each group of data pairs are saved to the memory to obtain each Group the remaining accumulation result of the data pair, and release the corresponding register;
  • the multiplication result of the first data pair of each data pair is accumulated with the remaining accumulation result of the previous data pair, and then accumulated with the multiplication result of the next data pair, until the multiplication result of the data pair in the seventh group of data is obtained. End, save the accumulation result corresponding to the data pair in the seventh set of data in the memory, and obtain the multiplication processing result of the target data.
  • 256-bit multiplication functions such as 64-bit multiplication instructions and 64-bit addition instructions can be called to calculate 256-bit multiplications.
  • the split data can be read from the memory to the register first. After splitting, the multiplier and the split multiplier can occupy 8 64-bit registers. Then according to the preset rules, the split multiplier and the split multiplier are multiplied together to obtain the final target multiplication result, which can be stored in the memory.
  • the specific preset rules for multiplication operations may include:
  • Figure 2 is a schematic diagram of the data pair grouping in the embodiment of this specification.
  • the multiplication rule can be followed, that is, the multiplier and the multiplicand can be multiplied bit by bit, and the dislocation arrangement rule will be The arranged data is grouped by column, and the two data multiplied can be used as a data pair.
  • the split multiplier and the split multiplier are divided into 7 groups of data pairs.
  • the dashed boxes from right to left in Figure 2 are the first to the seventh data pairs.
  • the first set of data pairs include: a[0]b[0]
  • the second set of data pairs include: a[1]b[0], a[0]b[1]
  • the three sets of data pairs include: a[2]b[0], a[1]b[1], a[0]b[2]
  • the fourth set of data pairs include: a[3]b[0], a [2]b[1], a[1]b[2], a[0]b[3]
  • the fifth group of data pairs includes: a[3]b[1], a[2]b[2]
  • the sixth group of data pairs include: a[3]b[2], a[2]b[3]
  • the seventh group of data pairs include: a[3]b[3] . That is, the first group of data pairs include a[0] and b[0], and the second group of data pairs include a[1] and b[0],
  • grouping the split data can be understood as pre-setting the multiplication of each data pair and the accumulation order of the multiplication results. It does not necessarily mean that the data needs to be grouped first, but only indicates which data is to be grouped. The result of the multiplication is accumulated. For the convenience of description, the embodiment of this specification may first group the data, accumulate the multiplication results of the data pairs in the same group, and accumulate the high bits with the multiplication results of the next group of data pairs.
  • the multiplication results of the first data pair to the seventh data pair can be calculated one by one, and the multiplication results of each data pair in each group can be accumulated. If there is only one data pair in the group, Then the multiplication result of the data pair is regarded as the accumulation result of the group. For example: Calculate the multiplication result of the first set of data pairs, that is, first calculate a[0] ⁇ b[0] to obtain a 128-bit multiplication result.
  • the multiplication result can be stored in a register.
  • the multiplication result can occupy two 64-bits register.
  • the result of the multiplication can be used as the accumulation result of the first set of data pairs, and the lower 64 bits of the accumulation result can be used as the lower 64 bits of the final target multiplication result, saved from the register in the memory, and freeing the lower 64 bits of the multiplication result register. Then calculate the multiplication results of the second set of data pairs, accumulate the multiplication results of the second set of data pairs, calculate the multiplication results of the third set of data pairs, and accumulate the multiplication results of the third set of data pairs, and so on , I won’t repeat it here.
  • each time the multiplication result of a data pair is calculated the calculated multiplication result is accumulated with the multiplication result of the previous data pair, and then the next data is calculated Multiply the result of the pair, and accumulate.
  • the lower 64-bit data of the final accumulation result of each group of data pairs can be saved from the corresponding register to the memory, and the corresponding register is released to obtain the remaining accumulation result of the group of data pairs.
  • the multiplication result of the first data pair calculated in each data pair is first accumulated with the remaining accumulation result of the previous data pair, and then the multiplication result of the next data pair is calculated, and the multiplication result of the next data pair is calculated It is accumulated with the accumulated accumulation result until the seventh data pair is calculated. Save the accumulation result of the seventh set of data pairs in the memory, that is, you can obtain two 256-digit target multiplication results.
  • each group of data pairs may include multiple data pairs.
  • the order of calculation of each data pair can be selected according to actual needs.
  • the embodiment of this specification does not Specifically limit the calculation sequence of the data pairs in each group of data pairs.
  • the multiplication result of each data pair when performing operations on each set of data pairs, will be released when data accumulation is performed, and the original register storing the multiplication result will be released, and the accumulation result will be stored in three registers to ensure The registers occupied the least during the whole calculation process.
  • the computer data processing methods provided in some embodiments of this specification can run in a 64-bit computer operating system, such as a 64-bit operating environment of x86CPU (Central Processing Unit/Processor, central processing unit).
  • the 64-bit computer operating system can support 64-bit instructions, such as 64-bit multiply and 64-bit add. Compared with 32-bit multiply and add, it has stronger computing power.
  • the 64-bit x86CPU can provide more registers.
  • the computer program of the computer data processing method in the embodiment of this specification can also adopt 64-bit compilation and use a 64-bit operating system to perform 256-bit multiplication operations, which can provide more registers, so that there is no need to access the cache during data processing. And memory, improve the efficiency of data processing, and quickly realize 256-bit multiplication.
  • x86-64CPU can usually include 16 64-bit registers in a 64-bit environment, such as: RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, R8, R9, R10, R11, R12, R13, R14 , R15.
  • 4 registers can be arbitrarily selected from the following 6 registers of a 64-bit computer operating system: RBX, RBP, R12, R13, R14, R15, and the value of the selected register is stored in the memory.
  • the target multiplication result is calculated, that is, after the 256-bit multiplication operation is completed, the saved values of the selected 4 registers are obtained from the memory, and the values of the selected registers are restored by using the saved values of the selected registers.
  • the value ensures the accurate operation of the function and improves the accuracy of data processing.
  • the embodiment of this specification uses the x86-64-bit characteristic 256-bit multiplication implementation method and implementation code as the basic module for large number calculation, and the calculated multiplication processing result of 256-bit data can be used to construct the MORSE platform of the technical department of the secure computing platform ( It can be understood as the elliptic curve cryptographic library and the large number calculation library of the digital currency platform), laying the theoretical foundation for cryptography and large number operations.
  • the following 9 registers RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11 can be selected from the 64-bit computer operating system, and the 4 registers selected in the above embodiments , Used for data storage during 256-bit multiplication operation.
  • the 16 64 registers of the x86-64CPU in a 64-bit environment except for the RSP register, the other 15 64-bit registers can be used for the calculation of 256-bit multiplication, and in the embodiment of this specification, priority Select RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R119 registers, you don’t need to restore the value of these 9 registers after the calculation is over, only need to restore from RBX, RBP, R12, R13, R14, R156 4 registers selected from the 2 registers. Reduce data processing steps and improve data processing efficiency.
  • the computer data processing method provided by the embodiment of this specification realizes a 256-bit multiplication operation, splits two 256-bit data into 64-bit numbers, and then calculates the split data according to preset rules. And in the entire calculation process, the multiplication results of each group of data pairs are calculated one by one, and the multiplication results of each group of data pairs are accumulated according to the group. After the accumulation of the multiplication results of each group of data pairs is completed, the lower 64 of the accumulated result is stored in In the memory, release the corresponding register.
  • multiply and accumulate and the operation mode of releasing the low 64-bit register after a group of accumulating can achieve less registers, so that the entire operation process can only access the registers without accessing the cache and memory, which improves the efficiency of data processing. Realize the fast operation of 256-bit multiplication.
  • Fig. 3 is a schematic diagram of the flow of 256-bit multiplication in an embodiment of this specification. The following describes the process of data processing in this embodiment of the specification in conjunction with Fig. 3:
  • the two 256-bit multipliers are divided into 64 bits, divided into four 64-bit numbers, and stored in the memory.
  • bit can represent the number of digits of data
  • 256bit can represent 256 digits
  • two 256-bit multipliers a, b, from high to low 64 bits are divided into a[3], a[2], a[1], a[0], b[3], b[2], b[1], b[0].
  • Steps 2, 3, and 4 can be understood as the execution process of the 256-bit multiplication function.
  • the 256-bit multiplication function is called to start the calculation of the 256-bit multiplication:
  • x86-64CPU can usually have 16 64-bit registers in a 64-bit environment: RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, R8, R9, R10, R11, R12, R13, R14, R15.
  • 64-bit system operating system stipulates which register values can be destroyed in a function, that is, it can be directly used for calculation, and its data does not need to be restored; which register values are in the function When returning, ensure that it must be the original value; which register values cannot be destroyed in the function.
  • the value is saved in the memory, and then can be used by the function. The value of these registers must be restored before the function returns.
  • 13 64-bit registers need to be prepared for 256-bit multiplication.
  • the function can directly use the following 9 registers: RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11; the following 6 register functions must ensure the original value when returning: RBX, RBP, R12, R13, R14, R15.
  • the value of the register RSP is the top address of the stack. In the embodiment of this specification, the register RSP cannot be used in a function.
  • step 3.2 to step 3.6 can be adjusted according to actual needs, and the embodiment of this specification does not specifically limit it.
  • A[3] ⁇ b[3] gets the 128-bit multiplication result, which is stored in 2 64-bit registers and added to the existing accumulation result.
  • the obtained addition result occupies 2 64-bit registers.
  • the 128-bit accumulation result obtained is the final result, which is stored in the memory from the register and no longer occupies the register.
  • each step can correspond to the calculation of a set of data pairs in the above embodiment, and the division of each set of data pairs can refer to Figures 2 and 3.
  • the product of the next set of data can be added to the high bits of the accumulation result of the previous set of data.
  • Step 3.1 occupies 2; Step 3.2, 3.3, 3.4, 3.5, 3.6 occupies 5 ; Step 3.7 occupies 3.
  • the values can be read from the memory to restore the values of these 4 registers.
  • s[0]-s[7] represents the accumulation result, where s[0] can represent the low 64 digits of the multiplication result of the first set of data pair a[0] ⁇ b[0], s[ 1] It can represent the lower 64 digits of the accumulation result of the second set of data pairs, and so on, s[6] and s[7] can represent the multiplication result of the seventh set of data pairs and the remaining accumulation of the sixth set of data pairs The result of the accumulation after accumulation.
  • the calculation process of the 256-bit multiplication in the embodiment of this specification can be realized by using the 64-bit instructions and 64-bit registers of the x86CPU.
  • the 64-bit instructions supported by the 64-bit environment of the x86CPU can specifically use the following 64-bit instructions:
  • the 64-bit environment of x86CPU has 16 64-bit registers: RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, R8, R9, R10, R11, R12, R13, R14, R15.
  • the compiled code is stored in the stack for protection, and the register is restored before the program returns, and the register can be used for calculation.
  • the register RSP used for the stack pointer
  • the other 15 64-bit registers can be used for calculation.
  • the 256-bit multiplication calculation process in the embodiment of this specification requires 13 64-bit spaces, that is, 13 64-bit registers, and the 64-bit environment of the x86CPU can have 15 64-bit registers for data processing. Therefore, the computer data processing method provided by the embodiments of this specification can complete calculations in registers in the 64-bit environment of x86CPU, and the extra two registers can be used as spare registers.
  • the computer data processing method provided by the embodiment of this specification realizes 256-bit multiplication, and can be completed in the 64-bit environment of x86CPU, occupies less registers, and does not limit (specify) which 64-bit registers are used and which 64-bit multiplications are used And addition instructions, and specific code implementation.
  • the entire calculation process does not need to access the cache and memory, which improves the efficiency of data processing, realizes a fast 256-bit multiplication operation, and provides a data basis for cryptography and large integer operations.
  • one or more embodiments of this specification also provide a computer data processing device.
  • the described devices may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc., which use the methods described in the embodiments of this specification, combined with necessary implementation hardware devices.
  • the devices in one or more embodiments provided in the embodiments of this specification are as described in the following embodiments. Since the implementation scheme of the device to solve the problem is similar to the method, the implementation of the specific device in the embodiment of this specification can participate in the implementation of the foregoing method, and the repetition will not be repeated.
  • the term "unit” or "module” can be a combination of software and/or hardware that implements predetermined functions.
  • the devices described in the following embodiments are preferably implemented by software, hardware or a combination of software and hardware is also possible and conceived.
  • FIG. 4 is a schematic diagram of the module structure of an embodiment of the computer data processing device provided in this specification.
  • the computer data processing device provided in this specification is used to process two 256-digit multiplication data.
  • the device may include: a data splitting module 41 and a data processing module 42, wherein:
  • the data splitting module 41 can be used to split the multiplier a and the multiplicand b from high to low into 4 64-digit numbers, respectively, to obtain the split multiplier and the split multiplicand, respectively.
  • the multiplier after the division includes: a[3], a[2], a[1], a[0], and the multiplicand after the division includes: b[3], b[2], b[1] , B[0];
  • the data processing module 42 may be used to read the split multiplier and the split multiplier into a register, and according to a preset rule, the split multiplier and the split multiplier The multiplicand is multiplied to obtain the multiplication result of the target data;
  • the preset rules include:
  • the split multiplier and the split multiplicand are divided into seven data pairs, the first data pair includes: a[0]b[0], the second data pair includes: a[1 ]b[0], a[0]b[1], the third set of data pairs includes: a[2]b[0], a[1]b[1], a[0]b[2], the first
  • the four sets of data pairs include: a[3]b[0], a[2]b[1], a[1]b[2], a[0]b[3], the fifth set of data pairs includes: a [3]b[1], a[2]b[2], a[1]b[3], the sixth group of data pairs includes: a[3]b[2], a[2]b[3] ,
  • the seventh set of data pairs includes: a[3]b[3];
  • the intra-group accumulation includes: in the same set of data pairs, Each time the multiplication result of a data pair is calculated, the calculated multiplication result and the multiplication result of the previous data pair in the group are accumulated in the group, and the lower 64 bits of the final accumulation result of the data pair in the same group are saved to the memory. Obtain the remaining accumulation result of the corresponding group of data pairs, and release the corresponding register;
  • the multiplication result of the first data pair of each data pair is accumulated with the remaining accumulation result of the previous data pair, and then accumulated with the multiplication result of the next data pair, until the multiplication result of the data pair in the seventh group of data is obtained. End, save the accumulation result corresponding to the data pair in the seventh set of data in the memory, and obtain the multiplication processing result of the target data.
  • the computer data processing device realizes a 256-bit multiplication operation, splits two 256-bit data into 64-bit numbers, and then calculates the split data according to preset rules. And in the entire calculation process, the multiplication results of each group of data pairs are calculated one by one, and the multiplication results of each group of data pairs are accumulated according to the group. After the accumulation of the multiplication results of each group of data pairs is completed, the lower 64 of the accumulated result is stored in In the memory, release the corresponding register.
  • multiply and accumulate and the operation mode of releasing the low 64-bit register after a group of accumulating can achieve less registers, so that the entire operation process can only access the registers without accessing the cache and memory, which improves the efficiency of data processing. Realize the fast operation of 256-bit multiplication.
  • the data processing module is specifically used for:
  • the register storing the multiplication result of each data pair is released, and the accumulation result is stored in 3 registers.
  • the operation mode of multiplying and accumulating, and releasing the lower 64-bit register after a group of accumulating is completed ensures that the register occupied by the whole calculation process is minimal.
  • the device is applied to a 64-bit computer operating system.
  • the embodiment of this specification uses a 64-bit operating system to perform 256-bit multiplication operations, which can provide more registers, so that during data processing, there is no need to access cache and memory, improve the efficiency of data processing, and quickly implement 256-bit multiplication.
  • Figure 5 is a schematic structural diagram of a computer data processing device in another embodiment of this specification. As shown in Figure 5, on the basis of the above embodiment, in some embodiments of this specification, the device further includes the device and the register The preparation module 51 is used to:
  • registers RBX, RBP, R12, R13, R14, and R15 of the 64-bit computer operating system are selected arbitrarily from the registers RBX, RBP, R12, R13, R14, and R15 of the 64-bit computer operating system, and the value of the selected register is stored in the memory;
  • the saved value of the selected register is obtained from the memory, and the value of the selected register is restored.
  • the saved values of the selected 4 registers are obtained from the memory, and the saved values of the selected registers are used to restore The value of the selected register ensures the accurate operation of the function and improves the accuracy of data processing.
  • the register preparation module is also used to:
  • the registers RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11 are selected from the 64-bit computer operating system, and the selected registers are used for data storage during data processing.
  • the RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R119 registers can be selected first, and the value of the 9 registers does not need to be restored after the calculation is completed, only the RBX, RBP, R12 4 registers selected from the, R13, R14, and R156 registers. Reduce data processing steps and improve data processing efficiency.
  • the above-mentioned device may also include other implementation manners according to the description of the method embodiment.
  • specific implementation manners reference may be made to the description of the corresponding method embodiments above, which will not be repeated here.
  • the embodiment of this specification also provides a computer data processing device, including: at least one 64-bit processor, a memory for storing 64-bit instructions executable by the processor, and at least 13 64-bit registers, and the processor executes the 64-bit
  • a computer data processing device including: at least one 64-bit processor, a memory for storing 64-bit instructions executable by the processor, and at least 13 64-bit registers, and the processor executes the 64-bit
  • the computer data processing method in the above embodiment is realized when the bit instruction is used, such as:
  • the split multiplier includes: a[3] , A[2], a[1], a[0], the multiplicand after the split includes: b[3], b[2], b[1], b[0];
  • the preset rules include:
  • the split multiplier and the split multiplicand are divided into seven data pairs, the first data pair includes: a[0]b[0], the second data pair includes: a[1 ]b[0], a[0]b[1], the third data pair includes: a[2]b[0], a[1]b[1], a[0]b[2], the first
  • the four sets of data pairs include: a[3]b[0], a[2]b[1], a[1]b[2], a[0]b[3], the fifth set of data pairs includes: a [3]b[1], a[2]b[2], a[1]b[3], the sixth group of data pairs includes: a[3]b[2], a[2]b[3] ,
  • the seventh set of data pairs includes: a[3]b[3];
  • the multiplication result of the first data pair of each data pair is accumulated with the remaining accumulation result of the previous data pair, and then accumulated with the multiplication result of the next data pair, until the multiplication of the data pair in the seventh group of data is obtained
  • the accumulation of the result is finished, and the accumulation result corresponding to the data pair in the seventh set of data is saved in the memory to obtain the target multiplication result.
  • processing device may also include other implementation manners according to the description of the method embodiment.
  • specific implementation manners reference may be made to the description of the related method embodiments, which will not be repeated here.
  • the computer data processing device or processing equipment provided in this manual can also be applied to a variety of data analysis and processing systems.
  • the system or device or processing equipment may include any computer data processing device in the foregoing embodiments.
  • the system or device or processing device may be a single server, or it may include a server cluster, system (including a distributed system), or a system (including a distributed system) that uses one or more of the methods or one or more of the embodiments of this specification.
  • Software application
  • actual operation device logic gate circuit device, quantum computer, etc., combined with the necessary hardware terminal device.
  • the detection system for checking difference data may include at least one processor and a memory storing computer-executable instructions.
  • the processor implements the steps of the method in any one or more of the foregoing embodiments when executing the instructions.
  • FIG. 6 is a hardware structural block diagram of a computer data processing server in an embodiment of this specification.
  • the server may be a computer data processing device or a computer data processing device in the foregoing embodiment.
  • the server 10 may include one or more (only one is shown in the figure) processor 100 (the processor 100 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA),
  • the memory 200 for storing data
  • the transmission module 300 for communication functions.
  • the server 10 may also include more or fewer components than those shown in FIG. 6, for example, may also include other processing hardware, such as a database or multi-level cache, GPU, or have a configuration different from that shown in FIG.
  • the memory 200 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the computer data processing method in the embodiments of this specification.
  • the processor 100 executes the software programs and modules stored in the memory 200 by running the software programs and modules. kind of functional applications and data processing.
  • the memory 200 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 200 may further include a memory remotely provided with respect to the processor 100, and these remote memories may be connected to a computer terminal through a network. Examples of the aforementioned networks include but are not limited to the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission module 300 is used to receive or send data via a network.
  • the foregoing specific examples of the network may include a wireless network provided by a communication provider of a computer terminal.
  • the transmission module 300 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission module 300 may be a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • the method or device described in the foregoing embodiment provided in this specification can implement business logic through a computer program and record it on a storage medium, and the storage medium can be read and executed by a computer to achieve the effects of the solution described in the embodiment of this specification.
  • the storage medium may include a physical device for storing information, and the information is usually digitized and then stored in an electric, magnetic, or optical medium.
  • the storage medium may include: devices that use electrical energy to store information, such as various types of memory, such as RAM, ROM, etc.; devices that use magnetic energy to store information, such as hard disks, floppy disks, magnetic tapes, magnetic core memory, bubble memory, U disk; a device that uses optical means to store information, such as CD or DVD.
  • devices that use electrical energy to store information such as various types of memory, such as RAM, ROM, etc.
  • devices that use magnetic energy to store information such as hard disks, floppy disks, magnetic tapes, magnetic core memory, bubble memory, U disk
  • a device that uses optical means to store information such as CD or DVD.
  • quantum memory graphene memory, etc.
  • the foregoing computer data processing method or device provided by the embodiments of this specification can be implemented in a computer by the processor executing the corresponding program instructions, such as using the c++ language of the windows operating system on the PC side, the linux system, or other such as using android , IOS system programming language is implemented in smart terminals, and processing logic based on quantum computers is implemented.
  • the device, computer storage medium, and system described above in the specification may also include other implementation manners according to the description of the related method embodiments.
  • specific implementation manners please refer to the description of the corresponding method embodiments, which will not be repeated here. .
  • a programmable logic device Programmable Logic Device, PLD
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller can be implemented in any suitable manner.
  • the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers.
  • controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as a part of the memory control logic.
  • controller in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application specific integrated circuits, programmable logic controllers and embedded The same function can be realized in the form of a microcontroller, etc. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for implementing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, and a tablet.
  • Computers, wearable devices, or any combination of these devices may be specifically implemented by computer chips or entities, or implemented by products with certain functions.
  • the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, and a tablet.
  • the functions are divided into various modules and described separately.
  • the function of each module can be realized in the same one or more software and/or hardware, or the module that realizes the same function can be realized by a combination of multiple sub-modules or sub-units, etc. .
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
  • the computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage, graphene storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • one or more embodiments of this specification can be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of this specification may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification may adopt a computer program implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • One or more embodiments of this specification may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • One or more embodiments of this specification can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Executing Machine-Instructions (AREA)
  • Complex Calculations (AREA)

Abstract

一种计算机数据处理方法及装置,实现了256比特乘法运算,将2个256位数据拆分成64位数,再对拆分后的数据按照预设规则进行计算。并且在整个计算过程中,逐个计算各组数据对的乘法结果,按组对各组数据对的乘法结果进行累加,各组数据对的乘法结果累加结束后,即将累加结果的低64位存储在内存中,释放对应的寄存器。即乘即累加,一组累加结束即释放低64位寄存器的运算方式,可以实现占用较少的寄存器,使得整个运算过程可以仅仅访问寄存器,不需要访问缓存和内存,提高了数据处理的效率,实现了256比特乘法的快速运算,为密码学、大整数运算等数据处理过程奠定了数据基础。

Description

一种计算机数据处理方法及装置 技术领域
本说明书属于计算机技术领域,尤其涉及一种计算机数据处理方法及装置。
背景技术
随着科技的进步,计算机技术在不断的发展,数据处理的复杂程度也在不断的增加,越来越多的需要进行大数运算数据处理,如:密码算法中通常需要使用256比特模乘。大数运算是指计算的数值非常大或者对运算的精度要求非常高的运算,由于编程语言提供的基本数值数据类型表示的数值范围有限,不能满足较大规模的高精度数值计算。这种大数运算通常与计算机的硬件相关联,如何实现比较高效、便捷的计算机数据的运算处理,是本领域亟需解决的技术问题。
发明内容
本说明书实施例的目的在于提供一种计算机数据处理方法及装置,实现256比特乘法的快速计算,提高了数据处理效率,为密码学、大整数运算等数据处理过程奠定了数据基础。
一方面本说明书实施例提供了一种计算机数据处理方法,用于实现2个256位数的乘法数据处理,所述方法包括:
将乘数a和被乘数b由高位到低位分别拆分成4个64位数,分别获得拆分后乘数和拆分后被乘数,所述拆分后乘数包括:a[3]、a[2]、a[1]、a[0],所述拆分后被乘数包括:b[3]、b[2]、b[1]、b[0];
将所述拆分后乘数和所述拆分后被乘数读取到寄存器中,并按照预设规则将所述拆分后乘数和所述拆分后被乘数进行相乘处理,获得目标数据的乘法处理结果;
其中,所述预设规则包括:
所述拆分后乘数和所述拆分后被乘数被分为七组数据对,第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据 对包括:a[3]b[3];
逐个计算所述第一组数据对到所述第七组数据对的乘法结果,对每组内的数据对的乘法结果进行组内累加,所述组内累加包括:在同一组数据对中,每计算一个数据对的乘法结果,将计算出的乘法结果与组内的上一个数据对的乘法结果进行组内累加,将同一组内的数据对的最终累加结果的低64位保存至内存,获得相应组数据对的剩余累加结果,并释放对应的寄存器;
各组数据对的第一个数据对的乘法结果与上一组数据对的剩余累加结果相累加,再与下一个数据对的乘法结果累加,直至获得第七组数据中数据对的乘法结果累加结束,将所述第七组数据中数据对对应的累加结果保存至内存中,获得所述目标数据的乘法处理结果。
进一步地,本说明书一个实施例中,所述方法还包括:
在对各组数据对的乘法结果进行累加时,释放存储各数据对的乘法结果的寄存器,并将累加结果保存在3个寄存器中。
进一步地,本说明书一个实施例中,所述方法应用于64位计算机操作系统中。
进一步地,本说明书一个实施例中,所述方法还包括:
从所述64位计算机操作系统的寄存器RBX、RBP、R12、R13、R14、R15中任意选取4寄存器,将选取出的寄存器的值保存到内存中;
在计算出所述目标乘法结果后,从内存中获取保存的所述选取出的寄存器的值,恢复所述选取出的寄存器的值。
进一步地,本说明书一个实施例中,所述方法还包括:
从所述64位计算机操作系统中选取寄存器RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R11,与所述选取出的寄存器均用于数据处理过程中的数据存储。
另一方面,本说明书提供了一种计算机数据处理装置,用于实现2个256位数的乘法数据处理,所述装置包括:
数据拆分模块,用于将乘数a和被乘数b由高位到低位分别拆分成4个64位数,分别获得拆分后乘数和拆分后被乘数,所述拆分后乘数包括:a[3]、a[2]、a[1]、a[0],所述拆分后被乘数包括:b[3]、b[2]、b[1]、b[0];
数据处理模块,用于将所述拆分后乘数和所述拆分后被乘数读取到寄存器中,并按 照预设规则将所述拆分后乘数和所述拆分后被乘数进行相乘处理,获得目标数据的乘法处理结果;
其中,所述预设规则包括:
所述拆分后乘数和所述拆分后被乘数被分为七组数据对,第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据对包括:a[3]b[3];
逐个计算所述第一组数据对到所述第七组数据对的乘法结果,对每组内的数据对的乘法结果进行组内累加,所述组内累加包括:在同一组数据对中,每计算一个数据对的乘法结果,将计算出的乘法结果与组内的上一个数据对的乘法结果进行组内累加,将同一组内的数据对的最终累加结果的低64位保存至内存,获得相应组数据对的剩余累加结果,并释放对应的寄存器;
各组数据对的第一个数据对的乘法结果与上一组数据对的剩余累加结果相累加,再与下一个数据对的乘法结果累加,直至获得第七组数据中数据对的乘法结果累加结束,将所述第七组数据中数据对对应的累加结果保存至内存中,获得所述目标数据的乘法处理结果。
进一步地,本说明书一个实施例中,所述数据处理模块具体用于:
在对各组数据对的乘法结果进行累加时,释放存储各数据对的乘法结果的寄存器,并将累加结果保存在3个寄存器中。
进一步地,本说明书一个实施例中,所述装置应用于64位计算机操作系统中。
进一步地,本说明书一个实施例中,所述装置还包括寄存器准备模块用于:
从所述64位计算机操作系统的寄存器RBX、RBP、R12、R13、R14、R15中任意选取4寄存器,将选取出的寄存器的值保存到内存中;
在计算出所述目标乘法结果后,从内存中获取保存的所述选取出的寄存器的值,恢复所述选取出的寄存器的值。
进一步地,本说明书一个实施例中,所述寄存器准备模块还用于:
从所述64位计算机操作系统中选取寄存器RAX、RCX、RDX、RSI、RDI、R8、 R9、R10、R11,与所述选取出的寄存器均用于数据处理过程中的数据存储。
再一方面,本说明书提供了一种计算机设备,包括:至少一个64位处理器、用于存储处理器可执行64位指令的存储器以及至少13个64位寄存器,所述处理器执行所述64位指令实现上述计算机数据处理方法。
本说明书提供的计算机数据处理方法、装置、处理设备,实现了256比特乘法运算,将2个256位数据拆分成64位数,再对拆分后的数据按照预设规则进行计算。并且在整个计算过程中,逐个计算各组数据对的乘法结果,按组对各组数据对的乘法结果进行累加,各组数据对的乘法结果累加结束后,即将累加结果的低64为存储在内存中,释放对应的寄存器。即乘即累加,一组累加结束即释放低64位寄存器的运算方式,可以实现占用较少的寄存器,使得整个运算过程可以仅仅访问寄存器,不需要访问缓存和内存,提高了数据处理的效率,实现了256比特乘法的快速运算,计算出的乘法处理结果为密码学、大整数运算等数据处理过程奠定了数据基础。
附图说明
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本说明书一个实施例中计算机数据处理方法的流程示意图;
图2是本说明书实施例中数据对分组示意图;
图3是本说明书一个实施例中256比特乘法运算的流程示意图;
图4是本说明书提供的计算机数据处理装置一个实施例的模块结构示意图;
图5是本说明书又一个实施例中计算机数据处理装置的结构示意图;
图6是本说明书一个实施例中计算机数据处理服务器的硬件结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描 述的实施例仅仅是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书保护的范围。
越来越多的应用场景需要使用大整数运算,可以利用计算机的数据处理能力,通过计算机编程等方式实现大整数运算。如:公钥密码算法广泛使用大整数乘法,在密码学算法中的椭圆曲线算法中,如国密SM2(国家密码管理局发布的椭圆曲线公钥密码算法),美国NIST 256r1 ECDSA(美国NIST公布的椭圆曲线数字签名算法),都使用256比特模乘。由于SM2和NIST 256r1使用特殊的伪梅森素数,对256位数的模可以用快速约简,所以计算256位模乘可以先求256位乘法,再用快速约简求模。这样256比特乘法约占国密SM2和NIST 256r1 ECDSA一半的计算量。256比特乘法还可作为实现模块用于构造更高比特的乘法和模乘,用于RSA2048、RSA4096等,RSA2048、RSA4096可以理解为两种公钥加密算法。
本说明书实施例中提供了一种计算机数据处理方法,可以实现256比特乘法,通过将用于256比特乘法运算的乘数和被乘数分别拆分为4个64位数后,再将拆分后的数据按照一定的规则进行运算。整个过程可以通过计算机程序代码实现,运算过程中,可以仅仅使用的计算机内部的寄存器,并且不需要限制使用哪些寄存器,也不需要访问缓存空间,提高了数据处理效率。
本说明书中计算机数据处理方法可以应用在客户端或服务器中,客户端可以是智能手机、平板电脑、智能可穿戴设备(智能手表、虚拟现实眼镜、虚拟现实头盔等)、智能车载设备等电子设备。
具体的,图1是本说明书一个实施例中计算机数据处理方法的流程示意图,如图1所示,本说明书一个实施例中提供的计算机数据处理方法可以包括:
步骤102、将乘数a和被乘数b由高位到低位拆分别分成4个64位数,分别获得拆分后乘数和拆分后被乘数,所述拆分后乘数包括:a[3]、a[2]、a[1]、a[0],所述拆分后被乘数包括:b[3]、b[2]、b[1]、b[0]。
在具体的实施过程中,本说明书实施例可以用于实现256比特乘法,即可以实现2个256位数据的大整数乘法运算。在数据处理之前,可以先对256比特乘法运算中的乘数a和被乘数b进行拆分,将乘数a和被乘数b分别拆分为4个64位数,获得拆分后乘数和拆分后被乘数。具体可以按照数据的位数进行划分,每隔64比特划分一次,具 体可以参考图3所示。其中,拆分后乘数从高位到低位按64位划分为:a[3]、a[2]、a[1]、a[0],拆分后被乘数从高位到低位按64位划分为:b[3]、b[2]、b[1]、b[0]。
步骤104、将所述拆分后乘数和所述拆分后被乘数读取到寄存器中,并按照预设规则将所述拆分后乘数和所述拆分后被乘数进行相乘处理,获得目标数据的乘法处理结果;
其中,所述预设规则包括:
所述拆分后乘数和所述拆分后被乘数被分为七组数据对,第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据对包括:a[3]b[3];
逐个计算所述第一组数据对到所述第七组数据对的乘法结果,对每组内的数据对的乘法结果进行组内累加,所述组内累加包括:在同一组数据对中,每计算一个数据对的乘法结果,将计算出的乘法结果与组内的上一个数据对的乘法结果进行组内累加,将各组数据对的最终累加结果的低64位保存至内存,获得各组数据对的剩余累加结果,并释放对应的寄存器;
各组数据对的第一个数据对的乘法结果与上一组数据对的剩余累加结果相累加,再与下一个数据对的乘法结果累加,直至获得第七组数据中数据对的乘法结果累加结束,将所述第七组数据中数据对对应的累加结果保存至内存中,获得所述目标数据的乘法处理结果。
在具体的实施过程中,乘数和被乘数拆分结束后,可以调用256位乘法函数如:64位乘法指令和64位加法指令,计算256位乘法。可以先从内存中将拆分好的数据读取到寄存器中,拆分后乘数和拆分后被乘数可以占用8个64位寄存器。再按照预设规则将拆分后乘数和拆分后被乘数进行相乘处理,获得最终的目标乘法结果,目标乘法结果可以保存在内存中。其中,具体进行乘法运算的预设规则可以包括:
图2是本说明书实施例中数据对分组示意图,如图2所示,本说明书一个实施例中,可以按照乘法运算规则,即乘数和被乘数逐位相乘,错位排列的规则,将排列好的数据按列进行分组,相乘的两个数据可以作为一个数据对。如图2所示,拆分后乘数和拆分后被乘数被划分为7组数据对,图2中从右到左虚线框分别为第一组数据对到第七组数据对。如图2所示,其中:第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、 a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据对包括:a[3]b[3]。即第一组数据对包括a[0]和b[0]一个数据对,第二组数据对包括a[1]和b[0]、a[0]和b[1]2个数据对,以此类推,此处不再赘述。
需要说明的是,对被拆分的数据进行分组可以理解为预先设置好各个数据对的相乘以及乘法结果的累加顺序,并不一定指需要先将数据进行分组,仅仅表示将哪些的数据的乘法结果进行累加。本说明书实施例为了方便描述,可以先将数据进行分组,位于同一组的数据对的乘法结果相累加,高位再与下一组数据对的乘法结果相累加。
数据对被划分好之后,可以依次逐个计算第一组数据对到第七组数据对的乘法结果,并按组对每组内的数据对的乘法结果进行累加,若组内只有一个数据对,则将该数据对的乘法结果作为该组的累加结果。如:计算第一组数据对的乘法结果,即先计算a[0]×b[0],获得128位的乘法结果,可以将乘法结果保存在寄存器中,该乘法结果可以占用2个64位寄存器。可以将乘法结果作为第一组数据对的累加结果,可以将累加结果中的低64位作为最终的目标乘法结果的低64位,从寄存器保存在内存中,释放保存该乘法结果低64位的寄存器。再计算第二组数据对的乘法结果,对第二组数据对的乘法结果进行累加,再计算第三组数据对的乘法结果,并对第三组数据对的乘法结果进行累加,以此类推,此处不再赘述。
需要注意的是,本说明书实施例在进行乘法结果的运算和数据累加时,每计算一个数据对的乘法结果,即将计算出的乘法结果与上一个数据对的乘法结果累加,再计算下一个数据对的乘法结果,并进行累加。并且,每一组数据对最终的累加结果的低64位数据可以从对应的寄存器保存到内存中,释放其对应的寄存器,获得该组数据对的剩余累加结果。而每一组数据对中计算的第一个数据对的乘法结果先与上一组数据对的剩余累加结果相累加,再计算下一个数据对的乘法结果,并将下一个数据对的乘法结果与累加后的累加结果相累加,直至计算到第七组数据对。将第七组数据对的累加结果保存至内存中,即可以获得2个256位数的目标乘法结果。
需要说明的是,各组数据对中可能包括多个数据对,在对各组数据对计算数据对的乘法结果时,可以根据实际需要选择各个数据对的计算的先后顺序,本说明书实施例不具体限定各组数据对内的数据对的计算顺序。
例如:先计算第一组数据对中a[0]和b[0]的乘法结果,获得a[0]×b[0]的乘法结果, 将计算获得的乘法结果的低64位保存至内存,并释放对应的寄存器。
再计算第二组数据对的乘法结果,如:可以先计算a[1]×b[0]的乘法结果,将计算获得乘法结果与a[0]×b[0]的乘法结果的高64位相加,获得累加结果;再计算a[0]×b[1]的乘法结果,将a[1]×b[0]的乘法结果与a[0]×b[0]的乘法结果的高64位相加的累加结果与a[0]×b[1]的乘法结果相加,获得第二组数据对的累加结果;将第二组数据对的累加结果的低64位保存至内存,获得第二组数据对的剩余累加结果,并释放保存第二组数据对的累加结果的寄存器。
再计算第三组数据对的乘法结果,如:可以先计算a[2]×b[0]的乘法结果,将a[2]×b[0]的乘法结果与第二组数据对的剩余累加结果相加,获得累加结果,再依次计算a[1]b[1]、a[0]b[2],并在计算出乘法结果后即与上一次的累加结果进行累加,具体可以参考第二组数据对的计算规则。按照类似的方法,依次计算第四组数据对到第七组数据对的乘法结果以及累加结果,获得第七组数据对a[3]×b[3]的乘法结果后,将a[3]×b[3]的乘法结果与第六组数据对的剩余累加结果相累加,并将累加结果保存在内存中,释放对应的寄存器。则256位数据a和b最终的目标数据的乘法处理结果均保存在内存中,计算结束。目标数据可以理解为数据a和b。
在本说明书一些实施例中,在对各组数据对进行运算时,各数据对的乘法结果在进行数据累加时,释放原本存储乘法结果的寄存器,并将累加结果保存在3个寄存器内,确保了整个计算过程占用的寄存器最少。
本说明书一些实施例中提供的计算机数据处理方法,可以运行在64位计算机操作系统中,如:x86CPU(Central Processing Unit/Processor,中央处理器)的64位运行环境。该64位计算机操作系统可以支持64位指令,如64位乘和64位加,与32位乘和加相比有更强的计算能力,同时64位的x86CPU可以提供更多的寄存器。本说明书实施例中的计算机数据处理方法的计算机程序也可以采用64位编译,利用64位操作系统进行256比特乘法的运算,可以提供更多的寄存器,使得在数据处理过程中,不需要访问缓存和内存,提高数据处理的效率,快速实现256比特乘法。
x86-64CPU在64位环境下通常情况下可以包括16个64位寄存器,如:RAX、RBX、RCX、RDX、RSI、RDI、RSP、RBP、R8、R9、R10、R11、R12、R13、R14、R15。本说明书一些实施例中,可以从64位计算机操作系统的以下6个寄存器中任意选取4寄存器:RBX、RBP、R12、R13、R14、R15,将选取出的寄存器的值保存到内存中。在计算出所述目标乘法结果后,即完成256比特乘法的运算后,从内存中获取保存的选 取出的4个寄存器的值,利用保存的选取出的寄存器的值,恢复选取出的寄存器的值,确保了函数的准确运行,提高数据处理的准确性。
本说明书实施例,利用x86-64位特性的256比特乘法实现方法和实现代码作为大数计算基本模块,计算出的256位数据的乘法处理结果,可以用于构建安全计算平台技术部MORSE平台(可以理解为数字货币平台)的椭圆曲线密码库和大数计算库,为密码学和大数运算奠定了理论基础。
在本说明书一些实施例中,可以从64位计算机操作系统中选取以下9个寄存器RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R11,与上述实施例中选取出的4个寄存器,用于256比特乘法运算过程中的数据存储。即本说明书实施例中,x86-64CPU在64位环境下的16个64寄存器除了RSP寄存器,其他15个64位寄存器均可以用于256比特乘法的计算,并且,本说明书实施例中,可以优先选择RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R119个寄存器,可以在计算结束后不必恢复该9个寄存器的值,只需要恢复从RBX、RBP、R12、R13、R14、R156个寄存器中选取出的4个寄存器。减少了数据处理步骤,提高数据处理效率。
本说明书实施例提供的计算机数据处理方法,实现了256比特乘法运算,将2个256位数据拆分成64位数,再对拆分后的数据按照预设规则进行计算。并且在整个计算过程中,逐个计算各组数据对的乘法结果,按组对各组数据对的乘法结果进行累加,各组数据对的乘法结果累加结束后,即将累加结果的低64为存储在内存中,释放对应的寄存器。即乘即累加,一组累加结束即释放低64位寄存器的运算方式,可以实现占用较少的寄存器,使得整个运算过程可以仅仅访问寄存器,不需要访问缓存和内存,提高了数据处理的效率,实现了256比特乘法的快速运算。
图3是本说明书一个实施例中256比特乘法运算的流程示意图,下面结合图3,介绍本说明书实施例中数据处理的过程:
1、256位乘法参数准备
如图3所示,将两个256位的乘数按64位划分,分别分为4个64位的数,保存在内存中。如图3所示,bit可以表示数据的位数,256bit可以表示256位数,两个256位乘数a、b,从高位到低位按64位划分为a[3]、a[2]、a[1]、a[0],b[3]、b[2]、b[1]、b[0]。
步骤第2、3、4步可以理解为256位乘法函数执行过程,调用256位乘法函数开始计算256位乘法:
2、寄存器准备
x86-64CPU在64位环境下通常可以有16个64位寄存器:RAX、RBX、RCX、RDX、RSI、RDI、RSP、RBP、R8、R9、R10、R11、R12、R13、R14、R15。
64位系统操作系统(如:Windows、Unix/Linux等)的调用约定规定了在一个函数中,哪些寄存器的值可以破坏,即可以直接用于计算,其数据不用恢复;哪些寄存器的值在函数返回时要保证必须是原值;哪些寄存器的值在函数中不能破坏。对于函数返回必须保持原值的寄存器,将其值在内存中保存,而后可供函数使用,在函数返回前要恢复这些寄存器的值。本说明书实施例中256比特乘法运算需要准备13个64位寄存器。
如Linux 64位环境下,函数可直接使用以下9个寄存器:RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R11;以下6个寄存器函数返回时要保证原值:RBX、RBP、R12、R13、R14、R15。其中,寄存器RSP的值是栈顶地址,本说明书实施例中,寄存器RSP不能在函数中使用。
本说明书实施例,可以从6个寄存器RBX、RBP、R12、R13、R14、R15中选4个保存到内存,加上RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R11,256位乘法函数可使用13个64位寄存器。
3、乘法计算过程:
可以将a[3]、a[2]、a[1]、a[0],b[3]、b[2]、b[1]、b[0]从内存读取到寄存器,占用8个64位寄存器。图3中“*”即本说明书实施例中的“×”,可以表示两个数相乘。
3.1、计算a[0]×b[0]得到128位乘法结果,从高位到低位分为2个64位数,存在2个64位寄存器中。其中,低64位可以作为最终乘法结果的一部分,从寄存器存到内存,此时,还占用1个寄存器。
3.2、计算a[1]×b[0]得到128位乘法结果,存在2个64位寄存器中,与a[0]×b[0]的高64位相加,累加结果占用3个64位寄存器。计算a[0]×b[1]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。累加结果的低64位可以作为最终乘法结果的一部分,从寄存器存到内存,此时,累加结果还占用2个寄存器。
3.3、计算a[2]×b[0]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。计算a[1]×b[1]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄 存器。计算a[0]×b[2]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。累加结果的低64位可以作为最终乘法结果的一部分,从寄存器存到内存,此时,累加结果还占用2个寄存器。
3.4、计算a[3]×b[0]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。计算a[2]×b[1]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。计算a[1]×b[2]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。计算a[0]×b[3]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。累加结果的低64位可以作为最终乘法结果的一部分,从寄存器存到内存,此时,累加结果还占用2个寄存器。
3.5、计算a[3]×b[1]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。计算a[2]×b[2]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。计算a[1]×b[3]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。累加结果低64位可以作为最终乘法结果的一部分,从寄存器存到内存,此时,累加结果还占用2个寄存器。
3.6、计算a[3]×b[2]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。计算a[2]×b[3]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用3个64位寄存器。累加结果的低64位可以作为最终乘法结果的一部分,从寄存器存到内存,此时,累加结果还占用2个寄存器。
需要说明的是,步骤3.2到步骤3.6中计算那两个数据的乘法结果的顺序可以根据实际需要进行调整,本说明书实施例不作具体限定。
3.7、a[3]×b[3]得到128位乘法结果,存在2个64位寄存器中,与已有累加结果相加,得到的加法结果占用2个64位寄存器。得到的128位累加结果都是最终结果,从寄存器存到内存,不再占用寄存器。
步骤3.1到步骤3.7的计算过程,可以参考图3所示,每一步骤可以对应上述实施例中一组数据对的计算,各组数据对的划分可以参考图2以及图3所示。如图3所示,下 一组数据的乘积可以与上一组数据累加结果的高位相加。
可以看出,除了乘数a、b占用8个64位寄存器外,乘法计算中占用的其他64位寄存器的数量为:3.1步占用2个;3.2、3.3、3.4、3.5、3.6步占用5个;3.7步占用3个。
综上,整个乘法计算过程中,两个乘数a,b需要8个64位空间,数据的乘和加最多需要5个64位临时空间。因此,整个乘法计算过程可以在13个64位寄存器中完成。
4、寄存器恢复
对于第2步寄存器准备中4个值在内存中保存的寄存器,计算结束后,可以从内存读取值,恢复这4个寄存器的值。
执行完第2、3、4步后256位乘法函数返回。图3中,s[0]-s[7]表示累加结果,其中s[0]可以表示第一组数据对即a[0]×b[0]的乘法结果的低64位数,s[1]可以表示第二组数据对的累加结果的低64位数,以此类推,s[6]和s[7]可以表示第七组数据对的乘法结果与第六组数据对的剩余累加结果累加后的累加结果。
本说明书实施例中的256位乘法的计算过程可以利用x86CPU的64位指令和64位寄存器实现,其中,x86CPU的64位环境支持的64位指令,具体可以使用下述64位指令:
(1)使用64位乘法指令MUL;或使用Intel CPU从2013的Haswell架构开始推出的MULX,还可以指定乘法结果使用的寄存器,如:指定其中5个寄存器作为乘法结果使用的寄存器,以减少mov指令(数据传送指令),提高数据处理效率。
(2)使用64位加法指令ADD,或带进位加法指令ADC。
通常情况下,x86CPU的64位环境有16个64位寄存器:RAX、RBX、RCX、RDX、RSI、RDI、RSP、RBP、R8、R9、R10、R11、R12、R13、R14、R15。
除了寄存器RSP用于栈指针,汇编编写的代码,将相应寄存器存入栈中保护,程序返回前恢复寄存器,则寄存器可用于计算。x86CPU的64位环境中,即除了寄存器RSP,其他15个64位寄存器可用于计算。
本说明书实施例中的256位乘法计算过程需要13个64位空间,即13个64位寄存器,而x86CPU的64位环境可以有15个64位寄存器供数据处理使用。因此,本说明书实施例提供的计算机数据处理方法可以在x86CPU的64位环境中在寄存器中完成计算,其中多余的2个寄存器可以作为备用寄存器。
本说明书实施例提供的计算机数据处理方法,实现了256比特乘法运算,并且可以在x86CPU的64位环境完成,占用较少的寄存器,不限制(规定)使用哪些64位寄存器,使用哪些64位乘法和加法指令,以及具体的代码实现。整个计算过程可以不访问缓存和内存,提高了数据处理效率,实现了256比特的快速乘法运算,为密码学、大整数运算等提供了数据基础。
本说明书中上述方法的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参与即可,每个实施例重点说明的都是与其他实施例的不同之处。相关之处参与方法实施例的部分说明即可。
基于上述所述的计算机数据处理方法,本说明书一个或多个实施例还提供一种计算机数据处理装置。所述的装置可以包括使用了本说明书实施例所述方法的系统(包括分布式系统)、软件(应用)、模块、组件、服务器、客户端等并结合必要的实施硬件的装置。基于同一创新构思,本说明书实施例提供的一个或多个实施例中的装置如下面的实施例所述。由于装置解决问题的实现方案与方法相似,因此本说明书实施例具体的装置的实施可以参与前述方法的实施,重复之处不再赘述。以下所使用的,术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
具体地,图4是本说明书提供的计算机数据处理装置一个实施例的模块结构示意图,如图4所示,本说明书中提供的计算机数据处理装置用于实现2个256位数的乘法数据处理,所述装置可以包括:数据拆分模块41、数据处理模块42,其中:
数据拆分模块41,可以用于将乘数a和被乘数b由高位到低位分别拆分成4个64位数,分别获得拆分后乘数和拆分后被乘数,所述拆分后乘数包括:a[3]、a[2]、a[1]、a[0],所述拆分后被乘数包括:b[3]、b[2]、b[1]、b[0];
数据处理模块42,可以用于将所述拆分后乘数和所述拆分后被乘数读取到寄存器中,并按照预设规则将所述拆分后乘数和所述拆分后被乘数进行相乘处理,获得目标数据的乘法处理结果;
其中,所述预设规则包括:
所述拆分后乘数和所述拆分后被乘数被分为七组数据对,第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、 a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据对包括:a[3]b[3];
逐个计算所述第一组数据对到所述第七组数据对的乘法结果,对每组内的数据对的乘法结果进行组内累加,所述组内累加包括:在同一组数据对中,每计算一个数据对的乘法结果,将计算出的乘法结果与组内的上一个数据对的乘法结果进行组内累加,将同一组内的数据对的最终累加结果的低64位保存至内存,获得相应组数据对的剩余累加结果,并释放对应的寄存器;
各组数据对的第一个数据对的乘法结果与上一组数据对的剩余累加结果相累加,再与下一个数据对的乘法结果累加,直至获得第七组数据中数据对的乘法结果累加结束,将所述第七组数据中数据对对应的累加结果保存至内存中,获得所述目标数据的乘法处理结果。
本说明书实施例提供的计算机数据处理装置,实现了256比特乘法运算,将2个256位数据拆分成64位数,再对拆分后的数据按照预设规则进行计算。并且在整个计算过程中,逐个计算各组数据对的乘法结果,按组对各组数据对的乘法结果进行累加,各组数据对的乘法结果累加结束后,即将累加结果的低64为存储在内存中,释放对应的寄存器。即乘即累加,一组累加结束即释放低64位寄存器的运算方式,可以实现占用较少的寄存器,使得整个运算过程可以仅仅访问寄存器,不需要访问缓存和内存,提高了数据处理的效率,实现了256比特乘法的快速运算。
在上述实施例的基础上,本说明书一些实施例中,所述数据处理模块具体用于:
在对各组数据对的乘法结果进行累加时,释放存储各数据对的乘法结果的寄存器,并将累加结果保存在3个寄存器中。
本说明书实施例,即乘即累加,一组累加结束即释放低64位寄存器的运算方式,确保了整个计算过程占用的寄存器最少。
在上述实施例的基础上,本说明书一些实施例中,所述装置应用于64位计算机操作系统中。
本说明书实施例利用64位操作系统进行256比特乘法的运算,可以提供更多的寄存器,使得在数据处理过程中,不需要访问缓存和内存,提高数据处理的效率,快速实现256比特乘法。
图5是本说明书又一个实施例中计算机数据处理装置的结构示意图,如图5所示,在上述实施例的基础上,本说明书一些实施例中,所述装置还包括所述装置还包括寄存器准备模块51用于:
从所述64位计算机操作系统的寄存器RBX、RBP、R12、R13、R14、R15中任意选取4寄存器,将选取出的寄存器的值保存到内存中;
在计算出所述目标乘法结果后,从内存中获取保存的所述选取出的寄存器的值,恢复所述选取出的寄存器的值。
本说明书实施例,在计算出所述目标乘法结果后,即完成256比特乘法的运算后,从内存中获取保存的选取出的4个寄存器的值,利用保存的选取出的寄存器的值,恢复选取出的寄存器的值,确保了函数的准确运行,提高数据处理的准确性。
在上述实施例的基础上,本说明书一些实施例中,所述寄存器准备模块还用于:
从所述64位计算机操作系统中选取寄存器RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R11,与所述选取出的寄存器均用于数据处理过程中的数据存储。
本说明书实施例可以优先选择RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R119个寄存器,可以在计算结束后不必恢复该9个寄存器的值,只需要恢复从RBX、RBP、R12、R13、R14、R156个寄存器中选取出的4个寄存器。减少了数据处理步骤,提高数据处理效率。
需要说明的,上述所述的装置根据方法实施例的描述还可以包括其他的实施方式。具体的实现方式可以参照上述对应的方法实施例的描述,在此不作一一赘述。
本说明书实施例还提供一种计算机数据处理设备,包括:至少一个64位处理器、用于存储处理器可执行64位指令的存储器以及至少13个64位寄存器,所述处理器执行所述64位指令时实现上述实施例中计算机数据处理方法,如:
将乘数a和被乘数b由高位到低位拆分成4个64位数,分别获得拆分后乘数和拆分后被乘数,所述拆分后乘数包括:a[3]、a[2]、a[1]、a[0],所述拆分后被乘数包括:b[3]、b[2]、b[1]、b[0];
将所述拆分后乘数和所述拆分后被乘数读取到寄存器中,并按照预设规则将所述拆分后乘数和所述拆分后被乘数进行相乘处理,获得目标乘法结果;
其中,所述预设规则包括:
所述拆分后乘数和所述拆分后被乘数被分为七组数据对,第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据对包括:a[3]b[3];
计算所述第一组数据对的乘法结果,将获得的所述第一组数据对的乘法结果的低64位保存至内存,并释放对应的寄存器;
逐个计算所述第二组数据对到所述第七组数据对的乘法结果,按组对数据对的乘法结果进行累加,每计算一个数据对的乘法结果,将计算出的乘法结果与上一个数据对的乘法结果进行累加,将各组数据对的最终累加结果的低64位保存至内存,获得各组数据对的剩余累加结果,并释放对应的寄存器;
其中,各组数据对的第一个数据对的乘法结果与上一组数据对的剩余累加结果相累加,再与下一个数据对的乘法结果累加,直至获得第七组数据中数据对的乘法结果累加结束,将所述第七组数据中数据对对应的累加结果保存至内存中,获得所述目标乘法结果。
需要说明的,上述所述的处理设备根据方法实施例的描述还可以包括其他的实施方式。具体的实现方式可以参照相关方法实施例的描述,在此不作一一赘述。
本说明书提供的计算机数据处理装置或处理设备,也可以应用在多种数据分析处理系统中。所述系统或装置或处理设备可以包括上述实施例中任意一个计算机数据处理装置。所述的系统或装置或处理设备可以为单独的服务器,也可以包括使用了本说明书的一个或多个所述方法或一个或多个实施例装置的服务器集群、系统(包括分布式系统)、软件(应用)、实际操作装置、逻辑门电路装置、量子计算机等并结合必要的实施硬件的终端装置。所述核对差异数据的检测系统可以包括至少一个处理器以及存储计算机可执行指令的存储器,所述处理器执行所述指令时实现上述任意一个或者多个实施例中所述方法的步骤。
本说明书实施例所提供的方法实施例可以在移动终端、计算机终端、服务器或者类似的运算装置中执行。以运行在服务器上为例,图6是本说明书一个实施例中计算机数据处理服务器的硬件结构框图,该服务器可以是上述实施例中的计算机数据处理装置或计算机数据处理设备。如图6所示,服务器10可以包括一个或多个(图中仅示出 一个)处理器100(处理器100可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器200、以及用于通信功能的传输模块300。本邻域普通技术人员可以理解,图6所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,服务器10还可包括比图6中所示更多或者更少的组件,例如还可以包括其他的处理硬件,如数据库或多级缓存、GPU,或者具有与图6所示不同的配置。
存储器200可用于存储应用软件的软件程序以及模块,如本说明书实施例中的计算机数据处理方法对应的程序指令/模块,处理器100通过运行存储在存储器200内的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器200可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器200可进一步包括相对于处理器100远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输模块300用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端的通信供应商提供的无线网络。在一个实例中,传输模块300包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输模块300可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书提供的上述实施例所述的方法或装置可以通过计算机程序实现业务逻辑并记录在存储介质上,所述的存储介质可以计算机读取并执行,实现本说明书实施例所描述方案的效果。
所述存储介质可以包括用于存储信息的物理装置,通常是将信息数字化后再以利用电、磁或者光学等方式的媒体加以存储。所述存储介质有可以包括:利用电能方式存储信息的装置如,各式存储器,如RAM、ROM等;利用磁能方式存储信息的装置如,硬盘、软盘、磁带、磁芯存储器、磁泡存储器、U盘;利用光学方式存储信息的装置如,CD或DVD。当然,还有其他方式的可读存储介质,例如量子存储器、石墨烯存储器等 等。
本说明书实施例提供的上述计算机数据处理方法或装置可以在计算机中由处理器执行相应的程序指令来实现,如使用windows操作系统的c++语言在PC端实现、linux系统实现,或其他例如使用android、iOS系统程序设计语言在智能终端实现,以及基于量子计算机的处理逻辑实现等。
需要说明的是说明书上述所述的装置、计算机存储介质、系统根据相关方法实施例的描述还可以包括其他的实施方式,具体的实现方式可以参照对应方法实施例的描述,在此不作一一赘述。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参与即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于硬件+程序类实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参与方法实施例的部分说明即可。
本说明书实施例并不局限于必须是符合行业通信标准、标准计算机数据处理和数据存储规则或本说明书一个或多个实施例所描述的情况。某些行业标准或者使用自定义方式或实施例描述的实施基础上略加修改后的实施方案也可以实现上述实施例相同、等同或相近、或变形后可预料的实施效果。应用这些修改或变形后的数据获取、存储、判断、处理方式等获取的实施例,仍然可以属于本说明书实施例的可选实施方案范围之内。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL 也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、车载人机交互设备、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
虽然本说明书一个或多个实施例提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的手段可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或终端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至为分布式数据处理环境)。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过 程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多限制的情况下,并不排除在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本说明书一个或多个时可以把各模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块或子单元的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本发明是参照根据本发明实施例的方法、装置(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储、石墨烯存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
本领域技术人员应明白,本说明书一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本说明书一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书一个或多个实施例可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本本说明书一个或多个实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参与即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参与方法实施例的部分说明即可。在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本说明书的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且, 描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
以上所述仅为本说明书一个或多个实施例的实施例而已,并不用于限制本说明书一个或多个实施例。对于本领域技术人员来说,本说明书一个或多个实施例可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在权利要求范围之内。

Claims (11)

  1. 一种计算机数据处理方法,用于实现2个256位数的乘法数据处理,所述方法包括:
    将乘数a和被乘数b由高位到低位分别拆分成4个64位数,分别获得拆分后乘数和拆分后被乘数,所述拆分后乘数包括:a[3]、a[2]、a[1]、a[0],所述拆分后被乘数包括:b[3]、b[2]、b[1]、b[0];
    将所述拆分后乘数和所述拆分后被乘数读取到寄存器中,并按照预设规则将所述拆分后乘数和所述拆分后被乘数进行相乘处理,获得目标数据的乘法处理结果;
    其中,所述预设规则包括:
    所述拆分后乘数和所述拆分后被乘数被分为七组数据对,第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据对包括:a[3]b[3];
    逐个计算所述第一组数据对到所述第七组数据对的乘法结果,对每组内的数据对的乘法结果进行组内累加,所述组内累加包括:在同一组数据对中,每计算一个数据对的乘法结果,将计算出的乘法结果与组内的上一个数据对的乘法结果进行组内累加,将同一组内的数据对的最终累加结果的低64位保存至内存,获得相应组数据对的剩余累加结果,并释放对应的寄存器;
    各组数据对的第一个数据对的乘法结果与上一组数据对的剩余累加结果相累加,再与下一个数据对的乘法结果累加,直至获得第七组数据中数据对的乘法结果累加结束,将所述第七组数据中数据对对应的累加结果保存至内存中,获得所述目标数据的乘法处理结果。
  2. 如权利要求1所述的方法,所述方法还包括:
    在对各组数据对的乘法结果进行累加时,释放存储各数据对的乘法结果的寄存器,并将累加结果保存在3个寄存器中。
  3. 如权利要求1所述的方法,所述方法应用于64位计算机操作系统中。
  4. 如权利要求3所述的方法,所述方法还包括:
    从所述64位计算机操作系统的寄存器RBX、RBP、R12、R13、R14、R15中任意选取4寄存器,将选取出的寄存器的值保存到内存中;
    在计算出所述目标乘法结果后,从内存中获取保存的所述选取出的寄存器的值,恢 复所述选取出的寄存器的值。
  5. 如权利要求4所述的方法,所述方法还包括:
    从所述64位计算机操作系统中选取寄存器RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R11,与所述选取出的寄存器均用于数据处理过程中的数据存储。
  6. 一种计算机数据处理装置,用于实现2个256位数的乘法数据处理,所述装置包括:
    数据拆分模块,用于将乘数a和被乘数b由高位到低位分别拆分成4个64位数,分别获得拆分后乘数和拆分后被乘数,所述拆分后乘数包括:a[3]、a[2]、a[1]、a[0],所述拆分后被乘数包括:b[3]、b[2]、b[1]、b[0];
    数据处理模块,用于将所述拆分后乘数和所述拆分后被乘数读取到寄存器中,并按照预设规则将所述拆分后乘数和所述拆分后被乘数进行相乘处理,获得目标数据的乘法处理结果;
    其中,所述预设规则包括:
    所述拆分后乘数和所述拆分后被乘数被分为七组数据对,第一组数据对包括:a[0]b[0],第二组数据对包括:a[1]b[0]、a[0]b[1],第三组数据对包括:a[2]b[0]、a[1]b[1]、a[0]b[2],第四组数据对包括:a[3]b[0]、a[2]b[1]、a[1]b[2]、a[0]b[3],第五组数据对包括:a[3]b[1]、a[2]b[2]、a[1]b[3],第六组数据对包括:a[3]b[2]、a[2]b[3],第七组数据对包括:a[3]b[3];
    逐个计算所述第一组数据对到所述第七组数据对的乘法结果,对每组内的数据对的乘法结果进行组内累加,所述组内累加包括:在同一组数据对中,每计算一个数据对的乘法结果,将计算出的乘法结果与组内的上一个数据对的乘法结果进行组内累加,将同一组内的数据对的最终累加结果的低64位保存至内存,获得相应组数据对的剩余累加结果,并释放对应的寄存器;
    各组数据对的第一个数据对的乘法结果与上一组数据对的剩余累加结果相累加,再与下一个数据对的乘法结果累加,直至获得第七组数据中数据对的乘法结果累加结束,将所述第七组数据中数据对对应的累加结果保存至内存中,获得所述目标数据的乘法处理结果。
  7. 如权利要求6所述的装置,所述数据处理模块具体用于:
    在对各组数据对的乘法结果进行累加时,释放存储各数据对的乘法结果的寄存器,并将累加结果保存在3个寄存器中。
  8. 如权利要求6所述的装置,所述装置应用于64位计算机操作系统中。
  9. 如权利要求8所述的装置,所述装置还包括寄存器准备模块用于:
    从所述64位计算机操作系统的寄存器RBX、RBP、R12、R13、R14、R15中任意选取4寄存器,将选取出的寄存器的值保存到内存中;
    在计算出所述目标乘法结果后,从内存中获取保存的所述选取出的寄存器的值,恢复所述选取出的寄存器的值。
  10. 如权利要求8所述的装置,所述寄存器准备模块还用于:
    从所述64位计算机操作系统中选取寄存器RAX、RCX、RDX、RSI、RDI、R8、R9、R10、R11,与所述选取出的寄存器均用于数据处理过程中的数据存储。
  11. 一种计算机设备,包括:至少一个64位处理器、用于存储处理器可执行64位指令的存储器以及至少13个64位寄存器,所述处理器执行所述64位指令时实现权利要求1-5任一项所述的方法。
PCT/CN2020/070620 2019-04-28 2020-01-07 一种计算机数据处理方法及装置 WO2020220743A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/779,073 US10782933B2 (en) 2019-04-28 2020-01-31 Computer data processing method and apparatus for large number operations

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910349581.2 2019-04-28
CN201910349581.2A CN110262773B (zh) 2019-04-28 2019-04-28 一种计算机数据处理方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/779,073 Continuation US10782933B2 (en) 2019-04-28 2020-01-31 Computer data processing method and apparatus for large number operations

Publications (1)

Publication Number Publication Date
WO2020220743A1 true WO2020220743A1 (zh) 2020-11-05

Family

ID=67913934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/070620 WO2020220743A1 (zh) 2019-04-28 2020-01-07 一种计算机数据处理方法及装置

Country Status (3)

Country Link
CN (1) CN110262773B (zh)
TW (1) TWI731543B (zh)
WO (1) WO2020220743A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262773B (zh) * 2019-04-28 2020-08-04 阿里巴巴集团控股有限公司 一种计算机数据处理方法及装置
US11579843B2 (en) * 2020-06-15 2023-02-14 Micron Technology, Inc. Bit string accumulation in multiple registers
CN113934678A (zh) * 2020-06-29 2022-01-14 中科寒武纪科技股份有限公司 一种计算装置、集成电路芯片、板卡、设备和计算方法
CN112765936B (zh) * 2020-12-31 2024-02-23 出门问问(武汉)信息科技有限公司 一种基于语言模型进行运算的训练方法及装置
US11996137B2 (en) * 2021-05-21 2024-05-28 Taiwan Semiconductor Manufacturing Company, Ltd. Compute in memory (CIM) memory array
CN114911832B (zh) * 2022-05-19 2023-06-23 芯跳科技(广州)有限公司 一种数据处理方法及装置
CN116126750B (zh) * 2023-02-24 2023-08-22 之江实验室 一种基于硬件特性的数据处理的方法及装置
CN116225366B (zh) * 2023-03-06 2024-04-05 开源网安物联网技术(武汉)有限公司 应用于嵌入式流水cpu内核的乘法指令扩展方法及装置
CN117896067B (zh) * 2024-03-13 2024-07-16 杭州金智塔科技有限公司 适用于国密sm2算法的并行模约减方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2037357A2 (en) * 2003-04-25 2009-03-18 Samsung Electronics Co., Ltd. Montgomery modular multiplier and method thereof using carry save addition
CN103942028A (zh) * 2014-04-15 2014-07-23 中国科学院数据与通信保护研究教育中心 应用在密码技术中的大整数乘法运算方法及装置
CN108351761A (zh) * 2015-11-12 2018-07-31 Arm有限公司 使用冗余表示的第一和第二操作数的乘法
CN110262773A (zh) * 2019-04-28 2019-09-20 阿里巴巴集团控股有限公司 一种计算机数据处理方法及装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW501344B (en) * 2001-03-06 2002-09-01 Nat Science Council Complex-valued multiplier-and-accumulator
CN1567178A (zh) * 2003-07-04 2005-01-19 中国科学院微电子中心 新型乘法器重构算法及电路
TW200605502A (en) * 2004-07-26 2006-02-01 Min-An Song Booth multiplier with fixed width
US8028015B2 (en) * 2007-08-10 2011-09-27 Inside Contactless S.A. Method and system for large number multiplication
US20090248769A1 (en) * 2008-03-26 2009-10-01 Teck-Kuen Chua Multiply and accumulate digital filter operations
CN101534125B (zh) * 2009-04-24 2012-07-18 北京空间机电研究所 一种超长数据变长编码合成系统
CN102750150B (zh) * 2012-06-14 2015-05-13 中国科学院软件研究所 基于x86架构的稠密矩阵乘法汇编代码自动生成方法
CN103631660A (zh) * 2013-09-23 2014-03-12 中国科学院数据与通信保护研究教育中心 在gpu中进行大整数计算时的存储资源分配方法及装置
CN104933689B (zh) * 2014-03-19 2018-01-16 炬芯(珠海)科技有限公司 一种图像信号处理方法及装置
CN108733347B (zh) * 2017-04-20 2021-01-29 杭州海康威视数字技术股份有限公司 一种数据处理方法及装置
CN109284083A (zh) * 2018-09-14 2019-01-29 北京探境科技有限公司 一种乘法运算装置及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2037357A2 (en) * 2003-04-25 2009-03-18 Samsung Electronics Co., Ltd. Montgomery modular multiplier and method thereof using carry save addition
CN103942028A (zh) * 2014-04-15 2014-07-23 中国科学院数据与通信保护研究教育中心 应用在密码技术中的大整数乘法运算方法及装置
CN108351761A (zh) * 2015-11-12 2018-07-31 Arm有限公司 使用冗余表示的第一和第二操作数的乘法
CN110262773A (zh) * 2019-04-28 2019-09-20 阿里巴巴集团控股有限公司 一种计算机数据处理方法及装置

Also Published As

Publication number Publication date
CN110262773B (zh) 2020-08-04
TW202040354A (zh) 2020-11-01
TWI731543B (zh) 2021-06-21
CN110262773A (zh) 2019-09-20

Similar Documents

Publication Publication Date Title
WO2020220743A1 (zh) 一种计算机数据处理方法及装置
JP6804668B2 (ja) ブロックデータ検証方法および装置
US8903882B2 (en) Method and data processing unit for calculating at least one multiply-sum of two carry-less multiplications of two input operands, data processing program and computer program product
TWI686712B (zh) 基於混淆電路的數據統計方法、裝置以及設備
CN103226461B (zh) 一种用于电路的蒙哥马利模乘方法及其电路
CN103838626A (zh) 一种处理串行任务的数据处理装置及方法
US10782933B2 (en) Computer data processing method and apparatus for large number operations
CN101021777A (zh) 基于除数(2n-1)的有效求模操作运算
CN116633520A (zh) 同态加密操作加速器以及同态加密操作加速器的操作方法
CN110019205A (zh) 一种数据存储、还原方法、装置及计算机设备
CN115834018A (zh) 一种保护隐私的多方数据处理方法、系统和设备
CN103765493B (zh) 数字平方计算机实现的方法和设备
WO2021253346A1 (zh) 数据传输计算方法,装置及存储介质
CN114584285B (zh) 安全多方处理方法及相关设备
CN109478199B (zh) 分段线性逼近的系统及方法
CN108710658A (zh) 一种数据记录的存储方法及装置
CN110046875B (zh) 一种siacoin挖矿算法的硬件实现方法及装置
CN114510217A (zh) 处理数据的方法、装置和设备
US8868634B2 (en) Method and apparatus for performing multiplication in a processor
WO2017037729A1 (en) Concurrent architecture of vedic multiplier-an accelerator scheme for high speed computing
CN106569778B (zh) 一种数据处理的方法及电子设备
CN117094268B (zh) 网格间数据传递方法、装置、存储介质及电子设备
Antao et al. Compact and flexible microcoded elliptic curve processor for reconfigurable devices
Yi et al. A compact and efficient architecture for elliptic curve cryptographic processor
Chang et al. Fixed-point computing element design for transcendental functions and primary operations in speech processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20798191

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20798191

Country of ref document: EP

Kind code of ref document: A1