CN113704174A

CN113704174A - Chip and data processing method

Info

Publication number: CN113704174A
Application number: CN202111242097.3A
Authority: CN
Inventors: 王雪强; 李艺
Original assignee: Huakong Tsingjiao Information Technology Beijing Co Ltd
Current assignee: Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2021-11-26

Abstract

The embodiment of the invention provides a chip and a data processing method, wherein the chip comprises an input control module, a composite computing module and at least two operation cores; the input control module is used for receiving and analyzing an input data frame from the host side and sending the operation type and the operation data obtained by analysis to the operation core; the operation core is used for executing modular operation on the received operation data according to the received operation type to obtain a core operation result, and the core operation result comprises a modular multiplication result or a modular exponentiation result; when the operation type received by the operation core is a composite type, the operation core is further used for sending the core operation result and the operation type to the composite computing module; and the composite calculation module is used for performing accumulation operation or accumulation operation on the received core operation result according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result. The embodiment of the invention can greatly improve the execution efficiency of the composite operator modular operation.

Description

Chip and data processing method

Technical Field

The invention relates to the technical field of computers, in particular to a chip and a data processing method.

Background

In the field of privacy computing, modular exponentiation and modular multiplication operations are the basic operations to implement cryptographic protocols. Composite operations based on the basic operation, such as a sum of modular multiplications (cumulative summation of a plurality of modular multiplicative calculation results) and a product of modular exponentiations (cumulative multiplication of a plurality of modular exponentiation calculation results), are important operations in the context of private computing. For example, updating the gradient of a machine learning algorithm in federal learning requires a large number of modular sum operations and a large number of product operations of modular exponentiations.

At present, the composite operation is realized by writing a software program, however, the modular multiplication operation and the modular exponentiation operation are usually large integer (such as 1024bit/2048bit/4096 bit) operations, and the calculation overhead is large. The two basic operations are then subjected to complex operations such as modular multiplication and modular exponentiation, which further increases the load of a Central Processing Unit (CPU) and reduces the overall performance of the system.

Disclosure of Invention

Embodiments of the present invention provide a chip, a data processing method, and a machine-readable medium, which can improve the execution efficiency of basic computation submodular operations, greatly improve the execution efficiency of composite computation submodular operations, reduce CPU loads, and improve the performance of a computing system.

In order to solve the above problem, an embodiment of the present invention discloses a chip, which includes an input control module, a composite computation module, and at least two computation cores, wherein,

the input control module is used for receiving and analyzing an input data frame from the host side and sending the operation type and the operation data obtained by analysis to the operation core; the input data frame comprises operation types and operation data, the operation types comprise single types or composite types, the single types comprise modular multiplication or modular exponentiation, and the composite types comprise modular multiplication or modular exponentiation products;

the operation core is used for executing modular operation on the received operation data according to the received operation type to obtain a core operation result, and the core operation result comprises a modular multiplication result or a modular exponentiation result; when the operation type received by the operation core is a composite type, the operation core is further used for sending the core operation result and the operation type to the composite computing module;

and the composite calculation module is used for performing accumulation operation or accumulation operation on the received core operation result according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result.

Optionally, the complex computation module includes a multiplexer, an accumulation unit, and an accumulation unit; wherein the content of the first and second substances,

the multiplexer is configured to receive a modular multiplication result output by the operation core when the operation type is a modular multiplication sum, and send the received modular multiplication result to the accumulation unit; or, the arithmetic core is configured to receive a modular exponentiation result output by the arithmetic core when the operation type is a product of modular exponentiation, and send the received modular exponentiation result to the multiplication unit;

the accumulation unit is used for receiving the modular multiplication result transmitted by the multiplexer and executing accumulation operation on the received modular multiplication result to obtain a modular multiplication sum result;

and the multiplication unit is used for receiving the modular exponentiation result transmitted by the multiplexer and executing multiplication operation on the received modular exponentiation result to obtain a product result of the modular exponentiation.

Optionally, the operation core is further configured to send a handshake request signal to the composite computation module when it receives that the operation type is a composite type and the execution of the modular operation of the operation core is completed, and send a core operation result obtained by the computation to the composite computation module when it receives a handshake response signal returned by the composite computation module.

Optionally, the operation type further includes outputting a modular multiplication sum result or outputting a product result of a modular exponentiation, and the composite calculation module is further configured to output the calculated modular multiplication sum result or the product result of a modular exponentiation when the operation type outputting the modular multiplication sum result or the product result of a modular exponentiation is received.

Optionally, the chip further includes an arbitration circuit, configured to select one of the operation cores in the valid state at the current time according to an arbitration algorithm to output, where the operation core in the valid state refers to the operation core that has sent the handshake request signal.

Optionally, the composite computing module further includes a state machine, configured to poll the states of the operation cores, and start the arbitration circuit when it is determined that there is an operation core in a valid state.

Optionally, the chip further includes a first output control module and a second output control module; wherein the content of the first and second substances,

the first output control module is configured to receive a core operation result output by the operation core, and output the received core operation result to the second output control module;

the second output control module is configured to transmit the core operation result output by the first output control module to the host side, or transmit the modular multiplication sum result or the product result of the modular exponentiation output by the composite computation module to the host side.

Optionally, data is exchanged between the operation core and the first output control module, between the operation core and the composite computation module, between the composite computation module and the second output control module, and between the first output control module and the second output control module through handshake signals.

Optionally, the operation core is further configured to send a handshake request signal to the first output control module when it receives that the operation type is a single type and the execution of the modulo operation of the operation core is completed, and output a core operation result obtained by calculation to the first output control module when it receives a handshake response signal returned by the first output control module.

Optionally, the composite computing module is further configured to send a handshake request signal to the second output control module when the accumulation operation or the accumulation operation is completed, and output the calculated modular multiplication sum result or the calculated modular exponentiation product result to the second output control module when a handshake response signal returned by the second output control module is received.

Optionally, the chip comprises a field programmable gate array FPGA chip or an application specific integrated circuit ASIC chip.

On the other hand, the embodiment of the invention discloses a data processing method, which is applied to a chip, wherein the chip comprises an input control module, a composite calculation module and at least two operation cores, and the method comprises the following steps:

receiving and analyzing an input data frame from a host side through the input control module, and sending an operation type and operation data obtained through analysis to an operation core; the input data frame comprises operation types and operation data, the operation types comprise single types or composite types, the single types comprise modular multiplication or modular exponentiation, and the composite types comprise modular multiplication or modular exponentiation products;

performing modular operation on the received operation data through the operation core according to the received operation type to obtain a core operation result, wherein the core operation result comprises a modular multiplication result or a modular exponentiation result; when the operation type received by the operation core is a composite type, the operation core also sends the core operation result and the operation type to the composite computation module;

and performing accumulation operation or multiplication operation on the received core operation result through the composite calculation module according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result.

Optionally, the complex computation module includes a multiplexer, an accumulation unit, and the method further includes:

receiving a modular multiplication result output by the operation core when the operation type is the modular multiplication sum through the multiplexer, and sending the received modular multiplication result to the accumulation unit; or, receiving a modular exponentiation result output by the operation core when the operation type is a product of modular exponentiation through a multiplexer, and sending the received modular exponentiation result to the multiplication unit;

receiving the modular multiplication result transmitted by the multiplexer through the accumulation unit, and performing accumulation operation on the received modular multiplication result to obtain a modular multiplication sum result;

and receiving the modular exponentiation result transmitted by the multiplexer through the multiplicative unit, and performing multiplicative operation on the received modular exponentiation result to obtain a product result of the modular exponentiation.

Optionally, the method further comprises:

and sending a handshake request signal to the composite computing module when the operation core receives that the operation type is the composite type and the modular operation execution of the operation core is completed, and sending a core operation result obtained by calculation to the composite computing module when a handshake response signal returned by the composite computing module is received.

Optionally, the operation type further includes outputting a result of a modular multiplication sum or outputting a result of a product of modular exponentiation, and the method further includes:

and outputting the modular multiplication sum result or the product result of the modular exponentiation obtained by calculation when the composite calculation module receives the operation type of outputting the modular multiplication sum result or the product result of the modular exponentiation.

Optionally, the chip further comprises an arbitration circuit, and the method further comprises:

and selecting one of the operation cores in the effective state at the current moment according to an arbitration algorithm through the arbitration circuit for outputting, wherein the operation core in the effective state refers to the operation core which sends a handshake request signal.

Optionally, the composite computing module further comprises a state machine, and the method further comprises:

and polling the state of each operation core through the state machine, and starting an arbitration circuit when the operation core with the effective state is determined to exist.

Optionally, the chip further includes a first output control module and a second output control module, and the method further includes:

receiving, by the first output control module, a core operation result output by the operation core, and outputting the received core operation result to the second output control module;

and transmitting the core operation result output by the first output control module to the host side through the second output control module, or transmitting a modular multiplication sum result or a modular exponentiation product result output by the composite calculation module to the host side.

Optionally, the method further comprises:

and when receiving a handshake response signal returned by the first output control module, the arithmetic core outputs a core operation result obtained by calculation to the first output control module.

Optionally, the method further comprises:

and when the composite calculation module finishes the execution of the accumulation operation or the accumulation operation, sending a handshake request signal to the second output control module, and when receiving a handshake response signal returned by the second output control module, outputting a modular multiplication sum result or a modular exponentiation sum result obtained by calculation to the second output control module.

In yet another aspect, an embodiment of the present invention discloses a machine-readable medium having stored thereon instructions, which, when executed by one or more processors of an apparatus, cause the apparatus to perform a data processing method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

the embodiment of the invention provides a chip which comprises an input control module, a composite calculation module and at least two operation cores, wherein each operation core can independently realize modular multiplication operation or modular exponentiation operation. The input control module can receive and analyze an input data frame from the host side, and send the operation type and the operation data obtained through analysis to the operation core to execute modular multiplication or modular exponentiation. The operation core executes modular operation on the received operation data according to the received operation type to obtain a core operation result; and when the operation type received by the operation core is a composite type, the operation core is also used for sending the core operation result and the operation type to the composite computing module. And the composite calculation module performs accumulation operation or multiplication operation on the received core operation result according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result. The chip of the embodiment of the invention can realize large-scale parallel execution of the modular operation (such as modular multiplication or modular exponentiation) of the basic operator and high-speed execution of the modular operation (such as modular multiplication sum or product of modular exponentiation) of the composite operator with larger operand by the multi-core array comprising at least two operation cores and the composite computation module, and the modular operation with larger operand is carried out in hardware (chip), so that the execution efficiency of the basic operation sub-modular operation can be improved, the execution efficiency of the modular operation of the composite operator can be greatly improved, the CPU load can be further reduced, the performance of a computing system can be improved, and a computing scene with higher requirements can be met.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a block diagram of a chip 100 according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an input data frame according to the present invention;

FIG. 3 is a schematic diagram of a chip 100 and host side 200 of the present invention communicating over a PCIe bus;

FIG. 4 is a block diagram of another embodiment of a chip 100 according to the present invention;

FIG. 5 is a block diagram of an embodiment of a chip 100 according to the invention;

FIG. 6 is a flow chart of the steps of an embodiment of a data processing method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms first, second and the like in the description and in the claims of the present invention are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the invention may be practiced other than those illustrated or described herein, and that the objects identified as "first," "second," etc. are generally a class of objects and do not limit the number of objects, e.g., a first object may be one or more. Furthermore, the term "and/or" in the specification and claims is used to describe an association relationship of associated objects, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The term "plurality" in the embodiments of the present invention means two or more, and other terms are similar thereto.

Referring to fig. 1, a block diagram of an embodiment of a chip 100 according to the present invention is shown, where the chip 100 includes an input control module 101, a complex computation module 102, and at least two operation cores 103, where,

the input control module 101 is configured to receive and analyze an input data frame from a host side, and send an operation type and operation data obtained through analysis to an operation core; the input data frame comprises operation types and operation data, the operation types comprise single types or composite types, the single types comprise modular multiplication or modular exponentiation, and the composite types comprise modular multiplication or modular exponentiation products;

the operation core 103 is configured to perform a modular operation on the received operation data according to the received operation type to obtain a core operation result, where the core operation result includes a modular multiplication result or a modular exponentiation result; when the operation type received by the operation core is a composite type, the operation core is further used for sending the core operation result and the operation type to the composite computing module;

the complex computation module 102 is configured to perform an accumulation operation or an accumulation operation on the received core operation result according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result.

In the application of privacy calculation based on cryptography, a large number of modular operations, such as modular multiplication, modular exponentiation, sum of modular multiplication, product of modular exponentiation and the like, are required to be performed on a data center or a server, and the modular operations are basic operations in the privacy calculation.

The modulo arithmetic is large in calculation amount and generally needs to consume a lot of time, and in order to improve the efficiency of the privacy calculation, the Montgomery algorithm can be adopted to accelerate the modulo arithmetic. However, when the amount of operation data is large, even if the montgomery algorithm is used for acceleration, the performance of the privacy computing system in an actual application scenario is still difficult to meet the requirement of privacy computing.

In order to solve the problem, the invention provides a chip for realizing a modular arithmetic instruction, and the chip can be a multi-core heterogeneous chip. The multi-core means that the chip comprises at least two operation cores, and each operation core can independently realize modular multiplication operation or modular exponentiation operation. Heterogeneous refers to a chip having a massively parallel processing structure, which is different from the traditional computer/server features, compared to the traditional CPU-memory-hard disk architecture of the computer/server host.

Further, the chip may include, but is not limited to, an FPGA (Field Programmable Gate Array) chip or an ASIC (Application Specific Integrated Circuit) chip.

In a specific implementation, the number of the operation cores included in each chip is different according to the size of the chip. Such as an FPGA, can accommodate hundreds of computational cores; and the customized ASIC chip can reach thousands of operation cores.

In the embodiment of the present invention, the input control module 101 in the chip 100 may be configured to receive and store a modulo operation instruction from the host side. The embodiment of the invention sets a uniform data frame format for the modular arithmetic instruction. On the host side, an application layer calls a background software program to generate a modular operation instruction in an input data frame format, a background scheduling algorithm is started to realize data transmission of a communication process, and a driving layer communicates with a chip through a Peripheral Component Interconnect Express (PCIe) bus.

The input control module 101 receives an input data frame from the host side, and parses each received input data frame to obtain an operation type and operation data therein. And the input data frame is a data frame for uniformly encapsulating the modular arithmetic command at the host side. The modular arithmetic instruction may comprise a modular arithmetic instruction of a base operator or a modular arithmetic instruction of a compound operator. The modular operation instruction of the basic operator comprises a modular multiplication operation instruction or a modular exponentiation operation instruction. The modular arithmetic instruction of the composite operator comprises a modular multiplication sum operation instruction or a modular exponentiation product operation instruction.

Referring to fig. 2, a schematic diagram of a structure of an input data frame of the present invention is shown. As shown in fig. 2, one input Data frame may include a header (Head) and a Data body (Data). The header may include information such as an operation code (OpCode), a data bit width of an operation (PBitCount), a data bit width of a power exponent (EbitCount), and the like. Wherein different operation codes are used to represent different operation types. The Data body (Data) contains operation Data participating in modular arithmetic instructions, such as d 1-dn.

The host side may send an input data frame to the chip 100 through the high-speed transmission bus, where the input data frame includes an operation type corresponding to the modulo operation instruction and operation data required to execute the modulo operation instruction. The host side also receives the calculation result returned by the modulo operation instruction executed by the chip 100 through the high-speed transmission bus. The high speed transport bus may be a PCIe bus.

Referring to fig. 3, a schematic diagram of a chip 100 and host side 200 of the present invention communicating over a PCIe bus is shown. As shown in fig. 3, the chip 100 receives and stores an input data frame from the host side through the input control module 101, and parses each received input data frame to obtain an operation type and operation data therein. The input control module 101 sends the operation type and the operation data obtained by the analysis to the operation core. It should be noted that the input control module 101 sends the operation type and the operation data obtained by the parsing to the operation core participating in executing the current modulo operation instruction. For example, 500 operation cores are provided in the chip 100, and when the modulo operation instruction needs to use 300 operation cores, the input control module 101 may send the operation type and the operation data obtained by the parsing to the 300 operation cores. The operation cores participating in executing the current modular operation instruction can be determined by a scheduling algorithm. The operation core in the embodiment of the invention refers to an operation core which participates in executing the current modular operation instruction.

If the current modular operation instruction is used to execute batch modular operations and includes n pieces of operation data, the input control module 101 may generate a corresponding control signal based on a multi-core scheduling algorithm, and allocate the received n pieces of operation data to n operation cores to perform parallel modular operations.

Referring to table 1, an example of a correspondence relationship between operation codes and operation types in an input data frame according to an embodiment of the present invention is shown.

TABLE 1

Operation code OpCode	Type of operation
		0001	Modular ride
0010	Modular exponentiation
		0011	Sum of modular multiplication
0100	Product of modular exponentiation
		0101	Outputting the modulo-multiplied sum result
0110	Outputting the product result of the modular exponentiation

In the embodiment of the present invention, the operation type may include a single type or a composite type, the single type may include, but is not limited to, a modular multiplication (0001) or a modular exponentiation (0010), and the composite type may include, but is not limited to, a sum of modular multiplications (0011) or a product of modular exponentiations (0100).

The chip 100 provided by the embodiment of the present invention may be a multi-core heterogeneous chip, and the chip 100 at least includes an input control module, a composite computation module, and at least two computation cores. The input control module is used for receiving and analyzing an input data frame from the host side, and sending the operation type and the operation data obtained through analysis to an operation core participating in executing the current modular operation instruction. Each arithmetic core performs a modular operation on the received operation data according to the received operation type, for example, if the operation type received by the arithmetic core is modular multiplication, the arithmetic core performs the modular multiplication operation on the received operation data; if the operation type received by the operation core is modular exponentiation, the operation core performs modular exponentiation on the received operation data. And executing respective modular operation by the plurality of operation cores in parallel to obtain respective core operation results, wherein the core operation results comprise modular multiplication results or modular exponentiation results.

Further, at least two operation cores in the chip 100 may form a multi-core array, and each operation core in the array can independently implement a modular multiplication operation:

and independently implementing modular exponentiation:

. The host side may simultaneously send a plurality of modulo operation instructions to the chip 100 in batch to execute the plurality of modulo operation instructions in batch through the chip 100.

If the input data frame from the host side is analyzed, and the operation type of the current modular operation instruction is a single type, the plurality of operation cores can obtain respective core operation results after the plurality of operation cores execute the modular operation of the operation cores. For example, assuming that the chip 100 receives n modular arithmetic instructions and the operation type is modular multiplication, n arithmetic cores may be allocated to execute n modular multiplication operations in parallel:

n arithmetic cores may obtain n modular multiplication results. For another example, assuming that the chip 100 receives n modular exponentiations and the operation type is modular exponentiation, n operation cores may be allocated to perform n modular exponentiations in parallel:

n arithmetic cores may yield n modular exponentiation results. When the operation type of the current modular operation instruction is a single type, after the plurality of operation cores execute the modular operation of the operation cores, the respective operation core results can be directly output to the output control module, so that the core operation results are output to the host side through the output control module.

If the input data frame from the host side is analyzed, the operation type of the current modular operation instruction is obtained to be a composite type, and after the plurality of operation cores execute the modular operation of the operation cores, the operation cores need to send the operation results of the operation cores and the operation types received by the operation cores to the composite computing module 102, so that the composite computing module 102 continues to execute the computation.

The complex computation module 102 performs an accumulation operation or a multiplication operation on the received core operation result according to the received operation type, so as to obtain a modular multiplication sum result or a modular exponentiation product result.

In one example, assuming that the operation type of the current modulo operation instruction is complex and is a sum of modulo multiplications, the operand includes a_i、b_iAnd p, i takes the value of [1, n]. The individual results of n modular multiplications are first computed by the multi-core array:

and then the composite calculation module 102 accumulates the individual results of the n modular multiplications to realize the sum of the modular multiplications:

and obtaining a final modular multiplication sum result S. In another example, assuming that the operation type of the current modular arithmetic instruction is complex and is the product of modular exponentiations, the operand includes a_i、e_iAnd p, i takes the value of [1, n]. The individual results of n modular exponentiations are first computed by a multi-core array:

then, the composite computing module 102 multiplies the individual results of the n modular exponentiations to realize the product of the modular exponentiations:

to obtain the final product result x of the modular exponentiation.

The composite calculation module 102 outputs the calculated modular multiplication sum result or the product result of the modular exponentiation to the output control module, so as to output the modular multiplication sum result or the product result of the modular exponentiation to the host side through the output control module.

Thus, the chip 100 provided by the embodiment of the present invention can accelerate the execution of the modular operation (such as modular multiplication or modular exponentiation) of the basic operator and accelerate the execution of the modular operation (such as modular multiplication sum or modular exponentiation product) of the compound operator. By the embodiment of the invention, the modular operation with larger operation amount can be carried out in the hardware (chip) by utilizing the characteristic of high operation speed of the hardware (chip), so that the semi-homomorphic encryption privacy calculation can be efficiently realized on the hardware, and the efficiency of the semi-homomorphic encryption privacy calculation is greatly improved. Particularly, for the modular operation of the composite operators such as the modular multiplication sum and the modular exponentiation product which have high requirements on the operation environment, the speed of executing the modular multiplication sum operation and the modular exponentiation product operation can be greatly improved by executing the modular operation in the hardware environment of the multi-core heterogeneous chip. And furthermore, in a federal learning scene, the gradient updating speed of the machine learning algorithm can be increased.

In addition, the chip provided by the embodiment of the invention can be a multi-core heterogeneous chip, and through a multi-core array and a composite computation module, large-scale parallel execution of modular operation (such as modular multiplication or modular exponentiation) of a basic operator and high-speed execution of modular operation (such as modular multiplication sum or modular exponentiation product) of a composite operator with larger operand can be realized, so that the execution efficiency of the basic computation sub-modular operation is improved, the execution efficiency of the composite operator modular operation can be greatly improved, the CPU load can be further reduced, the performance of a computation system is improved, and a computation scene with higher requirements is met.

Further, the composite calculation module may include a multiplexer, an accumulation unit, and an accumulation unit; wherein the content of the first and second substances,

In an embodiment of the present invention, the complex calculation module may include a multiplexer, an accumulation unit, and an accumulation unit. Further, the complex computation module may further include an arbitration circuit to determine which operation core output core operation result (modular multiplication result or modular exponentiation result) is selected to be received by the multiplexer at a certain time.

If the operation type of the current modular operation instruction is a composite type, the composite calculation module can accumulate modular multiplication results output by a plurality of operation cores in an accumulation unit in the module, then output a modular multiplication sum result obtained by accumulation and output the modular multiplication sum result to the output control module; if the operation type of the current modular arithmetic instruction is a composite type, the composite calculation module can accumulate modular exponentiation results output by a plurality of arithmetic cores in an accumulation unit in the module, and then output a product result of the modular exponentiation obtained by accumulation to the output control module.

In an optional embodiment of the present invention, the chip may further include a first output control module and a second output control module; wherein the content of the first and second substances,

In the embodiment of the present invention, the output control module in the chip 100 may include a first output control module and a second output control module. Referring to fig. 4, a block diagram of another embodiment of a chip 100 of the present invention is shown. As shown in fig. 4, the chip 100 includes an input control module 101, a composite computing module 102, and at least two operation cores 103, where the at least two operation cores 103 form a multi-core array, and the chip 100 further includes a first output control module 104 and a second output control module 105.

The first output control module 104 is configured to receive the core operation result output by each operation core when the operation type is the single type, and control to output the received core operation result of each operation core to the second output control module.

And a second output control module 105, configured to receive the core operation result of each operation core output by the first output control module when the operation type is a single type, and control to transmit the received core operation result (such as a modular multiplication result or a modular exponentiation result) of each operation core to the host side. The second output control module is also used for receiving the modular multiplication sum result or the modular exponentiation product result output by the composite calculation module and controlling the received modular multiplication sum result or the received modular exponentiation product result to be transmitted to the host side when the operation type is the composite type.

In an alternative embodiment of the invention, the operation type may further include outputting a result of a modular multiplication sum or outputting a result of a product of modular exponentiation. The composite calculation module is further used for outputting the modular multiplication sum result obtained by calculation when the received operation type is the output modular multiplication sum result; or, the composite calculation module is further configured to output the product result of the modular exponentiation obtained by calculation when the received operation type is the product result of the output modular exponentiation.

As shown in Table 1, the opcode 0101 indicates that the operation type is the sum of the output modulo multiplication. The modular operation instruction containing the operation code is used for triggering the composite calculation module to output a modular multiplication sum result obtained by calculation. When receiving the modular operation instruction containing the operation code 0101, the composite calculation module outputs the modular multiplication sum result to the second output control module. The opcode 0110 indicates that the operation type is the product of the output modular exponentiations. The modular operation instruction containing the operation code is used for triggering the composite calculation module to output the product result of the modular exponentiation obtained by calculation. When the complex computation module receives the modular operation instruction containing the operation code 0110, the complex computation module outputs the product result of the modular exponentiation to the second output control module.

In an optional embodiment of the present invention, data may be exchanged between the operation core and the first output control module, between the operation core and the composite computation module, between the composite computation module and the second output control module, and between the first output control module and the second output control module through handshake signals.

In an embodiment of the present invention, a handshake signal group may be implemented to include a valid signal, a ready signal, and a data signal. The valid signal is a handshake request signal, the ready signal is a handshake response signal, and the data signal is a data transmission signal. For a certain arithmetic core, the data signal sent by the arithmetic core can carry a modular multiplication result or a modular exponentiation result. For the composite calculation module, the data signal sent by the composite calculation module can carry a modular multiplication sum result or a modular exponentiation product result. In the embodiment of the invention, the modules can exchange data through handshake signals to ensure the orderliness and accuracy of data exchange.

In one example, when an operation core (assumed to be core 1) completes its modulo operation and obtains its core operation result, it may send a valid signal. Taking the operation type of the current modulo operation instruction as a single type as an example, the core1 may send a valid signal to the first output control module, and when the core1 receives a ready signal returned by the first output control module, it indicates that the first output control module is ready to receive data, and at this time, the core1 may send a data signal to the first output control module, where the data signal carries a core operation result of the core 1. Taking the operation type of the current modular arithmetic instruction as a composite type as an example, the core1 may send a valid signal to the composite computing module, and when the core1 receives a ready signal returned by the composite computing module, it indicates that the composite computing module is ready to receive data, and at this time, the core1 may send a data signal to the composite computing module, where the data signal carries a core arithmetic result of the core 1.

In an optional embodiment of the present invention, the operation core is further configured to send a handshake request signal to the composite calculation module when it receives that the operation type is a composite type and the execution of the modular operation of the operation core is completed, and send a core operation result obtained by the calculation to the composite calculation module when it receives a handshake response signal returned by the composite calculation module.

In the embodiment of the present invention, the outputs of the operation cores may be interconnected with the next stage (e.g., the first output control module or the composite computing module) through the handshake signal group.

When the handshake request signal (valid signal) is sent out by the operation core, it indicates that the operation core has completed the modular operation and obtains a core operation result (such as a modular multiplication result or a modular exponentiation result), which indicates that the operation core can send a core operation result to the receiving party.

When the operation type is a complex type, the next stage of the operation core is a complex computation module, and when the operation core completes the modular operation to obtain a core operation result, the operation core may send a handshake request signal (valid signal) to the complex computation module, and when a handshake response signal (ready signal) returned by the complex computation module is received, the operation core may send a data signal to the complex computation module, where the data signal carries the core operation result of the operation core.

Further, when the valid signal sent by a certain operation core and the received ready signal are simultaneously valid, the core operation result of the operation core can be transmitted to the next stage.

In an optional embodiment of the present invention, the operation core is further configured to send a handshake request signal to the first output control module when it receives that the operation type is a single type and the execution of the modulo operation of the operation core is completed, and output a core operation result obtained by the calculation to the first output control module when it receives a handshake response signal returned by the first output control module.

In the embodiment of the present invention, each arithmetic core may include two sets of handshake control signals, which are respectively interconnected with the next stage (composite computing module) or the next stage (first output control module).

Taking core1 of the n operation cores as an example, when the operation type received by core1 is a single type and the modulo operation execution of core1 is completed, core1 may send a handshake request signal (e.g., core1_ valid _ 1) to the first output control module, and when core1 receives a handshake response signal (e.g., core1_ ready _ 1) returned by the first output control module, core1 may send a core1_ data signal to the first output control module, where the core1_ data signal carries the core operation result of core 1. When the operation type received by the core1 is the complex type and the execution of the modulo operation of the core1 is completed, the core1 may send a handshake request signal (e.g., core1_ valid _ 2) to the complex computing module, and when the core1 receives a handshake response signal (e.g., core1_ ready _ 2) returned by the complex computing module, the core1 may send a core1_ data signal to the complex computing module, where the core1_ data signal carries the core operation result of the core 1.

In an optional embodiment of the present invention, the composite computing module is further configured to send a handshake request signal to the second output control module when the accumulation operation or the accumulation operation is completed, and output a modular multiplication sum result or a modular exponentiation product result obtained by calculation to the second output control module when a handshake response signal returned by the second output control module is received.

Similarly, the composite computing module and the second output control module can be interconnected through a handshake signal group. The input of the second output control module comprises two handshake signal groups. One path is from the first output control module, and the other path is from the composite calculation module.

If the operation type of the current modulo operation instruction is a single type, the second output control module receives a handshake request signal (valid signal) from the first output control module, and in response to the handshake request signal (valid signal), the second output control module can return a handshake response signal (ready signal) to the first output control module, and after receiving the handshake response signal (ready signal), the first output control module can send a data signal carrying a core operation result to the second output control module.

If the operation type of the current modulo operation instruction is the composite type, the second output control module receives a handshake request signal (valid signal) from the composite calculation module. In response to the handshake request signal (valid signal), the second output control module may return a handshake response signal (ready signal) to the composite computing module, and after receiving the handshake response signal (ready signal), the composite computing module may send a data signal carrying a sum result of the modular multiplications or a product result of the modular exponentiation to the second output control module.

The second output control module may store the core operation result of each operation core output by the first output control module, or the second output control module may store a modular multiplication sum result or a modular exponentiation product result output by the composite calculation module. And the second output control module waits for the host side to read the saved calculation result through the PCIe bus.

In an optional embodiment of the present invention, the chip may further include an arbitration circuit, configured to select one of the operation cores in the valid state at the current time according to an arbitration algorithm, and output the selected operation core in the valid state, where the operation core has sent the handshake request signal.

In the embodiment of the present invention, the chip 100 may execute the batch modulo operation instruction from the host side in parallel through a plurality of operation cores. Since the operation completion time of the plurality of operation cores may be out of order, when valid signals (e.g., core1_ valid _1, core2_ valid _1, and … core _ n _ valid _ 1) of the plurality of operation cores are all valid, the arbitration circuit may determine which operation core to select for output, generate corresponding core1_ ready _1, core2_ ready _1, and … core _ n _ ready _1 signals, and only one of the n handshake response signals is valid and returned to each operation core. Only when the valid signal sent by a certain operation core and the received ready signal are simultaneously effective, the operation core can output the operation result of the operation core.

It should be noted that, the arbitration algorithm adopted by the arbitration circuit is not limited in the embodiments of the present invention. For example, a round-robin scheme or a scheme of sequentially selecting arithmetic cores from high to low according to the number of the arithmetic core may be used.

In an optional embodiment of the present invention, the composite computing module may further comprise a state machine configured to poll the status of each of the plurality of cores, and when it is determined that there is a valid core, the arbitration circuit is enabled.

Referring to fig. 5, a block diagram of a further embodiment of a chip 100 of the present invention is shown. As shown in fig. 5, the chip 100 includes an input control module 101, a composite computing module 102, at least two operation cores 103, a first output control module 104, and a second output control module 105, where the at least two operation cores 103 may form a multi-core array. As shown in fig. 5, the complex computing module 102 includes an accumulation unit and a multiplication unit, and further includes an arbitration circuit, a multiplexer, a state machine, and an output control logic circuit. The first output control module 104 includes an arbitration circuit, a multiplexer, a state machine, and an output control logic circuit. In the complex computing module 102, the state machine is used to poll the state of each operation core, and when the operation core with a valid state is determined to exist, the arbitration circuit in the complex computing module 102 is started, and selects one operation core from the operation cores with the valid state to be output to the second output control module through the output control logic circuit in the complex computing module 102, and the output is the sum of the modular multiplication or the product of the modular exponentiation. In the first output control module 104, the state machine is used to poll the status of each operation core, and when it is determined that there is an operation core in a valid status, the arbitration circuit in the first output control module 104 is activated, and selects one operation core from the valid status to be output to the second output control module through the output control logic circuit in the first output control module 104, and the output is the modular multiplication result or the modular exponentiation result.

To sum up, the embodiment of the present invention provides a chip, which includes an input control module, a composite computation module, and at least two operation cores, where each operation core can independently implement a modular multiplication operation or a modular exponentiation operation. The input control module can receive and analyze an input data frame from the host side, and send the operation type and the operation data obtained through analysis to the operation core to execute modular multiplication or modular exponentiation. The operation core executes modular operation on the received operation data according to the received operation type to obtain a core operation result; and when the operation type received by the operation core is a composite type, the operation core is also used for sending the core operation result and the operation type to the composite computing module. And the composite calculation module performs accumulation operation or multiplication operation on the received core operation result according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result. The chip of the embodiment of the invention can realize large-scale parallel execution of the modular operation (such as modular multiplication or modular exponentiation) of the basic operator and high-speed execution of the modular operation (such as modular multiplication sum or product of modular exponentiation) of the composite operator with larger operand by the multi-core array comprising at least two operation cores and the composite computation module, and the modular operation with larger operand is carried out in hardware (chip), so that the execution efficiency of the basic operation sub-modular operation can be improved, the execution efficiency of the modular operation of the composite operator can be greatly improved, the CPU load can be further reduced, the performance of a computing system can be improved, and a computing scene with higher requirements can be met.

Referring to fig. 6, a flow chart of steps of an embodiment of a data processing method of the present invention is shown, the method is applicable to a chip, the chip includes an input control module, a composite computation module, and at least two computation cores, and the method may include the following steps:

601, receiving and analyzing an input data frame from a host side through the input control module, and sending an operation type and operation data obtained through analysis to an operation core; the input data frame comprises operation types and operation data, the operation types comprise single types or composite types, the single types comprise modular multiplication or modular exponentiation, and the composite types comprise modular multiplication or modular exponentiation products;

step 602, performing modular operation on the received operation data through the operation core according to the received operation type to obtain a core operation result, where the core operation result includes a modular multiplication result or a modular exponentiation result; when the operation type received by the operation core is a composite type, the operation core also sends the core operation result and the operation type to the composite computation module;

and 603, performing accumulation operation or multiplication operation on the received core operation result through the composite calculation module according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result.

Optionally, the method further comprises:

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

As for the method embodiment, since it is basically similar to the apparatus embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

A non-transitory computer-readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the data processing method shown in fig. 6.

A non-transitory computer-readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a data processing method applied to a chip including an input control module, a complex computation module, and at least two operation cores, the method comprising: receiving and analyzing an input data frame from a host side through the input control module, and sending an operation type and operation data obtained through analysis to an operation core; the input data frame comprises operation types and operation data, the operation types comprise single types or composite types, the single types comprise modular multiplication or modular exponentiation, and the composite types comprise modular multiplication or modular exponentiation products; performing modular operation on the received operation data through the operation core according to the received operation type to obtain a core operation result, wherein the core operation result comprises a modular multiplication result or a modular exponentiation result; when the operation type received by the operation core is a composite type, the operation core also sends the core operation result and the operation type to the composite computation module; and performing accumulation operation or multiplication operation on the received core operation result through the composite calculation module according to the received operation type to obtain a modular multiplication sum result or a modular exponentiation product result.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The chip and the data processing method provided by the invention are described in detail, and the principle and the implementation mode of the invention are explained by applying specific examples, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A chip comprising an input control module, a composite computation module, and at least two computational cores, wherein,

2. The chip of claim 1, wherein the complex computation module comprises a multiplexer, an accumulation unit, and an accumulation unit; wherein the content of the first and second substances,

3. The chip according to claim 1, wherein the operation core is further configured to send a handshake request signal to the composite computation module when it is received that the operation type is a composite type and execution of modulo operation of the operation core is completed, and send a core operation result obtained by computation to the composite computation module when it receives a handshake response signal returned by the composite computation module.

4. The chip of claim 1, wherein the operation type further includes outputting a modular multiplication sum result or outputting a product result of modular exponentiation, and the composite calculation module is further configured to output the calculated modular multiplication sum result or product result of modular exponentiation when receiving the operation type outputting the modular multiplication sum result or the product result of modular exponentiation.

5. The chip of claim 1, further comprising an arbitration circuit configured to select one of the cores in the active state at a current time according to an arbitration algorithm for output, where the core in the active state refers to the core that has sent the handshake request signal.

6. The chip of claim 5, wherein the composite computing module further comprises a state machine configured to poll the status of each computing core and enable the arbitration circuit upon determining that a valid computing core exists.

7. The chip of claim 1, further comprising a first output control module and a second output control module; wherein the content of the first and second substances,

8. The chip of claim 7, wherein data is exchanged between the computational core and the first output control module, between the computational core and the composite computational module, between the composite computational module and the second output control module, and between the first output control module and the second output control module via handshake signals.

9. The chip according to claim 7, wherein the operation core is further configured to send a handshake request signal to the first output control module when it is received that the operation type is a single type and execution of a modulo operation of the operation core is completed, and output a calculated core operation result to the first output control module when it receives a handshake response signal returned by the first output control module.

10. The chip of claim 7, wherein the complex computation module is further configured to send a handshake request signal to the second output control module when the accumulation operation or the accumulation operation is completed, and output the calculated modular multiplication sum result or the calculated modular exponentiation product result to the second output control module when a handshake response signal returned by the second output control module is received.

11. The chip of claim 1, wherein the chip comprises a Field Programmable Gate Array (FPGA) chip or an Application Specific Integrated Circuit (ASIC) chip.

12. A data processing method is applied to a chip, wherein the chip comprises an input control module, a composite computing module and at least two operation cores, and the method comprises the following steps:

13. The method of claim 12, wherein the complex computation module comprises a multiplexer, an accumulation unit, and an accumulation unit, the method further comprising:

14. The method of claim 12, further comprising:

15. The method of claim 12, wherein the operation type further comprises outputting a modular multiplication sum result or outputting a modular exponentiation product result, the method further comprising:

16. The method of claim 12, wherein the chip further comprises an arbitration circuit, the method further comprising:

17. The method of claim 16, wherein the composite computing module further comprises a state machine, the method further comprising:

18. The method of claim 12, wherein the chip further comprises a first output control module and a second output control module, the method further comprising:

19. The method of claim 18, wherein data is exchanged between the computational core and the first output control module, between the computational core and the composite computational module, between the composite computational module and the second output control module, and between the first output control module and the second output control module via handshake signals.

20. The method of claim 18, further comprising:

21. The method of claim 18, further comprising:

22. The method of claim 12, wherein the chip comprises a Field Programmable Gate Array (FPGA) chip or an Application Specific Integrated Circuit (ASIC) chip.

23. A machine-readable medium having stored thereon instructions which, when executed by one or more processors of an apparatus, cause the apparatus to perform a data processing method as claimed in any one of claims 12 to 22.