WO2024078428A1

WO2024078428A1 - Acceleration device, computing system, and acceleration method

Info

Publication number: WO2024078428A1
Application number: PCT/CN2023/123473
Authority: WO
Inventors: 何倩雯; 蒋佳立; 邬贵明
Original assignee: 杭州阿里云飞天信息技术有限公司
Priority date: 2022-10-11
Filing date: 2023-10-09
Publication date: 2024-04-18
Also published as: CN115801221A

Abstract

Embodiments of the present application provide an acceleration device, a computing system, and an acceleration method. The acceleration device comprises: a first storage component, and a first acceleration component and a second acceleration component that are connected to the first storage component. The first storage component is connected to a first host processing component by means of a bus. The first storage component is used for storing multiple pieces of ciphertext data corresponding to multiple objects and sent by the first host processing component. The second acceleration component is used for acquiring the multiple pieces of ciphertext data from the first storage component, for any feature, performing bucketing processing on the multiple pieces of ciphertext data to obtain multiple bucketing results, and storing the multiple bucketing results in the first storage component. The first acceleration component is used for acquiring the multiple bucketing results from the first storage component, performing computing processing on the ciphertext data in a same bucketing result to obtain a ciphertext processing result, and storing, in the first storage component, the ciphertext processing results respectively corresponding to the multiple bucketing results. The technical solution provided in the embodiments of the present application improves the processing efficiency.

Description

Acceleration device, computing system and acceleration method

This application claims priority to the Chinese patent application filed with the China Patent Office on October 11, 2022, with application number 202211241151.7 and application name “Acceleration device, computing system and acceleration method”, the entire contents of which are incorporated by reference in this application.

Technical Field

The embodiments of the present application relate to the field of computer technology, and in particular, to an acceleration device, a computing system, and an acceleration method.

Background technique

With the development of science and technology, data value has been increasingly valued. There is often a need for data fusion between different data providers. However, due to factors such as privacy protection, data between different data providers cannot be shared, thus forming data islands. In order to solve the problem of data islands, privacy computing based on homomorphic encryption has emerged. It aims to break data islands and use multi-party data for calculations and modeling without leaking data privacy.

Homomorphic encryption is a type of encryption algorithm with special natural properties. By processing homomorphically encrypted data, an output data is obtained. After decrypting this output data, the result is the same as the output result obtained by processing the unencrypted original data in the same way. That is, calculation first and then decryption is equivalent to decryption first and then calculation. This feature is of great significance for protecting data security.

In an actual application, when multiple data providers have the same object but different features, there is a need for joint data processing as follows: the data initiator performs homomorphic encryption on the target data obtained by calculating the feature values of each object to obtain ciphertext data, and then provides the ciphertext data corresponding to the multiple objects to the data receiver; the data receiver buckets the ciphertext data corresponding to the multiple objects according to different feature values for each feature it has; then it calculates and processes the ciphertext data in each bucket to obtain the ciphertext processing result, and then returns the ciphertext processing results of each bucket corresponding to each feature to the data initiator. The data initiator can then decrypt and obtain the plaintext processing results of each bucket, and can perform subsequent processing operations based on the plaintext processing results of each bucket, thereby achieving the purpose of the data initiator using the features of the data receiver to process the data, while protecting the data security of both parties.

As can be seen from the above description, since it is necessary to bucket multiple ciphertext data for each feature and perform calculations on the ciphertext data in each bucket, the amount of calculation is very large, which affects the processing efficiency.

Summary of the invention

The embodiments of the present application provide an acceleration device, a computing system and an acceleration method for solving the technical problems that affect processing efficiency in the prior art.

In a first aspect, an acceleration device is provided in an embodiment of the present application, comprising: a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to a first host processing component via a bus;

The first storage component is used to store multiple ciphertext data corresponding to multiple objects sent by the first host processing component;

The second acceleration component is used to obtain the multiple ciphertext data from the first storage component, and for any feature, perform bucket processing on the multiple ciphertext data to obtain multiple bucket results; and store the multiple bucket results in the first storage component;

The first acceleration component is used to obtain the multiple bucket results from the first storage component; perform calculations on the ciphertext data in the same bucket result to obtain a ciphertext processing result; and store the ciphertext processing results corresponding to the multiple bucket results respectively in the first storage component;

The first storage component is used to provide the ciphertext processing results corresponding to the multiple bucket results to the first host processing component.

In a second aspect, an embodiment of the present application provides a computing system, including a first computing device and a second computing device, wherein the first computing device includes a first host processing component and an acceleration device as described in any one of the first aspects above;

The second computing device includes a second host processing component and a second acceleration device; the second acceleration device includes a second storage component and at least one third acceleration component; the second storage component is connected to the second host processing component via a bus;

The second storage component is used to store a plurality of to-be-processed data sent by the second host processing component; the to-be-processed data is target data to be encrypted or a ciphertext processing result to be decrypted;

The third acceleration component is used to obtain at least one to-be-processed data from the second storage component; for any to-be-processed data, encrypt or decrypt the to-be-processed data to obtain a calculation result, and store the calculation result in the second storage component;

The second host processing component is used to obtain a calculation result corresponding to any data to be processed from the second storage component.

In a third aspect, an embodiment of the present application provides a computing device, including a host processing component, a host storage component, and an acceleration device as described in the first aspect above.

In a fourth aspect, an acceleration method is provided in an embodiment of the present application, which is applied to an acceleration device, wherein the acceleration device includes a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to a first host processing component via a bus; wherein the first storage component stores the first host processing component. The method includes:

Acquire the plurality of ciphertext data from the first storage component;

For any feature, the plurality of ciphertext data are bucketed to obtain a plurality of bucketing results;

The multiple bucket results are stored in the first storage component; the first acceleration component is used to obtain the multiple bucket results from the first storage component; the ciphertext data in the same bucket result is calculated and processed to obtain the ciphertext processing result; the ciphertext processing results corresponding to the multiple bucket results are respectively stored in the first storage component; the first storage component is used to provide the ciphertext processing results corresponding to the multiple bucket results to the first host processing component.

The acceleration device provided in the embodiment of the present application includes a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to the first host processing component through a bus; the second acceleration component performs bucket processing, and the first host processing component stores multiple ciphertext data in the first storage component. Multiple features can share the multiple ciphertext data for bucket processing, and then the first acceleration component obtains the bucket result from the first storage component, and performs calculation processing on the ciphertext data in the same bucket result to obtain the ciphertext processing result; the ciphertext processing result can be provided to the first host processing component via the first storage component. Since the first host processing component only needs to perform one data transmission, the bucket processing operation and the calculation processing operation can be implemented using the acceleration device, which reduces the amount of calculation of the host processing component, thereby improving the processing efficiency, and can reduce the I/O overhead to ensure the acceleration performance of the acceleration device.

These and other aspects of the present application will be more clearly understood in the description of the following embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following is a brief introduction to the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

FIG1 is a schematic structural diagram of an embodiment of an acceleration device provided by the present application;

FIG2 is a schematic structural diagram of an embodiment of a second accelerating component provided by the present application;

FIG3 shows a schematic structural diagram of an embodiment of a first accelerating component provided by the present application;

FIG4 shows a schematic structural diagram of an embodiment of a first computing unit provided by the present application;

FIG5 is a schematic diagram showing the operation structure of a first operation unit in a practical application of an embodiment of the present application;

FIG6a shows a schematic diagram of the structure of an embodiment of a computing system provided by the present application;

FIG6b is a schematic diagram showing an interaction scenario of a computing system provided by the present application in an actual application;

FIG7a shows a schematic structural diagram of an embodiment of a second acceleration device provided by the present application;

FIG7b shows a schematic structural diagram of an embodiment of a third acceleration component provided by the present application;

FIG8 shows a flow chart of an embodiment of an acceleration method provided by the present application;

FIG9 shows a flow chart of an embodiment of an acceleration method provided by the present application;

FIG. 10 shows a schematic structural diagram of an embodiment of a computing device provided by the present application.

Detailed ways

In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

In some of the processes described in the specification and claims of this application and the above-mentioned figures, multiple operations that appear in a specific order are included, but it should be clearly understood that these operations may not be executed in the order in which they appear in this article or executed in parallel. The serial numbers of the operations, such as 101, 102, etc., are only used to distinguish between different operations, and the serial numbers themselves do not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed in sequence or in parallel. It should be noted that the descriptions of "first", "second", etc. in this article are used to distinguish different messages, devices, modules, etc., do not represent the order of precedence, and do not limit the "first" and "second" to be different types.

The technical solution of the embodiment of the present application can be applied to scenarios of joint data processing by multiple parties, such as scenarios of joint modeling by multiple parties, etc. Of course, the present application is not limited to this.

From the description in the background technology, it can be seen that there is currently a demand for joint data processing: the data initiator will calculate the target data based on the feature value of each object, perform homomorphic encryption, obtain ciphertext data, and then provide the ciphertext data corresponding to multiple objects to the data receiver; the data receiver will bucket the ciphertext data corresponding to multiple objects according to different feature values for each feature it possesses; then calculate and process the ciphertext data in each bucket result to obtain the ciphertext processing result, and then return the ciphertext processing results of each bucket result corresponding to each feature to the data initiator, and the data initiator can decrypt and obtain the plaintext processing results of each bucket result, and can perform subsequent processing operations based on the plaintext processing results of each bucket result.

In practical applications, the above-mentioned data joint processing requirements may exist in scenarios where federated learning is used for multi-party joint modeling. Take multi-party joint modeling as an example. Federated learning is a distributed machine learning method that can use data from multiple parties for joint modeling while protecting data privacy. Vertical federated learning is a commonly used federated learning method, which refers to multi-party joint modeling when the feature data and label information of the sample object are distributed among different data providers. Multiple data providers have the same sample object but different feature data. For example, data provider A and data provider B have the same user C, but data provider A has the educational background data of user C, and data provider B has the age data of user C. The educational background data and age data are feature data. When performing joint modeling, usually only one party has the label data of the sample object. The data provider with the label data is also called the data initiator (active party), and the data provider without label data is also called the data receiver (passive party). Through vertical federated learning, the active party can use the feature data of the passive party to improve the capabilities of the machine learning model while protecting the data privacy of each participant.

In the vertical federated learning method, the decision tree model is a commonly used machine learning model. The most important thing in training the decision tree model is to find the optimal split point, where the split point refers to the specific value of a certain feature data. For example, if the label data is user C as the target group, the split point may be age less than 20 years old or age less than 30 years old, etc.

When training a decision tree model, the following method is usually used: the active party first determines the gradient information corresponding to the model based on the feature values and label data of the sample objects it has, and then encrypts the gradient information into ciphertext gradient information using homomorphic encryption and transmits it to the passive party. The passive party calculates the ciphertext gradient accumulation value of the split space corresponding to each feature based on the ciphertext gradient information, and then sends the ciphertext gradient accumulation value to the active party. The active party decrypts it to obtain the gradient accumulation value, and can finally determine the optimal split point based on the gradient accumulation values of multiple features. It can be seen that the passive party needs to ciphertext accumulate the ciphertext gradient information obtained by homomorphic encryption. In order to improve the training efficiency, the bucketing method can be used. For each feature data, the passive party can bucket the ciphertext gradient information corresponding to different sample objects according to the feature value, accumulate the ciphertext gradient information in each bucket result, and then send the ciphertext gradient accumulation value corresponding to each bucket result to the active party. The active party then determines the optimal split point based on the ciphertext gradient accumulation value of each bucket result.

From the above description, it can be seen that the data receiver needs to bucket each feature and perform corresponding calculations on the ciphertext gradient information in each bucket result. Since these calculations and processing operations are usually completed by the processing components in each computing device, the host processing component also needs to perform the remaining work, which will result in a large amount of calculation for the processing component, thereby affecting the processing performance and reducing the processing efficiency.

In order to improve processing performance and efficiency, the inventors found in their research that the computational processing of ciphertext data obtained by encrypting using a homomorphic encryption algorithm essentially requires large integer multiplication and addition to achieve, which consumes a lot of processing performance. Therefore, they thought of using a dedicated accelerator to perform computational processing on ciphertext data to achieve better processing performance. However, the inventors also found that if a dedicated accelerator is used, the host processing component is still required to perform bucket processing. In actual applications, the number of objects is often large, especially in joint modeling scenarios, where sample objects are usually in the hundreds of thousands or even millions, and the number of features is also very large. Since bucket processing is required for each feature, the bucketing results need to be transmitted to the accelerator for each feature. The data order is: number of objects * number of features, which will in turn bring about a large I/O overhead, resulting in an acceleration performance bottleneck.

Based on this, the inventor has conducted a series of studies and proposed the technical solution of the present application. The embodiment of the present application provides an acceleration device, which is composed of a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to the first host processing component through a bus; the second acceleration component performs bucket processing, and the first host processing component only needs to send multiple ciphertext data corresponding to multiple objects once to be stored in the first storage component, and multiple features can share the multiple ciphertext data for bucket processing, and then the first acceleration component obtains the bucket result from the first storage component, and performs calculation processing on the ciphertext data in the same bucket result to obtain the ciphertext processing result; the ciphertext processing result can be provided to the first host processing component via the first storage component, so that the host processing component only needs to perform data transmission once, and the acceleration device can be used to implement bucket processing and calculation processing, which reduces the operation of the host processing component. The amount of calculation is performed by using dedicated acceleration devices to perform calculation processing operations, which improves processing efficiency and can reduce the I/O overhead of the acceleration device to ensure the acceleration performance of the acceleration device.

The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work are within the scope of protection of this application.

FIG1 is a schematic diagram of the structure of an embodiment of an acceleration device provided by an embodiment of the present application, and the acceleration device may include a first storage component 101, a first acceleration component 102 and a second acceleration component 103 respectively connected to the first storage component 101. The first storage component 101 is connected to the first host processing component 100 via a bus, and the bus type may be, for example, PCIE (peripheral component interconnect express, a high-speed serial computer expansion bus standard), and of course, other high-speed buses such as Ethernet may also be used for interconnection, and this application does not limit this.

Among them, the acceleration device can be implemented by an application-specific integrated circuit (ASIC) or a field programmable gate array (FPGA). Of course, it can also be implemented by a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device (PLD), a controller, a microcontroller, a microprocessor or other forms of integrated circuits (IC), etc. This application does not limit this.

The acceleration device can be deployed in a first computing device. Relative to the acceleration device, the first computing device can be referred to as a host device of the acceleration device. The first host processing component can be, for example, a central processing unit (CPU) in the first computing device, which is responsible for traditional processing tasks in the first computing device.

The first storage component 101 is used to store multiple ciphertext data corresponding to multiple objects sent by the first host processing component 100;

The second acceleration component 103 is used to obtain multiple ciphertext data from the first storage component 101, and for any feature, perform bucket processing on the multiple ciphertext data to obtain multiple bucket results; and store the multiple bucket results in the first storage component 101;

The first acceleration component 102 is used to obtain multiple bucket results from the first storage component; perform calculations on the ciphertext data in the same bucket result to obtain a ciphertext processing result; and store the ciphertext processing results corresponding to the multiple bucket results in the first storage component;

The first storage component 101 is used to provide the ciphertext processing results corresponding to the multiple bucket results to the first host processing component 100 .

Each object may correspond to a ciphertext data, and thus multiple objects may correspond to multiple ciphertext data. The ciphertext data may be obtained by encrypting the target data using a homomorphic encryption algorithm. In a practical application, such as in a multi-party joint modeling scenario, the ciphertext data may refer to ciphertext gradient information, which is obtained by the data initiator by encrypting the gradient data using a homomorphic encryption algorithm.

The first host processing component 100 can transmit multiple ciphertext data corresponding to multiple objects sent by the data sender to the first storage component 101 in the acceleration device for storage.

The first host processing component 100 may send corresponding instruction information to the first storage component 101, the first acceleration component 102, and the second acceleration component 103 to start or trigger each component to perform corresponding operations. For example, after the first host processing component 100 stores multiple ciphertext data in the first storage component 101, it may send corresponding instruction information to the second acceleration component 103, and the second acceleration component 103 may obtain the multiple ciphertext data from the first storage component 101 based on the instruction information. Of course, the first host processing component 100 may also notify the first storage component 101, the first acceleration component 102, and the second acceleration component 103 to start after receiving the multiple ciphertext data sent by the data initiator, and the first storage component 101, the first acceleration component 102, and the second acceleration component 103 may trigger the execution of their respective operations in real time or periodically.

The second acceleration component 103 is responsible for the bucket processing operation corresponding to each feature owned by the data receiver. It can bucket multiple ciphertext data for each feature to obtain multiple bucket results corresponding to each feature. The bucket results corresponding to different features can then be stored in the first storage component 101. The second acceleration component 103 can also send a bucket end notification to the first host processing component 100. After receiving the bucket end notification, the first host processing component 100 can notify the first acceleration component 102 to obtain multiple bucket results and perform calculation processing.

Optionally, in order to improve processing efficiency, the second acceleration component 103 can adopt a parallel method to perform bucket processing on multiple ciphertext data for multiple features at the same time. The multiple features can be notified by the first host processing component 100, etc. The first host processing component 100 can divide the features to be processed into multiple groups, each group includes multiple features, and after the bucketing operation corresponding to multiple features in any group is completed, multiple features of another group are issued.

After the first acceleration component 102 obtains multiple bucket results from the first storage component 101, it can calculate and process the ciphertext data in the same bucket result to obtain the ciphertext processing result, and store the ciphertext processing results corresponding to the multiple bucket results in the first storage component; optionally, the first acceleration component 102 can calculate and process the ciphertext data in the same bucket result specifically according to the target calculation processing mode, and the corresponding operation method can be determined according to the target calculation processing mode, and the calculation processing is specifically performed according to the operation method corresponding to the target calculation processing mode.

The target computing processing mode or the operation method can be notified to the first acceleration component 102 by the first host processing component 100 .

The target calculation processing mode may include, for example, ciphertext accumulation, and may also include ciphertext multiplication, ciphertext subtraction, etc. In a multi-party joint modeling scenario, the target calculation processing mode may specifically refer to ciphertext accumulation.

The operation method corresponding to the accumulation of ciphertext can be point addition operation. For example, in ECC (Elliptic Curve Cryptography), the accumulation of ciphertext means the conversion into the point addition operation of two points on the elliptic curve. In the homomorphic encryption algorithm based on the elliptic curve, the point addition operation is converted into arithmetic operations such as modular addition and modular multiplication when it is executed.

After the first storage component 101 stores the ciphertext processing results corresponding to the multiple bucket results, it can notify the first host processing component 100, so that the first host processing component 100 can obtain the multiple buckets from the first storage component 100. The results correspond to the ciphertext processing results respectively. Among them, the first storage component can be implemented by an external memory with a higher bandwidth.

The first host processing component 100 can send the ciphertext processing results corresponding to the multiple bucket results to the data initiator to facilitate the data initiator to perform subsequent operations. For example, the data initiator can first decrypt to obtain the plaintext processing results corresponding to the multiple bucket results corresponding to each feature, and then calculate the plaintext processing results according to the target calculation processing mode; or the data initiator can first calculate the ciphertext processing results corresponding to the multiple bucket results corresponding to each feature according to the target calculation processing mode, and then decrypt the processing results.

Through the acceleration device provided in this embodiment, bucket operations and computing processing operations can be performed by the acceleration device. The host processing component only needs to transmit the ciphertext data once, which can be shared by multiple features for bucket operations, thereby reducing the amount of computation of the host processing component, improving processing efficiency, and reducing I/O overhead, thereby ensuring the acceleration performance of the acceleration device.

In some embodiments, as shown in FIG1 , the acceleration device may further include a bus interface 104, which may be used to access the first computing device, so that the first acceleration component 102, the second acceleration component 103, and the first storage component 101 are connected to the first host processing component 100 in the first computing device through a bus. The bus interface 104 may be used to enable the acceleration device to be pluggable and installed in the first computing device.

In some embodiments, as shown in FIG. 1 , the acceleration device may further include a substrate 105 , on which the first storage component 101 , the first acceleration component 102 , and the second acceleration component 103 are welded, so as to realize electrical connection between the first acceleration component 102 , the second acceleration component 103 and the first storage component 101 , respectively.

Among them, by performing bucket processing on multiple ciphertext data, the multiple ciphertext data can be divided into multiple data intervals, each data interval is similar to a bucket, and the ciphertext data contained in each data interval constitutes a bucket result.

As an optional method, the second acceleration component 103 buckets the multiple ciphertext data for any feature, and obtaining multiple bucket results may include: for any feature, bucketing the multiple ciphertext data according to at least one feature value corresponding to the feature, and obtaining multiple bucket results.

Among them, the bucket processing operation can first divide multiple objects according to at least one feature value, and then divide the ciphertext data corresponding to the multiple objects according to the division results of the multiple objects, so that the ciphertext data corresponding to the objects in the same feature value interval are divided into the same bucket result.

For example, if the object is a user and the feature is age, the feature values include 10, 20, and 30. According to the three age values, the age can be divided into four age intervals: 0-10, 10-20, 20-30, and 30-∞ (infinity). According to the four age intervals, multiple users can be divided into different age intervals. Then, the ciphertext data corresponding to users in the same age range is also divided into the same bucket, thereby obtaining multiple bucket results.

Among them, at least one feature value corresponding to each feature can be stored in the first storage component 100 by the first host processing component 100, and obtained from the first storage component 100 by the second acceleration component 103. Of course, due to the small amount of data, the first host processing component 100 can directly send at least one feature value corresponding to each feature to the second acceleration component 103.

In addition, as another optional manner, the first storage component 101 is also used to store bucket information of multiple objects corresponding to different features sent by the first host processing component 100;

The second acceleration component 103 performs bucket processing on multiple ciphertext data for any feature, and obtains multiple bucket results including: for any feature, determining bucket information of multiple objects corresponding to the feature respectively; dividing the ciphertext data corresponding to at least one object corresponding to the same bucket information into the same bucket, so as to obtain multiple buckets.

The bucket information may refer to a bucket identifier, which is used to uniquely identify a bucket and may be implemented in the form of any one or more characters (such as a combination of numbers, letters, etc.), which is not limited in this application. Bucket information corresponding to different features of multiple objects may be determined by the first host processing component 100.

After the first host processing component 100 obtains the ciphertext data corresponding to each object, it can combine the multiple features possessed by the data recipient itself, and for each feature, divide the multiple objects according to at least one feature value corresponding to each feature, so as to determine the feature value interval of each object, and set the same bucket information for objects in the same feature value interval, and the bucket information corresponding to different feature value intervals is different. The first host processing component 100 can store the bucket information of each feature corresponding to each object in the first storage component 101, and the second acceleration component 103 can obtain the bucket information of each feature corresponding to multiple objects from the first storage component 101. Of course, since the data volume of the bucket information is small, the first host processing component 100 can also send the bucket information of different features corresponding to multiple objects to the second acceleration component 103.

In some embodiments, as shown in FIG. 2 , the second acceleration component 103 may include a data loading unit 201 , a plurality of bucketing units 202 , and a data storage unit 203 .

The data loading unit 201 is used to obtain multiple ciphertext data from the first storage component 101, and provide the multiple ciphertext data to multiple bucketing units respectively; assign features to be processed to the multiple bucketing units respectively, and control the multiple bucketing units to process the assigned features to be processed in parallel;

The bucketing unit 202 is used to bucket the multiple ciphertext data according to the features assigned to it, obtain multiple bucketing results, and send the multiple bucketing results to the storage unit;

The data storage unit 203 is used to store the multiple bucketing results sent by each bucketing unit into the first storage component. The data storage unit can be implemented by RAM (Random Access Memory).

Multiple features can be processed in parallel by using multiple bucketing units 202. Each bucketing unit 202 can be allocated to obtain at least one feature, and the at least one feature can be processed by bucketing in a line processing manner.

Optionally, each bucket unit 202 may be assigned a feature, and the first host processing component 100 may determine the number of features to be processed in parallel at one time according to the number of units of the multiple bucket units 202 , and the number of features may be less than or equal to the number of units. The first host processing component 100 can select at least one feature according to the number of features, and provide the bucket information of the at least one feature corresponding to multiple objects to the acceleration device, so that the data loading unit 201 can assign the bucket information of the at least one feature to at least one bucket unit 202 one by one, and each bucket unit 202 can obtain the bucket information of a feature, and then for the feature assigned to it, the ciphertext data corresponding to at least one object corresponding to the same bucket information can be divided into the same bucket result; of course, the first host processing component 100 can also select at least one feature according to the number of features, and provide at least one feature value corresponding to the at least one feature to the acceleration device, so that the data loading unit 201 can assign at least one feature value corresponding to the at least one feature to at least one bucket unit 202 one by one, and each bucket unit 202 can obtain at least one feature value of a feature, and then for the feature assigned to it, the multiple ciphertext data can be bucketed according to the at least one feature value corresponding to it.

Since the amount of computational processing performed by the first acceleration component 101 on the ciphertext data is also relatively large, in order to further improve processing efficiency and acceleration performance, the first acceleration component 101 may include at least one first acceleration unit;

Among them, each first acceleration unit can be used to obtain at least one bucket result from the first storage component 101, and for any bucket result, according to the target calculation processing mode, calculate and process multiple ciphertext data in the bucket result to obtain the ciphertext processing result; store the ciphertext processing result corresponding to any bucket result in the first storage component 101.

Optionally, the first acceleration component 101 may be provided with a plurality of first acceleration units, so as to improve parallel processing capability, processing efficiency, and acceleration performance.

In some embodiments, as shown in FIG. 3 , each first acceleration unit may include a first control unit 301 and a plurality of first computing units 302 .

The first control unit 301 is used to obtain at least one bucket result from the first storage component 101; and dispatch the at least one bucket result to at least one computing unit 302;

The first computing unit 302 is used to perform computing processing on a plurality of ciphertext data in the bucket result according to a target computing processing mode for any bucket result assigned thereto to obtain a ciphertext processing result;

The first control unit 301 is used to store the ciphertext processing result corresponding to any bucket result in the first storage component 101.

Multiple first computing units 302 can be used to implement parallel computing of multiple bucket results, thereby improving processing efficiency and further ensuring acceleration performance.

In some embodiments, as shown in FIG3 , each first acceleration unit 300 may further include a first storage unit 303 ;

The first operation unit 302 may also be used to save the ciphertext processing result corresponding to any bucket result to the first storage unit 303;

The first control unit 301 stores the ciphertext processing result corresponding to any bucket result in the first storage component, which may be: The ciphertext processing result corresponding to any bucket result stored in the first storage unit 303 is stored in the first storage component 101.

In some embodiments, as shown in FIG. 3 , each first acceleration unit 300 may further include a first loading unit 304 .

The first control unit 301 obtaining at least one bucket result from the first storage component 101 may specifically control the first loading unit 304 to obtain at least one bucket result from the first storage component 101 .

Optionally, the first control unit 301 can perform corresponding operations according to the instructions of the first host processing component 100. Therefore, in some embodiments, the first control unit 301 can also be used to receive first control information sent by the first host processing component 100, and control the operation of multiple first computing units 302, first storage units 303, and first loading units 304 according to the first control information.

The first control information may include the first total data amount of at least one bucket result that the first acceleration unit 300 needs to obtain and the second total data amount corresponding to the at least one bucket result after the calculation and processing are performed on the at least one bucket result. In addition, it may also include the first storage address corresponding to the at least one bucket result that needs to be obtained and the second storage address corresponding to the at least one ciphertext processing result obtained after the calculation and processing are performed on the at least one bucket result. Thus, the first control unit 301 may specifically obtain at least one bucket result from the first storage component 101 according to the first total data amount and the first storage address; and may control the first storage unit 303 to store at least one ciphertext processing result to the first storage component 101 according to the second total data amount and the second storage address. Specifically, the first control unit 301 may specifically control the first loading unit 304 to obtain at least one bucket result from the first storage component 101 according to the first total data amount and the first storage address.

In addition, the first control information may further include the target computing processing mode or the operation method corresponding to the target computing processing mode, and the first control unit 301 may specifically notify the first operation unit 302 of the corresponding operation method according to the first control information.

The first operation unit 302 calculates and processes multiple ciphertext data in any bucket result assigned to it to obtain a ciphertext processing result, including: for any bucket result assigned to it, according to the operation method, calculating and processing multiple ciphertext data in the bucket result to obtain a ciphertext processing result.

In one or more of the above embodiments, the operation method corresponding to each target calculation mode can be pre-configured with one or more corresponding operation instructions, and the calculation and processing of multiple ciphertext data in each bucket result can be achieved by executing one or more operation instructions.

In practical applications, each first operation unit 302 can be implemented by a programmable processor (PC), which can store corresponding instructions to perform corresponding operations. In some embodiments, as shown in FIG4 , the first operation unit 302 may include a first storage subunit 401, a first parsing subunit 402, a first calculation subunit 403, and a first control subunit 404;

The first storage subunit 401 is used to store one or more operation instructions corresponding to the target computing processing mode;

The first parsing subunit 402 is used to parse one or more operation instructions;

The first control subunit 403 is used to send a signal to the first calculation subunit 404 based on the analysis result of the first analysis unit. Calculation instructions;

The first calculation subunit 404 is used to perform calculation processing on multiple ciphertext data based on the calculation indication information to obtain a ciphertext processing result.

The one or more operation instructions may be converted into corresponding calculation instruction information after being parsed to control the operation of the first calculation subunit.

The first storage subunit may be implemented by RAM, etc.

In practical applications, the target computing processing mode can be ciphertext accumulation, and the corresponding operation mode is point addition. For example, ciphertext data is encrypted using a homomorphic encryption algorithm based on elliptic curves, such as the EC-ELGamal semi-homomorphic acceleration algorithm. EC-ElGamal is a type of ECC, which is an implementation of ElGamal transplanted to elliptic curves. The main calculations include: elliptic curve point addition, point subtraction, point multiplication, modular inversion and discrete logarithm. ElGamal is an asymmetric encryption algorithm based on Diffie-Hellman key exchange.

Taking the EC-ELGamal semi-homomorphic encryption algorithm as an example, the encryption formula is:
Enc(P,m)＝(C ₁ ＝kG,C ₂ ＝kP+mG)

Among them, P represents the public key, which is a point on the elliptic curve; G is the base point of the elliptic curve; k is a random number; m is the plaintext data to be encrypted, that is, the target data, and Enc(P, m) represents the ciphertext obtained by encryption, which is composed of the point pair data _C1 and _C2 .

The ciphertext addition formula is:
Enc(P,m ₁ )+Enc(P,m ₂ )
=(k ₁ G+k ₂ G,(k ₁ P+m ₁ G)+(k ₂ P+m ₂ G))

The decryption formula is:
M＝C ₂ -sC ₁
=mG

Among them, M represents the decryption result, s represents the private key, and the private key multiplied by the base point is the public key, so sC1=s*kG=kP, and thus C2-sC1=mG.

It can be seen that encryption essentially requires point multiplication on an elliptic curve and the addition of the point multiplication results on two elliptic curves (point addition). Ciphertext addition is essentially point addition on an elliptic curve, while decryption requires point multiplication on an elliptic curve. Point multiplication operations essentially consist of scalars and points. For example, the point multiplication operation kP includes the scalar k and the point P; the point multiplication operation mG includes the scalar m and the point G.

Ciphertext accumulation means adding multiple ciphertext data. In some embodiments, the first calculation subunit 404 performs calculation processing on multiple ciphertext data based on the calculation indication information to obtain the ciphertext processing result, which can be: sequentially obtain one ciphertext data from the multiple ciphertext data, perform a dot addition operation with the previous dot addition result, and determine whether the current accumulation times meets the preset times. If yes, the last point addition result is output as the ciphertext processing result, if no, the point addition result is saved in the first storage subunit 401. The first control unit can provide the multiple ciphertext data to the first calculation subunit in the form of an input data stream.

In one implementation, the first storage subunit 401 may include a first instruction storage subunit, a first data storage subunit, and a first number storage unit.

Among them, the first instruction storage unit is used to store one or more operation instructions, the first data storage subunit is used to store intermediate results in the calculation process, such as the previous point addition result, and the first number storage unit is used to store a preset number of times, etc.

In addition, for a target computing processing mode involving a point multiplication operation, the first storage subunit may further include a first scalar storage subunit for storing scalar data.

In an actual application, the operation schematic diagram of the first operation unit 302 can be as shown in FIG5. The first instruction storage unit, the first parsing subunit, the first control subunit, the first data storage subunit, the first calculation subunit, the first number storage subunit and the first scalar storage subunit described in FIG5 have been described in detail above and will not be repeated here. In conjunction with FIG5, the first calculation subunit can have a basic calculation logic, which can include a first input A, a second input B, a third input C and a fourth input D; the first input A can come from the input data stream or the first data storage subunit, and the second input B, the third input C and the fourth input D can come from the first data storage subunit, and of course each input can be empty. Take the point addition operation corresponding to the ciphertext accumulation as an example: the ciphertext data obtained from the input data stream can enter the first input A, the previous point addition result is used as the second input B, the third input C and the fourth input D can be space, and the first calculation subunit performs a point addition operation, and performs a point addition operation on the first input A and the second input B to obtain a point addition result, which will be stored in the first data storage subunit or output as a ciphertext processing result.

In practical applications of the embodiments of the present application, the first computing device may be a computing device corresponding to a data receiver responsible for computing and processing multiple ciphertext data, etc. The data initiator corresponds to a second computing device, which is used to transmit ciphertext data corresponding to multiple objects to the first computing device.

As shown in FIG. 6 a , the embodiment of the present application further provides a computing system, which may include a first computing device 60 and a second computing device 70 .

The first computing device 60 may include a first host processing component 100 and a first acceleration device 601. The specific structural implementation of the first acceleration device 601 may be described in detail in any of the embodiments shown in FIG. 1 to FIG. 5 above, and will not be repeated here.

The second computing device 70 may include a second host processing component 700 and a second acceleration device 602 .

That is, the second computing device 70 may also be configured with a second acceleration device 602 for accelerating encryption or decryption operations. Therefore, the second acceleration device 602 can be used to obtain multiple data to be processed, and for any data to be processed, encrypt or decrypt the data to be processed to obtain a calculation result.

The data to be processed may be target data to be encrypted or a ciphertext processing result to be decrypted; correspondingly, the calculation processing result may be ciphertext data or a plaintext processing result.

For the ciphertext data, the second host processing component 700 may obtain the ciphertext data corresponding to the multiple objects respectively from the second acceleration device 602 and send the ciphertext data to the first computing device 60 .

For the plaintext processing result, the second host processing component 700 can obtain the plaintext processing result from the second acceleration device 602 and perform subsequent processing operations.

For example, in a practical application, the technical solution of the embodiment of the present application can be applied to a scenario in which multi-party joint modeling is performed using a vertical federated learning method. In the interaction diagram shown in Figure 6b, the second acceleration device 602 in the second computing device 70 of the data initiator first encrypts the gradient information corresponding to different sample objects to obtain the ciphertext gradient information of multiple sample objects, wherein the gradient information is calculated based on the feature values and label data corresponding to the sample objects provided by the data initiator using a decision tree model.

Afterwards, the second acceleration device 602 sends the ciphertext gradient information of the multiple sample objects to the second host processing component 700, and the second host processing component 700 sends the ciphertext gradient information of the multiple sample objects to the first computing device 60 corresponding to the data recipient.

After the first host processing component 100 in the first computing device 60 receives the ciphertext gradient information of multiple sample objects, the ciphertext gradient information of the multiple sample objects can be sent to the first acceleration device 601. The first acceleration device 601 can first perform bucket processing on the ciphertext gradient information of the multiple sample objects according to different features to obtain multiple bucket results of each feature, and then use the technical solution of the present application to calculate the ciphertext gradient cumulative values corresponding to the multiple bucket results of each feature, and then send the ciphertext gradient cumulative values corresponding to the multiple bucket results of each feature to the first host processing component 100; the first host processing component 100 then sends the ciphertext gradient cumulative values corresponding to the multiple bucket results of each feature to the second computing device 70 of the data initiator.

The second host processing component 700 in the second computing device 70 receives the ciphertext gradient accumulation values corresponding to the multiple bucket results of each feature, and can send them to the second acceleration device 602.

The second acceleration device 602 can decrypt and obtain the gradient accumulation values corresponding to the multiple bucket results of each feature, and then accumulate the gradient accumulation values of the multiple bucket results of each feature to obtain the gradient accumulation value corresponding to the feature. Of course, it is also possible to first accumulate the ciphertext gradient accumulation values corresponding to the multiple bucket results to obtain the ciphertext gradient accumulation value corresponding to the feature, and then decrypt the ciphertext gradient accumulation value corresponding to the feature to obtain the gradient accumulation value corresponding to the feature.

The second acceleration device 602 may send the gradient accumulation values corresponding to the plurality of features to the second host processing component 700 .

The second host processing component 700 can specifically determine the decision tree model based on the gradient accumulation values corresponding to the multiple features. The optimal split point. According to the optimal split point, the decision tree model can be constructed.

The decision tree model may be an XGBoost (eXtreme Gradient Boosting) model, etc. Of course, it may also be other types of decision tree models, such as GBDT (Gradient Boosting Decision Tree), GBM (Gradient Boosting Machine), etc.

The gradient information may include the first-order gradient and second-order gradient corresponding to each sample object, which is obtained by deriving the loss function of the decision tree model. The feature values of the sample object are input into the decision tree model to obtain the predicted data. The loss function can be used to estimate the degree of inconsistency between the predicted data and the label data. The first-order gradient and second-order gradient can be obtained by deriving the loss function.

It can be seen from the above description that, for the second computing device, when the data to be processed is the target data to be encrypted, the target data may be the gradient information corresponding to the decision tree model calculated based on the feature values and label data corresponding to the sample object provided by the data initiator;

In the case where the data to be processed is the ciphertext processing result to be decrypted, the data to be processed may be the ciphertext processing result to be decrypted obtained by calculation for any feature provided by the data recipient, wherein the ciphertext processing result to be decrypted may be the ciphertext gradient accumulated value, the ciphertext gradient accumulated value corresponding to each feature, or the ciphertext gradient accumulated value corresponding to each bucket result, and the corresponding calculation processing result obtained by decrypting it is the gradient accumulated value.

The first computing device 60 and the second computing device 70 are connected via a network. The network provides a medium for a communication link between the first computing device 60 and the second computing device 70. The network may include various connection types, such as wired, wireless, or optical fiber cables, etc. Optionally, the wireless connection may be implemented via a mobile network, and accordingly, the network standard of the mobile network may be any one of 2G (GSM), 2.5G (GPRS), 3G (WCDMA, TD-SCDMA, CDMA2000, UTMS), 4G (LTE), 4G+ (LTE+), 5G, WiMax, etc. Optionally, of course, a communication connection may also be established via Bluetooth, WiFi, infrared, etc.

The first computing device 60 and the second computing device 70 may also include other components, such as input/output interfaces, display components, communication components for implementing the above-mentioned communication connections, and host storage components for storing computer instructions for host processing components to call and execute to implement corresponding operations, etc. This application does not go into details.

As shown in FIG. 7a , in order to further improve processing efficiency and acceleration performance, the second acceleration device may include a second storage component 701 and at least one third acceleration component 702 ; the second storage component 701 is connected to the second host processing component 700 via a bus;

The second storage component 701 is used to store a plurality of data to be processed sent by the second host processing component 700; the data to be processed is target data to be encrypted or a ciphertext processing result to be decrypted;

The third acceleration component 702 is used to obtain at least one to-be-processed data from the second storage component; for any to-be-processed data, encrypt or decrypt the to-be-processed data to obtain a calculation result, and store the calculation result in the second storage component 701;

The second host processing component 700 obtains a calculation result corresponding to any data to be processed from the second storage component 701 .

Optionally, the second acceleration device may be provided with a plurality of third acceleration components 702, thereby improving parallel processing capability, improving processing efficiency, and improving acceleration performance.

In some embodiments, as shown in FIG. 7 b , each third acceleration component 702 may include a second control unit 7021 and a plurality of second computing units 7022 ;

The second control unit 7021 is used to obtain at least one data to be processed from the second storage component; and dispatch the at least one data to be processed to at least one second computing unit;

The second computing unit 7022 is used to encrypt or decrypt any data to be processed assigned to it to obtain a computing result;

The second control unit 7021 is used to store the calculation results corresponding to any data to be processed into the second storage component 701.

In some embodiments, each third acceleration component 702 may further include a second storage unit 7023; the second computing unit 7022 is further configured to save a calculation result corresponding to any to-be-processed data to the second storage unit 7023;

The second control unit 7021 stores the calculation processing result corresponding to any data to be processed in the second storage component 701, including: storing the calculation processing result corresponding to any data to be processed stored in the second storage unit 7023 in the second storage component 701.

In some embodiments, as shown in FIG7b , each third acceleration component 702 may further include a second loading unit 7024. The second control unit 7021 may specifically control the second loading unit 7024 to obtain at least one to-be-processed data from the second storage component 701 .

In some embodiments, the second control unit 7021 is further used to receive second control information sent by the second host processing component 700, and control the operation of the plurality of second computing units 7022 and the second storage unit 7023 according to the second control information;

The second control unit 7021 is further used to notify the second computing unit 7022 of the corresponding computing method according to the second control information; wherein the computing methods corresponding to encryption are point addition and point multiplication, and the computing method corresponding to decryption is point multiplication.

The second control information may include the first total amount of at least one data to be processed that the third acceleration component needs to obtain and the second total amount of data corresponding to at least one data to be processed after the at least one data to be processed is calculated and processed. In addition, it may also include a first storage address corresponding to the at least one data to be processed that needs to be obtained and a second storage address corresponding to at least one calculation result obtained after the at least one data to be processed is calculated and processed. Thus, the second control unit 7021 can specifically obtain at least one data to be processed from the second storage component 701 according to the first total amount of data and the first storage address; and can control the second storage unit 7023 to store at least one calculation result to the second storage component 701 according to the second total amount of data and the second storage address. Specifically, the second control unit 7021 can specifically control the second loading unit 7024 to obtain at least one data to be processed from the second storage component 701 according to the first total amount of data and the first storage address. data to be processed.

In addition, the second control information may also include an operation method corresponding to encryption or decryption, and the second control unit 7021 may specifically notify the second operation unit 7022 of the corresponding operation method according to the second control information;

The second computing unit 7022 performs computing processing on any data to be processed assigned to it to obtain a computing processing result, including: for any data to be processed assigned to it, processing the data to be processed according to the computing method to obtain a computing processing result.

Among them, the operation mode corresponding to encryption or decryption can be pre-configured with one or more corresponding operation instructions, and each data to be processed can be calculated and processed by executing one or more operation instructions. In some embodiments, the second operation unit 7022 can be implemented by a programmable processor (PC) in actual application, which can store corresponding instructions to perform corresponding operations. The second operation unit 7022 can include a second storage subunit, a second parsing subunit, a second calculation subunit, and a second control subunit;

The second storage subunit is used to store one or more operation instructions corresponding to encryption or decryption;

The second parsing subunit is used to parse one or more operation instructions;

The second control subunit is used to send calculation instruction information to the second calculation subunit based on the analysis result of the analysis unit;

The second calculation subunit is used to perform calculation processing on the data to be processed based on the calculation indication information to obtain a calculation processing result.

The second storage subunit may be implemented by RAM, etc.

In one implementation, the second storage subunit may include a second instruction storage subunit, a second data storage subunit, and a second number storage unit.

Among them, the second instruction storage unit is used to store one or more operation instructions, the second data storage subunit is used to store intermediate results in the calculation process, and the second number storage unit is used to store a preset number of times, etc.

In addition, since the encryption operation involves a point multiplication operation, the second storage subunit may further include a first scalar storage subunit for storing scalar data.

It should be noted that the specific structure of the second operation unit can be the same as the structure of the first operation unit 302 described in the corresponding embodiment above. Therefore, the specific implementation can be found in the above explanation of the first operation unit, which will not be repeated here.

Through the technical solution of the embodiment of the present application, the processing efficiency of the encryption or decryption operation in the second computing device and the processing efficiency of the ciphertext accumulation operation in the first computing device can be improved, the amount of calculation of the host processing component can be reduced, the processing performance can be improved, the acceleration performance can be improved, and efficient and high-performance data joint processing can be achieved.

The first computing device and the second computing device may be physical machines, which may be physical machines providing cloud computing capabilities, etc.

In addition, an embodiment of the present application further provides an acceleration method, which can be applied to an acceleration device as shown in FIG1 , wherein the acceleration device includes a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to the first host processing component through a bus; wherein the first storage component stores multiple ciphertext data corresponding to multiple objects sent by the first host processing component; the specific structural implementation of the acceleration device can be described in detail in the corresponding embodiment, which will not be repeated here. The method can be specifically executed by the second acceleration component in the acceleration device, as shown in FIG8 , and the method can include the following steps:

801: Obtain multiple ciphertext data from a first storage component.

802: For any feature, multiple ciphertext data are bucketed to obtain multiple bucket results.

803: Store the multiple bucketing results in the first storage component.

The first acceleration component is used to obtain multiple bucket results from the first storage component; calculate and process the ciphertext data in the same bucket result to obtain the ciphertext processing result; store the ciphertext processing results corresponding to the multiple bucket results respectively in the first storage component; the first storage component is used to provide the ciphertext processing results corresponding to the multiple bucket results respectively to the first host processing component.

In addition, an embodiment of the present application further provides an acceleration method, which can be applied to an acceleration device as shown in FIG1 , wherein the acceleration device includes a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to the first host processing component through a bus; wherein the first storage component stores multiple ciphertext data corresponding to multiple objects sent by the first host processing component; the specific structural implementation of the acceleration device can be described in detail in the corresponding embodiment, which will not be repeated here. The method can be specifically executed by the first acceleration component in the acceleration device, as shown in FIG9 , and the method can include the following steps:

901: Obtain multiple bucket results from a first storage component.

The multiple bucketing results can be obtained by the second acceleration component obtaining multiple ciphertext data from the first storage component, and performing bucketing processing on the multiple ciphertext data according to multiple features.

902: Calculate and process the ciphertext data in the same bucket result to obtain a ciphertext processing result.

903: Store the ciphertext processing results corresponding to the multiple bucket results respectively in the first storage component.

It should be noted that, in the acceleration method described in the embodiment shown in FIG. 8 and the acceleration method described in the embodiment shown in FIG. 9 , the specific operation mode of each step has been described in detail in the relevant device embodiments and will not be elaborated here.

In addition, an embodiment of the present application further provides a computing device, as shown in FIG. 10 , which may include a host processing component 1001, a host storage component 1002, and an acceleration device 1003, wherein the acceleration device may adopt a structure as described in any of the embodiments of FIG. 1 to FIG. 5 or FIG. 7 a, which will not be repeated here.

The host storage component 1002 may store one or more computer instructions for the host processing component 1001 to call and execute to implement corresponding operations.

Of course, the computing device may also include other components, such as input/output interfaces, display components, communication components, etc.

The input/output interface provides an interface between the processing component and the peripheral interface module, which may be an output device, an input device, etc. The communication component is configured to facilitate wired or wireless communication between the computing device and other devices.

The host processing component may include one or more processors to execute computer instructions to complete all or part of the steps in the above method. Of course, the host processing component may also be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components to perform the above method.

The host storage component is configured to store various types of data to support operations in the computing device. The host storage component can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

The acceleration device can be implemented by using an application-specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device (PLD), a field programmable gate array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components. It can be connected to the host processing component through a bus and deployed in a computing device in a hot-swappable manner.

The embodiment of the present application also provides a computer-readable storage medium storing a computer program, which can implement the acceleration method of the embodiment shown in Figure 8 or Figure 9 when executed by a computer. The computer-readable medium can be included in the computing device described in the above embodiment; or it can exist independently without being assembled into the electronic device.

The embodiment of the present application also provides a computer program product, which includes a computer program carried on a computer-readable storage medium, and when the computer program is executed by a computer, it can implement the acceleration method of the embodiment shown in Figure 8 or Figure 9 as described above. In such an embodiment, the computer program can be downloaded and installed from a network, and/or installed from a removable medium. When the computer program is executed by a processor, various functions defined in the system of the present application are executed.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without creative work.

Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some of the technical features therein with equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

An acceleration device, characterized in that it comprises: a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to a first host processing component via a bus;

The first storage component is used to store multiple ciphertext data corresponding to multiple objects sent by the first host processing component;

The second acceleration component is used to obtain the multiple ciphertext data from the first storage component, and for any feature, perform bucket processing on the multiple ciphertext data to obtain multiple bucket results; and store the multiple bucket results in the first storage component;

The first acceleration component is used to obtain the multiple bucket results from the first storage component; perform calculations on the ciphertext data in the same bucket result to obtain a ciphertext processing result; and store the ciphertext processing results corresponding to the multiple bucket results respectively in the first storage component;

The first storage component is used to provide the ciphertext processing results corresponding to the multiple bucket results to the first host processing component.
The device according to claim 1 is characterized in that the second acceleration component performs bucket processing on the multiple ciphertext data for any feature, and obtaining multiple bucket results comprises: for any feature, determining bucket information of the multiple objects corresponding to the feature respectively; dividing the ciphertext data corresponding to at least one object corresponding to the same bucket information into the same bucket result to obtain multiple bucket results; wherein the bucket information of the multiple objects corresponding to different features is determined by the first host processing component;

The first storage component is also used to store bucket information of the multiple objects corresponding to different features sent by the first host processing component.
The device according to claim 1, characterized in that the second acceleration component comprises a data loading unit, a plurality of bucketing units and a data storage unit;

The data loading unit is used to obtain the plurality of ciphertext data from the first storage component, and provide the plurality of ciphertext data to the plurality of bucketing units respectively; the data loading unit is also used to assign features to be processed to the plurality of bucketing units respectively, and control the plurality of bucketing units to process the assigned features to be processed in parallel;

The bucketing unit is used to perform bucketing processing on the plurality of ciphertext data according to the features assigned thereto, to obtain a plurality of bucketing results; and send the plurality of bucketing results to the storage unit;

The storage unit is used to store multiple bucket results sent by each bucket unit into the first storage component.
The device according to claim 1 is characterized in that it also includes a substrate, and the first storage component, the first acceleration component and the second acceleration component are welded on the substrate.
The device according to claim 1, characterized in that the first acceleration assembly includes at least one first acceleration unit;

The first acceleration unit is used to obtain at least one bucket result from the first storage component, and for any bucket result, perform calculation processing on multiple ciphertext data in the bucket result according to the target calculation processing mode to obtain a ciphertext processing mode. Processing result; storing the ciphertext processing result corresponding to any bucket result in the first storage component.
The device according to claim 5, characterized in that the first acceleration unit comprises a first control unit and a plurality of first computing units;

The first control unit is used to obtain at least one bucket result from the first storage component; and dispatch the at least one bucket result to at least one computing unit;

The first computing unit is used to perform computing processing on a plurality of ciphertext data in any bucket result assigned to it according to a target computing processing mode to obtain a ciphertext processing result;

The first control unit is used to store the ciphertext processing result corresponding to any bucket result in the first storage component.
The device according to claim 6, characterized in that the first acceleration unit further comprises a first storage unit; the first computing unit is further configured to save the ciphertext processing result corresponding to any bucket result to the first storage unit;

The first control unit storing the ciphertext processing result corresponding to any bucket result in the first storage component includes: storing the ciphertext processing result corresponding to any bucket result stored in the first storage unit in the first storage component.
The device according to claim 7, characterized in that the first control unit is further used to receive first control information sent by the first host processing component, and control the operation of the plurality of first computing units and the first storage unit according to the first control information;

The first control unit is further configured to notify the first computing unit of a corresponding computing mode according to the first control information;

The first computing unit calculates and processes multiple ciphertext data in any bucket result assigned to it to obtain a ciphertext processing result, including: for any bucket result assigned to it, according to the computing method, processing multiple ciphertext data in the bucket result to obtain a ciphertext processing result.
The device according to claim 6, characterized in that the first operation unit includes a first storage subunit, a first parsing subunit, a first calculation subunit, and a first control subunit;

The first storage subunit is used to store one or more operation instructions corresponding to the target computing processing mode;

The first parsing subunit is used to parse the one or more operation instructions;

The first control subunit is used to send calculation instruction information to the first calculation subunit based on the analysis result of the analysis unit;

The first calculation subunit is used to perform calculation processing on the multiple ciphertext data based on the calculation indication information to obtain a ciphertext processing result.
The device according to claim 9, characterized in that the target calculation processing mode is ciphertext accumulation, and the operation method is point addition operation;

The first calculation subunit calculates and processes the plurality of ciphertext data based on the calculation instruction information to obtain ciphertext The processing results include: obtaining one ciphertext data from the multiple ciphertext data in turn, performing a dot addition operation with the previous dot addition result, determining whether the current cumulative number of additions meets the preset number of times, if so, outputting the last dot addition result as the ciphertext processing result, if not, saving the dot addition result to the first storage subunit.
A computing system, characterized in that it comprises a first computing device and a second computing device, wherein the first computing device comprises a first host processing component and an acceleration device according to any one of claims 1 to 10;

The second computing device includes a second host processing component and a second acceleration device; the second acceleration device includes a second storage component and at least one third acceleration component; the second storage component is connected to the second host processing component via a bus;

The second storage component is used to store a plurality of to-be-processed data sent by the second host processing component; the to-be-processed data is target data to be encrypted or a ciphertext processing result to be decrypted;

The third acceleration component is used to obtain at least one to-be-processed data from the second storage component; for any to-be-processed data, encrypt or decrypt the to-be-processed data to obtain a calculation result, and store the calculation result in the second storage component;

The second host processing component is used to obtain a calculation result corresponding to any data to be processed from the second storage component.
The system according to claim 11, characterized in that, when the data to be processed is target data to be encrypted, the target data is gradient information corresponding to the decision tree model calculated based on feature values and label data corresponding to the sample object provided by the data initiator;

or,

When the data to be processed is a ciphertext processing result to be decrypted, the data to be processed is specifically a ciphertext processing result to be decrypted obtained by calculation based on any feature provided by the data recipient, and the calculation processing result corresponding to the ciphertext processing result is a gradient accumulation value; then the second host processing component is also used to determine the optimal splitting point of the decision tree model based on the gradient accumulation values corresponding to multiple features.
A computing device, characterized in that it comprises a host processing component, a host storage component and an acceleration device as claimed in any one of claims 1 to 10.
An acceleration method, characterized in that it is applied to an acceleration device, the acceleration device comprising a first storage component, a first acceleration component connected to the first storage component, and a second acceleration component; the first storage component is connected to a first host processing component via a bus; wherein the first storage component stores a plurality of ciphertext data corresponding to a plurality of objects sent by the first host processing component; the method comprising:

Acquire the plurality of ciphertext data from the first storage component;

For any feature, the plurality of ciphertext data are bucketed to obtain a plurality of bucketed results;

The multiple bucket results are stored in the first storage component; the first acceleration component is used to obtain the multiple bucket results from the first storage component; the ciphertext data in the same bucket result is calculated and processed to obtain the ciphertext processing result; the ciphertext processing results corresponding to the multiple bucket results are respectively stored in the first storage component; the first storage component is used to provide the ciphertext processing results corresponding to the multiple bucket results to the first host processing component.