CN112328962A - Matrix operation optimization method, device and equipment and readable storage medium - Google Patents

Matrix operation optimization method, device and equipment and readable storage medium Download PDF

Info

Publication number
CN112328962A
CN112328962A CN202011357215.0A CN202011357215A CN112328962A CN 112328962 A CN112328962 A CN 112328962A CN 202011357215 A CN202011357215 A CN 202011357215A CN 112328962 A CN112328962 A CN 112328962A
Authority
CN
China
Prior art keywords
matrix
result
multiplication
matrix element
calculation engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011357215.0A
Other languages
Chinese (zh)
Other versions
CN112328962B (en
Inventor
董扬辉
王玮
胡水海
陈天健
黄启军
黄铭毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Shenzhen Zhixing Technology Co Ltd
Original Assignee
WeBank Co Ltd
Shenzhen Zhixing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd, Shenzhen Zhixing Technology Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011357215.0A priority Critical patent/CN112328962B/en
Publication of CN112328962A publication Critical patent/CN112328962A/en
Application granted granted Critical
Publication of CN112328962B publication Critical patent/CN112328962B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The application discloses a matrix operation optimization method, a device, equipment and a readable storage medium, wherein the matrix operation optimization method comprises the following steps: acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data, distributing the matrix elements to a preset calculation engine array, performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix, and taking the result matrix as a target matrix operation result if the accumulation frequency information corresponding to the result matrix is matched with the target matrix dimension information. The method and the device solve the technical problem of low calculation efficiency in matrix operation.

Description

Matrix operation optimization method, device and equipment and readable storage medium
Technical Field
The present application relates to the field of big data technology of privacy computing, and in particular, to a matrix operation optimization method, apparatus, device, and readable storage medium.
Background
With the continuous development of financial technologies, especially internet technology and finance, more and more technologies (such as distributed, Blockchain, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, such as higher requirements on the distribution of backlog of the financial industry.
With the continuous development of computer software and artificial intelligence, people have higher and higher requirements on the computing performance of a computer, currently, in a federal learning scene, multiplication operation between homomorphic encryption matrices is generally required to be involved, the operation complexity is higher, currently, when matrix multiplication is performed based on a CPU, the matrices are generally required to be firstly segmented to obtain a matrix segmentation result, and then data in the matrix segmentation result is read for matrix operation, but when the homomorphic encryption matrices are larger, the CPU can only read the data one by one for data operation, the data operation consumes longer time, so that the matrix operation consumes longer time, a large amount of memory copy is required during the matrix segmentation, the time consumption of the matrix operation is further increased, and the computing efficiency during the matrix operation is lower.
Disclosure of Invention
The present application mainly aims to provide a matrix operation optimization method, device, apparatus, and readable storage medium, and aims to solve the technical problem of low computation efficiency in matrix operation in the prior art.
In order to achieve the above object, the present application provides a matrix operation optimization method, which is applied to a matrix operation optimization device, and includes:
acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data, and distributing the matrix elements to a preset calculation engine array;
performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix;
and if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix, taking the result matrix as the operation result of the target matrix.
The application also provides a matrix operation optimizing device, the matrix operation optimizing device is virtual device, just the matrix operation optimizing device is applied to matrix operation optimizing equipment, the matrix operation optimizing device includes:
the distribution module is used for acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array;
the calculation module is used for performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix;
and the output control module is used for taking the result matrix as a target matrix operation result if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix.
The present application further provides a matrix operation optimization device, the matrix operation optimization device is an entity device, the matrix operation optimization device includes: a memory, a processor and a program of the matrix operation optimization method stored on the memory and executable on the processor, the program of the matrix operation optimization method being executable by the processor to implement the steps of the matrix operation optimization method as described above.
The present application also provides a readable storage medium having stored thereon a program for implementing a matrix operation optimization method, the program implementing the steps of the matrix operation optimization method as described above when executed by a processor.
Compared with the technical means of matrix multiplication based on a CPU (Central processing Unit) adopted in the prior art, the method, the device, the equipment and the readable storage medium for optimizing the matrix operation firstly acquire matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extract matrix elements from the matrix element sequence data and distribute the matrix elements to a preset calculation engine array, wherein the matrix elements of a matrix to be operated are converted into the matrix element sequence data in advance through sequencing before calculation, so that the division of the matrix is not needed before calculation, the time for copying a large amount of memory is further reduced, and the preset calculation engine array is used for carrying out element multiplication on each matrix element in parallel to obtain a matrix element multiplication result and carrying out element addition on the matrix element multiplication result, the result matrix can be obtained, namely, the multiplication between the matrix and the matrix is divided into the multiplication and the addition between the matrix elements for operation, so that the aim of carrying out the multiplication of a plurality of pairs of matrix elements in parallel by a preset calculation engine array mode is fulfilled, the time consumption of data operation is reduced, whether the result matrix is taken as the target matrix operation result is controlled by judging whether the accumulated number information of statistics is matched with the dimension information of the target matrix or not, the accuracy of the matrix operation result in the running-water operation process is ensured, the running-water type matrix operation process with extremely short time consumption is further realized, and the problems that when the homomorphic encryption matrix is large, the data operation consumes long time due to the fact that a CPU only can read the data one by one and carry out the data operation, and further the matrix operation consumes long time and needs to carry out a large amount of memory copy when the matrix is cut are overcome, further, the time consumption of matrix operation is increased, and the technical defect that the efficiency of matrix operation is low is caused, so that the calculation efficiency during matrix operation is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a matrix operation optimization method according to the present application;
FIG. 2 is a schematic diagram illustrating an interaction process between a CPU and a heterogeneous processor in an embodiment of the matrix operation optimization method of the present application;
FIG. 3 is a schematic flow chart diagram illustrating a second embodiment of the matrix operation optimization method of the present application;
fig. 4 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.
The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In a first embodiment of the matrix operation optimization method of the present application, referring to fig. 1, the matrix operation optimization method includes:
step S10, acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array;
in this embodiment, the matrix operation method is applied to a heterogeneous processor, such as an FPGA (Field Programmable Gate Array) or an ASIC, and can be used to perform multiplication operations between matrices, where the target matrix dimension information is a matrix dimension of a multiplication result matrix generated when performing matrix multiplication operations, and the matrix element sequence data is sequence data formed by alternately arranging matrix elements of two matrices performing multiplication operations, for example, it is assumed that a matrix a is (K11, K12, K21, K22), a matrix B is (C11, C12, C21, C22), and the matrix element sequence data is (K11, C11, K12, C12, K21, C21, K22, C22).
Acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data, and distributing the matrix elements to a preset calculation engine array, specifically, acquiring a first matrix to be calculated and a second matrix to be calculated, and alternately arranging each matrix element in the first matrix to be calculated and each matrix element in the second matrix to be calculated to obtain matrix element sequence data, wherein the matrix elements are basic elements constituting a matrix, specifically, numerical values on each bit in the matrix, and further, extracting matrix elements from the matrix element sequence data to be distributed to the preset calculation engine array step by step according to arrangement position information of each matrix element in the matrix element sequence data, for example, assuming that a matrix a is (K11, K12, K21, K22), B is (C11, c12, C21, C22), and the matrix element sequence data is (K11, C11, K12, C12, K21, C21, K22, C22), the preset calculation engine array includes 2 parallel multiplication calculation engines a and B, it can be set that, at the first distribution, K11 and C11 are distributed to a, K12 and C21 are distributed to B, at the second word distribution, K11 and C12 are distributed to a, K12 and C22 are distributed to B, at the third distribution, K21 and C11 are distributed to a, K22 and C21 are distributed to B, at the fourth distribution, K21 and C12 are distributed to a, K22 and C22 are distributed to B, and the matrix element distribution process when calculating the product of matrix a and matrix B is completed.
Further, in step S10, the step of acquiring the matrix element sequence data includes:
step S11, acquiring a first matrix to be operated and a second matrix to be operated;
in this embodiment, a first matrix to be operated and a second matrix to be operated are obtained, and specifically, the first matrix to be operated and the second matrix to be operated are extracted from a CPU memory.
Step S12, alternately arranging each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated to obtain the matrix element sequence data.
In this embodiment, each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated are alternately arranged to obtain the matrix element sequence data, and specifically, each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated are alternately arranged through a preset CPU end to obtain matrix element sequence data, and the matrix element sequence data is sent to the memory of the heterogeneous processor, where it is required to be noted that, since the matrix elements in both the first matrix to be operated and the second matrix to be operated are alternately arranged, the first matrix to be operated and the second matrix to be operated are directly sent to the memory of the heterogeneous processor in the form of matrix element sequence data, it is not necessary to read the matrix elements in the matrices one by one, thereby reducing the number of memory interactions and the time between the CPU end and the heterogeneous processor end, and further, the calculation efficiency in matrix operation is improved.
Further, in step S10, the preset calculation engine array at least includes a multiplication engine, the matrix element sequence data includes a first matrix element data and a second matrix element data,
the step of extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array comprises the following steps:
step A10, determining the position information of the sequence element to be calculated corresponding to each multiplication engine;
in this embodiment, it should be noted that the multiplication calculation engine is a calculation engine for calculating the product between matrix elements.
Determining position information of sequence elements to be calculated corresponding to each multiplication engine, specifically determining a matrix element position of a matrix element required by each multiplication engine for the current calculation in an original matrix, and further determining a sequence element position corresponding to the matrix element position corresponding to each multiplication engine in the matrix element sequence data, so as to obtain the position information of the sequence elements to be calculated corresponding to each multiplication engine.
Step a20, based on the position information of each sequence element to be calculated, selecting a matrix element from the first matrix element data and the second matrix element data, and distributing the matrix element to each multiplication engine.
In this embodiment, it should be noted that the sequence element position information to be calculated is a sequence position of an element required by the multiplication engine for the current calculation in the matrix element sequence data.
Based on the position information of each sequence element to be calculated, selecting a matrix element from the first matrix element data and the second matrix element data respectively and distributing the matrix element to each multiplication engine, specifically, based on the sequence position of the element required by each multiplication engine in the sequence data of the matrix element in the current calculation, selecting a matrix element from the first matrix element data respectively and selecting a matrix element from the second matrix element data respectively and distributing the matrix element to one multiplication engine, wherein, a multiplication engine needs a matrix element in the first matrix element data and a matrix element in the second matrix element data when performing multiplication, the first matrix element data is data corresponding to the first matrix to be operated on in the matrix element sequence data, the second matrix element data is data corresponding to the matrix element sequence data of the second matrix to be operated.
Step S20, element multiplication and element addition are carried out on each matrix element through the preset calculation engine array to obtain a result matrix;
in this embodiment, it should be noted that the preset calculation engine array includes a multiplication calculation engine array and an addition calculation engine array, where the multiplication calculation engine array includes at least one multiplication calculation engine, and the addition calculation engine array includes at least one addition calculation engine.
In this embodiment, element multiplication and element addition are performed on each matrix element through the preset calculation engine array to obtain a result matrix, specifically, a product between matrix elements received by each multiplication calculation engine is calculated in parallel to obtain a matrix element product output by each multiplication calculation engine, and then each matrix element product is distributed to an addition calculation engine to accumulate each matrix element product into a corresponding result matrix element to obtain a result matrix, where the result matrix is a matrix generated when a product between the first matrix to be calculated and the second matrix to be calculated is calculated, and after the calculation is completed, the result matrix is a product between the first matrix to be calculated and the second matrix to be calculated, and each result matrix element in the result matrix may be set to correspond to the multiplication calculation engine, so as to ensure that the matrix element product output by the multiplication engine can be accurately accumulated to the corresponding result matrix element.
Further, in step S20, the preset compute engine array includes a multiply compute engine array and an add compute engine array,
the step of obtaining a result matrix by performing element multiplication and element addition on each matrix element through the preset calculation engine array comprises:
step S21, based on the multiplication engine array, carrying out element multiplication on each matrix element in parallel to obtain a matrix element product result;
in this embodiment, it should be noted that the matrix element product result includes a matrix element product output by parallel computation of each of the multiplication engines.
And performing element multiplication on each matrix element in parallel based on the multiplication engine array to obtain a matrix element product result, specifically, performing parallel computation on products between matrix elements received by each multiplication engine in the multiplication engine array to obtain matrix element products output by each multiplication engine.
Step S22, serializing and outputting the matrix element product result to obtain a serialized output result;
in this embodiment, the matrix element product result is serialized and output, so as to obtain a serialized output result, specifically, each matrix element product obtained by parallel computation is serialized and output, so as to obtain each serialized matrix element product, that is, obtain a serialized output result.
Step S23, based on the addition engine array, respectively accumulating each element in the serialized output result into a corresponding result matrix element, so as to obtain the result matrix.
In this embodiment, it should be noted that the addition engine array at least includes an addition engine, and is configured to accumulate matrix element products calculated by the multiplication engine for multiple times until a product between the first matrix to be operated and the second matrix to be operated is obtained.
Based on the addition calculation engine array, respectively accumulating each element in the serialized output result into a corresponding result matrix element to obtain the result matrix, specifically, based on a product identifier corresponding to each matrix element product, respectively accumulating each matrix element product in the serialized output result into a corresponding result matrix element to obtain the result matrix, where the product identifier is an identifier identifying a correspondence between a matrix element product and a corresponding result matrix element, the product identifier includes a position identifier in the serialized output result and a tag identifier of a multiplication calculation engine, and it should be noted that, after the heterogeneous processor starts to calculate a matrix product between a first matrix to be calculated and a second matrix to be calculated, the calculation process of the matrix product will be performed in a pipelined manner, further, the multiplication engine continuously calculates the matrix element product, and further, the matrix element product is transmitted to the addition engine in a pipeline manner, and the addition engine continuously accumulates the matrix element product into the corresponding result matrix element until the matrix product is calculated.
And step S30, if the accumulation frequency information corresponding to the result matrix is matched with the dimension information of the target matrix, taking the result matrix as the operation result of the target matrix.
In this embodiment, it should be noted that the result matrix at least includes one result matrix element, the accumulation number information at least includes an accumulation number corresponding to the result matrix element, and the target matrix dimension information includes a target matrix dimension.
If the accumulation frequency information corresponding to the result matrix is matched with the dimension information of the target matrix, taking the result matrix as a target matrix operation result, specifically, reading the reading of a preset addition frequency counter corresponding to each result matrix element in real time to obtain the accumulation frequency of each result matrix element, wherein the accumulation frequency is the frequency of the addition calculation engine for performing the addition calculation on the result matrix element, and further respectively judging whether each accumulation frequency is consistent with the dimension of the target matrix, if so, the result matrix element corresponding to the accumulation frequency is proved to be completely calculated, and after each result matrix element is completely calculated, if a matrix operation ending identifier is detected, the result matrix is proved to be completely calculated, and then the result matrix is taken as the target matrix operation result, and if not, the calculation process is continuously executed, and continuously accumulating matrix element products in the result matrix elements in the result matrix until the accumulation times corresponding to the result matrix elements are consistent with the target matrix dimension.
Further, in step S30, the target matrix dimension information includes a target matrix dimension, the result matrix includes at least one result matrix element, the accumulation time information includes at least one accumulation time corresponding to the result matrix element,
if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix, the step of taking the result matrix as the operation result of the target matrix comprises the following steps:
step S31, reading the reading of a preset addition frequency counter in real time to obtain the corresponding accumulation frequency of each result matrix element;
in this embodiment, it should be noted that, an addition number counter is provided in the heterogeneous processor, and when the heterogeneous processor starts to calculate a matrix product between a first to-be-calculated matrix and a second to-be-calculated matrix, the addition number counter is started, a reading of the addition number counter is initially 0, and each time an addition calculation engine performs an addition, a reading of the addition number counter is increased by 1, the addition number information includes an addition number corresponding to all result matrix elements, and the target matrix dimension information is a target matrix dimension of a product matrix between the first to-be-calculated matrix and the second to-be-calculated matrix, where the target matrix dimension may be determined by a matrix dimension of the first to-be-calculated matrix and a matrix dimension of the second to-be-calculated matrix, for example, assuming that the first to-be-calculated matrix is a matrix of 2 x 2, and the dimension of the matrix is 2, the second matrix to be calculated is a2 x 2 matrix, and the dimension of the matrix is 2, so that the dimension of the target matrix is 2.
Step S32, judging whether each accumulated number is consistent with the dimension of the target matrix;
and step S33, if the two are consistent, the result matrix is used as the operation result of the target matrix.
In this embodiment, specifically, it is determined whether each of the accumulation times is consistent with the target matrix dimension, if so, it is verified that each of the result matrix elements has been calculated, and then after detecting a matrix operation end identifier, the result matrix is used as a target matrix operation result, and if not, the calculation process is continuously performed to continue accumulating matrix element products in the result matrix elements in the result matrix until each of the accumulation times is consistent with the target matrix dimension, where the matrix operation end identifier is a tag identifying a matrix element result and may be stored in an idle bit of data.
Additionally, in an implementation manner, as shown in fig. 2, an interaction process between a CPU end and a heterogeneous processor is schematically illustrated, where "CPU" is the CPU end, "task management" is a task management module, and is used to execute step S30, "distribution unit" is used to extract matrix elements from the matrix element sequence data and distribute the matrix elements to a preset calculation engine array, "calculation units 1" to "calculation unit n" are the multiplication calculation engine array, "merging unit" is used to serialize and output matrix element products output by each multiplication calculation engine, and "adding unit" is an addition calculation engine array.
Compared with the technical means of matrix multiplication based on a CPU (central processing unit) adopted in the prior art, the embodiment of the application firstly acquires matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracts matrix elements from the matrix element sequence data and distributes the matrix elements to a preset calculation engine array, wherein the matrix elements of a matrix to be calculated are converted into the matrix element sequence data in advance before calculation, so that the division of the matrix is not needed before calculation, the time for copying a large amount of memory is reduced, the preset calculation engine array performs element multiplication on each matrix element in parallel to obtain a matrix element multiplication result, and performs element addition on the matrix element multiplication result to obtain a result matrix, that is, the multiplication between the matrix and the matrix is divided into the multiplication and the addition between the matrix elements for operation, thereby realizing the purpose of performing the multiplication of a plurality of pairs of matrix elements in parallel by a preset calculation engine array mode, reducing the time consumption of data operation, further controlling whether the result matrix is taken as the target matrix operation result by judging whether the accumulated number information of statistics is matched with the dimension information of the target matrix, ensuring the accuracy of the matrix operation result in the running-type operation process, further realizing the running-type matrix operation process with extremely short time consumption, overcoming the defects that when the homomorphic encryption matrix is larger, the time consumption of the data operation is longer due to the fact that a CPU can only read the data one by one for data operation, and the time consumption of the matrix operation is longer due to the fact that a large amount of memory copy is needed during the matrix division, and further increasing the time consumption of the matrix operation, further, the technical defect of low efficiency of matrix operation is caused, so that the calculation efficiency in matrix operation is improved.
Further, referring to fig. 3, in another embodiment of the present application, based on the first embodiment of the present application, the matrix elements comprise dense state matrix elements,
the step of obtaining a result matrix by performing element multiplication and element addition on each matrix element through the preset calculation engine array comprises:
step B10, converting each dense state matrix element to a Montgomery domain, and performing modular exponentiation operation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix;
in this embodiment, it should be noted that the dense matrix elements are matrix elements in a homomorphic encryption matrix, the first matrix to be operated is a first dense matrix encrypted homomorphically, the second matrix to be operated is a second dense matrix encrypted homomorphically, and the dense matrix elements are matrix elements of the first dense matrix and the second dense matrix.
In addition, it should be noted that, by converting the calculation in the real number domain into the calculation in the montgomery domain, the calculation in the large data bit width can be converted into the calculation in the multiple small data bit widths, which can reduce the complexity of the data operation, and for the same modular multiplication calculation, the time consumed in the montgomery domain is less than the time consumed in the real number domain, thereby reducing the time consumed by the data operation and further improving the efficiency of the data operation.
Converting each dense state matrix element into a Montgomery domain, performing modular exponentiation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix, and specifically, generating Montgomery parameters, wherein the Montgomery parameters are used for converting data from a real number domain into the Montgomery domain, and the generation method of the Montgomery parameters is known and not described herein, and further converting element multiplication between each dense state matrix element into modular exponentiation operation on the modular exponentiation calculation engine array in the preset calculation engine array and executing in parallel based on the Montgomery parameters to obtain a modular exponentiation result corresponding to each dense state matrix element, wherein the modular exponentiation result is a multiplication result of the dense state matrix elements corresponding to each dense state matrix element in the Montgomery domain, and further performing the Montgomery removal on the modular exponentiation result, the modular exponentiation result is converted from the Montgomery domain to the real number domain to obtain homomorphic encrypted products among the dense state matrix elements which are output in parallel, and the product of each dense state matrix element is obtained, and then the addition operation of accumulating the product of each dense state matrix element to the corresponding result matrix element is converted to the Montgomery domain for modular multiplication operation to obtain a modular multiplication result, namely, the Montgomery domain result matrix is obtained.
Further, in step B10, the preset calculation engine array comprises a modular exponentiation engine array and a modular multiplication engine array,
the step of converting each dense matrix element into a Montgomery domain, and performing modular exponentiation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix comprises:
step B11, converting each dense state matrix element to Montgomery domain, and performing modular exponentiation operation in parallel through the modular exponentiation engine array to obtain a modular exponentiation operation result;
in this embodiment, it should be noted that the modular exponentiation engine array at least includes a modular exponentiation engine for performing modular exponentiation, and the secret state matrix element is a matrix element in a homomorphic encryption state.
Converting each dense state matrix element into a Montgomery domain, performing modular exponentiation in parallel through the modular exponentiation engine array to obtain a modular exponentiation result, specifically, receiving a corresponding dense state matrix element group through each modular exponentiation engine, where the dense state matrix element group includes at least 2 dense state matrix elements, converting a multiplication operation of a dense state matrix element product between each matrix element in a homomorphic encryption state in the dense state matrix element group into a Montgomery domain based on the Montgomery parameters, performing modular exponentiation in parallel through each modular exponentiation engine to obtain a modular exponentiation value output by each modular exponentiation engine, and taking each modular exponentiation value as the modular exponentiation result.
Step B12, performing Montgomerization on the modular exponentiation result to convert the modular exponentiation result into a real number domain to obtain a dense state matrix element product result;
in this embodiment, the modular exponentiation result is demomontgomery-transformed to convert the modular exponentiation result into a real number domain to obtain a dense state matrix element product result, specifically, each modular exponentiation value in the modular exponentiation result is demomontgomery-transformed to convert each modular exponentiation value into a real number domain to obtain a dense state matrix element product corresponding to each modular exponentiation value, and each dense state matrix element product is serialized and output to obtain the dense state matrix element product result, where a conversion formula of multiplication operation for calculating the dense state matrix element product and corresponding modular exponentiation operation is as follows:
Figure BDA0002802933670000111
wherein [ [ K ]1*C1]]Is the product of the elements of the secret state matrix, i.e. the product of the matrix elements of the encryption matrix in the same state, where K1Matrix elements in the clear text state, C1For encrypting matrix elements of the matrix homomorphically, i.e. for the cipher text C1
Figure BDA0002802933670000112
For the modular exponentiation calculation formula in the Montgomery domain, N is the homomorphic encryption key, i.e., the Montgomery parameter.
And step B13, converting the dense state matrix element product result to a Montgomery domain, and performing modular multiplication operation through the modular multiplication engine array to obtain a Montgomery domain result matrix.
In this embodiment, it should be noted that the dense state matrix element product is a matrix element product in a homomorphic encryption state.
Converting the result of the dense state matrix element multiplication to a montgomery domain, performing modular multiplication operation through the modular multiplication engine array to obtain a montgomery domain result matrix, specifically, receiving each dense state matrix element multiplication through the modular multiplication engine array, and respectively accumulating each matrix element multiplication in a homomorphic encryption state to the addition operation of the corresponding result matrix element to convert the addition operation of the corresponding result matrix element to the modular multiplication operation in the montgomery domain based on the montgomery parameter to obtain the montgomery domain result matrix, wherein a conversion formula for calculating the addition operation and the corresponding modular multiplication operation of the matrix element multiplication and the result matrix element in the homomorphic encryption state is as follows:
[[C1+C2]]=(C1*C2)modN
wherein [ [ C ]1+C2]]Multiplication of matrix elements for homomorphic encryption and sum between resultant matrix elements for homomorphic encryption, where C1Is the product of the elements of the dense state matrix, C2Is a result matrix element of a dense state, i.e. of a result matrix of a homomorphic encryption state, C1And C2To be in the ciphertext state, (C)1*C2mod N) is a modular multiplication formula in a montgomery domain, and N is the homomorphic encryption key, i.e., the montgomery parameter.
And step B20, performing Montgomerization on the Montgomery domain result matrix to obtain the result matrix.
In this embodiment, the montgomery domain result matrix is subjected to a montgomery transformation to obtain the result matrix, and specifically, the montgomery domain result matrix is subjected to a montgomery transformation to convert the montgomery domain result matrix into a real number domain to obtain the result matrix, where the result matrix is a matrix in a homomorphic encryption state.
The embodiment provides a method for performing matrix operation in a montgomery domain, that is, after extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array, converting each dense-state matrix element into the montgomery domain and performing modular exponentiation and modular multiplication by the preset calculation engine array to obtain a montgomery domain result matrix, and further performing demomontgomery transformation on the montgomery domain result matrix to obtain the result matrix, wherein since the calculation performed in the montgomery domain can convert a large data bit width calculation process in a real number domain into a plurality of small data calculation processes in the montgomery domain, the complexity of data calculation can be reduced, and the efficiency of matrix operation can be improved, and since the multiplication between matrices is split into the modular exponentiation operation and the modular multiplication operation in the montgomery domain, the purpose of performing matrix operation in a parallel mode by a preset calculation engine array mode is realized, the time consumption of data operation is reduced, and further controlling whether the result matrix is used as the target matrix operation result by judging whether the counted accumulated times information is matched with the dimension information of the target matrix or not so as to ensure the accuracy of the matrix operation result in the running water type operation process, further realizes the flow type matrix operation process with extremely short time consumption, and in order to overcome the defect that when the homomorphic encryption matrix is larger, the CPU can only read data one by one to carry out data operation, the time consumption of data operation is long, so that the time consumption of matrix operation is long, and a large amount of memory copy is needed during matrix segmentation, so that the time consumption of matrix operation is further increased, and a foundation is laid for the technical defect that the efficiency of matrix operation is low.
Referring to fig. 4, fig. 4 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.
As shown in fig. 4, the matrix operation optimizing device may include: a processor 1001, such as a heterogeneous processor, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Optionally, the matrix operation optimization device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
It will be appreciated by those skilled in the art that the matrix operation optimization device configuration shown in fig. 4 does not constitute a limitation of the matrix operation optimization device, and may include more or less components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 4, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and a matrix operation optimization program. The operating system is a program for managing and controlling hardware and software resources of the matrix operation optimization device, and supports the operation of the matrix operation optimization program and other software and/or programs. The network communication module is used for realizing communication among the components in the memory 1005 and communication with other hardware and software in the matrix operation optimization system.
In the matrix operation optimization apparatus shown in fig. 4, the processor 1001 is configured to execute a matrix operation optimization program stored in the memory 1005, and implement the steps of the matrix operation optimization method according to any one of the above.
The specific implementation of the matrix operation optimization device of the present application is substantially the same as that of each embodiment of the matrix operation optimization method, and is not described herein again.
The embodiment of the present application further provides a matrix operation optimization device, where the matrix operation optimization device is applied to a matrix operation optimization device, and the matrix operation optimization device includes:
the distribution module is used for acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array;
the calculation module is used for performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix;
and the output control module is used for taking the result matrix as a target matrix operation result if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix.
Optionally, the distribution module is further configured to:
acquiring a first matrix to be operated and a second matrix to be operated;
and alternately arranging each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated to obtain the matrix element sequence data.
Optionally, the output control module is further configured to:
reading the reading of a preset addition frequency counter in real time to obtain the corresponding accumulation frequency of each result matrix element;
judging whether the accumulated times are consistent with the dimensionality of the target matrix or not;
and if the two are consistent, taking the result matrix as the operation result of the target matrix.
Optionally, the computing module is further configured to:
based on the multiplication engine array, carrying out element multiplication on each matrix element in parallel to obtain a matrix element product result;
serializing and outputting the matrix element product result to obtain a serialized output result;
and respectively accumulating each element in the serialized output result to a corresponding result matrix element based on the addition calculation engine array to obtain the result matrix.
Optionally, the distribution module is further configured to:
determining the position information of the sequence element to be calculated corresponding to each multiplication engine;
and based on the position information of each sequence element to be calculated, selecting a matrix element from the first matrix element data and the second matrix element data respectively and distributing the matrix element to each multiplication engine.
Optionally, the computing module is further configured to:
converting each dense matrix element into a Montgomery domain, and performing modular exponentiation operation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix;
and carrying out Montgomerization removal on the Montgomery domain result matrix to obtain the result matrix.
Optionally, the computing module is further configured to:
converting each dense-state matrix element into a Montgomery domain, and performing modular exponentiation operation in parallel through the modular exponentiation engine array to obtain a modular exponentiation operation result;
performing Montgomerization on the modular exponentiation operation result to convert the modular exponentiation operation result into a real number domain to obtain a dense-state matrix element product result;
and converting the dense matrix element product result into a Montgomery domain, and performing modular multiplication operation through the modular multiplication engine array to obtain a Montgomery domain result matrix.
The specific implementation of the matrix operation optimization device of the present application is substantially the same as that of each embodiment of the matrix operation optimization method, and is not described herein again.
The embodiment of the present application provides a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs are further executable by one or more processors for implementing the steps of the matrix operation optimization method described in any one of the above.
The specific implementation of the readable storage medium of the present application is substantially the same as that of each embodiment of the above matrix operation optimization method, and is not described herein again.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (10)

1. A matrix operation optimization method is characterized by comprising the following steps:
acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data, and distributing the matrix elements to a preset calculation engine array;
performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix;
and if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix, taking the result matrix as the operation result of the target matrix.
2. The method for optimizing matrix operations according to claim 1, wherein the step of obtaining the sequence data of the matrix elements comprises:
acquiring a first matrix to be operated and a second matrix to be operated;
and alternately arranging each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated to obtain the matrix element sequence data.
3. The method for optimizing matrix operations according to claim 1, wherein the information on the target matrix dimensions includes the target matrix dimensions, the result matrix includes at least one result matrix element, the information on the number of accumulated times includes at least one accumulated time corresponding to the result matrix element,
if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix, the step of taking the result matrix as the operation result of the target matrix comprises the following steps:
reading the reading of a preset addition frequency counter in real time to obtain the corresponding accumulation frequency of each result matrix element;
judging whether the accumulated times are consistent with the dimensionality of the target matrix or not;
and if the two are consistent, taking the result matrix as the operation result of the target matrix.
4. The method for optimizing matrix operations according to claim 1, wherein the predetermined calculation engine arrays include a multiplication calculation engine array and an addition calculation engine array,
the step of obtaining a result matrix by performing element multiplication and element addition on each matrix element through the preset calculation engine array comprises:
based on the multiplication engine array, carrying out element multiplication on each matrix element in parallel to obtain a matrix element product result;
serializing and outputting the matrix element product result to obtain a serialized output result;
and respectively accumulating each element in the serialized output result to a corresponding result matrix element based on the addition calculation engine array to obtain the result matrix.
5. The method for optimizing matrix operations according to claim 1, wherein the predetermined calculation engine array comprises at least one multiplication engine, the matrix element sequence data comprises a first matrix element data and a second matrix element data,
the step of extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array comprises the following steps:
determining the position information of the sequence element to be calculated corresponding to each multiplication engine;
and based on the position information of each sequence element to be calculated, selecting a matrix element from the first matrix element data and the second matrix element data respectively and distributing the matrix element to each multiplication engine.
6. The method for optimizing matrix operations of claim 1, wherein the matrix elements comprise dense state matrix elements,
the step of obtaining a result matrix by performing element multiplication and element addition on each matrix element through the preset calculation engine array comprises:
converting each dense matrix element into a Montgomery domain, and performing modular exponentiation operation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix;
and carrying out Montgomerization removal on the Montgomery domain result matrix to obtain the result matrix.
7. The method for optimizing matrix operations according to claim 6, wherein the predetermined calculation engine arrays include an array of modular exponentiation engines and an array of modular multiplication engines,
the step of converting each dense matrix element into a Montgomery domain, and performing modular exponentiation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix comprises:
converting each dense-state matrix element into a Montgomery domain, and performing modular exponentiation operation in parallel through the modular exponentiation engine array to obtain a modular exponentiation operation result;
performing Montgomerization on the modular exponentiation operation result to convert the modular exponentiation operation result into a real number domain to obtain a dense-state matrix element product result;
and converting the dense matrix element product result into a Montgomery domain, and performing modular multiplication operation through the modular multiplication engine array to obtain a Montgomery domain result matrix.
8. A matrix operation optimization device, characterized in that the matrix operation optimization device comprises:
the distribution module is used for acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array;
the calculation module is used for performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix;
and the output control module is used for taking the result matrix as a target matrix operation result if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix.
9. A matrix operation optimizing device, characterized by comprising: a memory, a processor, and a program stored on the memory for implementing the matrix operation optimization method,
the memory is used for storing a program for realizing the matrix operation optimization method;
the processor is configured to execute a program for implementing the matrix operation optimization method to implement the steps of the matrix operation optimization method according to any one of claims 1 to 7.
10. A readable storage medium having stored thereon a program for implementing a matrix operation optimization method, the program being executed by a processor to implement the steps of the matrix operation optimization method according to any one of claims 1 to 7.
CN202011357215.0A 2020-11-27 2020-11-27 Matrix operation optimization method, device and equipment and readable storage medium Active CN112328962B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011357215.0A CN112328962B (en) 2020-11-27 2020-11-27 Matrix operation optimization method, device and equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011357215.0A CN112328962B (en) 2020-11-27 2020-11-27 Matrix operation optimization method, device and equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN112328962A true CN112328962A (en) 2021-02-05
CN112328962B CN112328962B (en) 2021-12-31

Family

ID=74308657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011357215.0A Active CN112328962B (en) 2020-11-27 2020-11-27 Matrix operation optimization method, device and equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112328962B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204502A (en) * 2021-04-20 2021-08-03 深圳致星科技有限公司 Heterogeneous accelerated computing optimization method, device and equipment and readable storage medium
CN113296733A (en) * 2021-04-25 2021-08-24 阿里巴巴新加坡控股有限公司 Data processing method and device
CN113485798A (en) * 2021-06-16 2021-10-08 曙光信息产业(北京)有限公司 Kernel function generation method, apparatus, device and storage medium
CN116107636A (en) * 2023-04-06 2023-05-12 之江实验室 Hardware acceleration method and device, storage medium and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1525307A (en) * 2003-02-26 2004-09-01 上海华园微电子技术有限公司 Modulus multiply operation circuit and encrypt method of applying said modulus multiply operation circuit
CN1702613A (en) * 2004-03-02 2005-11-30 三星电子株式会社 Montgomery modular multiplier
CN101136882A (en) * 2006-10-25 2008-03-05 中兴通讯股份有限公司 Wireless communication baseband processed system matrix computing method and device
CN104462023A (en) * 2014-12-31 2015-03-25 合一网络技术(北京)有限公司 Super-large scale sparse matrix multiplication method based on mapreduce frame
CN108369666A (en) * 2015-11-26 2018-08-03 福满代谢组技术有限公司 Data analysis set-up, method and program
CN110188869A (en) * 2019-05-05 2019-08-30 北京中科汇成科技有限公司 A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating
CN110377876A (en) * 2019-07-19 2019-10-25 广东省新一代通信与网络创新研究院 Matrix multiplication operation method, apparatus and computer readable storage medium
CN109885406B (en) * 2019-02-27 2020-01-24 上海燧原智能科技有限公司 Operator calculation optimization method, device, equipment and storage medium
CN111316261A (en) * 2017-11-01 2020-06-19 苹果公司 Matrix calculation engine
CN111339490A (en) * 2020-02-18 2020-06-26 三星(中国)半导体有限公司 Matrix multiplication computing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080874B (en) * 2004-12-15 2012-11-14 日本电气株式会社 Error correction encoding apparatus and error correction encoding method used therein
CN101169743A (en) * 2007-11-27 2008-04-30 南京大学 Method for implementing parallel power flow calculation based on multi-core computer in electric grid
US8626815B1 (en) * 2008-07-14 2014-01-07 Altera Corporation Configuring a programmable integrated circuit device to perform matrix multiplication
CN109445752B (en) * 2018-10-10 2019-10-15 西安交通大学 A kind of system of parallel computation
US10838851B2 (en) * 2019-02-28 2020-11-17 International Business Machines Corporation Multi-dimensional accesses in memory

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1525307A (en) * 2003-02-26 2004-09-01 上海华园微电子技术有限公司 Modulus multiply operation circuit and encrypt method of applying said modulus multiply operation circuit
CN1702613A (en) * 2004-03-02 2005-11-30 三星电子株式会社 Montgomery modular multiplier
CN101136882A (en) * 2006-10-25 2008-03-05 中兴通讯股份有限公司 Wireless communication baseband processed system matrix computing method and device
CN104462023A (en) * 2014-12-31 2015-03-25 合一网络技术(北京)有限公司 Super-large scale sparse matrix multiplication method based on mapreduce frame
CN108369666A (en) * 2015-11-26 2018-08-03 福满代谢组技术有限公司 Data analysis set-up, method and program
CN111316261A (en) * 2017-11-01 2020-06-19 苹果公司 Matrix calculation engine
CN109885406B (en) * 2019-02-27 2020-01-24 上海燧原智能科技有限公司 Operator calculation optimization method, device, equipment and storage medium
CN110188869A (en) * 2019-05-05 2019-08-30 北京中科汇成科技有限公司 A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating
CN110377876A (en) * 2019-07-19 2019-10-25 广东省新一代通信与网络创新研究院 Matrix multiplication operation method, apparatus and computer readable storage medium
CN111339490A (en) * 2020-02-18 2020-06-26 三星(中国)半导体有限公司 Matrix multiplication computing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曾咏涛: "基于YHFT-Matrix的MIMO-OFDM系统关键算法的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(月刊)》 *
梁阳,: "基于Spark的大规模矩阵算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(月刊 )》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204502A (en) * 2021-04-20 2021-08-03 深圳致星科技有限公司 Heterogeneous accelerated computing optimization method, device and equipment and readable storage medium
CN113296733A (en) * 2021-04-25 2021-08-24 阿里巴巴新加坡控股有限公司 Data processing method and device
WO2022228222A1 (en) * 2021-04-25 2022-11-03 阿里巴巴(中国)有限公司 Data processing method and apparatus
CN113485798A (en) * 2021-06-16 2021-10-08 曙光信息产业(北京)有限公司 Kernel function generation method, apparatus, device and storage medium
CN113485798B (en) * 2021-06-16 2023-10-31 曙光信息产业(北京)有限公司 Nuclear function generation method, device, equipment and storage medium
CN116107636A (en) * 2023-04-06 2023-05-12 之江实验室 Hardware acceleration method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112328962B (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN112328962B (en) Matrix operation optimization method, device and equipment and readable storage medium
US11698773B2 (en) Accelerated mathematical engine
Roy et al. FPGA-based high-performance parallel architecture for homomorphic computing on encrypted data
US11451370B2 (en) Secure probabilistic analytics using an encrypted analytics matrix
CN109155763B (en) Digital signal processing on data stream
EP3330880B1 (en) Secure computation system, secure computation device, secure computation method, and program
Massolino et al. A compact and scalable hardware/software co-design of SIKE
Chen et al. Resource-efficient FPGA architecture and implementation of Hough transform
CN113177225B (en) Block chain-based data storage certification method, device, equipment and storage medium
Çavuşoğlu et al. A novel parallel image encryption algorithm based on chaos
CN113627085A (en) Method, apparatus, medium, and program product for optimizing horizontal federated learning modeling
Fang et al. Secure function evaluation using an fpga overlay architecture
KR20220118560A (en) Resource management and control method and apparatus, device and storage medium
Longa et al. The cost to break SIKE: A comparative hardware-based analysis with AES and SHA-3
Chaharlang et al. A novel quantum audio steganography–steganalysis approach using LSFQ-based embedding and QKNN-based classifier
CN112286752A (en) Algorithm verification method and system for federated learning heterogeneous processing system
Wang et al. HE-Booster: an efficient polynomial arithmetic acceleration on GPUs for fully homomorphic encryption
US10454680B2 (en) RSA decryption processor and method for controlling RSA decryption processor
Liao et al. Efficient privacy-preserving outsourcing of large-scale convex separable programming for smart cities
Henry et al. Solving discrete logarithms in smooth-order groups with CUDA
CN116488788A (en) Hardware accelerator of full homomorphic encryption algorithm, homomorphic encryption method and electronic equipment
CN116306030A (en) New energy prediction dynamic scene generation method considering prediction error and fluctuation distribution
CN112149834A (en) Model training method, device, equipment and medium
CN114036581A (en) Privacy calculation method based on neural network model
Brumley et al. Batch binary weierstrass

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant