CN112328962A

CN112328962A - Matrix operation optimization method, device and equipment and readable storage medium

Info

Publication number: CN112328962A
Application number: CN202011357215.0A
Authority: CN
Inventors: 董扬辉; 王玮; 胡水海; 陈天健; 黄启军; 黄铭毅
Original assignee: WeBank Co Ltd; Shenzhen Zhixing Technology Co Ltd
Current assignee: WeBank Co Ltd; Shenzhen Zhixing Technology Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-02-05
Anticipated expiration: 2040-11-27
Also published as: CN112328962B

Abstract

The application discloses a matrix operation optimization method, a device, equipment and a readable storage medium, wherein the matrix operation optimization method comprises the following steps: acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data, distributing the matrix elements to a preset calculation engine array, performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix, and taking the result matrix as a target matrix operation result if the accumulation frequency information corresponding to the result matrix is matched with the target matrix dimension information. The method and the device solve the technical problem of low calculation efficiency in matrix operation.

Description

Matrix operation optimization method, device and equipment and readable storage medium

Technical Field

The present application relates to the field of big data technology of privacy computing, and in particular, to a matrix operation optimization method, apparatus, device, and readable storage medium.

Background

With the continuous development of financial technologies, especially internet technology and finance, more and more technologies (such as distributed, Blockchain, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, such as higher requirements on the distribution of backlog of the financial industry.

With the continuous development of computer software and artificial intelligence, people have higher and higher requirements on the computing performance of a computer, currently, in a federal learning scene, multiplication operation between homomorphic encryption matrices is generally required to be involved, the operation complexity is higher, currently, when matrix multiplication is performed based on a CPU, the matrices are generally required to be firstly segmented to obtain a matrix segmentation result, and then data in the matrix segmentation result is read for matrix operation, but when the homomorphic encryption matrices are larger, the CPU can only read the data one by one for data operation, the data operation consumes longer time, so that the matrix operation consumes longer time, a large amount of memory copy is required during the matrix segmentation, the time consumption of the matrix operation is further increased, and the computing efficiency during the matrix operation is lower.

Disclosure of Invention

The present application mainly aims to provide a matrix operation optimization method, device, apparatus, and readable storage medium, and aims to solve the technical problem of low computation efficiency in matrix operation in the prior art.

In order to achieve the above object, the present application provides a matrix operation optimization method, which is applied to a matrix operation optimization device, and includes:

acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data, and distributing the matrix elements to a preset calculation engine array;

performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix;

and if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix, taking the result matrix as the operation result of the target matrix.

The application also provides a matrix operation optimizing device, the matrix operation optimizing device is virtual device, just the matrix operation optimizing device is applied to matrix operation optimizing equipment, the matrix operation optimizing device includes:

the distribution module is used for acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array;

the calculation module is used for performing element multiplication and element addition on each matrix element through the preset calculation engine array to obtain a result matrix;

and the output control module is used for taking the result matrix as a target matrix operation result if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix.

The present application further provides a matrix operation optimization device, the matrix operation optimization device is an entity device, the matrix operation optimization device includes: a memory, a processor and a program of the matrix operation optimization method stored on the memory and executable on the processor, the program of the matrix operation optimization method being executable by the processor to implement the steps of the matrix operation optimization method as described above.

The present application also provides a readable storage medium having stored thereon a program for implementing a matrix operation optimization method, the program implementing the steps of the matrix operation optimization method as described above when executed by a processor.

Compared with the technical means of matrix multiplication based on a CPU (Central processing Unit) adopted in the prior art, the method, the device, the equipment and the readable storage medium for optimizing the matrix operation firstly acquire matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extract matrix elements from the matrix element sequence data and distribute the matrix elements to a preset calculation engine array, wherein the matrix elements of a matrix to be operated are converted into the matrix element sequence data in advance through sequencing before calculation, so that the division of the matrix is not needed before calculation, the time for copying a large amount of memory is further reduced, and the preset calculation engine array is used for carrying out element multiplication on each matrix element in parallel to obtain a matrix element multiplication result and carrying out element addition on the matrix element multiplication result, the result matrix can be obtained, namely, the multiplication between the matrix and the matrix is divided into the multiplication and the addition between the matrix elements for operation, so that the aim of carrying out the multiplication of a plurality of pairs of matrix elements in parallel by a preset calculation engine array mode is fulfilled, the time consumption of data operation is reduced, whether the result matrix is taken as the target matrix operation result is controlled by judging whether the accumulated number information of statistics is matched with the dimension information of the target matrix or not, the accuracy of the matrix operation result in the running-water operation process is ensured, the running-water type matrix operation process with extremely short time consumption is further realized, and the problems that when the homomorphic encryption matrix is large, the data operation consumes long time due to the fact that a CPU only can read the data one by one and carry out the data operation, and further the matrix operation consumes long time and needs to carry out a large amount of memory copy when the matrix is cut are overcome, further, the time consumption of matrix operation is increased, and the technical defect that the efficiency of matrix operation is low is caused, so that the calculation efficiency during matrix operation is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a matrix operation optimization method according to the present application;

FIG. 2 is a schematic diagram illustrating an interaction process between a CPU and a heterogeneous processor in an embodiment of the matrix operation optimization method of the present application;

FIG. 3 is a schematic flow chart diagram illustrating a second embodiment of the matrix operation optimization method of the present application;

fig. 4 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.

The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In a first embodiment of the matrix operation optimization method of the present application, referring to fig. 1, the matrix operation optimization method includes:

step S10, acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array;

in this embodiment, the matrix operation method is applied to a heterogeneous processor, such as an FPGA (Field Programmable Gate Array) or an ASIC, and can be used to perform multiplication operations between matrices, where the target matrix dimension information is a matrix dimension of a multiplication result matrix generated when performing matrix multiplication operations, and the matrix element sequence data is sequence data formed by alternately arranging matrix elements of two matrices performing multiplication operations, for example, it is assumed that a matrix a is (K11, K12, K21, K22), a matrix B is (C11, C12, C21, C22), and the matrix element sequence data is (K11, C11, K12, C12, K21, C21, K22, C22).

Acquiring matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracting matrix elements from the matrix element sequence data, and distributing the matrix elements to a preset calculation engine array, specifically, acquiring a first matrix to be calculated and a second matrix to be calculated, and alternately arranging each matrix element in the first matrix to be calculated and each matrix element in the second matrix to be calculated to obtain matrix element sequence data, wherein the matrix elements are basic elements constituting a matrix, specifically, numerical values on each bit in the matrix, and further, extracting matrix elements from the matrix element sequence data to be distributed to the preset calculation engine array step by step according to arrangement position information of each matrix element in the matrix element sequence data, for example, assuming that a matrix a is (K11, K12, K21, K22), B is (C11, c12, C21, C22), and the matrix element sequence data is (K11, C11, K12, C12, K21, C21, K22, C22), the preset calculation engine array includes 2 parallel multiplication calculation engines a and B, it can be set that, at the first distribution, K11 and C11 are distributed to a, K12 and C21 are distributed to B, at the second word distribution, K11 and C12 are distributed to a, K12 and C22 are distributed to B, at the third distribution, K21 and C11 are distributed to a, K22 and C21 are distributed to B, at the fourth distribution, K21 and C12 are distributed to a, K22 and C22 are distributed to B, and the matrix element distribution process when calculating the product of matrix a and matrix B is completed.

Further, in step S10, the step of acquiring the matrix element sequence data includes:

step S11, acquiring a first matrix to be operated and a second matrix to be operated;

in this embodiment, a first matrix to be operated and a second matrix to be operated are obtained, and specifically, the first matrix to be operated and the second matrix to be operated are extracted from a CPU memory.

Step S12, alternately arranging each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated to obtain the matrix element sequence data.

In this embodiment, each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated are alternately arranged to obtain the matrix element sequence data, and specifically, each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated are alternately arranged through a preset CPU end to obtain matrix element sequence data, and the matrix element sequence data is sent to the memory of the heterogeneous processor, where it is required to be noted that, since the matrix elements in both the first matrix to be operated and the second matrix to be operated are alternately arranged, the first matrix to be operated and the second matrix to be operated are directly sent to the memory of the heterogeneous processor in the form of matrix element sequence data, it is not necessary to read the matrix elements in the matrices one by one, thereby reducing the number of memory interactions and the time between the CPU end and the heterogeneous processor end, and further, the calculation efficiency in matrix operation is improved.

Further, in step S10, the preset calculation engine array at least includes a multiplication engine, the matrix element sequence data includes a first matrix element data and a second matrix element data,

the step of extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array comprises the following steps:

step A10, determining the position information of the sequence element to be calculated corresponding to each multiplication engine;

in this embodiment, it should be noted that the multiplication calculation engine is a calculation engine for calculating the product between matrix elements.

Determining position information of sequence elements to be calculated corresponding to each multiplication engine, specifically determining a matrix element position of a matrix element required by each multiplication engine for the current calculation in an original matrix, and further determining a sequence element position corresponding to the matrix element position corresponding to each multiplication engine in the matrix element sequence data, so as to obtain the position information of the sequence elements to be calculated corresponding to each multiplication engine.

Step a20, based on the position information of each sequence element to be calculated, selecting a matrix element from the first matrix element data and the second matrix element data, and distributing the matrix element to each multiplication engine.

In this embodiment, it should be noted that the sequence element position information to be calculated is a sequence position of an element required by the multiplication engine for the current calculation in the matrix element sequence data.

Based on the position information of each sequence element to be calculated, selecting a matrix element from the first matrix element data and the second matrix element data respectively and distributing the matrix element to each multiplication engine, specifically, based on the sequence position of the element required by each multiplication engine in the sequence data of the matrix element in the current calculation, selecting a matrix element from the first matrix element data respectively and selecting a matrix element from the second matrix element data respectively and distributing the matrix element to one multiplication engine, wherein, a multiplication engine needs a matrix element in the first matrix element data and a matrix element in the second matrix element data when performing multiplication, the first matrix element data is data corresponding to the first matrix to be operated on in the matrix element sequence data, the second matrix element data is data corresponding to the matrix element sequence data of the second matrix to be operated.

Step S20, element multiplication and element addition are carried out on each matrix element through the preset calculation engine array to obtain a result matrix;

in this embodiment, it should be noted that the preset calculation engine array includes a multiplication calculation engine array and an addition calculation engine array, where the multiplication calculation engine array includes at least one multiplication calculation engine, and the addition calculation engine array includes at least one addition calculation engine.

In this embodiment, element multiplication and element addition are performed on each matrix element through the preset calculation engine array to obtain a result matrix, specifically, a product between matrix elements received by each multiplication calculation engine is calculated in parallel to obtain a matrix element product output by each multiplication calculation engine, and then each matrix element product is distributed to an addition calculation engine to accumulate each matrix element product into a corresponding result matrix element to obtain a result matrix, where the result matrix is a matrix generated when a product between the first matrix to be calculated and the second matrix to be calculated is calculated, and after the calculation is completed, the result matrix is a product between the first matrix to be calculated and the second matrix to be calculated, and each result matrix element in the result matrix may be set to correspond to the multiplication calculation engine, so as to ensure that the matrix element product output by the multiplication engine can be accurately accumulated to the corresponding result matrix element.

Further, in step S20, the preset compute engine array includes a multiply compute engine array and an add compute engine array,

the step of obtaining a result matrix by performing element multiplication and element addition on each matrix element through the preset calculation engine array comprises:

step S21, based on the multiplication engine array, carrying out element multiplication on each matrix element in parallel to obtain a matrix element product result;

in this embodiment, it should be noted that the matrix element product result includes a matrix element product output by parallel computation of each of the multiplication engines.

And performing element multiplication on each matrix element in parallel based on the multiplication engine array to obtain a matrix element product result, specifically, performing parallel computation on products between matrix elements received by each multiplication engine in the multiplication engine array to obtain matrix element products output by each multiplication engine.

Step S22, serializing and outputting the matrix element product result to obtain a serialized output result;

in this embodiment, the matrix element product result is serialized and output, so as to obtain a serialized output result, specifically, each matrix element product obtained by parallel computation is serialized and output, so as to obtain each serialized matrix element product, that is, obtain a serialized output result.

Step S23, based on the addition engine array, respectively accumulating each element in the serialized output result into a corresponding result matrix element, so as to obtain the result matrix.

In this embodiment, it should be noted that the addition engine array at least includes an addition engine, and is configured to accumulate matrix element products calculated by the multiplication engine for multiple times until a product between the first matrix to be operated and the second matrix to be operated is obtained.

Based on the addition calculation engine array, respectively accumulating each element in the serialized output result into a corresponding result matrix element to obtain the result matrix, specifically, based on a product identifier corresponding to each matrix element product, respectively accumulating each matrix element product in the serialized output result into a corresponding result matrix element to obtain the result matrix, where the product identifier is an identifier identifying a correspondence between a matrix element product and a corresponding result matrix element, the product identifier includes a position identifier in the serialized output result and a tag identifier of a multiplication calculation engine, and it should be noted that, after the heterogeneous processor starts to calculate a matrix product between a first matrix to be calculated and a second matrix to be calculated, the calculation process of the matrix product will be performed in a pipelined manner, further, the multiplication engine continuously calculates the matrix element product, and further, the matrix element product is transmitted to the addition engine in a pipeline manner, and the addition engine continuously accumulates the matrix element product into the corresponding result matrix element until the matrix product is calculated.

And step S30, if the accumulation frequency information corresponding to the result matrix is matched with the dimension information of the target matrix, taking the result matrix as the operation result of the target matrix.

In this embodiment, it should be noted that the result matrix at least includes one result matrix element, the accumulation number information at least includes an accumulation number corresponding to the result matrix element, and the target matrix dimension information includes a target matrix dimension.

If the accumulation frequency information corresponding to the result matrix is matched with the dimension information of the target matrix, taking the result matrix as a target matrix operation result, specifically, reading the reading of a preset addition frequency counter corresponding to each result matrix element in real time to obtain the accumulation frequency of each result matrix element, wherein the accumulation frequency is the frequency of the addition calculation engine for performing the addition calculation on the result matrix element, and further respectively judging whether each accumulation frequency is consistent with the dimension of the target matrix, if so, the result matrix element corresponding to the accumulation frequency is proved to be completely calculated, and after each result matrix element is completely calculated, if a matrix operation ending identifier is detected, the result matrix is proved to be completely calculated, and then the result matrix is taken as the target matrix operation result, and if not, the calculation process is continuously executed, and continuously accumulating matrix element products in the result matrix elements in the result matrix until the accumulation times corresponding to the result matrix elements are consistent with the target matrix dimension.

Further, in step S30, the target matrix dimension information includes a target matrix dimension, the result matrix includes at least one result matrix element, the accumulation time information includes at least one accumulation time corresponding to the result matrix element,

if the accumulated time information corresponding to the result matrix is matched with the dimension information of the target matrix, the step of taking the result matrix as the operation result of the target matrix comprises the following steps:

step S31, reading the reading of a preset addition frequency counter in real time to obtain the corresponding accumulation frequency of each result matrix element;

in this embodiment, it should be noted that, an addition number counter is provided in the heterogeneous processor, and when the heterogeneous processor starts to calculate a matrix product between a first to-be-calculated matrix and a second to-be-calculated matrix, the addition number counter is started, a reading of the addition number counter is initially 0, and each time an addition calculation engine performs an addition, a reading of the addition number counter is increased by 1, the addition number information includes an addition number corresponding to all result matrix elements, and the target matrix dimension information is a target matrix dimension of a product matrix between the first to-be-calculated matrix and the second to-be-calculated matrix, where the target matrix dimension may be determined by a matrix dimension of the first to-be-calculated matrix and a matrix dimension of the second to-be-calculated matrix, for example, assuming that the first to-be-calculated matrix is a matrix of 2 x 2, and the dimension of the matrix is 2, the second matrix to be calculated is a2 x 2 matrix, and the dimension of the matrix is 2, so that the dimension of the target matrix is 2.

Step S32, judging whether each accumulated number is consistent with the dimension of the target matrix;

and step S33, if the two are consistent, the result matrix is used as the operation result of the target matrix.

In this embodiment, specifically, it is determined whether each of the accumulation times is consistent with the target matrix dimension, if so, it is verified that each of the result matrix elements has been calculated, and then after detecting a matrix operation end identifier, the result matrix is used as a target matrix operation result, and if not, the calculation process is continuously performed to continue accumulating matrix element products in the result matrix elements in the result matrix until each of the accumulation times is consistent with the target matrix dimension, where the matrix operation end identifier is a tag identifying a matrix element result and may be stored in an idle bit of data.

Additionally, in an implementation manner, as shown in fig. 2, an interaction process between a CPU end and a heterogeneous processor is schematically illustrated, where "CPU" is the CPU end, "task management" is a task management module, and is used to execute step S30, "distribution unit" is used to extract matrix elements from the matrix element sequence data and distribute the matrix elements to a preset calculation engine array, "calculation units 1" to "calculation unit n" are the multiplication calculation engine array, "merging unit" is used to serialize and output matrix element products output by each multiplication calculation engine, and "adding unit" is an addition calculation engine array.

Compared with the technical means of matrix multiplication based on a CPU (central processing unit) adopted in the prior art, the embodiment of the application firstly acquires matrix element sequence data and target matrix dimension information corresponding to the matrix element sequence data, extracts matrix elements from the matrix element sequence data and distributes the matrix elements to a preset calculation engine array, wherein the matrix elements of a matrix to be calculated are converted into the matrix element sequence data in advance before calculation, so that the division of the matrix is not needed before calculation, the time for copying a large amount of memory is reduced, the preset calculation engine array performs element multiplication on each matrix element in parallel to obtain a matrix element multiplication result, and performs element addition on the matrix element multiplication result to obtain a result matrix, that is, the multiplication between the matrix and the matrix is divided into the multiplication and the addition between the matrix elements for operation, thereby realizing the purpose of performing the multiplication of a plurality of pairs of matrix elements in parallel by a preset calculation engine array mode, reducing the time consumption of data operation, further controlling whether the result matrix is taken as the target matrix operation result by judging whether the accumulated number information of statistics is matched with the dimension information of the target matrix, ensuring the accuracy of the matrix operation result in the running-type operation process, further realizing the running-type matrix operation process with extremely short time consumption, overcoming the defects that when the homomorphic encryption matrix is larger, the time consumption of the data operation is longer due to the fact that a CPU can only read the data one by one for data operation, and the time consumption of the matrix operation is longer due to the fact that a large amount of memory copy is needed during the matrix division, and further increasing the time consumption of the matrix operation, further, the technical defect of low efficiency of matrix operation is caused, so that the calculation efficiency in matrix operation is improved.

Further, referring to fig. 3, in another embodiment of the present application, based on the first embodiment of the present application, the matrix elements comprise dense state matrix elements,

step B10, converting each dense state matrix element to a Montgomery domain, and performing modular exponentiation operation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix;

in this embodiment, it should be noted that the dense matrix elements are matrix elements in a homomorphic encryption matrix, the first matrix to be operated is a first dense matrix encrypted homomorphically, the second matrix to be operated is a second dense matrix encrypted homomorphically, and the dense matrix elements are matrix elements of the first dense matrix and the second dense matrix.

In addition, it should be noted that, by converting the calculation in the real number domain into the calculation in the montgomery domain, the calculation in the large data bit width can be converted into the calculation in the multiple small data bit widths, which can reduce the complexity of the data operation, and for the same modular multiplication calculation, the time consumed in the montgomery domain is less than the time consumed in the real number domain, thereby reducing the time consumed by the data operation and further improving the efficiency of the data operation.

Converting each dense state matrix element into a Montgomery domain, performing modular exponentiation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix, and specifically, generating Montgomery parameters, wherein the Montgomery parameters are used for converting data from a real number domain into the Montgomery domain, and the generation method of the Montgomery parameters is known and not described herein, and further converting element multiplication between each dense state matrix element into modular exponentiation operation on the modular exponentiation calculation engine array in the preset calculation engine array and executing in parallel based on the Montgomery parameters to obtain a modular exponentiation result corresponding to each dense state matrix element, wherein the modular exponentiation result is a multiplication result of the dense state matrix elements corresponding to each dense state matrix element in the Montgomery domain, and further performing the Montgomery removal on the modular exponentiation result, the modular exponentiation result is converted from the Montgomery domain to the real number domain to obtain homomorphic encrypted products among the dense state matrix elements which are output in parallel, and the product of each dense state matrix element is obtained, and then the addition operation of accumulating the product of each dense state matrix element to the corresponding result matrix element is converted to the Montgomery domain for modular multiplication operation to obtain a modular multiplication result, namely, the Montgomery domain result matrix is obtained.

Further, in step B10, the preset calculation engine array comprises a modular exponentiation engine array and a modular multiplication engine array,

the step of converting each dense matrix element into a Montgomery domain, and performing modular exponentiation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix comprises:

step B11, converting each dense state matrix element to Montgomery domain, and performing modular exponentiation operation in parallel through the modular exponentiation engine array to obtain a modular exponentiation operation result;

in this embodiment, it should be noted that the modular exponentiation engine array at least includes a modular exponentiation engine for performing modular exponentiation, and the secret state matrix element is a matrix element in a homomorphic encryption state.

Converting each dense state matrix element into a Montgomery domain, performing modular exponentiation in parallel through the modular exponentiation engine array to obtain a modular exponentiation result, specifically, receiving a corresponding dense state matrix element group through each modular exponentiation engine, where the dense state matrix element group includes at least 2 dense state matrix elements, converting a multiplication operation of a dense state matrix element product between each matrix element in a homomorphic encryption state in the dense state matrix element group into a Montgomery domain based on the Montgomery parameters, performing modular exponentiation in parallel through each modular exponentiation engine to obtain a modular exponentiation value output by each modular exponentiation engine, and taking each modular exponentiation value as the modular exponentiation result.

Step B12, performing Montgomerization on the modular exponentiation result to convert the modular exponentiation result into a real number domain to obtain a dense state matrix element product result;

in this embodiment, the modular exponentiation result is demomontgomery-transformed to convert the modular exponentiation result into a real number domain to obtain a dense state matrix element product result, specifically, each modular exponentiation value in the modular exponentiation result is demomontgomery-transformed to convert each modular exponentiation value into a real number domain to obtain a dense state matrix element product corresponding to each modular exponentiation value, and each dense state matrix element product is serialized and output to obtain the dense state matrix element product result, where a conversion formula of multiplication operation for calculating the dense state matrix element product and corresponding modular exponentiation operation is as follows:

wherein [ [ K ]₁*C₁]]Is the product of the elements of the secret state matrix, i.e. the product of the matrix elements of the encryption matrix in the same state, where K₁Matrix elements in the clear text state, C₁For encrypting matrix elements of the matrix homomorphically, i.e. for the cipher text C₁，

For the modular exponentiation calculation formula in the Montgomery domain, N is the homomorphic encryption key, i.e., the Montgomery parameter.

And step B13, converting the dense state matrix element product result to a Montgomery domain, and performing modular multiplication operation through the modular multiplication engine array to obtain a Montgomery domain result matrix.

In this embodiment, it should be noted that the dense state matrix element product is a matrix element product in a homomorphic encryption state.

Converting the result of the dense state matrix element multiplication to a montgomery domain, performing modular multiplication operation through the modular multiplication engine array to obtain a montgomery domain result matrix, specifically, receiving each dense state matrix element multiplication through the modular multiplication engine array, and respectively accumulating each matrix element multiplication in a homomorphic encryption state to the addition operation of the corresponding result matrix element to convert the addition operation of the corresponding result matrix element to the modular multiplication operation in the montgomery domain based on the montgomery parameter to obtain the montgomery domain result matrix, wherein a conversion formula for calculating the addition operation and the corresponding modular multiplication operation of the matrix element multiplication and the result matrix element in the homomorphic encryption state is as follows:

[[C₁+C₂]]＝(C₁*C₂)modN

wherein [ [ C ]₁+C₂]]Multiplication of matrix elements for homomorphic encryption and sum between resultant matrix elements for homomorphic encryption, where C₁Is the product of the elements of the dense state matrix, C₂Is a result matrix element of a dense state, i.e. of a result matrix of a homomorphic encryption state, C₁And C₂To be in the ciphertext state, (C)₁*C₂mod N) is a modular multiplication formula in a montgomery domain, and N is the homomorphic encryption key, i.e., the montgomery parameter.

And step B20, performing Montgomerization on the Montgomery domain result matrix to obtain the result matrix.

In this embodiment, the montgomery domain result matrix is subjected to a montgomery transformation to obtain the result matrix, and specifically, the montgomery domain result matrix is subjected to a montgomery transformation to convert the montgomery domain result matrix into a real number domain to obtain the result matrix, where the result matrix is a matrix in a homomorphic encryption state.

The embodiment provides a method for performing matrix operation in a montgomery domain, that is, after extracting matrix elements from the matrix element sequence data and distributing the matrix elements to a preset calculation engine array, converting each dense-state matrix element into the montgomery domain and performing modular exponentiation and modular multiplication by the preset calculation engine array to obtain a montgomery domain result matrix, and further performing demomontgomery transformation on the montgomery domain result matrix to obtain the result matrix, wherein since the calculation performed in the montgomery domain can convert a large data bit width calculation process in a real number domain into a plurality of small data calculation processes in the montgomery domain, the complexity of data calculation can be reduced, and the efficiency of matrix operation can be improved, and since the multiplication between matrices is split into the modular exponentiation operation and the modular multiplication operation in the montgomery domain, the purpose of performing matrix operation in a parallel mode by a preset calculation engine array mode is realized, the time consumption of data operation is reduced, and further controlling whether the result matrix is used as the target matrix operation result by judging whether the counted accumulated times information is matched with the dimension information of the target matrix or not so as to ensure the accuracy of the matrix operation result in the running water type operation process, further realizes the flow type matrix operation process with extremely short time consumption, and in order to overcome the defect that when the homomorphic encryption matrix is larger, the CPU can only read data one by one to carry out data operation, the time consumption of data operation is long, so that the time consumption of matrix operation is long, and a large amount of memory copy is needed during matrix segmentation, so that the time consumption of matrix operation is further increased, and a foundation is laid for the technical defect that the efficiency of matrix operation is low.

Referring to fig. 4, fig. 4 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 4, the matrix operation optimizing device may include: a processor 1001, such as a heterogeneous processor, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.

Optionally, the matrix operation optimization device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

It will be appreciated by those skilled in the art that the matrix operation optimization device configuration shown in fig. 4 does not constitute a limitation of the matrix operation optimization device, and may include more or less components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 4, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and a matrix operation optimization program. The operating system is a program for managing and controlling hardware and software resources of the matrix operation optimization device, and supports the operation of the matrix operation optimization program and other software and/or programs. The network communication module is used for realizing communication among the components in the memory 1005 and communication with other hardware and software in the matrix operation optimization system.

In the matrix operation optimization apparatus shown in fig. 4, the processor 1001 is configured to execute a matrix operation optimization program stored in the memory 1005, and implement the steps of the matrix operation optimization method according to any one of the above.

The specific implementation of the matrix operation optimization device of the present application is substantially the same as that of each embodiment of the matrix operation optimization method, and is not described herein again.

The embodiment of the present application further provides a matrix operation optimization device, where the matrix operation optimization device is applied to a matrix operation optimization device, and the matrix operation optimization device includes:

Optionally, the distribution module is further configured to:

acquiring a first matrix to be operated and a second matrix to be operated;

and alternately arranging each first matrix element in the first matrix to be operated and each second matrix element in the second matrix to be operated to obtain the matrix element sequence data.

Optionally, the output control module is further configured to:

reading the reading of a preset addition frequency counter in real time to obtain the corresponding accumulation frequency of each result matrix element;

judging whether the accumulated times are consistent with the dimensionality of the target matrix or not;

and if the two are consistent, taking the result matrix as the operation result of the target matrix.

Optionally, the computing module is further configured to:

based on the multiplication engine array, carrying out element multiplication on each matrix element in parallel to obtain a matrix element product result;

serializing and outputting the matrix element product result to obtain a serialized output result;

and respectively accumulating each element in the serialized output result to a corresponding result matrix element based on the addition calculation engine array to obtain the result matrix.

Optionally, the distribution module is further configured to:

determining the position information of the sequence element to be calculated corresponding to each multiplication engine;

and based on the position information of each sequence element to be calculated, selecting a matrix element from the first matrix element data and the second matrix element data respectively and distributing the matrix element to each multiplication engine.

Optionally, the computing module is further configured to:

converting each dense matrix element into a Montgomery domain, and performing modular exponentiation operation and modular multiplication operation through the preset calculation engine array to obtain a Montgomery domain result matrix;

and carrying out Montgomerization removal on the Montgomery domain result matrix to obtain the result matrix.

Optionally, the computing module is further configured to:

converting each dense-state matrix element into a Montgomery domain, and performing modular exponentiation operation in parallel through the modular exponentiation engine array to obtain a modular exponentiation operation result;

performing Montgomerization on the modular exponentiation operation result to convert the modular exponentiation operation result into a real number domain to obtain a dense-state matrix element product result;

and converting the dense matrix element product result into a Montgomery domain, and performing modular multiplication operation through the modular multiplication engine array to obtain a Montgomery domain result matrix.

The embodiment of the present application provides a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs are further executable by one or more processors for implementing the steps of the matrix operation optimization method described in any one of the above.

The specific implementation of the readable storage medium of the present application is substantially the same as that of each embodiment of the above matrix operation optimization method, and is not described herein again.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A matrix operation optimization method is characterized by comprising the following steps:

2. The method for optimizing matrix operations according to claim 1, wherein the step of obtaining the sequence data of the matrix elements comprises:

acquiring a first matrix to be operated and a second matrix to be operated;

3. The method for optimizing matrix operations according to claim 1, wherein the information on the target matrix dimensions includes the target matrix dimensions, the result matrix includes at least one result matrix element, the information on the number of accumulated times includes at least one accumulated time corresponding to the result matrix element,

4. The method for optimizing matrix operations according to claim 1, wherein the predetermined calculation engine arrays include a multiplication calculation engine array and an addition calculation engine array,

5. The method for optimizing matrix operations according to claim 1, wherein the predetermined calculation engine array comprises at least one multiplication engine, the matrix element sequence data comprises a first matrix element data and a second matrix element data,

6. The method for optimizing matrix operations of claim 1, wherein the matrix elements comprise dense state matrix elements,

7. The method for optimizing matrix operations according to claim 6, wherein the predetermined calculation engine arrays include an array of modular exponentiation engines and an array of modular multiplication engines,

8. A matrix operation optimization device, characterized in that the matrix operation optimization device comprises:

9. A matrix operation optimizing device, characterized by comprising: a memory, a processor, and a program stored on the memory for implementing the matrix operation optimization method,

the memory is used for storing a program for realizing the matrix operation optimization method;

the processor is configured to execute a program for implementing the matrix operation optimization method to implement the steps of the matrix operation optimization method according to any one of claims 1 to 7.

10. A readable storage medium having stored thereon a program for implementing a matrix operation optimization method, the program being executed by a processor to implement the steps of the matrix operation optimization method according to any one of claims 1 to 7.