CN115729518A - Information processing method and apparatus, storage medium, and electronic device - Google Patents

Information processing method and apparatus, storage medium, and electronic device Download PDF

Info

Publication number
CN115729518A
CN115729518A CN202111015924.5A CN202111015924A CN115729518A CN 115729518 A CN115729518 A CN 115729518A CN 202111015924 A CN202111015924 A CN 202111015924A CN 115729518 A CN115729518 A CN 115729518A
Authority
CN
China
Prior art keywords
matrix
input
target
matrices
post
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111015924.5A
Other languages
Chinese (zh)
Inventor
姜曦楠
袁鹏
周飞虎
郭振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111015924.5A priority Critical patent/CN115729518A/en
Publication of CN115729518A publication Critical patent/CN115729518A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

The application discloses an information processing method and device, a storage medium and electronic equipment, which are applied to the field of maps. Wherein, the method comprises the following steps: acquiring a group of preposed input matrixes and postposed input matrixes, wherein the group of preposed input matrixes comprises a plurality of preposed input matrixes; when detecting that matrix multiplication operation needs to be performed on each pre-input matrix and a post-input matrix in a group of pre-input matrices respectively to obtain a first group of output matrices and to perform row merging operation on the first group of output matrices to obtain a first target output matrix, performing row merging operation on the group of pre-input matrices to obtain a target pre-input matrix; and performing matrix multiplication operation on the target preposed input matrix and the postposed input matrix in the target processing device to obtain a first target output matrix. The method and the device solve the technical problem that the utilization rate of the operation resources is low due to the fact that the operation amount of each resource calling is too small in an information processing method in the related technology.

Description

Information processing method and apparatus, storage medium, and electronic device
Technical Field
The present application relates to the field of computers, and in particular, to an information processing method and apparatus, a storage medium, and an electronic device.
Background
At present, in image recognition, an image can be input into a recognition model, where the recognition model can be a neural network model. The recognition model obtains a predicted recognition result by extracting features from the image. Only element-by-element operations are supported within the recognition model if tensor dataflow analysis is involved. For the fusion calculation of combining after matrix multiplication, element-by-element operation can be performed on a plurality of small tensor inputs respectively to obtain a plurality of small tensor outputs, and then the combination operation is performed on the plurality of small tensor outputs to obtain a large tensor output.
For the above information Processing flow, since element-by-element operations involving a plurality of small tensors are involved, if matrix multiplication is involved, matrix multiplication operations are performed a plurality of times, and it is necessary to call an operation resource, for example, a GPU (Graphics Processing Unit) a plurality of times by a plurality of instructions. When the operation resources are called each time, because the matrix multiplication of each small tensor is less in operation amount, the called operation resources cannot be fully utilized each time of resource calling, and the utilization rate of the operation resources is low.
Therefore, the information processing method in the related art has the technical problem of low utilization rate of the calculation resources due to the fact that the calculation amount performed by each resource calling is too small.
Disclosure of Invention
The embodiment of the application provides an information processing method and device, a storage medium and electronic equipment, and aims to at least solve the technical problem that the utilization rate of operation resources is low due to the fact that the operation amount of each resource calling is too small in the information processing method in the related technology.
According to an aspect of an embodiment of the present application, there is provided an information processing method including: acquiring a group of pre-input matrixes and a group of post-input matrixes, wherein the group of pre-input matrixes comprises a plurality of pre-input matrixes; when detecting that matrix multiplication operation needs to be respectively executed on each preposed input matrix and the postposed input matrix in the group of preposed input matrices to obtain a first group of output matrices and row merging operation is executed on the first group of output matrices to obtain a first target output matrix, executing the row merging operation on the group of preposed input matrices to obtain a target preposed input matrix; and performing the matrix multiplication operation on the target pre-input matrix and the post-input matrix in a target processing device to obtain the first target output matrix.
According to another aspect of the embodiments of the present application, there is also provided an information processing apparatus including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a group of pre-input matrixes and post-input matrixes, and the group of pre-input matrixes comprises a plurality of pre-input matrixes; a first execution unit, configured to, when it is detected that a matrix multiplication operation needs to be performed on each pre-input matrix and the post-input matrix in the set of pre-input matrices respectively to obtain a first set of output matrices, and a row merging operation is performed on the first set of output matrices to obtain a first target output matrix, perform the row merging operation on the set of pre-input matrices to obtain a target pre-input matrix; and the second execution unit is used for executing the matrix multiplication operation on the target pre-input matrix and the target post-input matrix in the target processing device to obtain the first target output matrix.
As an alternative embodiment, the first execution unit includes: a first execution module, the second execution unit comprising: a second execution module, wherein the first execution module is configured to, when the first pre-input matrix is a matrix with a dimension of a1 × B, the second pre-input matrix is a matrix with a dimension of a2 × B, and the post-input matrix is a matrix with a dimension of B × C, perform the row merge operation on the matrix with the dimension of a1 × B and the matrix with the dimension of a2 × B to obtain the target pre-input matrix, where the set of pre-input matrices includes the first pre-input matrix and the second pre-input matrix, the target pre-input matrix is a matrix with a dimension of a × B, a is a natural number greater than 1, a = a1+ a2, a1, a2, B, and C are natural numbers; the second execution module is configured to perform the matrix multiplication operation on the matrix with the dimension of a × B and the matrix with the dimension of B × C in the target processing apparatus to obtain the first target output matrix, where the first target output matrix is a matrix with the dimension of a × C.
As an alternative embodiment, the device further comprises: a third execution unit, configured to, when it is detected that a first preset operation needs to be performed on at least one output matrix in the first group of output matrices, perform a row splitting operation on the first target output matrix to obtain the first group of split matrices; a fourth execution unit, configured to execute the first preset operation on at least one split matrix in the first group of split matrices.
As an alternative embodiment, the third execution unit includes: a third executing module, configured to, when the first pre-input matrix is a matrix with a dimension of a1 × B, the second pre-input matrix is a matrix with a dimension of a2 × B, the post-input matrix is a matrix with a dimension of B × C, and the first target output matrix is a matrix with a dimension of a × C, perform the row splitting operation on the matrix with a dimension of a × C to obtain a first split matrix and a second split matrix, where the set of pre-input matrices includes the first pre-input matrix and the second pre-input matrix, the first set of split matrices includes the first split matrix and the second split matrix, the first split matrix is a matrix with a dimension of a1 × C, the second split matrix is a matrix with a dimension of a2 × C, a is a natural number greater than 1, a = a1+ a2, a1, a2, B, and C are natural numbers.
As an alternative embodiment, the device further comprises: a second obtaining unit, configured to obtain a pre-input matrix and a set of post-input matrices, where the set of post-input matrices includes multiple post-input matrices; a fifth execution unit, configured to, when it is detected that it is necessary to perform the matrix multiplication operation on each post-input matrix in the pre-input matrix and the set of post-input matrices, respectively, to obtain a second set of output matrices, and perform a column merge operation on the second set of output matrices, to obtain a second target output matrix, perform the column merge operation on the set of post-input matrices, to obtain a target post-input matrix; a sixth execution unit, configured to perform, in a target processing apparatus, the matrix multiplication operation on the pre-input matrix and the target post-input matrix to obtain the second target output matrix.
As an alternative implementation, the fifth execution unit includes: a fourth execution module, the sixth execution unit comprising: a fifth execution module, wherein the fourth execution module is configured to, when the first post-input matrix is a matrix with a dimension of B × C1, the second post-input matrix is a matrix with a dimension of B × C2, and the pre-input matrix is a matrix with a dimension of a × B, perform the column merging operation on the matrix with the dimension of B × C1 and the matrix with the dimension of B × C2 to obtain the target post-input matrix, where the set of post-input matrices includes the first post-input matrix and the second post-input matrix, the target post-input matrix is a matrix with a dimension of B × C, C is a natural number greater than 1, C = C1+ C2, and a, B, C1, and C2 are natural numbers; the fifth executing module is configured to perform the matrix multiplication operation on the matrix with the dimension of a × B and the matrix with the dimension of B × C in the target processing apparatus to obtain the second target output matrix, where the second target output matrix is a matrix with the dimension of a × C.
As an alternative embodiment, the apparatus further comprises: a seventh execution unit, configured to, when it is detected that a second preset operation needs to be performed on at least one output matrix in the second group of output matrices, perform a column splitting operation on the second target output matrix to obtain a second group of split matrices; an eighth execution unit, configured to execute the second preset operation on at least one split matrix in the second group of split matrices.
As an alternative implementation, the seventh execution unit includes: a sixth executing module, configured to, when the first post-input matrix is a matrix with a dimension of B × C1, the second post-input matrix is a matrix with a dimension of B × C2, the pre-input matrix is a matrix with a dimension of a × B, and the second target output matrix is a matrix with a dimension of a × C, perform the column splitting operation on the matrix with a dimension of a × C to obtain a third split matrix and a fourth split matrix, where the set of post-input matrices includes the first post-input matrix and the second post-input matrix, the second set of split matrices includes the third split matrix and the fourth split matrix, the third split matrix is a matrix with a dimension of a × C1, the fourth split matrix is a matrix with a dimension of a × C2, C is a natural number greater than 1, C = C1+ C2, and a, B, C1, and C2 are natural numbers.
As an alternative embodiment, the second execution unit includes: a seventh execution module, configured to perform the matrix multiplication operation on the target pre-input matrix and the post-input matrix in an image processor GPU to obtain the first target output matrix, where the target processing apparatus includes the GPU.
As an alternative embodiment, the first obtaining unit includes: the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring the set of front input matrixes from a set of multi-dimensional matrixes to be processed in a target prediction model and acquiring the rear input matrixes from the target multi-dimensional matrixes to be processed in the target prediction model, and the target prediction model is used for determining predicted target objects according to the set of multi-dimensional matrixes and the target multi-dimensional matrixes.
As an optional embodiment, the obtaining module comprises: an obtaining submodule, wherein the first execution unit includes: an eighth execution module, wherein the second execution unit comprises: a ninth execution module, wherein the obtaining sub-module is configured to, in the set of multidimensional matrices including a matrix with a dimension of D × a1 × B and a matrix with a dimension of D × a2 × B, and the target multidimensional matrix is a matrix with a dimension of D × B × C, obtain a first pre-input matrix with D dimensions of a1 × B and a second pre-input matrix with D dimensions of a2 × B in the set of multidimensional matrices, and obtain D post-input matrices with D dimensions of B × C in the target multidimensional matrix, where D is a natural number greater than 1, a = a1+ a2, and a1, a2, B, and C are natural numbers; the eighth executing module is configured to execute the row merging operation on the D first pre-input matrices with the dimensionality of a1 × B and the D second pre-input matrices with the dimensionality of a2 × B, respectively, to obtain D target pre-input matrices, where the target pre-input matrices are matrices with the dimensionality of a × B, and each target pre-input matrix is a matrix obtained by executing the row merging operation on one first pre-input matrix and one second pre-input matrix; the ninth execution module is configured to sequentially execute the matrix multiplication operation on D target pre-input matrices and D post-input matrices in the target processing apparatus, respectively, to obtain D first target output matrices with dimension of a × C, where each first target output matrix is a matrix obtained by executing the matrix multiplication operation on one target pre-input matrix and one post-input matrix.
As an alternative embodiment, the apparatus further comprises: a first adjusting unit, configured to, when it is detected that a model structure in a first prediction model is used to perform matrix multiplication on each pre-input matrix and the post-input matrix in the set of pre-input matrices respectively to obtain a first set of output matrices, and perform row merging operation on the first set of output matrices to obtain the first target output matrix, adjust the model structure in the first prediction model to obtain a second prediction model, where the model structure in the second prediction model is used to perform the row merging operation on the set of pre-input matrices to obtain a target pre-input matrix, and perform the matrix multiplication operation on the target pre-input matrix and the post-input matrix to obtain the first target output matrix; the target processing device is used for performing the matrix multiplication operation on the target pre-input matrix and the target post-input matrix in the second prediction model to obtain the first target output matrix.
As an alternative embodiment, the device further comprises: a second adjusting unit, configured to, when it is detected that a model structure in a third prediction model is used to perform the matrix multiplication operation on the pre-input matrix and each post-input matrix in the set of post-input matrices respectively to obtain a second set of output matrices, and perform a column merge operation on the second set of output matrices to obtain a second target output matrix, adjust the model structure in the third prediction model to obtain a fourth prediction model, where the model structure in the fourth prediction model is used to perform the column merge operation on the set of post-input matrices to obtain a target post-input matrix, and perform the matrix multiplication operation on the pre-input matrix and the target post-input matrix to obtain the second target output matrix; the target processing device is configured to perform the matrix multiplication operation on the pre-input matrix and the target post-input matrix in the fourth prediction model to obtain the second target output matrix.
According to still another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned information processing method when running.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the information processing method described above through the computer program.
According to yet another aspect of embodiments herein there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the information processing method as described above.
In the embodiment of the present application, by performing a merging operation of input matrices and then performing a matrix multiplication operation, if it is detected that a matrix multiplication operation needs to be performed on each input matrix and a post-input matrix (i.e., the same tensor) in a set of pre-input matrices (i.e., a series of tensors) and a row merging operation needs to be performed on an obtained set of output matrices, the row merging operation is performed on the set of pre-input matrices to obtain a merged pre-input matrix (i.e., a target pre-input matrix), and then the matrix multiplication operation is performed on the merged pre-input matrix and post-input matrix in a target processing device (e.g., a GPU).
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic diagram of an application environment of an alternative information processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a flow of an alternative information processing method according to an embodiment of the application;
FIG. 3 is a schematic diagram of an alternative information processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another alternative information processing method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 8 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 10 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 11 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 12 is a schematic diagram of yet another alternative information processing method according to an embodiment of the present application;
FIG. 13 is a schematic diagram of an alternative information processing apparatus according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present application, an information processing method is provided, and optionally, as an optional implementation manner, the information processing method may be applied to, but is not limited to, an environment as shown in fig. 1. Among them, but not limited to, include: a user device 102, where the user device 102 may include, but is not limited to, a memory 104, a processor 106, and a display 108, a network 110, and a server 112.
Illustratively, the procedure of the above-described information processing method may include the steps of:
step S102, the user equipment 102 acquires an image to be recognized, wherein the image to be recognized can be an image to be recognized, which contains a recognition target of a preset type;
step S104-step S106, the user equipment 102 sends the image to be recognized to the server 112 through the network 110;
in step S108, the server 112 inputs the image to be recognized into the neural network model for target recognition through the database 114 and the processing engine 116, so as to obtain a recognition result of the target recognition.
The image to be recognized may be an image obtained from a map, and it is a specific object (e.g., a landmark building, a casual object, a vehicle object, etc.) in the map that needs to be recognized. In the process of processing an input image to be identified by a neural network model, if the condition that a group of preposed input matrixes need to be subjected to matrix multiplication operation with the same postpositive input matrix respectively is detected, and then row and row combination operation is performed on a group of output matrixes, row combination operation is performed on the group of preposed input matrixes, and then matrix multiplication operation is performed on the combined input matrixes and the postpositive input matrixes in a target processing device to obtain the output matrixes.
The matrix multiplication is involved in processing the image to be recognized, and for example, the matrix multiplication is involved in convolution between feature maps by a convolution kernel or fusion between feature maps. For example, the convolution layer in the neural network model may perform a matrix multiplication operation on the feature map extracted from the image to be recognized, resulting in a multiplication result.
Here, the pre-input matrix and the post-input matrix are two entries of a matrix multiplication, wherein the pre-input matrix is a preceding entry of the two entries of the matrix multiplication, the post-input matrix is a following entry of the two entries of the matrix multiplication, and the number of rows of the pre-input matrix is the same as the number of columns of the post-input matrix. For example, the matrix multiplication operation is: a is multiplied by B, wherein A is a front input matrix and B is a rear input matrix.
In steps S110-S114, the server 112 sends the recognition result of the object recognition to the user device 102 through the network 110, and the processor 106 in the user device 102 identifies the recognized object in the image to be recognized according to the recognition result of the object recognition.
In addition to the example shown in fig. 1, the above steps may be performed by the user equipment 102 independently, that is, the user equipment 102 performs the steps of row merging operation, matrix multiplication operation, and the like of the matrix, thereby relieving the processing pressure of the server. The user equipment 102 includes, but is not limited to, a handheld device (e.g., a mobile phone), a notebook computer, a desktop computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted device, and the like, and the implementation manner of the user equipment 102 is not limited in this application.
Optionally, as an optional implementation manner, fig. 2 is a schematic flowchart of an optional information processing method according to an embodiment of the present application, and as shown in fig. 2, the flow of the information processing method may include the following steps:
step S202, a group of pre-input matrixes and post-input matrixes are obtained, wherein the group of pre-input matrixes comprises a plurality of pre-input matrixes.
The information processing method in this embodiment may be applied to the field of car networking, for example, the field of maps in car networking, and may be applied to a scenario of tensor data stream (tenserflow) analysis, for example, a scenario of image recognition using a neural network model, an applied engine may be a tensor map optimization engine, and the tensor map optimization engine may be a map optimization engine in deep learning (for example, a Grappler computational map optimization engine in tenserflow), and may be integrated into an AI (Artificial Intelligence) platform or an AI framework. The core of the tensor is a data container, which may be a multidimensional array, with scalars, vectors, matrices, etc. being tensors of different orders. In this embodiment, a matrix is used as an example for description.
The server may obtain a set of pre-input matrices (i.e., a series of tensors) and post-input matrices (i.e., the same tensor) and detect the processing logic of the input matrices. The performed detection may include: whether a merge operation involving output matrices is involved or not is detected, for example, a merge (merging) operation may be found, whether its inputs are all generated by matrix multiplication and whether it is merged along the "row" dimension or not is checked, and whether the second inputs (postinputs) of all the above matrix multiplications are the same tensor inputs or not is also checked. If so, all the above matrices (i.e., a set of pre-input matrices) may be merged in the order of the original merge operation, and then a matrix multiplication operation may be performed using the merged result (i.e., the first target output matrix) as a pre-input and the original post-input (i.e., a post-input matrix).
Here, the merging operation is a common tensor operation that merges a plurality of tensors into one tensor; the matrix multiplication operation (Matmul) is a commonly used tensor operation, and each element of the ith row of the first tensor (antecedent) and each element of the kth column of the second tensor (postent) are multiplied and accumulated one by one, and are recorded as the result of the ith row and the jth column of the output tensor.
For example, it may be detected whether it is necessary to perform matrix multiplication operations on each pre-input matrix and post-input matrix in a set of pre-input matrices to obtain a first set of output matrices (at this time, actual matrix multiplication operations are not performed), and it is necessary to perform row merging operations on the first set of output matrices to obtain a first target output matrix (at this time, actual row merging operations are not performed).
Step S204, when detecting that matrix multiplication operation needs to be respectively executed on each pre-input matrix and the post-input matrix in the group of pre-input matrices to obtain a first group of output matrices, and executing row merging operation on the first group of output matrices to obtain a first target output matrix, executing row merging operation on the group of pre-input matrices to obtain a target pre-input matrix.
If it is detected that matrix multiplication needs to be performed on each pre-input matrix and each post-input matrix in a set of pre-input matrices respectively to obtain a first set of output matrices, and row merging is performed on the first set of output matrices to obtain a first target output matrix, based on a row merging rule of a set of output matrices, the server may first perform row merging on a set of pre-input matrices to obtain a target pre-input matrix.
The row combination rule may be used to indicate a row combination order of a set of output matrices, and the obtained row number of the target pre-input matrix is a sum of row numbers of pre-input matrices included in the set of pre-input matrices. And combining the operation rule of the matrix operation, wherein the result obtained by executing the row combination operation first and then executing the matrix multiplication operation is consistent with the result obtained by executing the matrix multiplication operation first and then executing the row combination operation.
In step S206, a matrix multiplication operation is performed on the target pre-input matrix and the target post-input matrix in the target processing apparatus to obtain a first target output matrix.
After the target pre-input matrix is obtained, the server may control to perform matrix multiplication on the target pre-input matrix and the post-input matrix to obtain a first target output matrix. The matrix multiplication operation described above may be performed within the target processing device. The target processing device may be a processing unit (e.g., a GPU) for performing the matrix multiplication operation, and may be located inside the server or outside the server, which is not limited in this embodiment.
For example, as shown in fig. 3, in the process of tensor dataflow analysis, the element-by-element operation may be performed on a plurality of small tensor inputs, and then the merging operation may be performed on the small tensor outputs: performing element-by-element operations 1 to N and element-by-element operations N on the small tensor input 1 to the small tensor input N, respectively, to obtain a small tensor output 1 to a small tensor output N; and performing merging operation on the small tensor output 1 to the small tensor output N to obtain large tensor output. Here, the element-by-element operation 1 to the element-by-element operation N are homogeneous operations, for example, square root operations. The method calls the operation resources for many times, and the utilization rate of the operation resources is low.
To overcome the above problem, as shown in fig. 4, when a series of tensors and the same tensor have been subjected to matrix multiplication (MatMul) and then perform a combining operation, the matrix multiplication operations of these multiple small tensors can be aggregated into a matrix multiplication operation of a large tensor, and thus the aggregation operation is efficient in matrix multiplication: the method comprises the steps of firstly carrying out merging operation on small tensor input 1 to small tensor input N to obtain large tensor input, and then carrying out element-by-element operation on large tensor output to obtain large tensor output. The method can only call the operation resources once, and improves the utilization rate of the operation resources.
According to the embodiment provided by the application, a group of front input matrixes and rear input matrixes are obtained, wherein the group of front input matrixes comprises a plurality of front input matrixes; when detecting that matrix multiplication operation needs to be performed on each pre-input matrix and a post-input matrix in a group of pre-input matrices respectively to obtain a first group of output matrices and to perform row merging operation on the first group of output matrices to obtain a first target output matrix, performing row merging operation on the group of pre-input matrices to obtain a target pre-input matrix; the method has the advantages that the matrix multiplication operation is performed on the target preposed input matrix and the postpositioned input matrix in the target processing device to obtain the first target output matrix, the technical problem that the utilization rate of operation resources is low due to the fact that the operation amount of each resource calling is too small in the information processing method in the related art is solved, and the utilization rate of the operation resources is improved.
As an example, a set of pre-input matrices includes two pre-input matrices, which are two matrices of 32 × 64 size, respectively, and a post-input matrix is a matrix of 64 × 32 size, if it is detected that it is necessary to perform a matrix multiplication operation on each pre-input matrix and the post-input matrix in the set of pre-input matrices, respectively, to obtain two matrices of 32 × 32 size, and then perform a row merging operation on a first set of output matrices to obtain a matrix of 64 × 32 size, the row merging operation may be performed on the first set of output matrices based on a row merging rule of a set of output matrices to obtain a matrix of 64 × 64 size, and then a GPU resource is called by a call instruction, and a matrix multiplication operation is performed on the matrix of 64 × 64 size and the post-input matrix in the GPU to obtain a matrix of 64 × 32 size.
As an alternative embodiment, performing a row merging operation on a set of pre-input matrices to obtain a target pre-input matrix includes: and when the first front input matrix is a matrix with the dimension of a1 × B, the second front input matrix is a matrix with the dimension of a2 × B and the rear input matrix is a matrix with the dimension of B × C, performing row merging operation on the matrix with the dimension of a1 × B and the matrix with the dimension of a2 × B to obtain a target front input matrix.
If the pre-input matrix and the post-input matrix are both two-dimensional matrices, where a set of pre-input matrices includes a first pre-input matrix and a second pre-input matrix, the first pre-input matrix is a matrix with a dimension of a1 × B (i.e., a matrix with a row number of a1 and a column number of B), the second pre-input matrix is a matrix with a dimension of a2 × B (i.e., a matrix with a row number of a2 and a column number of B), and the post-input matrix is a matrix with a dimension of B × C (i.e., a matrix with a row number of B and a column number of C), the server may perform a row merging operation on the matrix with the dimension of a1 × B and the matrix with the dimension of a2 × B to obtain a target pre-input matrix, the target pre-input matrix has the dimension of a × B, and a = a1+ a2. Here, a1, a2, B, and C are natural numbers, and a is a natural number greater than 1.
In the row merging, the server may merge the second pre-input matrix before and after the first pre-input matrix, or may mix the row vectors of the first pre-input matrix and the row vectors of the second pre-input matrix according to a merging rule, in which case, the row vectors belonging to the same pre-input matrix are not necessarily all adjacent.
Correspondingly, performing a matrix multiplication operation on the target pre-input matrix and the post-input matrix in the target processing device to obtain a first target output matrix includes: and performing matrix multiplication operation on the matrix with the dimension of A multiplied by B and the matrix with the dimension of B multiplied by C in the target processing device to obtain a first target output matrix, wherein the first target output matrix is the matrix with the dimension of A multiplied by C.
The server may perform a matrix multiplication operation on a matrix of dimension a × B and a matrix of dimension B × C in the target processing device. When matrix multiplication is performed, multiplication operations may be performed row by row, that is, multiplication operations are performed on each row of a matrix with a dimension of a × B and each column of a matrix with a dimension of B × C, and a matrix obtained after performing the matrix multiplication operations is a first target output matrix.
Illustratively, as shown in FIG. 5, one set of prefix input matrices is prefix input [ 1], [ a1, B ] (i.e., a matrix of a1 row, B column) and prefix input 2[ a2, B ] (i.e., a matrix of a2 row, B column), and postinput [ B, C ]. First, a matrix multiplication operation is performed on the preliminary input 1 and the preliminary input 2 after the processing according to the parameter 1 (first parameter) and the preliminary input [ B, C ] after the processing according to the parameter 2 (second parameter), respectively, to obtain the output results 1[ a1, C ] and the output results 2[ a2, C ], and then, a row combination operation is performed on the output results 1[ a1, C ] and the output results 2[ a2, C ] to obtain the output results [ A, C ]. For the information processing mode, two times of matrix multiplication operations of small matrixes are executed, the overhead on hardware is two, and two calling instructions need to be issued; and moreover, two times of matrix multiplication operations of small matrixes are executed, and the utilization rate of the operation resources called each time is low.
To optimize the process flow, as shown in FIG. 6, a merge operation may be performed on the prefix input 1[ a1, B ] after processing according to the parameter 1 and the prefix input 2[ a2, B ] after processing according to the parameter N to obtain the prefix input [ A, B ] (i.e., a matrix of A rows and B columns), and then a matrix multiplication operation may be performed on the prefix input [ A, B ] after processing according to the parameter 1 and the postinput [ B, C ] after processing according to the parameter 2 to obtain the output results [ A, C ]. After optimization, only one matrix multiplication operation of a large matrix is executed, the overhead on hardware is one copy, and only one call instruction needs to be sent; in addition, the matrix multiplication operation of the large matrix is only executed once, so that the utilization rate of the operation resource called each time can be improved.
By the optional embodiment, the applicability of matrix fusion calculation can be improved and the utilization rate of the called operation resources can be improved by executing the row merging operation on the two-dimensional input matrix first and then executing the matrix multiplication operation of the matrix.
As an alternative embodiment, the method further comprises:
s1, when detecting that a first preset operation needs to be executed on at least one output matrix in a first group of output matrices, executing a line splitting operation on a first target output matrix to obtain a first group of splitting matrices;
and S2, executing a first preset operation on at least one split matrix in the first group of split matrices.
In addition to performing matrix multiplication operations, the result of the original matrix multiplication may be an input to other operations. For example, as shown in FIG. 7, in addition to the merging operation performed on the output result 1[ 2], [ 2] and [ 1], [ C ], another operation 1 is performed on the output result 1[ 2], and another operation 2 is performed on the output result 2[ 2], C.
If it is detected that a first preset operation needs to be performed on at least one output matrix in the first set of output matrices (the first preset operations performed on different output matrices may be the same or different), that is, the output result of the matrix multiplication operation performed on the preceding input matrix is the input of other operations, the result of the new matrix multiplication may be Split (Split) to obtain the original matrix multiplication tensor, and a row splitting operation is performed on the first target output matrix to obtain the first set of Split matrices.
After obtaining the first set of split matrices, the server may perform a first preset operation on at least one split matrix in the first set of split matrices. If the number of the split matrixes contained in the at least one split matrix is multiple, the corresponding first preset operation can be respectively executed on each split matrix in the at least one split matrix.
Optionally, in this embodiment, if at least one split matrix is part of the first group of split matrices, a row splitting operation may also be performed on the first target output matrix to obtain at least one split matrix; a first preset operation is performed on at least one split matrix.
For example, as shown in FIG. 8, after the output matrix [ A, C ] is obtained, a splitting operation may be performed thereon to obtain the output result 1[ a1, C ] and the output result 2[ a2, C ], and then the other operation 1 is performed on the output result 1[ a1, C ] and the other operation 2 is performed on the output result 2[ a2, C ].
By the optional embodiment, the line splitting operation is performed on the first target output matrix, and other operations are performed on the split matrix obtained by splitting, so that the method can be suitable for scenes with other operation requirements, and the compatibility of matrix fusion calculation is improved.
As an alternative embodiment, the performing the first preset operation on at least one split matrix in the first set of split matrices includes:
when the first front input matrix is a matrix with the dimensionality of a1 xB, the second front input matrix is a matrix with the dimensionality of a2 xB, the rear input matrix is a matrix with the dimensionality of B xC, and the first target output matrix is a matrix with the dimensionality of A xC, splitting operation is carried out on the matrix with the dimensionality of A xC to obtain a first split matrix and a second split matrix.
If the first pre-input matrix is a matrix with the dimension of a1 × B, the second pre-input matrix is a matrix with the dimension of a2 × B, and the post-input matrix is a matrix with the dimension of B × C, the output matrix obtained by performing the matrix multiplication operation on the first pre-input matrix and the post-input matrix is a matrix with the dimension of a1 × C, and the output matrix obtained by performing the matrix multiplication operation on the second pre-input matrix and the post-input matrix is a matrix with the dimension of a2 × C.
The first target output matrix is a matrix with dimension a × C, on which the server can perform a row splitting operation, and the resulting split matrix is a first set of split matrices that includes at least two split matrices. If the number of the split matrixes is two, the server can perform row splitting operation on the matrix with dimension of A × B to obtain the split matrix as follows: a third split matrix and a fourth split matrix. Here, the third split matrix is a matrix of a dimension a1 × B, the fourth split matrix is a matrix of a dimension a2 × B, and a = a1+ a2. And the first set of split matrices includes a third split matrix and a fourth split matrix.
Alternatively, when performing the row splitting operation, a1 rows corresponding to the third split matrix and a2 rows corresponding to the fourth split matrix in the first target output matrix may be determined first. The a1 rows corresponding to the third split matrix may be continuous rows or discontinuous rows, and the a2 rows corresponding to the fourth split matrix may be continuous rows or discontinuous rows. This is not limited in this embodiment.
By the optional embodiment, the two corresponding split matrixes are obtained by executing the line splitting operation on the output matrix, and the applicability of vector fusion calculation can be improved.
As an alternative embodiment, the method further comprises:
s1, acquiring a front input matrix and a group of rear input matrices, wherein the group of rear input matrices comprises a plurality of rear input matrices;
s2, when detecting that matrix multiplication operation needs to be performed on each post-input matrix in the pre-input matrix and the group of post-input matrices respectively to obtain a second group of output matrices, and performing column merging operation on the second group of output matrices to obtain a second target output matrix, performing column merging operation on the group of post-input matrices to obtain a target post-input matrix;
and S3, performing matrix multiplication operation on the pre-input matrix and the target post-input matrix in the target processing device to obtain a second target output matrix.
The server may obtain a set of pre-input matrices (i.e., a series of tensors) and post-input matrices (i.e., the same tensor),
the server may obtain a set of post-input matrices (i.e., a series of tensors) and pre-input matrices (i.e., the same tensor), and detect the processing logic of the input matrices. The performed detection may include: detecting whether a merge operation involving the output matrices is involved, e.g. the merge operation can be found, checking whether its inputs all result from a matrix multiplication, and checking whether it is a merge along the dimension "column", and also checking whether the first entries (leading inputs) of all the above matrix multiplications are the same tensor inputs. If so, all the above matrices (i.e., a set of postinput matrices) may be merged in the order of the original merge operation, and then a matrix multiplication operation may be performed using the merged result (i.e., the second target output matrix) as a postinput and the original postinput (i.e., the postinput matrix).
For example, it may be detected whether a matrix multiplication operation needs to be performed on the pre-input matrix and each post-input matrix in the set of post-input matrices, respectively, to obtain a second set of output matrices (at this time, the actual matrix multiplication operation is not performed), and a column merge operation is performed on the second set of output matrices to obtain a second target output matrix (at this time, the actual row merge operation is not performed).
If it is detected that the above operation needs to be performed, based on the column merging rule of the second group of output matrices, the server may first perform a column merging operation on a group of post-input matrices to obtain a target post-input matrix. The column merge rule may be used to indicate a column merge order of the second group of output matrices, and the obtained row number of the target post-input matrix is a sum of row numbers of post-input matrices included in the group of post-input matrices. And combining the operation rule of the matrix operation, wherein the result obtained by firstly executing the column merging operation and then executing the matrix multiplication operation is consistent with the result obtained by firstly executing the matrix multiplication operation and then executing the column merging operation.
After the target post-input matrix is obtained, the server may control to perform matrix multiplication on the pre-input matrix and the target post-input matrix to obtain a second target output matrix. The matrix multiplication operation described above may be performed within the target processing device. The target processing device may be a processing unit (e.g., GPU) for performing the matrix multiplication operation, and may be located inside the server or outside the server, which is not limited in this embodiment.
As an example, a set of post-input matrices includes two post-input matrices, which are two matrices with a size of 64 × 32, respectively, and a pre-input matrix is a matrix with a size of 32 × 64, if it is detected that a matrix multiplication operation needs to be performed on the pre-input matrix and each post-input matrix in the set of post-input matrices, respectively, to obtain two matrices with a size of 32 × 32, and then a column merging operation is performed on a second set of output matrices to obtain a matrix with a size of 32 × 64, the column merging operation may be performed on the first set of output matrices based on a column merging rule of the first set of output matrices to obtain a matrix with a size of 64 × 64, and then a GPU resource is called by calling an instruction, and a matrix multiplication operation is performed on the pre-input matrix and the matrix with a size of 64 × 64 in the GPU to obtain a matrix with a size of 32 × 64.
By the embodiment provided by the application, the utilization rate of the operation resources can be improved by executing the matrix merging operation of the input matrix first and then executing the matrix multiplication operation of the matrix.
As an alternative embodiment, performing a row merging operation on a set of post-input matrices to obtain a target post-input matrix includes: and when the first post-input matrix is a matrix with the dimension of B multiplied by c1, the second post-input matrix is a matrix with the dimension of B multiplied by c2 and the pre-input matrix is a matrix with the dimension of A multiplied by B, performing column merging operation on the matrix with the dimension of B multiplied by c1 and the matrix with the dimension of B multiplied by c2 to obtain a target post-input matrix.
If the pre-input matrix and the post-input matrix are two-dimensional matrices, wherein one set of post-input matrices includes a first post-input matrix and a second post-input matrix, the first post-input matrix is a dimension B × C1 matrix (i.e., a matrix with rows B and columns C1), the second post-input matrix is a dimension B × C2 matrix (i.e., a matrix with rows B and columns C2), and the pre-input matrix is a dimension a × B matrix (i.e., a matrix with rows a and columns B), the server may perform a column merge operation on the dimension B × C1 matrix and the dimension B × C2 matrix to obtain a target post-input matrix, the dimension of the target post-input matrix is B × C, and C = C1+ C2. Here, a, B, C1, and C2 are natural numbers, and C is a natural number greater than 1.
When column merging is performed, the server may merge the second post input matrix before or after the first post input matrix, or mix the column vectors of the first post input matrix and the column vectors of the second post input matrix according to a merging rule, in which case, the column vectors belonging to the same post input matrix are not necessarily all adjacent.
Correspondingly, performing a matrix multiplication operation on the pre-input matrix and the target post-input matrix in the target processing device to obtain a second target output matrix includes: performing matrix multiplication operation on the matrix with the dimension of A multiplied by B and the matrix with the dimension of B multiplied by C in the target processing device to obtain a second target output matrix, wherein the second target output matrix is the matrix with the dimension of A multiplied by C
The server may perform a matrix multiplication operation on a matrix of dimension a × B and a matrix of dimension B × C in the target processing device. When matrix multiplication is performed, multiplication operations may be performed row by row, that is, multiplication operations are performed on each row of a matrix with a dimension of a × B and each column of a matrix with a dimension of B × C, respectively, and a matrix obtained after performing the matrix multiplication operations is a second target output matrix.
Illustratively, as shown in FIG. 9, the postinput [ A, B ], and the set of postinput matrices are postinput 1[ B, c1] (i.e., a matrix of B row and c1 column) and postinput 2[ B, c2] (i.e., a matrix of B row and c2 column). First, a matrix multiplication operation is performed on the preliminary input after the processing according to the parameter 1 and the preliminary input 1[ B ], c1] and the preliminary input 2[ B, c2] after the processing according to the parameter 2, respectively, to obtain an output result 1[ A ], c1] and an output result 2[ A, c2]; then, a row merging operation is performed on the output result 1[ A, C1] and the output result 2[ A, C2], to obtain an output result [ A, C ]. For the information processing mode, two times of matrix multiplication operations of small matrixes are executed, the overhead on hardware is two, and two calling instructions need to be issued; and moreover, matrix multiplication operations of small matrixes are performed twice, and the utilization rate of the operation resources called each time is low.
To optimize the process flow, as shown in FIG. 10, a merge operation may be performed on the postinput 1[ B, C1] after being processed according to the parameter 1 and the postinput 2[ B, C2] after being processed according to the parameter N to obtain the postinput [ B, C ], and then a matrix multiplication operation may be performed on the postinput [ A, B ] after being processed according to the parameter 1 and the postinput [ B, C ] after being processed according to the parameter 2 to obtain the output [ A, C ]. After optimization, only one matrix multiplication operation of the large matrix is executed, the overhead on hardware is one copy, only one call instruction needs to be sent, and only one matrix multiplication operation of the large matrix is executed, so that the utilization rate of the operation resource called each time can be improved.
By the optional embodiment, the column merging operation is performed on the two-dimensional input matrix first, and then the matrix multiplication operation of the matrix is performed, so that the applicability of matrix fusion calculation can be improved, and the utilization rate of the called operation resources can be improved.
As an alternative embodiment, the method further comprises:
s1, when detecting that a second preset operation needs to be executed on at least one output matrix in a second group of output matrices, executing a column splitting operation on a second target output matrix to obtain a second group of split matrices;
and S2, executing a second preset operation on at least one split matrix in the second group of split matrices.
In addition to performing matrix multiplication operations, the result of the original matrix multiplication may be an input to other operations. For example, as shown in FIG. 11, in addition to the matrix multiplication operation on the output input 1[ A ], c1] and the output input 2[ A ], c2], another operation 3 is performed on the output input 1[ A ], c1] and another operation 4 is performed on the output input 2[ A ], c2 ].
If it is detected that a second preset operation (the second preset operations executed by different output matrices may be the same or different) needs to be executed on at least one output matrix in the second group of output matrices, that is, the output result of the matrix multiplication operation executed by the post-input matrix is the input of other operations, the result of the new matrix multiplication can be split to obtain the original matrix multiplication tensor, and a column splitting operation is executed on the second target output matrix to obtain a second group of split matrices.
After obtaining the second set of split matrices, the server may perform a second preset operation on at least one split matrix in the second set of split matrices. If the number of the split matrixes included in the at least one split matrix is multiple, the corresponding second preset operation can be respectively executed on each split matrix in the at least one split matrix.
Optionally, in this embodiment, if at least one split matrix is part of the second group of split matrices, a column splitting operation may also be performed on the second target output matrix to obtain at least one split matrix; and executing a second preset operation on the at least one split matrix.
For example, as shown in FIG. 12, after the output matrix [ A, C ] is obtained, a splitting operation may be performed thereon to obtain output result 1[ 2[ A, C1] and output result 2[ A, C2], and then another operation 3 is performed on output result 1[ 2[ A, C1] and another operation 4 is performed on output result 2[ A, C2 ].
By the optional embodiment, the column splitting operation is performed on the second target output matrix, and other operations are performed on the split matrix obtained through splitting, so that the method can adapt to scenes with other operation requirements, and the compatibility of matrix fusion calculation is improved.
As an alternative embodiment, performing the column splitting operation on the second target output matrix to obtain the second set of split matrices includes:
when the first post-input matrix is a matrix with dimension B x C1, the second post-input matrix is a matrix with dimension B x C2, the pre-input matrix is a matrix with dimension A x B, and the second target output matrix is a matrix with dimension A x C, column splitting operation is performed on the matrix with dimension A x C to obtain a third split matrix and a fourth split matrix.
If the pre-input matrix is a matrix with dimension A multiplied by B, the first post-input matrix is a matrix with dimension B multiplied by c1, and the second post-input matrix is a matrix with dimension B multiplied by c2, the output matrix obtained by performing matrix multiplication operation on the pre-input matrix and the first post-input matrix is a matrix with dimension A multiplied by c1, and the output matrix obtained by performing matrix multiplication operation on the pre-input matrix and the second post-input matrix is a matrix with dimension A multiplied by c2.
The second target output matrix is a two-dimensional matrix with a dimension of a × C, and the server may perform a column splitting operation on the second target output matrix, where the obtained splitting matrix is a second group of splitting matrices, and the number of the splitting matrices included in the second group of splitting matrices is at least two. If the number of the split matrixes is two, the server can perform row splitting operation on the matrix with dimension of A × C to obtain the split matrix as follows: a third split matrix and a fourth split matrix. Here, the third split matrix is a matrix of dimension a × C1, the fourth split matrix is a matrix of dimension a × C2, and C = C1+ C2. And the second set of split matrices includes a third split matrix and a fourth split matrix.
Alternatively, when performing the column splitting operation, c1 columns corresponding to the third split matrix and c2 columns corresponding to the fourth split matrix in the second target output matrix may be determined first. The c1 columns corresponding to the third split matrix may be a plurality of continuous columns or a plurality of discontinuous columns, and the c2 columns corresponding to the fourth split matrix may be a plurality of continuous columns or a plurality of discontinuous columns. This is not limited in this embodiment.
By the optional embodiment, the column splitting operation is performed on the two-dimensional output matrix to obtain two corresponding split matrixes, so that the applicability of vector fusion calculation can be improved. .
As an alternative implementation, performing a matrix multiplication operation on a target pre-input matrix and a post-input matrix in a target processing device to obtain a first target output matrix, includes:
and performing matrix multiplication operation on the target preposed input matrix and the postposed input matrix in the GPU to obtain a first target output matrix, wherein the target processing device comprises the GPU.
The target processing device may include a GPU, the CPU may be disposed on a server, may be a multi-core GPU, and the matrix multiplication operation may be performed by the GPU. In matrix processing, a processor or other control component on the server may issue a call instruction to the GPU instructing the GPU to perform a corresponding matrix multiplication operation.
When matrix fusion calculation is performed, if matrix multiplication operation of a matrix is performed first and then combination operation of the matrix is performed, a plurality of call instructions need to be issued to call the GPU to perform the matrix multiplication operation respectively. If only one GPU is provided, the matrix multiplication operation needs to be executed in sequence, and the called GPU resources cannot be fully utilized in each execution. If there are multiple GPU cores, although different GPU cores may be used to perform different matrix multiplication operations, it is still not possible to fully utilize the GPU resources per call.
In this embodiment, a GPU resource may be called by a call instruction, and a matrix multiplication operation is performed on the pre-input matrix and the post-input matrix in the called GPU, so as to obtain a target output matrix. Only one GPU call is carried out, so that only one call instruction needs to be issued; in addition, the utilization rate of GPU resources can be improved because one GPU is used for completing the matrix multiplication operation of the large matrix.
For example, if the matrix combination operation of first performing the matrix multiplication operation and then outputting the matrix is performed, two GPU cores need to be called to respectively perform the corresponding matrix multiplication operations, and the resource utilization rate of each GPU core is 30%. If the merging operation of the input matrix is executed firstly and then the matrix multiplication operation of the matrix is executed, only one GPU core is required to be called to execute the corresponding matrix multiplication operation, and the resource utilization rate of the GPU core is 60%.
By the embodiment, the advantage of high calculation speed of the GPU can be utilized by calling the GPU to execute the matrix multiplication operation, so that the matrix fusion calculation speed is improved, and meanwhile, the utilization rate of GPU resources is also improved.
As an alternative embodiment, obtaining a set of pre-input matrices and post-input matrices includes:
the method comprises the steps of obtaining a set of pre-input matrixes from a set of multi-dimensional matrixes to be processed in a target prediction model, and obtaining a post-input matrix from the set of multi-dimensional matrixes to be processed in the target prediction model, wherein the target prediction model is used for determining a predicted target object according to the set of multi-dimensional matrixes and the target multi-dimensional matrix.
The set of pre-input matrix and post-input matrix may be matrices obtained from a target prediction model, and the target prediction model may be arranged on a server and may be used to determine a predicted target object according to the set of multi-dimensional matrices and the target multi-dimensional matrix. The target prediction model may be a neural network model, the input of which may be the image to be recognized. A set of multidimensional matrices and a target multidimensional matrix may be inputs to a target convolutional layer of a neural network model (or other layers that involve matrix multiplication).
When a set of multi-dimensional matrices and a target multi-dimensional matrix are processed, a set of pre-input matrices may be obtained from the set of multi-dimensional matrices, a set of post-input matrices may be obtained from the target multi-dimensional matrix, and the set of pre-input matrices and the post-input matrices may be processed by using the information processing method shown in the foregoing embodiment. Has already been described and will not be described in detail herein.
It should be noted that the fusion calculation scheme provided in this embodiment, in which a plurality of matrices are multiplied by the same matrix and then combined, may be applied to tensor calculation with a higher dimension, that is, batched-mathul (batch matrix multiplication), and when other dimensions are all batch dimensions, other dimensions may be kept unchanged, and the fusion calculation scheme may be further generalized to other operations that may be expressed by einstein accumulation protocols, such as transposition operation and diagonal operation. The einstein summation operation is a commonly used tensor operation, and refers to cumulative contraction along some specified dimensions after two tensor elements are multiplied scalar by scalar, and matrix multiplication is an example of the einstein summation operation.
The number of the group of pre-input matrixes to be processed in the group of multi-dimensional matrixes can be multiple, and the number of the post-input matrixes to be processed in the target multi-dimensional matrix can also be multiple. The server may process different sets of pre-input matrices and post-input matrices in series or in parallel. The processing flows of the pre-input matrix and the post-input matrix of different groups may be similar, or may be different based on different processing targets, which is not limited in this embodiment.
By the embodiment, the input matrix is acquired from the multi-dimensional matrix to be processed in the prediction model of the prediction target to perform matrix fusion calculation, so that the target object prediction efficiency can be improved, and the applicability of the matrix fusion calculation can be improved.
As an alternative embodiment, the obtaining a set of pre-input matrices from a set of multi-dimensional matrices to be processed in the target prediction model, and obtaining a post-input matrix from a set of multi-dimensional matrices to be processed in the target prediction model comprises: in a set of multi-dimensional matrices including a matrix of dimension D × a1 × B and a matrix of dimension D × a2 × B, and a target multi-dimensional matrix is a matrix of dimension D × B × C, a first pre-input matrix of dimension D1 × B and a second pre-input matrix of dimension D2 × B are obtained in the set of multi-dimensional matrices, and a plurality of post-input matrices of dimension B × C are obtained in the target multi-dimensional matrix, where D is a natural number greater than 1, a = a1+ a2, and a1, a2, B, and C are natural numbers.
The set of multi-dimensional matrices and the target multi-dimensional matrix may each be a three-dimensional matrix. The set of multi-dimensional matrices includes matrices of dimension D × a1 × B (which may be considered as D matrices of dimension a1 × B) and matrices of dimension D × a2 × B (which may be considered as D matrices of dimension a2 × B). Correspondingly, obtaining a set of pre-input matrices from a set of multi-dimensional matrices to be processed in the target prediction model may include: d first prepositioned input matrixes with the dimension of a1 multiplied by B and D second prepositioned input matrixes with the dimension of a2 multiplied by B are obtained in a group of multi-dimensional matrixes. D sets of pre-input matrices are obtained from a set of multi-dimensional matrices, and the set of pre-input matrices comprises a first pre-input matrix with a dimension of a1 × B and a second pre-input matrix with a dimension of a2 × B.
Optionally, performing a row merging operation on a set of pre-input matrices to obtain a target pre-input matrix includes: and respectively executing row merging operation on the D first preposed input matrixes with the dimensionality of a1 multiplied by B and the D second preposed input matrixes with the dimensionality of a2 multiplied by B to obtain D target preposed input matrixes.
When the merging operation of the input matrices is performed, the server may perform a row merging operation on D first prefix input matrices with a1 × B dimension and D second prefix input matrices with a2 × B dimension, respectively, where each first prefix input matrix with a1 × B dimension and the corresponding second prefix input matrix with a2 × B dimension perform the row merging operation, so as to obtain D target prefix input matrices. The target pre-input matrices are matrices with dimensions of A multiplied by B, and each target pre-input matrix is a matrix obtained by performing a row combination operation on a first pre-input matrix and a second pre-input matrix.
In this embodiment, performing a matrix multiplication operation on the target pre-input matrix and the target post-input matrix in the target processing apparatus to obtain the first target output matrix includes: and sequentially performing matrix multiplication operation on the D target preposed input matrixes and the D post-input matrixes in the target processing device to obtain D first target output matrixes with dimension of A multiplied by C, wherein each first target output matrix is a matrix obtained by performing matrix multiplication operation on one target preposed input matrix and one post-input matrix.
The target multidimensional matrix may be a matrix of dimensions D × B × C, which may be considered to comprise D two-dimensional matrices of dimensions B × C. Correspondingly, the step of obtaining the post-input matrix from the target multidimensional matrix to be processed in the target prediction model comprises the following steps: d postinput matrixes with dimensions B multiplied by C are obtained from the target multi-dimensional matrix.
The D target pre-input matrices and the D post-input matrices may be in a one-to-one correspondence, for example, one target pre-input matrix is only allowed to be associated with one post-input matrix, and one post-input matrix is also only allowed to be associated with one target pre-input matrix; or a one-to-many relationship, wherein one target pre-input matrix is allowed to be associated with one or more post-input matrices, and one post-input matrix is only allowed to be associated with one target pre-input matrix; it may also be a many-to-many relationship, i.e. one target pre-input matrix allows association with one or more post-input matrices and one post-input matrix also allows association with one or more target pre-input matrices.
Optionally, performing a matrix multiplication operation on the target pre-input matrix and the post-input matrix in the target processing apparatus to obtain the first target output matrix includes: and sequentially performing matrix multiplication operation on the D target preposed input matrixes and the D post input matrixes in the target processing device to obtain D first target output matrixes with dimension of A multiplied by C.
For a scenario in which the D target pre-input matrices and the D post-input matrices are in a one-to-one correspondence relationship, when performing matrix multiplication operations, matrix multiplication operations may be sequentially performed on each target pre-input matrix and the corresponding post-input matrix in the target processing device, respectively, to obtain D first target output matrices with dimensions of a × C. The different matrix multiplication operations may be performed in series or in parallel, which is not limited in this embodiment.
Optionally, for a scenario in which each target pre-input matrix and each post-input matrix are each subjected to a matrix multiplication operation, performing, in the target processing apparatus, the matrix multiplication operation on the target pre-input matrix and the post-input matrix to obtain a first target output matrix includes: and sequentially performing matrix multiplication operation on each target pre-input matrix in the D target pre-input matrixes and the D post-input matrixes in the target processing device to obtain D multiplied by D target output matrixes with dimension of A multiplied by C.
According to the embodiment, a plurality of groups of preposed input matrixes are obtained from a group of multidimensional matrixes so as to obtain a plurality of target preposed input matrixes, a plurality of postposed input matrixes are obtained from the multidimensional matrixes, and matrix multiplication operation is performed on each target preposed input matrix and the corresponding postposed input matrix, so that the process of performing matrix fusion calculation on the multidimensional matrixes can be simplified, and the efficiency of the matrix fusion calculation is improved.
As an alternative embodiment, the method further comprises:
and under the condition that the model structure in the first prediction model is detected to be used for respectively executing matrix multiplication operation on each preposed input matrix and the postposition input matrix in a group of preposed input matrixes to obtain a first group of output matrixes and executing row combination operation on the first group of output matrixes to obtain a first target output matrix, adjusting the model structure in the first prediction model to obtain a second prediction model, wherein the model structure in the second prediction model is used for executing row combination operation on the group of preposed input matrixes to obtain a target preposed input matrix and executing matrix multiplication operation on the target preposed input matrix and the postposition input matrix to obtain the first target output matrix.
In this embodiment, if it is detected that the model structure in the first prediction model is used to perform matrix multiplication operation on each pre-input matrix and the post-input matrix in a set of pre-input matrices to obtain a first set of output matrices, and perform row merging operation on the first set of output matrices to obtain a first target output matrix, perform row splitting operation on the pre-input matrix to obtain a first target output matrix, and perform matrix multiplication operation on each split matrix and the post-input matrix in the first set of split matrices to obtain a second set of split matrices, the server may adjust the model structure in the first prediction model to obtain a second prediction model. The adjusting is to adjust the model structure in the first prediction model to: the device is used for executing row merging operation on a group of preposed input matrixes to obtain a target preposed input matrix and executing matrix multiplication operation on the target preposed input matrix and the postpositive input matrix to obtain a first target output matrix.
The adjustment may be realized by adjusting execution logic of the first prediction model, a configuration adjustment program of the first prediction model may be preset on the server, and the model configuration in the first prediction model may be adjusted by calling the configuration adjustment program, so as to obtain the second prediction model. Alternatively, the second prediction model may be the target prediction model in the foregoing embodiments. After the model structure adjustment, the target processing apparatus may be configured to perform a matrix multiplication operation on the target pre-input matrix and the target post-input matrix in the second prediction model to obtain a first target output matrix.
By the embodiment, the model structure of the prediction model is adjusted based on the detected operation performed by the model structure of the prediction model, so that the flexibility of model configuration can be improved, and the efficiency of target prediction by the prediction model can be improved.
As an alternative embodiment, the method further comprises:
and under the condition that the model structure in the third prediction model is detected to be used for respectively executing matrix multiplication operation on the preposed input matrix and each postpositional input matrix in the group of postpositional input matrices to obtain a second group of output matrices and executing column merging operation on the second group of output matrices to obtain a second target output matrix, adjusting the model structure in the third prediction model to obtain a fourth prediction model, wherein the model structure in the fourth prediction model is used for executing column merging operation on the group of postpositional input matrices to obtain a target postpositional input matrix, and executing matrix multiplication operation on the preposed input matrix and the target postpositional input matrix to obtain the second target output matrix.
In this embodiment, if it is detected that the model structure in the third prediction model is used to perform matrix multiplication on the pre-input matrix and each post-input matrix in the set of post-input matrices respectively to obtain a second set of output matrices, and perform column merging on the second set of output matrices to obtain a second target output matrix, the server may adjust the model structure in the third prediction model to obtain a fourth prediction model. The adjusting is to adjust the model structure in the third prediction model to: the array combining unit is used for executing array combining operation on the group of post-input matrixes to obtain a target post-input matrix, and executing matrix multiplication operation on the pre-input matrix and the target post-input matrix to obtain a second target output matrix.
The adjustment may be implemented by adjusting execution logic of the third prediction model, a structure adjustment program of the third prediction model may be preset on the server, and the fourth prediction model may be obtained by adjusting a model structure in the third prediction model by calling the structure adjustment program. Alternatively, the fourth prediction model may be the target prediction model in the foregoing embodiments. After the model structure adjustment, the target processing apparatus may be configured to perform a matrix multiplication operation on the pre-input matrix and the target post-input matrix in the fourth prediction model to obtain a second target output matrix.
By the embodiment, the model structure of the prediction model is adjusted based on the detected operation performed by the model structure of the prediction model, so that the flexibility of model configuration can be improved, and the efficiency of target prediction by the prediction model can be improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
According to another aspect of the embodiments of the present application, there is also provided an information processing apparatus for implementing the above-described information processing method. As shown in fig. 13, the apparatus includes:
a first obtaining unit 1302, configured to obtain a set of pre-input matrices and post-input matrices, where the set of pre-input matrices includes a plurality of pre-input matrices;
a first executing unit 1304, configured to perform a matrix multiplication operation on each pre-input matrix and a post-input matrix in a set of pre-input matrices to obtain a first set of output matrices, and perform a row merging operation on the first set of output matrices to obtain a first target output matrix, when it is detected that a first target output matrix needs to be obtained;
a second executing unit 1306, configured to perform matrix multiplication on the target pre-input matrix and the post-input matrix in the target processing apparatus, so as to obtain a first target output matrix.
It should be noted that the first obtaining unit 1302 in this embodiment may be configured to execute the step S202, the first executing unit 1304 in this embodiment may be configured to execute the step S204, and the second executing unit 1306 in this embodiment may be configured to execute the step S206.
According to the embodiment provided by the application, a group of front input matrixes and rear input matrixes are obtained, wherein the group of front input matrixes comprises a plurality of front input matrixes; under the condition that it is detected that matrix multiplication operation needs to be performed on each pre-input matrix and a post-input matrix in a group of pre-input matrices respectively to obtain a first group of output matrices, and row merging operation is performed on the first group of output matrices to obtain a first target output matrix, row merging operation is performed on the group of pre-input matrices to obtain a target pre-input matrix; the method has the advantages that the matrix multiplication operation is performed on the target preposed input matrix and the target postposed input matrix in the target processing device to obtain the first target output matrix, the technical problem that the utilization rate of operation resources is low due to the fact that the operation amount of each resource calling is too small in an information processing method in the related art is solved, and the utilization rate of the operation resources is improved.
As an alternative embodiment, the first execution unit 1304 includes: the first execution module, the second execution unit 1306 includes: a second execution module, wherein,
a first execution module, configured to, when the first pre-input matrix is a matrix with a dimension of a1 × B, the second pre-input matrix is a matrix with a dimension of a2 × B, and the post-input matrix is a matrix with a dimension of B × C, perform a row merge operation on the matrix with the dimension of a1 × B and the matrix with the dimension of a2 × B to obtain a target pre-input matrix, where a set of pre-input matrices includes the first pre-input matrix and the second pre-input matrix, the target pre-input matrix is a matrix with a dimension of a × B, a is a natural number greater than 1, a = a1+ a2, and a1, a2, B, and C are natural numbers;
and the second execution module is used for executing matrix multiplication operation on the matrix with the dimension of A multiplied by B and the matrix with the dimension of B multiplied by C in the target processing device to obtain a first target output matrix, wherein the first target output matrix is the matrix with the dimension of A multiplied by C.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the above apparatus further comprises:
the third execution unit is used for executing a line splitting operation on the first target output matrix to obtain a first group of splitting matrixes when detecting that a first preset operation needs to be executed on at least one output matrix in the first group of output matrixes;
and the fourth execution unit is used for executing a first preset operation on at least one split matrix in the first group of split matrixes.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the third execution unit includes:
and a third execution module, configured to, when the first pre-input matrix is a matrix with a dimension of a1 × B, the second pre-input matrix is a matrix with a dimension of a2 × B, the post-input matrix is a matrix with a dimension of B × C, and the first target output matrix is a matrix with a dimension of a × C, perform a row splitting operation on the matrix with a dimension of a × C to obtain a first split matrix and a second split matrix, where a set of pre-input matrices includes the first pre-input matrix and the second pre-input matrix, the first set of split matrices includes the first split matrix and the second split matrix, the first split matrix is a matrix with a dimension of a1 × C, the second split matrix is a matrix with a dimension of a2 × C, a is a natural number greater than 1, a = a1+ a2, and a1, a2, B, and C are natural numbers.
As an alternative example of this embodiment, reference may be made to the example shown in the information processing method described above, and details are not described herein in this embodiment.
As an alternative embodiment, the above apparatus further comprises:
the second acquisition unit is used for acquiring a front input matrix and a group of rear input matrices, wherein the group of rear input matrices comprises a plurality of rear input matrices;
a fifth execution unit, configured to, when detecting that a matrix multiplication operation needs to be performed on each of the pre-input matrix and the set of post-input matrices, respectively, to obtain a second set of output matrices, and perform a column merging operation on the second set of output matrices, to obtain a second target output matrix, perform a column merging operation on the set of post-input matrices, to obtain a target post-input matrix;
and the sixth execution unit is used for executing matrix multiplication operation on the preposed input matrix and the target postposed input matrix in the target processing device to obtain a second target output matrix.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the fifth execution unit includes: a fourth execution module, the sixth execution unit comprising: a fifth execution module, wherein,
a fourth execution module, configured to, when the first post-input matrix is a matrix with a dimension of B × C1, the second post-input matrix is a matrix with a dimension of B × C2, and the pre-input matrix is a matrix with a dimension of a × B, perform column merging operation on the matrix with the dimension of B × C1 and the matrix with the dimension of B × C2 to obtain a target post-input matrix, where a set of post-input matrices includes the first post-input matrix and the second post-input matrix, the target post-input matrix is a matrix with a dimension of B × C, C is a natural number greater than 1, C = C1+ C2, and a, B, C1, and C2 are natural numbers;
and the fifth execution module is used for executing matrix multiplication operation on the matrix with the dimension of A multiplied by B and the matrix with the dimension of B multiplied by C in the target processing device to obtain a second target output matrix, wherein the second target output matrix is the matrix with the dimension of A multiplied by C.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the above apparatus further comprises:
the seventh execution unit is configured to, when it is detected that a second preset operation needs to be performed on at least one output matrix in the second group of output matrices, perform a column splitting operation on a second target output matrix to obtain a second group of split matrices;
and the eighth execution unit is used for executing a second preset operation on at least one split matrix in the second group of split matrices.
As an alternative example of this embodiment, reference may be made to the example shown in the information processing method described above, and details are not described herein in this embodiment.
As an alternative implementation, the seventh execution unit includes:
a sixth executing module, configured to, when the first post-input matrix is a matrix with a dimension of B × C1, the second post-input matrix is a matrix with a dimension of B × C2, the pre-input matrix is a matrix with a dimension of a × B, and the second target output matrix is a matrix with a dimension of a × C, perform a column splitting operation on the matrix with a dimension of a × C to obtain a third split matrix and a fourth split matrix, where one set of post-input matrices includes the first post-input matrix and the second post-input matrix, the second set of split matrices includes the third split matrix and the fourth split matrix, the third split matrix is a matrix with a dimension of a × C1, the fourth split matrix is a matrix with a dimension of a × C2, C is a natural number greater than 1, C = C1+ C2, and a, B, C1, and C2 are natural numbers.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the second execution unit 1306 includes:
and the seventh execution module is used for executing matrix multiplication operation on the target preposed input matrix and the postposed input matrix in the GPU to obtain a first target output matrix, wherein the target processing device comprises the GPU.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the first obtaining unit 1302 includes:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring the set of front input matrixes from a set of multi-dimensional matrixes to be processed in a target prediction model and acquiring the rear input matrixes from the target multi-dimensional matrixes to be processed in the target prediction model, and the target prediction model is used for determining predicted target objects according to the set of multi-dimensional matrixes and the target multi-dimensional matrixes.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the obtaining module comprises: the obtaining sub-module, the first execution unit 1304 includes: an eighth execution module, the second execution unit 1306, includes: a ninth execution module that, wherein,
an obtaining submodule, configured to obtain, in a set of multidimensional matrices including a matrix with a dimension of D × a1 × B and a matrix with a dimension of D × a2 × B, and a target multidimensional matrix is a matrix with a dimension of D × B × C, a first pre-input matrix with D dimensions of a1 × B and a second pre-input matrix with D dimensions of a2 × B from the set of multidimensional matrices, and obtain a post-input matrix with D dimensions of B × C from the target multidimensional matrix, where D is a natural number greater than 1, a = a1+ a2, and a1, a2, B, and C are natural numbers;
an eighth execution module, configured to perform row merging on D first pre-input matrices with a1 × B dimensionality and D second pre-input matrices with a2 × B dimensionality, respectively, to obtain D target pre-input matrices, where a target pre-input matrix is a matrix with a × B dimensionality, and each target pre-input matrix is a matrix obtained by performing row merging on one first pre-input matrix and one second pre-input matrix;
and a ninth executing module, configured to sequentially execute matrix multiplication operations on D target pre-input matrices and D post-input matrices in the target processing apparatus, respectively, to obtain D first target output matrices with dimensions of a × C, where each first target output matrix is a matrix obtained by executing matrix multiplication operations on one target pre-input matrix and one post-input matrix.
As an alternative example of this embodiment, reference may be made to the example shown in the information processing method described above, and details are not described herein in this embodiment.
As an alternative embodiment, the above apparatus further comprises:
the first adjusting unit is used for adjusting the model structure in the first prediction model to obtain a second prediction model under the condition that the model structure in the first prediction model is detected to be used for respectively executing matrix multiplication operation on each pre-input matrix and each post-input matrix in a group of pre-input matrices to obtain a first group of output matrices and executing row merging operation on the first group of output matrices to obtain a first target output matrix, wherein the model structure in the second prediction model is used for executing row merging operation on the group of pre-input matrices to obtain a target pre-input matrix and executing matrix multiplication operation on the target pre-input matrix and the post-input matrix to obtain the first target output matrix;
the target processing device is used for performing matrix multiplication operation on the target preposed input matrix and the postposed input matrix in the second prediction model to obtain a first target output matrix.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
As an alternative embodiment, the above apparatus further comprises:
a second adjusting unit, configured to, when it is detected that a model structure in the third prediction model is used to perform a matrix multiplication operation on each post-input matrix in the pre-input matrix and the set of post-input matrices, respectively, to obtain a second set of output matrices, and to perform a column merging operation on the second set of output matrices, to obtain a second target output matrix, adjust the model structure in the third prediction model, to obtain a fourth prediction model, where the model structure in the fourth prediction model is used to perform a column merging operation on the set of post-input matrices, to obtain a target post-input matrix, and to perform a matrix multiplication operation on the pre-input matrix and the target post-input matrix, to obtain a second target output matrix;
and the target processing device is used for executing matrix multiplication operation on the preposed input matrix and the target postpositive input matrix in the fourth prediction model to obtain a second target output matrix.
As an alternative example of the present embodiment, reference may be made to the example shown in the information processing method, and details are not described herein in the present embodiment.
According to another aspect of the embodiments of the present application, there is also provided an electronic device for implementing the information processing method, where the electronic device may be the terminal device or the server shown in fig. 1. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 14, the electronic device comprises a memory 1402 and a processor 1404, the memory 1402 having stored therein a computer program, the processor 1404 being arranged to execute the steps of any of the method embodiments described above by means of the computer program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring a group of preposed input matrixes and a postposed input matrix, wherein the group of preposed input matrixes comprises a plurality of preposed input matrixes;
s2, when detecting that matrix multiplication operation needs to be performed on each pre-input matrix and each post-input matrix in a group of pre-input matrices respectively to obtain a first group of output matrices, and performing row merging operation on the first group of output matrices to obtain a first target output matrix, performing row merging operation on the group of pre-input matrices to obtain a target pre-input matrix;
and S3, performing matrix multiplication operation on the target pre-input matrix and the target post-input matrix in the target processing device to obtain a first target output matrix.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 14 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a Mobile Internet Device (MID), a PAD, and the like. Fig. 14 does not limit the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 14, or have a different configuration than shown in FIG. 14.
Memory 1402 may be used to store software programs and modules, among other things, as in the embodiments of the present application. The processor 1404 executes various functional applications and data processing by executing software programs and modules stored in the memory 1402, so as to implement the information processing method described above. Memory 1402 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1402 may further include memory located remotely from the processor 1404, which may be connected to a terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. As an example, as shown in fig. 14, the memory 1402 may include, but is not limited to, a first obtaining unit 1302, a first executing unit 1304, and a second executing unit 1306 in the information processing apparatus. In addition, the present invention may further include, but is not limited to, other module units in the information processing apparatus, which are not described in detail in this example.
Optionally, the transmitting device 1406 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmitting device 1406 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmitting device 1406 is a Radio Frequency (RF) module, which is used to communicate with the internet by wireless means.
In addition, the electronic device further includes: a display 1408 for displaying the image to be recognized and also for displaying the target recognition result; a connection bus 1410 for connecting the respective module components in the electronic apparatus.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P) network, and any type of computing device, such as an electronic device like a server or a terminal, can become a node in the blockchain system by joining the Peer-To-Peer network.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above, wherein the computer program is arranged to perform the steps in any of the method embodiments described above when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring a group of preposed input matrixes and a postposed input matrix, wherein the group of preposed input matrixes comprises a plurality of preposed input matrixes;
s2, when detecting that matrix multiplication operation needs to be performed on each pre-input matrix and each post-input matrix in a group of pre-input matrices respectively to obtain a first group of output matrices, and performing row merging operation on the first group of output matrices to obtain a first target output matrix, performing row merging operation on the group of pre-input matrices to obtain a target pre-input matrix;
and S3, performing matrix multiplication operation on the target pre-input matrix and the target post-input matrix in the target processing device to obtain a first target output matrix.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be implemented in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or at least two units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (15)

1. An information processing method, characterized by comprising:
acquiring a group of pre-input matrixes and a group of post-input matrixes, wherein the group of pre-input matrixes comprises a plurality of pre-input matrixes;
when detecting that matrix multiplication operation needs to be respectively executed on each preposed input matrix and the postposed input matrix in the group of preposed input matrices to obtain a first group of output matrices and row merging operation is executed on the first group of output matrices to obtain a first target output matrix, executing the row merging operation on the group of preposed input matrices to obtain a target preposed input matrix;
and performing the matrix multiplication operation on the target pre-input matrix and the post-input matrix in a target processing device to obtain the first target output matrix.
2. The method of claim 1,
the performing the row merging operation on the set of pre-input matrices to obtain a target pre-input matrix includes: when the first pre-input matrix is a matrix with a dimension of a1 × B, the second pre-input matrix is a matrix with a dimension of a2 × B, and the post-input matrix is a matrix with a dimension of B × C, performing the row merging operation on the matrix with the dimension of a1 × B and the matrix with the dimension of a2 × B to obtain the target pre-input matrix, where the set of pre-input matrices includes the first pre-input matrix and the second pre-input matrix, the target pre-input matrix is a matrix with a dimension of a × B, a is a natural number greater than 1, a = a1+ a2, and a1, a2, B, and C are natural numbers;
the performing, in a target processing device, the matrix multiplication operation on the target pre-input matrix and the post-input matrix to obtain the first target output matrix, includes: performing the matrix multiplication operation on the matrix with the dimension of A × B and the matrix with the dimension of B × C in the target processing device to obtain the first target output matrix, wherein the first target output matrix is the matrix with the dimension of A × C.
3. The method of claim 1, further comprising:
when detecting that a first preset operation needs to be executed on at least one output matrix in the first group of output matrices, executing a line splitting operation on the first target output matrix to obtain a first group of split matrices;
performing the first preset operation on at least one split matrix in the first set of split matrices.
4. The method of claim 3, wherein performing the row splitting operation on the first target output matrix to obtain the first set of split matrices comprises:
when a first pre-input matrix is a matrix with a dimension of a1 × B, a second pre-input matrix is a matrix with a dimension of a2 × B, the post-input matrix is a matrix with a dimension of B × C, and the first target output matrix is a matrix with a dimension of a × C, performing the row splitting operation on the matrix with a dimension of a × C to obtain a first split matrix and a second split matrix, wherein the set of pre-input matrices includes the first pre-input matrix and the second pre-input matrix, the first set of split matrices includes the first split matrix and the second split matrix, the first split matrix is a matrix with a dimension of a1 × C, the second split matrix is a matrix with a dimension of a2 × C, a is a natural number greater than 1, a = a1+ a2, a1, a2, B, and C are natural numbers.
5. The method of claim 1, further comprising:
acquiring a front input matrix and a group of rear input matrices, wherein the group of rear input matrices comprises a plurality of rear input matrices;
when detecting that the matrix multiplication operation needs to be respectively executed on the preposed input matrix and each postpositional input matrix in the group of postpositional input matrices to obtain a second group of output matrices, and executing the column merging operation on the second group of output matrices to obtain a second target output matrix, executing the column merging operation on the group of postpositional input matrices to obtain a target postpositional input matrix;
and performing the matrix multiplication operation on the pre-input matrix and the target post-input matrix in a target processing device to obtain a second target output matrix.
6. The method of claim 5,
the performing the column merging operation on the set of post-input matrices to obtain a target post-input matrix includes: when the first post-input matrix is a matrix with a dimension of B × C1, the second post-input matrix is a matrix with a dimension of B × C2, and the pre-input matrix is a matrix with a dimension of A × B, performing the column merging operation on the matrix with the dimension of B × C1 and the matrix with the dimension of B × C2 to obtain the target post-input matrix, where the set of post-input matrices includes the first post-input matrix and the second post-input matrix, the target post-input matrix is a matrix with a dimension of B × C, C is a natural number greater than 1, C = C1+ C2, and A, B, C1, and C2 are natural numbers;
the performing, in a target processing device, the matrix multiplication operation on the pre-input matrix and the target post-input matrix to obtain the second target output matrix, includes: performing the matrix multiplication operation on the matrix with the dimension of A × B and the matrix with the dimension of B × C in the target processing device to obtain the second target output matrix, wherein the second target output matrix is the matrix with the dimension of A × C.
7. The method of claim 6, further comprising:
when detecting that a second preset operation needs to be executed on at least one output matrix in the second group of output matrices, executing a column splitting operation on the second target output matrix to obtain a second group of split matrices;
performing the second preset operation on at least one split matrix in the second set of split matrices.
8. The method of claim 7, wherein performing a column splitting operation on the second target output matrix to obtain the second set of split matrices comprises:
when the first post-input matrix is a matrix with a dimension of B × C1, the second post-input matrix is a matrix with a dimension of B × C2, the pre-input matrix is a matrix with a dimension of a × B, and the second target output matrix is a matrix with a dimension of a × C, performing the column splitting operation on the matrix with a dimension of a × C to obtain a third split matrix and a fourth split matrix, wherein the set of post-input matrices includes the first post-input matrix and the second post-input matrix, the second set of split matrices includes the third split matrix and the fourth split matrix, the third split matrix is a matrix with a dimension of a × C1, the fourth split matrix is a matrix with a dimension of a × C2, C is a natural number greater than 1, C = C1+ C2, and a, B, C1, and C2 are natural numbers.
9. The method according to any one of claims 1 to 4, wherein the performing the matrix multiplication operation on the target pre-input matrix and the post-input matrix in the target processing device to obtain the first target output matrix comprises:
performing the matrix multiplication operation on the target pre-input matrix and the post-input matrix in an image processor GPU to obtain the first target output matrix, wherein the target processing device comprises the GPU.
10. The method according to any one of claims 1 to 4, wherein the obtaining a set of pre-input matrices and post-input matrices comprises:
the method comprises the steps of obtaining a set of preposed input matrixes from a set of multi-dimensional matrixes to be processed in a target prediction model, and obtaining a postpositioned input matrix from the set of multi-dimensional matrixes to be processed in the target prediction model, wherein the target prediction model is used for determining a predicted target object according to the set of multi-dimensional matrixes and the target multi-dimensional matrix.
11. The method of claim 10,
the obtaining the set of pre-input matrices from a set of multi-dimensional matrices to be processed in a target prediction model and obtaining the post-input matrices from a set of multi-dimensional matrices to be processed in the target prediction model includes: in the set of multidimensional matrices comprising a matrix with a dimension of D × a1 × B and a matrix with a dimension of D × a2 × B, and the target multidimensional matrix being a matrix with a dimension of D × B × C, obtaining D first pre-input matrices with a dimension of a1 × B and D second pre-input matrices with a dimension of a2 × B in the set of multidimensional matrices, and obtaining D post-input matrices with a dimension of B × C in the target multidimensional matrix, wherein D is a natural number greater than 1, A = a1+ a2, and a1, a2, B, and C are natural numbers;
the performing the row merging operation on the set of pre-input matrices to obtain a target pre-input matrix includes: performing the row merging operation on the D first pre-input matrices with the dimensionality of a1 × B and the D second pre-input matrices with the dimensionality of a2 × B, respectively, to obtain D target pre-input matrices, where the target pre-input matrices are matrices with the dimensionality of a × B, and each target pre-input matrix is a matrix obtained by performing the row merging operation on one first pre-input matrix and one second pre-input matrix;
the performing, in a target processing device, the matrix multiplication operation on the target pre-input matrix and the post-input matrix to obtain the first target output matrix, includes: and sequentially executing the matrix multiplication operation on D target pre-input matrixes and D post-input matrixes in the target processing device to obtain D first target output matrixes with dimension of A x C, wherein each first target output matrix is a matrix obtained by executing the matrix multiplication operation on one target pre-input matrix and one post-input matrix.
12. The method according to any one of claims 1 to 4, further comprising:
when detecting that a model structure in a first prediction model is used for respectively executing matrix multiplication operation on each pre-input matrix and the post-input matrix in the group of pre-input matrices to obtain a first group of output matrices and executing row merging operation on the first group of output matrices to obtain a first target output matrix, adjusting the model structure in the first prediction model to obtain a second prediction model, wherein the model structure in the second prediction model is used for executing the row merging operation on the group of pre-input matrices to obtain a target pre-input matrix, and executing the matrix multiplication operation on the target pre-input matrix and the post-input matrix to obtain the first target output matrix;
the target processing device is used for performing the matrix multiplication operation on the target pre-input matrix and the target post-input matrix in the second prediction model to obtain the first target output matrix.
13. The method according to any one of claims 5 to 8, further comprising:
when detecting that a model structure in a third prediction model is used for respectively executing the matrix multiplication operation on the pre-input matrix and each post-input matrix in the set of post-input matrices to obtain a second set of output matrices and executing column merging operation on the second set of output matrices to obtain a second target output matrix, adjusting the model structure in the third prediction model to obtain a fourth prediction model, wherein the model structure in the fourth prediction model is used for executing the column merging operation on the set of post-input matrices to obtain a target post-input matrix and executing the matrix multiplication operation on the pre-input matrix and the target post-input matrix to obtain the second target output matrix;
the target processing device is configured to perform the matrix multiplication operation on the pre-input matrix and the target post-input matrix in the fourth prediction model to obtain the second target output matrix.
14. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 13.
15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of the claims 1 to 13 by means of the computer program.
CN202111015924.5A 2021-08-31 2021-08-31 Information processing method and apparatus, storage medium, and electronic device Pending CN115729518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111015924.5A CN115729518A (en) 2021-08-31 2021-08-31 Information processing method and apparatus, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111015924.5A CN115729518A (en) 2021-08-31 2021-08-31 Information processing method and apparatus, storage medium, and electronic device

Publications (1)

Publication Number Publication Date
CN115729518A true CN115729518A (en) 2023-03-03

Family

ID=85291679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111015924.5A Pending CN115729518A (en) 2021-08-31 2021-08-31 Information processing method and apparatus, storage medium, and electronic device

Country Status (1)

Country Link
CN (1) CN115729518A (en)

Similar Documents

Publication Publication Date Title
CN111797983A (en) Neural network construction method and device
CN109766949A (en) Convolutional neural networks light weight method, device and electronic equipment
CN109214543B (en) Data processing method and device
CN114281521B (en) Method, system, equipment and medium for optimizing deep learning heterogeneous resource communication efficiency
CN111026063A (en) Digital twin construction method and device, computer equipment and storage medium
CN109859314B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and storage medium
CN114626503A (en) Model training method, target detection method, device, electronic device and medium
CN113759338B (en) Target detection method and device, electronic equipment and storage medium
CN111709415B (en) Target detection method, device, computer equipment and storage medium
JP2019194902A (en) Information processing method, device, system, and storage medium
JP7282474B2 (en) Encryption mask determination method, encryption mask determination device, electronic device, storage medium, and computer program
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN109800078B (en) Task processing method, task distribution terminal and task execution terminal
CN112836807A (en) Data processing method and device based on neural network
CN111967478A (en) Feature map reconstruction method and system based on weight inversion, storage medium and terminal
CN115729518A (en) Information processing method and apparatus, storage medium, and electronic device
CN115729517A (en) Information processing method and apparatus, storage medium, and electronic device
CN115690845A (en) Motion trail prediction method and device
CN114064125B (en) Instruction analysis method and device and electronic equipment
US20210224632A1 (en) Methods, devices, chips, electronic apparatuses, and storage media for processing data
CN114022721A (en) Image feature point selection method, related device, equipment and storage medium
US11681920B2 (en) Method and apparatus for compressing deep learning model
CN109657523B (en) Driving region detection method and device
CN111901500A (en) Image processing method and apparatus, storage medium, and electronic apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40083078

Country of ref document: HK