CN117908994B

CN117908994B - Method, device and equipment for processing media information and readable storage medium

Info

Publication number: CN117908994B
Application number: CN202410322444.0A
Authority: CN
Inventors: 张洪健; 容清员; 姚达
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-03-20
Filing date: 2024-03-20
Publication date: 2024-06-11
Anticipated expiration: 2044-03-20
Also published as: CN117908994A

Abstract

The application discloses a method, a device and equipment for processing media information and a readable storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring a target weight matrix and a feature matrix, wherein the target weight matrix comprises a plurality of first matrix blocks, and the feature matrix comprises a plurality of second matrix blocks; sequentially loading a plurality of first block combinations into at least two first storage areas, wherein the first block combinations comprise first matrix blocks and second matrix blocks; in the loading process, sequentially performing matrix operation on each second block combination loaded into the first storage area to obtain operation results of each second block combination, wherein each second block combination comprises a first matrix block and a second matrix block; and determining matrix operation results of the target weight matrix and the feature matrix based on the operation results of the second block combinations. The application realizes the advanced loading of matrix blocks, ensures that the operation unit is always in an operation state, and improves the efficiency of processing the media information through the neural network.

Description

Method, device and equipment for processing media information and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method, a device, equipment and a readable storage medium for processing media information.

Background

In the field of computer technology, task processing such as text translation, information classification, visual recognition, speech synthesis and the like can be performed on media information through a neural network. As the functions of the neural network become more powerful, network parameters of the neural network become larger, resulting in longer and longer processing time of media information through the neural network. Based on this, how to improve the efficiency of processing media information is a problem to be solved.

Disclosure of Invention

The application provides a method, a device, equipment and a readable storage medium for processing media information, which can improve the efficiency of processing the media information through a neural network.

In one aspect, a method for processing media information is provided, the method comprising: acquiring a target weight matrix of a neural network and inputting a feature matrix of the neural network, wherein the target weight matrix comprises a plurality of first matrix blocks, the feature matrix comprises a plurality of second matrix blocks, and the feature matrix is used for representing the content of media information; sequentially loading a plurality of first block combinations to be loaded into at least two first storage areas, wherein the first block combinations comprise the first matrix blocks and the second matrix blocks; in the process of loading the plurality of first block combinations into the at least two first storage areas, sequentially performing matrix operation on each second block combination loaded into the first storage areas to obtain operation results of each second block combination, wherein each second block combination comprises a first matrix block and a second matrix block; and determining a matrix operation result of the target weight matrix and the feature matrix based on the operation result of each second block combination, wherein the matrix operation result is used for determining a result of processing the media information through the neural network.

In another aspect, there is provided a processing apparatus of media information, the apparatus including: the acquisition module is used for acquiring a target weight matrix of the neural network and inputting a feature matrix of the neural network, wherein the target weight matrix comprises a plurality of first matrix blocks, the feature matrix comprises a plurality of second matrix blocks, and the feature matrix is used for representing the content of the media information; the loading module is used for sequentially loading a plurality of first block combinations to be loaded into at least two first storage areas, wherein the first block combinations comprise the first matrix blocks and the second matrix blocks; the operation module is used for sequentially carrying out matrix operation on each second block combination loaded in the first storage areas in the process of loading the plurality of first block combinations into the at least two first storage areas to obtain operation results of each second block combination, wherein the second block combination comprises a first matrix block and a second matrix block; and the determining module is used for determining a matrix operation result of the target weight matrix and the feature matrix based on the operation result of the second block combination, wherein the matrix operation result is used for determining a result of processing the media information through the neural network.

In one possible implementation manner, the acquiring module is configured to acquire a first weight matrix of the neural network, where the first weight matrix includes a plurality of columns of first elements; for any column of first elements, determining quantization parameters based on data bits corresponding to the any column of first elements and the first data type, and quantizing the any column of first elements based on the quantization parameters to obtain a column of quantized elements; the target weight matrix is determined based on a plurality of columns of quantization elements.

In one possible implementation, the target weight matrix includes a plurality of columns of target elements; the acquisition module is used for merging a list of quantized elements into a list of target elements for the list of target elements, wherein the target elements comprise at least two continuous quantized elements; or rearranging the array of quantized elements to obtain an array of rearranged quantized elements, and merging the array of rearranged quantized elements into the array of target elements, wherein the target elements comprise at least two consecutive rearranged quantized elements.

In one possible implementation, the dimensions of the target weight matrix include a first dimension and a second dimension, a block parameter of the first dimension being used to control a number of elements of the first matrix block in the first dimension, and a block parameter of the second dimension being used to control a number of elements of the first matrix block in the second dimension; the dimensions of the feature matrix include a third dimension and a fourth dimension, a block parameter of the third dimension is used for controlling the number of elements of the second matrix block in the third dimension, and a block parameter of the fourth dimension is used for controlling the number of elements of the second matrix block in the fourth dimension.

In a possible implementation manner, the loading module is configured to determine, for any one of the first block combinations, at least one second loading unit from a plurality of first loading units, where a target dimension meets a set condition, and load each element in the any one of the first block combinations to a corresponding first storage area through the at least one second loading unit; the first loading unit is a unit for loading elements of the target dimension, the second loading unit is a unit for loading elements of the target dimension in any one of the first block combinations, the target dimension is at least one of the first dimension, the second dimension, the third dimension and the fourth dimension, and the target dimension meets a set condition and comprises that the number of elements of the target weight matrix or the feature matrix in the target dimension is not an integer multiple of block parameters of the target dimension.

In a possible implementation manner, the obtaining module is further configured to obtain a configuration file, where the configuration file is used to record the matrix shape and the corresponding parameter value; the determining module is further configured to determine, if the matrix shape of the profile record includes a matrix shape of the target weight matrix and a matrix shape of the feature matrix, a block parameter of the first dimension, a block parameter of the second dimension, a block parameter of the third dimension, and a block parameter of the fourth dimension based on a parameter value corresponding to the matrix shape.

In a possible implementation manner, the configuration file is further used for recording values of a plurality of candidate parameters; the determining module is further configured to determine a performance indicator of each candidate parameter value if the matrix shape recorded in the configuration file does not include at least one of the matrix shape of the target weight matrix or the matrix shape of the feature matrix, where the performance indicator of the candidate parameter value is used to describe an operation time required for performing matrix operation on the target weight matrix and the feature matrix based on the candidate parameter value; the determining module is further configured to determine a block parameter of the first dimension, a block parameter of the second dimension, a block parameter of the third dimension, and a block parameter of the fourth dimension based on the candidate parameter values that the performance index satisfies the index condition.

In one possible implementation, the first matrix block includes at least two first sub-blocks, and the second matrix block includes at least two second sub-blocks; the operation module is configured to load a plurality of sub-block combinations into a second storage area for any one of second block combinations that have been loaded into the first storage area, where any one of the second block combinations includes the plurality of sub-block combinations, and any one of the sub-block combinations includes the first sub-block and the second sub-block; performing matrix operation on each sub-block combination in the second storage area to obtain an operation result of each sub-block combination; and determining the operation result of any second block combination based on the operation result of each sub-block combination.

In one possible implementation, the dimensions of the first matrix block include a second dimension, and sub-block parameters of the second dimension are used to control the number of the first sub-blocks in the second dimension; the dimensions of the second matrix block include a third dimension, a sub-block parameter of the third dimension being used to control the number of the second sub-blocks in the third dimension.

In one possible implementation, the first sub-block and the second sub-block are different in data type; the operation module is used for carrying out data type conversion on a first sub-block in any sub-block combination to obtain a target sub-block, and the data types of the target sub-block and the second sub-block are the same; and performing matrix operation on the target sub-block and a second sub-block in any sub-block combination to obtain an operation result of any sub-block combination.

In another aspect, there is provided an electronic device including a processor and a memory, where at least one computer program is stored in the memory, where the at least one computer program is loaded and executed by the processor, so that the electronic device implements any one of the above media information processing methods.

In another aspect, there is also provided a computer readable storage medium having stored therein at least one computer program loaded and executed by a processor to cause an electronic device to implement a method for processing media information as described in any of the above.

In another aspect, there is also provided a computer program, where the computer program is at least one, and at least one computer program is loaded and executed by a processor, so as to enable an electronic device to implement a method for processing any one of the media information.

In another aspect, there is provided a computer program product having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor to cause an electronic device to implement a method for processing any of the above media information.

The technical scheme provided by the application has at least the following beneficial effects.

In the technical scheme provided by the application, the target weight matrix comprises a plurality of first matrix blocks, the feature matrix comprises a plurality of second matrix blocks, and the first block combination comprises the first matrix blocks and the second matrix blocks. Each first block combination is loaded into at least two first storage areas, and the second matrix blocks are first matrix blocks loaded into the first storage areas. By loading each first matrix block into at least two first storage areas and carrying out matrix operation on the second block combination in the process of loading the first matrix blocks, the matrix blocks which are loaded in advance and wait for operation are realized, and the operation unit is ensured to be in the operation state of the matrix blocks all the time, so that the delay of loading the matrix blocks is covered, and the matrix operation efficiency between the target weight matrix and the feature matrix is improved. Because the target weight matrix is a parameter of the neural network, and the feature matrix is used for representing the content of the media information, the efficiency of processing the media information through the neural network can be improved by improving the matrix operation efficiency between the target weight matrix and the feature matrix.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an implementation environment of a media information processing method according to an embodiment of the present application.

Fig. 2 is a flowchart of a method for processing media information according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating generation of a quantization parameter matrix according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating determination of a target weight matrix according to an embodiment of the present application.

Fig. 5 is a schematic diagram illustrating determination of block parameters according to an embodiment of the present application.

Fig. 6 is a schematic diagram of a matrix operation according to an embodiment of the present application.

Fig. 7 is a schematic diagram of a matrix multiplication operation according to an embodiment of the present application.

Fig. 8 is a schematic diagram of another matrix operation according to an embodiment of the present application.

FIG. 9 is a schematic diagram of a store and operation provided by an embodiment of the present application.

Fig. 10 is a schematic structural diagram of a media information processing device according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

In the field of computer technology, processing such as text translation, information classification, visual recognition, speech synthesis and the like can be performed on media information through a neural network. With the development of computer technology, the neural network has more and more powerful functions, and network parameters of the neural network are also larger and larger, so that the time for processing the media information through the neural network is longer and longer. Based on this, how to improve the efficiency of processing media information is a problem to be solved.

Based on the above problems, the embodiments of the present application provide a method for processing media information, where the content of the media information is represented by a feature matrix. In addition, parameters of the neural network comprise weight matrixes, and the efficiency of processing media information through the neural network is improved by improving the matrix operation efficiency of the weight matrixes and the feature matrixes.

As shown in fig. 1, fig. 1 is a schematic diagram of an implementation environment of a media information processing method according to an embodiment of the present application, where the implementation environment includes a terminal device 101 and a server 102. The method for processing media information in the embodiment of the present application may be performed by the terminal device 101, by the server 102, or by both the terminal device 101 and the server 102.

The terminal device 101 may be a smart phone, a game console, a desktop computer, a tablet computer, a laptop computer, a smart television, a smart car device, a smart voice interaction device, a smart home appliance, etc. The server 102 may be a server, or a server cluster formed by a plurality of servers, or any one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application. The server 102 may be in communication connection with the terminal device 101 via a wired network or a wireless network. The server 102 may have functions of data processing, data storage, data transceiving, etc., which are not limited in the embodiment of the present application. The number of terminal devices 101 and servers 102 is not limited, and may be one or more.

The various alternative embodiments of the present application are applicable to the field of artificial intelligence technology. Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

As shown in fig. 2, fig. 2 is a flowchart of a method for processing media information according to an embodiment of the present application, and for convenience of description, a terminal device 101 or a server 102 that performs the method for processing media information according to the embodiment of the present application is referred to as an electronic device, and the method may be performed by the electronic device. As shown in fig. 2, the method includes the following steps.

Step 201, obtaining a target weight matrix of the neural network and a feature matrix of the input neural network, wherein the target weight matrix comprises a plurality of first matrix blocks, the feature matrix comprises a plurality of second matrix blocks, and the feature matrix is used for representing the content of the media information.

The embodiment of the application does not limit the structure, the function and the like of the neural network. The initial network model is illustratively trained multiple times through the media information to obtain a trained network model. The neural network in the embodiment of the present application is the trained network model or any network layer in the trained network model, for example, the neural network includes at least one network layer of a linear layer, an activation layer, a normalization layer, an attention layer, and the like.

In general, the network parameters of the neural network include a target weight matrix, the input of the neural network is a feature matrix of the media information, and the target weight matrix is used for performing matrix operation with the feature matrix to obtain a matrix operation result. Wherein the feature matrix is used to characterize the content of the media information, including but not limited to the semantics expressed by the media information, the semantics of the individual pieces of information included in the media information, and the like. The embodiment of the application does not limit the type of the media information. Illustratively, the media information includes at least one of text, images, audio, or video, etc. The media information comprises a plurality of pieces of information, it being understood that the type of piece of information is related to the type of media information. For example, if the media information is text, the information segment may be any item such as a character, a word or a sentence; the media information is an image, and the information segment can be a pixel point or an image area; the media information is audio, and the information segment can be an audio frame; the media information is video and the information segment may be a frame image.

The embodiment of the application does not limit the acquisition mode of the feature matrix. Illustratively, each information segment included in the media information is mapped to a corresponding feature, and the media feature is obtained by stitching the features of each information segment. The media features exist in the form of a matrix, and the media features can be feature matrices in the embodiment of the application. Or processing the media features through any network layer, for example, linearly mapping the media features through a linear layer to obtain processed features, where the processed features may be feature matrices in the embodiment of the present application.

As mentioned above, the network parameters of the neural network include the target weight matrix, and the embodiment of the present application does not limit the acquisition manner of the target weight matrix. In one exemplary embodiment, the weight parameters of the trained network model or the weight parameters of any network layer in the trained network model are used as the target weight matrix. In this case, the data type of the target weight matrix is the same as the data type of the feature matrix. For example, the data type of the feature matrix is FP16 (Half-Precision Floating-Point, half-precision floating Point number, floating Point number represented by 16-bit binary), and the data type of the target weight matrix is also FP16.

In another exemplary embodiment, the "obtain target weight matrix of neural network" in step 201 includes steps 2011 to 2013 (not shown in the figure).

In step 2011, a first weight matrix of the neural network is obtained, where the first weight matrix includes a plurality of columns of first elements.

As already mentioned above, the electronic device may acquire a neural network. In the embodiment of the application, the network parameters of the neural network comprise a first weight matrix, and the data type of the first weight matrix is the same as the data type of the feature matrix. Assuming that the data type of the feature matrix is the target data type, the data type of the first weight matrix is also the target data type. The accuracy of the first weight matrix is high, so that the parameter quantity of the neural network is large. In this case, the first weight matrix may be quantized to compress the accuracy of the first weight matrix and reduce the number of parameters. Based on this, the first weight matrix is a weight matrix to be quantized, the first weight matrix including a plurality of rows and columns of first elements. Optionally, the first weight matrix includes K rows and N columns of first elements, where K and N are positive integers.

In step 2012, for any column of first elements, a quantization parameter is determined based on the number of bits of data corresponding to any column of first elements and the first data type, and any column of first elements is quantized based on the quantization parameter, resulting in a column of quantized elements.

In the embodiment of the present application, for any column of first elements in the first weight matrix, the reference element may be determined from any column of first elements. The embodiment of the application does not limit the determination manner of the reference element, and the reference element is, for example, a first element with a minimum or maximum or minimum or maximum absolute value in any column of first elements, or a first element located at a set position in any column of first elements, for example, the reference element is a fifth first element in a column of first elements.

The data type of the first element is a target data type, and the reference element is used for quantizing a list of first elements, so that the first element is quantized into the first data type from the target data type, and the quantized first element is a quantized element. Quantization generally refers to mapping floating point type data to integer type data. That is, the target data type is a floating Point type, for example, the target data type is FP32 (Single-Precision Floating-Point, single-precision floating Point number, floating Point number represented by 32-bit binary) or FP16. The first data type is different from the target data type, the first data type being an integer type, for example, the first data type is INT4 (an integer represented by a 4-bit binary system) or INT8 (an integer represented by an 8-bit binary system).

The first data type is different and there is also a difference in the number of data bits required to characterize the quantization element. For example, if the first data type is INT4, 4 bits of data are required to characterize a quantization element, that is, the number of bits of data corresponding to INT4 is 4. As another example, if the first data type is INT8, then 8 bits of data are required to characterize a quantization element, that is, the number of bits of data corresponding to INT8 is 8.

A quantization parameter, also referred to as a Scaling Vector, may be determined based on the number of data bits corresponding to the reference element and the first data type. The quantization parameter is used for quantizing the elements of the target data type in a quantization stage of the neural network, and mapping the quantized elements into the elements of the target data type in an reasoning stage of the neural network.

Alternatively, according to the formulaDetermining quantization parameters corresponding to the first element of the ith column. Wherein max represents the maximum sign, abs represents the absolute sign,The ith column first element in the first weight matrix is characterized. Based on this, it can be seen that,The absolute value of the first element in the ith column is calculated in a characterization mode, and the first element with the largest absolute value is found out, wherein the first element with the largest absolute value is the reference element.The number of data bits corresponding to the first data type is characterized. For example, the number of data bits corresponding to the first data type INT8Quantization parameterEqual to the reference element divided by the power of 2 to 7; number of data bits corresponding to the first data type INT4Quantization parameterEqual to the reference element divided by the power of 2 to 3.

As shown in fig. 3, the first weight matrix includes first elements of K rows and N columns. Since each column corresponds to one quantization parameter, a quantization parameter matrix including 1 row and N columns of quantization parameters can be obtained based on the first weight matrix.

Next, for any column of first elements, the column of first elements may be quantized based on the number of data bits corresponding to the first data type and quantization parameters corresponding to the column of first elements, to obtain a column of quantized elements. Optionally, dividing each first element in the list of first elements by the quantization parameter results in a division result, and determining the quantization element based on the division result and the number of data bits. Illustratively, the first data type INT8 corresponds to a number of data bitsAdding the division result corresponding to the first element to the power of 7 of 2 to obtain a quantized element; number of data bits corresponding to the first data type INT4And adding the division result corresponding to the first element to the power of 3 of 2 to obtain the quantized element. In this way, each first element is mapped to a corresponding quantized element.

Step 2013, determining a target weight matrix based on the multi-column quantization element.

In the embodiment of the application, a plurality of columns of quantization elements can form a quantization weight matrix, and a target weight matrix can be determined based on the quantization weight matrix. In one exemplary embodiment, a quantization weight matrix is used as the target weight matrix. For example, if the data type of the quantization element in the quantization weight matrix is INT4 or INT8, the data type of the element in the target weight matrix is also INT4 or INT8.

In another exemplary embodiment, the target weight matrix includes a plurality of columns of target elements. Step 2013 includes: for a column of target elements, merging a column of quantized elements into a column of target elements, the target elements comprising at least two quantized elements in succession; or rearranging a row of quantized elements to obtain a row of rearranged quantized elements, merging the row of rearranged quantized elements into a row of target elements, wherein the target elements comprise at least two consecutive rearranged quantized elements.

In the embodiment of the present application, assuming that the data type of the target element in the target weight matrix is a set data type and the data type of the quantized element is a first data type, the merging number is determined based on the number of data bits corresponding to the set data type and the number of data bits corresponding to the first data type, where the merging number is the number of quantized elements required to be merged into the target element or the number of quantized elements after rearrangement.

For a column of quantized elements, assuming that the merging number is i (i is a positive integer), the 1 st quantized element to the i-th quantized element are merged into the 1 st target element, the i+1st quantized element to the 2 i-th quantized element are merged into the 2 nd target element, the 2i+1th quantized element to the 3 i-th quantized element are merged into the 3 rd target element, and so on. That is, the quantized elements are combined into one target element per the successive combined number, resulting in a list of target elements. In this way, a plurality of columns of target elements can be obtained, thereby obtaining a target weight matrix.

As shown in fig. 4, fig. 4 is a schematic diagram illustrating determination of a target weight matrix according to an embodiment of the present application. The quantization weight matrix includes K rows and N columns of quantization elements, the number type of quantization elements is INT8, that is, one quantization element is characterized by 8 bits of data, and the quantization element is 8 bits (bits). For each column of quantized elements, 4 INT8 quantized elements of consecutive 4 rows are combined into one INT32 target element, e.g. 4 quantized elements of the first columnMerging into target elements4 Quantization elements of the second columnMerging into target elementsAnd so on, obtaining a target weight matrix comprising K/4 rows and N columns of target elements. Similarly, 8 INT4 quantization elements of consecutive 8 rows can be combined into one INT32 target element, resulting in a target weight matrix.

Typically, the hardware device used for matrix operations corresponds to a data type, for example, registers of a GPU (Graphics Processing Unit, graphics processor) chip support the data type of INT32, and the data type corresponding to the hardware device may be the same or different from the data type corresponding to the quantization element. By combining at least two quantization elements into one target element, the data type of the target element can be adapted to the data type corresponding to the hardware device, so that the operation performance of the hardware device in matrix operation is improved.

Alternatively, a sequence of quantized elements may be rearranged to obtain a sequence of rearranged quantized elements, where the rearrangement manner is not limited herein, and illustratively, the rearrangement manner may be set according to manual experience, or may be adaptively set according to characteristics of hardware devices.

For example, one thread bundle of a GPU chip includes 32 threads, requiring: thread number 0 through 3 reads line 0, 1, 8, 9 elements, thread number 4 through 7 reads line 2,3, 10, 11 elements, and so on. Based on this, for a column of quantized elements of 32 rows INT4, the row order of 32 rows (noted as 0 to 31 rows) can be changed to: 0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27, 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31. The rearranged quantized elements can be better adapted to the thread characteristics of the GPU chip by rearranging the quantized elements, so that tensor calculation cores of the GPU can be better utilized during subsequent matrix operation, performance loss caused by precision conversion is reduced, and performance of the GPU chip is improved.

For a row of rearranged quantized elements, assuming that the merging number is i (i is a positive integer), merging the 1 st rearranged quantized element to the i th rearranged quantized element into the 1 st target element, merging the (i+1) th rearranged quantized element to the 2i th rearranged quantized element into the 2 nd target element, merging the (2i+1) th rearranged quantized element to the 3i rd rearranged quantized element into the 3 rd target element, and so on. That is, the rearranged quantized elements are combined into a target element in each successive combination number, and a list of target elements is obtained. In this way, a plurality of columns of target elements can be obtained, thereby obtaining a target weight matrix.

The target weight matrix is a network parameter of the neural network and includes a plurality of first matrix blocks. Any two first matrix blocks may be the same or different in size, e.g., one first matrix block may include 4 rows and 4 columns of target elements and the other first matrix block may include 4 rows and 2 columns of target elements, with the two being different in size. Any two first matrix blocks include target elements that do not coincide, that is, any one of the target elements in the target weight matrix belongs to one first matrix block.

Optionally, the dimensions of the target weight matrix include a first dimension and a second dimension, the block parameter of the first dimension is used to control the number of elements of the first matrix block in the first dimension, and the block parameter of the second dimension is used to control the number of elements of the first matrix block in the second dimension.

In the embodiment of the application, the first dimension is the row dimension of the target weight matrix, the second dimension is the column dimension of the target weight matrix, and the number of rows and columns of the target weight matrix can be the same or different. Assuming that the target weight matrix is a matrix B including K rows and N columns of target elements, the first dimension is a K dimension, the second dimension is an N dimension, and K and N are two identical or different positive integers.

The first dimension and the second dimension respectively correspond to one block parameter, and the element number of the first matrix block in the corresponding dimension is controlled through the block parameter. For example, the target weight matrix includes K rows and N columns of target elements, and the target weight matrix includes split_k first matrix blocks in the K dimension, where a block size of each first matrix block in the K dimension is K/split_k=tile_k. That is, the first dimension (i.e., the K dimension) corresponds to the tile_k parameter, and the first matrix block includes tile_k target elements in the K dimension. Similarly, the second dimension (i.e., the N dimension) corresponds to the TILE_N parameter, and the first matrix block includes TILE_N target elements in the N dimension. In summary, the first matrix block includes target elements of the TILE_K row TILE_N column.

Similarly, the feature matrix is input data of the neural network, including a plurality of second matrix blocks. Any two second matrix blocks may be the same or different in size, e.g., one second matrix block may comprise 4 rows and 4 columns of elements and the other second matrix block may comprise 3 rows and 4 columns of elements, the two being different in size. Any two second matrix blocks comprise elements that do not coincide, that is, any one element in the feature matrix belongs to one second matrix block.

Optionally, the dimensions of the feature matrix include a third dimension and a fourth dimension, a block parameter of the third dimension being used to control the number of elements of the second matrix block in the third dimension, and a block parameter of the fourth dimension being used to control the number of elements of the second matrix block in the fourth dimension.

In the embodiment of the application, the third dimension is the row dimension of the feature matrix, the fourth dimension is the column dimension of the feature matrix, and the number of rows and columns of the feature matrix can be the same or different. Assuming that the feature matrix is a matrix a comprising M rows and K columns of elements, the third dimension is the M dimension, the fourth dimension is the K dimension, and M and K are two identical or different positive integers.

The third dimension and the fourth dimension respectively correspond to one block parameter, and the number of elements of the second matrix block in the corresponding dimension is controlled through the block parameter. For example, the feature matrix includes M rows and K columns of elements, the third dimension (i.e., M dimension) corresponds to the tile_m parameter, and the second matrix block includes tile_m elements in the M dimension. Similarly, the fourth dimension (i.e., the K dimension) corresponds to the tile_k parameter, and the second matrix block includes tile_k elements in the K dimension. In summary, the second matrix block includes elements of the TILE_M row TILE_K column.

It should be noted that, if the target weight matrix is a first weight matrix or a quantization weight matrix, the number of columns of the feature matrix is the same as the number of rows of the target weight matrix, and in this case, the first dimension and the fourth dimension may correspond to the same block parameter. For example, the feature matrix includes M rows and K columns of elements, the target weight matrix includes K rows and N columns of target elements, and the first dimension and the fourth dimension are both K dimensions, corresponding to the same block parameter tile_k parameter. The second dimension is the N dimension, corresponding to the block parameter tile_n parameter. The third dimension is the M dimension, corresponding to the block parameter tile_m parameter.

If the target weight matrix is obtained by combining the quantized elements in the quantized weight matrix, the number of columns of the feature matrix is the same as the number of rows of the quantized weight matrix, but the number of columns of the feature matrix is different from the number of rows of the target weight matrix. In this case, the first dimension and the fourth dimension may correspond to different block parameters. Alternatively, the block parameters of the first dimension may be determined based on the block parameters of the fourth dimension, or the block parameters of the fourth dimension may be determined based on the block parameters of the first dimension. For example, after combining every continuous i quantization elements in the quantization weight matrix, a target weight matrix is obtained, and then the feature matrix includes M rows and K columns of elements, and the target weight matrix includes K/i rows and N columns of target elements. The first dimension is K/i dimension, the second dimension is N dimension, the third dimension is M dimension, the fourth dimension is K dimension, and the four dimensions correspond to four block parameters respectively. Optionally, the block parameter of the first dimension is multiplied by i equal to the block parameter of the fourth dimension.

The embodiment of the application does not limit the determination mode of the block parameters. In one exemplary embodiment, the electronic device may obtain block parameters for each dimension entered by the user, or the electronic device may determine block parameters for a corresponding dimension based on the number of elements of the matrix in a certain dimension.

In another exemplary embodiment, the method according to an embodiment of the present application further includes steps S1 to S2 (not shown in the drawings). Alternatively, steps S1 to S2 are performed before step 202.

Step S1, a configuration file is obtained, wherein the configuration file is used for recording the matrix shape and the corresponding parameter value.

In the embodiment of the application, the electronic device may acquire the configuration file, and the acquiring mode is not limited herein. For example, the electronic device may obtain a profile entered by the user, or the electronic device may read the profile from a database or storage device, etc.

It has been mentioned above that the target weight matrix corresponds to the block parameters of the first dimension and the block parameters of the second dimension, and the feature matrix corresponds to the block parameters of the third dimension and the block parameters of the fourth dimension. In the embodiment of the application, the configuration file is used for recording at least one corresponding relation, and the corresponding relation comprises a matrix shape and a corresponding parameter value. Any two corresponding relations comprise different matrix shapes, and different matrix shapes can correspond to the same or different parameter values. Typically, a matrix comprises at least one row and at least one column of elements, with matrices of different rows and columns corresponding to different shapes, e.g. matrices of 4 rows and 4 columns corresponding to square matrices and matrices of 4 rows and 6 columns corresponding to rectangular matrices. In some cases, the matrix shape may be represented by the number of matrix rows and the number of matrix columns.

It is understood that the matrix shape includes a shape of a weight matrix, and the parameter values corresponding to the shape of the weight matrix include the value of the block parameter of the first dimension and the value of the block parameter of the second dimension. The matrix shape also comprises a shape of the feature matrix, and the parameter values corresponding to the shape of the feature matrix comprise values of the block parameters of the third dimension and values of the block parameters of the fourth dimension.

Step S2, if the matrix shape recorded by the configuration file comprises the matrix shape of the target weight matrix and the matrix shape of the feature matrix, determining the block parameters of the first dimension, the block parameters of the second dimension, the block parameters of the third dimension and the block parameters of the fourth dimension based on the parameter values corresponding to the matrix shape.

In the embodiment of the application, for any corresponding relation recorded by the configuration file, the shape of the weight matrix in the corresponding relation and the matrix shape of the target weight matrix can be compared, if the comparison is passed, the block parameter of the first dimension is determined to be the value of the block parameter of the first dimension in the corresponding relation, and the block parameter of the second dimension is determined to be the value of the block parameter of the second dimension in the corresponding relation.

Similarly, the matrix shape of the feature matrix can be compared with the shape of the feature matrix in any corresponding relation, if the comparison is passed, the block parameter of the third dimension is determined to be the value of the block parameter of the third dimension in the corresponding relation, and the block parameter of the fourth dimension is determined to be the value of the block parameter of the fourth dimension in the corresponding relation.

In some cases, at least one of the matrix shape of the target weight matrix and the matrix shape of the feature matrix does not pass through the correspondence relation with any one of the profile records, and in this case, the corresponding block parameters may be set to the set values.

In one possible implementation, the configuration file is further used to record a plurality of candidate parameter values. The method according to the embodiment of the present application further includes steps S3 to S4 (not shown in the drawings). Alternatively, steps S3 to S4 are performed after step S1.

And S3, if the matrix shape recorded by the configuration file does not comprise at least one of the matrix shape of the target weight matrix or the matrix shape of the feature matrix, determining the performance index of each candidate parameter value, wherein the performance index of the candidate parameter value is used for describing the operation time required for carrying out matrix operation on the target weight matrix and the feature matrix based on the candidate parameter value.

In the embodiment of the present application, for the block parameters of the first dimension, the block parameters of the second dimension, the block parameters of the third dimension, and the block parameters of the fourth dimension, the configuration file is further used to record the search space (i.e., the value area) of each block parameter. The block parameter is used for controlling the element number of the matrix block in the corresponding dimension, so that the value of the block parameter is a positive integer. The configuration file records the value area of the block parameter, which is equivalent to each candidate value of the block parameter. One candidate value of the different block parameters may be combined to obtain one candidate parameter value, that is, any one candidate parameter value includes one candidate value of each block parameter.

If at least one of the matrix shape of the target weight matrix and the matrix shape of the feature matrix does not pass through the comparison with any corresponding relation recorded by the configuration file, for any candidate parameter value, each matrix block included in the target weight matrix and each matrix block included in the feature matrix can be determined based on one candidate value of each block parameter included in the candidate parameter value. Any matrix block included in the target weight matrix and any matrix block included in the feature matrix are called a matrix block combination, and matrix operation is performed on the matrix block combination to obtain an operation result of the matrix block combination. And determining matrix operation results of the target weight matrix and the feature matrix based on the operation results of the combination of the matrix blocks. Calculating the operation time required by the operation, and determining the performance index of the candidate parameter value based on the operation time. The embodiment of the application does not limit the mode of determining the performance index of the candidate parameter value based on the operation time, and exemplarily maps the operation time to the performance index of the candidate parameter value or determines the performance index of the candidate parameter value according to the time range to which the operation time belongs.

By the method, the performance index of each candidate parameter value can be determined. Alternatively, the performance index of the candidate parameter value is inversely related to the operation time, that is, the shorter the operation time is, the higher the performance index is. Or the candidate parameter value performance index is the operation time.

And S4, determining a block parameter of a first dimension, a block parameter of a second dimension, a block parameter of a third dimension and a block parameter of a fourth dimension based on the candidate parameter values of which the performance indexes meet the index conditions.

The embodiment of the application does not limit the content that the performance index meets the index condition. Illustratively, if the performance index is an operation time, the performance index satisfying the index condition may include: the operation time is the shortest or within a set time range. As another example, if the performance index is inversely related to the operation time, the performance index meeting the index condition may include: the performance index is highest or is within a set numerical range.

When the electronic equipment determines that the performance index meets the candidate parameter value of the index condition, the candidate value of each block parameter is obtained, wherein the block parameters comprise the block parameters of the first dimension, the block parameters of the second dimension, the block parameters of the third dimension and the block parameters of the fourth dimension.

It can be understood that the block parameters affect the blocking of the target weight matrix and the blocking of the feature matrix, so as to affect the operation efficiency of the first matrix block and the second matrix block, and further affect the operation efficiency of the target weight matrix and the feature matrix. Because the target weight matrix is a network parameter of the neural network and the feature matrix is used for representing the content of the media information, the operation efficiency of the target weight matrix and the feature matrix can influence the efficiency of processing the media information through the neural network. That is, the block parameters may affect the efficiency of processing media information through the neural network.

The value of the block parameter is not only related to hardware equipment, but also related to the matrix shape of the target weight matrix and the matrix shape of the feature matrix, and the value of the block parameter is set only by experience, so that different deployment scenes are difficult to meet. Based on the above, the embodiment of the application records the matrix shape and the corresponding parameter values through the configuration file, so as to determine each block parameter which is matched with the shape of the target weight matrix and the shape of the feature matrix based on the configuration file, realize the self-adaptive adjustment of the block parameter values according to the matrix shape, and be beneficial to improving the matrix operation efficiency, thereby improving the processing efficiency of the media information through the neural network.

In addition, for the matrix shape which is not included in the configuration file, the candidate parameter values of which the performance indexes meet the index conditions are selected by determining the performance indexes of the candidate parameter values, so that the block parameters corresponding to the matrix shape are determined, the deployment scene of the matrix shape is enriched, the application scenes of the neural network and the media information are enriched, the matrix operation efficiency is improved, and the processing efficiency of the media information through the neural network is improved.

As shown in fig. 5, fig. 5 is a schematic diagram illustrating determination of block parameters according to an embodiment of the present application. In the embodiment of the application, the electronic equipment can acquire the configuration file. And reading in the search space of each block parameter from the configuration file, and reading in the matrix shape and the parameter value of each corresponding block parameter. Next, it is determined whether the matrix shape includes the shape of the target weight matrix and the shape of the feature matrix. And if so, outputting the parameter value of each block parameter. If not, traversing the search space of each block parameter, and determining the parameter value of each block parameter so as to minimize the operation time of the target weight matrix and the feature matrix. And then, writing the matrix shape and the parameter values of the corresponding block parameters into a configuration file, wherein the matrix shape comprises the shape of the target weight matrix and the shape of the feature matrix. And outputting the parameter values of the block parameters.

The electronic device determines the parameter value of each block parameter, on the one hand, the parameter value is equivalent to the block parameter of the first dimension and the block parameter of the second dimension, thereby realizing the determination of each first matrix block in the target weight matrix. On the other hand, the block parameters corresponding to the third dimension and the block parameters of the fourth dimension are determined, thereby realizing the determination of the respective second matrix blocks in the feature matrix.

Step 202, loading a plurality of first block combinations to be loaded into at least two first storage areas in sequence, wherein the first block combinations comprise a first matrix block and a second matrix block.

In the embodiment of the application, a first block combination comprises a first matrix block and a second matrix block, and matrix operation is performed on the first block combination through a thread block. In practical application, the thread blocks are at least one, and the electronic device allocates at least two first storage areas for each thread block. The first storage area is an arbitrary storage area, for example, the first storage area is a shared memory, and the shared memory is a memory that can be accessed by different processes, where the processes include thread blocks.

For any one thread block, the electronic device can load a first block combination from global memory into a corresponding first storage area that does not store data. The delay of the global memory is higher than that of the first storage area, and the first block combination is loaded to the first storage area, so that when the subsequent matrix operation is performed, the thread reads data from the first storage area, the reading speed of the data is improved, and the processing efficiency of the media information through the neural network is improved. After loading is completed, if there is a corresponding first storage area where data is not stored, the electronic device may load another first block combination into the first storage area. After loading is completed, if the first storage area of the corresponding non-stored data does not exist, the electronic device pauses loading until the first storage area of the corresponding non-stored data exists, at which time the electronic device can load the next first block combination into the first storage area. In this way, the electronic device sequentially loads the plurality of first block combinations into the at least two first storage areas in sequence.

Optionally, the first matrix block includes target elements of a tile_k row tile_n column, and the second matrix block includes elements of a tile_m row tile_k column. If the electronic device allocates a first storage area of stage_num for each thread block, the electronic device needs to open up a first storage area of stage_num size (tile_m+tile_k+tile_n) for each thread block. The first block combination comprises a first matrix block and a second matrix block, the data quantity of the first matrix block is TILE_K, the data quantity of the second matrix block is TILE_M, and the data quantity of the second matrix block is TILE_K.

In one possible implementation, step 202 includes: for any one of the first block combinations, determining at least one second loading unit from the plurality of first loading units, and loading each element in any one of the first block combinations to a corresponding first storage area through the at least one second loading unit when the target dimension meets the set condition.

The first loading unit is a unit for loading elements of a target dimension, the second loading unit is a unit for loading elements of the target dimension in any one of the first block combinations, the target dimension is at least one of the first dimension, the second dimension, the third dimension and the fourth dimension, and the target dimension meets the set condition and comprises that the number of elements of the target weight matrix or the feature matrix in the target dimension is not an integer multiple of block parameters of the target dimension.

In the embodiment of the application, the dimensions of the target weight matrix comprise a first dimension and a second dimension, and the dimensions of the feature matrix comprise a third dimension and a fourth dimension. For any dimension, the electronic device may determine whether the dimension satisfies the set condition. Alternatively, if the number of elements of the target weight matrix or feature matrix in the dimension is not an integer multiple of the block parameter of the dimension, the dimension satisfies the set condition, and the dimension may be regarded as the target dimension.

For example, the dimensions of the target weight matrix include K-dimension and N-dimension, and the dimensions of the feature matrix include M-dimension and K-dimension. The M dimension corresponds to the block parameter TILE_M, the N dimension corresponds to the block parameter TILE_N, and the K dimension corresponds to the block parameter TILE_K. If the number of elements in the M dimension is not divisible by TILE_M, the number of elements in the N dimension is not divisible by TILE_N, and the number of elements in the K dimension is divisible by TILE_K. Then the M-dimension and the N-dimension are target dimensions and the K-dimension is not target dimension.

Since the block parameters are used to control the number of elements of the matrix block in the corresponding dimension, the number of elements of the target dimension cannot be divided by the block parameters of the corresponding dimension, resulting in a partial matrix block lacking a partial element in the dimension. For example, the target weight matrix includes 9 rows and 9 columns of target elements. If the block parameters for both dimensions are 3, the matrix block should comprise 3 rows and 3 columns of elements, in which case the target weight matrix may be divided into 9 first matrix blocks comprising 3 rows and 3 columns of target elements. If the block parameters in both dimensions are 4, the matrix block should comprise 4 rows and 4 columns of elements, in which case the target weight matrix may be divided into 4 first matrix blocks comprising 4 rows and 4 columns of target elements, two first matrix blocks comprising 1 row and 4 column of target elements, two first matrix blocks comprising 4 rows and 1 column of target elements, one first matrix block comprising 1 row and 1 column of target elements, i.e. 5 first matrix blocks lack part of the target elements.

In general, a loading unit corresponding to each element in a matrix block is required to load the corresponding element, so that the matrix block is loaded through a plurality of loading units. The embodiments of the present application are not limited to a load unit, which may be a thread, for example. Since the matrix block may lack a part of elements in the target dimension, for a loading unit (i.e., a first loading unit) for loading the elements in the target dimension, a loading unit (i.e., a second loading unit) for loading the elements in the target dimension in the matrix block may be determined from each first loading unit, and each element in the matrix block is loaded into a corresponding first storage area through each second loading unit, so that loading of the first block combination into the corresponding first storage area is achieved.

By determining the second loading unit from the first loading unit, the loading unit for determining the element of the target dimension in the loading matrix block is realized, and therefore, the loading unit for the target dimension is subjected to boundary processing. Since the boundary processing is performed on the loading units of the target dimension, the boundary processing is not required on the loading units of the non-target dimension, and the additional overhead caused by the boundary processing can be reduced.

Optionally, any one of the first dimension, the second dimension, the third dimension and the fourth dimension may perform boundary processing on the loading unit of the dimension, or may not need to perform boundary processing on the loading unit of the dimension. After determining the target weight matrix, feature matrix, and block parameters for each dimension, the target dimension that requires boundary processing may be determined. The kernel function caller calls a kernel function comprising boundary processing of the loading unit of the target dimension, and boundary processing is carried out on the loading unit of the target dimension through the kernel function. For example, the M-dimension and the N-dimension are target dimensions, and the K-dimension is not a target dimension, the kernel function invoker invokes a kernel function including boundary processing of the M-dimension and the N-dimension loading units, and boundary processing is performed on the M-dimension and the N-dimension loading units by the kernel function. And loading each first block combination into a corresponding first storage area through boundary processing.

In step 203, in the process of loading the plurality of first block combinations into at least two first storage areas, matrix operation is sequentially performed on each second block combination loaded into the first storage areas, so as to obtain an operation result of each second block combination, where the second block combination includes a first matrix block and a second matrix block.

In the embodiment of the application, in the process of loading one first block combination into the corresponding first storage area, the electronic device can perform matrix operation on the second block combination loaded into the first storage area to obtain the operation result of the second block combination. The second block combination includes a first matrix block and a second matrix block, and the embodiment of the application does not limit the manner in which the second block combination performs matrix operation, and illustratively, the matrix operation includes at least one of multiplication operation or addition operation performed by the first matrix block and the second matrix block, or the matrix operation includes operations such as multiplication operation or addition operation performed by each element in the first matrix block and each element in the second matrix block, and multiplication operation or addition operation performed by the operation result of the element.

Fig. 6 is a schematic diagram of a matrix operation according to an embodiment of the present application, as shown in fig. 6. In the embodiment of the present application, the thread block may read a first matrix block from the target weight matrix, where the first matrix block includes tile_k elements in the K dimension and tile_n elements in the N dimension, i.e. the size of the first matrix block is tile_k×tile_n. Similarly, the thread block may also read a second matrix block from the feature matrix, where the second matrix block includes tile_m elements in the M dimension and tile_k elements in the K dimension, i.e., the second matrix block has a size tile_m×tile_k. And reading the first matrix block and the second matrix block into a shared memory, and then, performing matrix multiplication operation on the first matrix block and the second matrix block by the thread block to obtain an operation result, wherein the operation result comprises TILE_M. The above process is continued until the matrix multiplication operation on each first matrix block and each second matrix block is completed, and finally, the operation results of the split_k elements including M rows and N columns are obtained. It is understood that the matrix multiplication result of the target weight matrix and the feature matrix may be determined based on the split_k operation results including M rows and N columns of elements.

Fig. 7 is a schematic diagram of a matrix multiplication according to an embodiment of the present application, as shown in fig. 7. For a feature matrix comprising M rows and K columns of elements, the mth row and the kth column of elements in the feature matrix may be denoted amk. Similarly, for a target weight matrix comprising K rows and N columns of elements, the kth row and N column elements in the target weight matrix may be denoted bkn. The result of the matrix multiplication operation of the target weight matrix and the feature matrix is a matrix comprising M rows and N columns of elements, and the nth row and nth column of elements in the matrix can be marked as cmn. Alternatively, cmn =m1×b1n+m2×b2n+m3×b3n+ … … + amK × bKn, that is, the mth row and n column elements in the operation result can be obtained based on the mth row element in the feature matrix and the n column element in the target weight matrix. Wherein M takes any one of values 1 to M, K takes any one of values 1 to K, and N takes any one of values 1 to N.

Based on the principle, the operation results of each first matrix block in the target weight matrix and each second matrix block in the feature matrix can be calculated, and the operation results of the target weight matrix and the feature matrix are determined based on the operation results. As in fig. 7, based on the 1 st to 3 rd row elements in the feature matrix and the 1 st column element in the target weight matrix, the 1 st row 1 st column element c11, the 2 nd row 1 st column element c21, and the 3 rd row 1 st column element c31 in the operation result can be obtained. The 1 st to 3 rd row elements in the feature matrix can be divided into a plurality of second matrix blocks, the 1 st column elements in the target weight matrix are divided into a plurality of first matrix blocks, and the operation results of the first matrix blocks and the second matrix blocks are synthesized to obtain c11, c21 and c31 in the operation results.

In one possible implementation, the first matrix block comprises at least two first sub-blocks and the second matrix block comprises at least two second sub-blocks. Step 203 includes steps 2031 to 2033 (not shown in the drawings).

Wherein the first matrix block comprises at least two first sub-blocks. The sizes of any two first sub-blocks are the same or different, and target elements included in any two first sub-blocks are not coincident, that is, any one target element in the target weight matrix belongs to one first sub-block.

Optionally, the dimensions of the first matrix block include a second dimension, and the sub-block parameters of the second dimension are used to control the number of first sub-blocks in the second dimension.

In the embodiment of the present application, the first dimension is a row dimension of the first matrix block, the second dimension is a column dimension of the first matrix block, and the number of rows and columns of the first matrix block may be the same or different. The second dimension corresponds to a sub-block parameter, and the number of the first sub-blocks in the corresponding dimension is controlled through the sub-block parameter.

For example, assuming that the first dimension is the K dimension and the second dimension is the N dimension, the first matrix block may include target elements of the tile_k row tile_n column, and tile_k and tile_n are two identical or different positive integers. The sub-block parameter corresponding to the second dimension is WARP_N_NUM, and the number of the first sub-blocks in the second dimension is controlled through WARP_N_NUM. For example, warp_n_num=4, then the first matrix block comprises 4 first sub-blocks in the column dimension.

Similarly, the second matrix block comprises at least two second sub-blocks. Any two second sub-blocks are identical or different in size, and the elements included in any two second sub-blocks are not coincident, that is, any one element in the feature matrix belongs to one second sub-block.

Optionally, the dimensions of the second matrix block include a third dimension, and sub-block parameters of the third dimension are used to control the number of second sub-blocks in the third dimension.

In the embodiment of the present application, the third dimension is a row dimension of the second matrix block, the fourth dimension is a column dimension of the second matrix block, and the number of rows and columns of the second matrix block may be the same or different. The third dimension corresponds to one sub-block parameter, and the number of second sub-blocks in the corresponding dimension is controlled by the sub-block parameter.

For example, assuming that the third dimension is the M dimension, the fourth dimension is the K dimension, the second matrix block may include target elements of TILE_M row TILE_K column, and TILE_M and TILE_K are two identical or different positive integers. The sub-block parameter corresponding to the third dimension is WARP_M_NUM, and the number of the second sub-blocks in the third dimension is controlled through WARP_M_NUM. For example, warp_m_num=2, then the second matrix block includes 2 second sub-blocks in the row dimension.

The embodiment of the application does not limit the determination mode of the sub-block parameters. In one exemplary embodiment, the electronic device may obtain sub-block parameters for each dimension entered by the user, or the electronic device may determine sub-block parameters for a corresponding dimension based on the number of elements of the matrix block in a certain dimension.

Or the corresponding relation of the configuration file record also comprises the matrix block shape and the corresponding parameter value. The matrix block shape comprises the shape of a weight matrix block, and the parameter value corresponding to the shape of the weight matrix block comprises the value of the sub-block parameter of the second dimension. The matrix block shape also comprises a shape of a feature matrix block, and the parameter value corresponding to the shape of the feature matrix block comprises a value of a subblock parameter of a third dimension. The corresponding relation of the configuration file records is at least one. Any two corresponding relations comprise different matrix block shapes, and the different matrix block shapes can correspond to the same or different parameter values.

If the shape of the matrix block recorded by the configuration file comprises the shape of the first matrix block and the shape of the second matrix block, determining a sub-block parameter of the second dimension based on the value of the parameter corresponding to the shape of the first matrix block, and determining a sub-block parameter of the third dimension based on the value of the parameter corresponding to the shape of the second matrix block.

That is, for any corresponding relationship, the shape of the weight matrix block in the corresponding relationship and the shape of the first matrix block can be compared, and if the comparison is passed, the sub-block parameter of the second dimension is determined to be the value of the sub-block parameter of the second dimension in the corresponding relationship. Similarly, the shape of the second matrix block is compared with the shape of the feature matrix block in the corresponding relation, and if the comparison is passed, the sub-block parameter of the third dimension is determined to be the value of the sub-block parameter of the third dimension in the corresponding relation.

In some cases, at least one of the shape of the first matrix block and the shape of the second matrix block is not aligned with any one of the correspondence relationships, in which case the corresponding sub-block parameters may be set to the set values. Or the configuration file is also used for recording the search space (namely the value area) of each sub-block parameter.

The sub-block parameters are used for controlling the number of the sub-blocks of the matrix block in the corresponding dimension, so that the values of the sub-block parameters are positive integers. The configuration file records the value area of the sub-block parameters, which is equivalent to each candidate value of the sub-block parameters. Optionally, the candidate parameter values of the configuration file record further include a candidate value of each sub-block parameter. And determining the sub-block parameters of the second dimension and the sub-block parameters of the third dimension by determining the performance index of each candidate parameter value and determining the sub-block parameters of the second dimension based on the candidate parameter values of which the performance index meets the index condition. The determination process of the above sub-block parameters can be described in steps S1 to S4, and the implementation principles of the two are similar, and will not be repeated here.

Step 2031, for any one of the second block combinations that has been loaded into the first storage area, loading the plurality of sub-block combinations into the second storage area, any one of the second block combinations including the plurality of sub-block combinations, any one of the sub-block combinations including the first sub-block and the second sub-block.

In the embodiment of the application, the first matrix block and the second matrix block are subjected to matrix operation through the thread block. The thread block includes at least two thread bundles, and a sub-block combination includes a first sub-block and a second sub-block. A combination of sub-blocks is matrix-operated by a thread bundle. In practical applications, the electronic device loads the sub-block combinations into the second storage area, so as to facilitate matrix operation of the sub-block combinations in the second storage area. The second storage area is any storage area different from the first storage area, for example, the second storage area is a register, and the register is an integral part of the central processing unit and is used for high-speed storage with limited capacity and can be used for temporarily storing instructions, data, addresses and the like.

In an exemplary embodiment, for any one of the sub-block combinations, in the case where the target dimension satisfies the set condition, at least one fourth loading unit is determined from the plurality of third loading units, and each element in any one of the sub-block combinations is loaded to the second storage area by the at least one fourth loading unit. The third loading unit is a unit for loading the element of the target dimension, and the fourth loading unit is a unit for loading the element of the target dimension in any sub-block combination. The implementation principle of this step may be described in the "determining the second loading unit from the first loading unit, and loading each element in the first block combination to the corresponding first storage area through the second loading unit", where the implementation principle is similar, and the description is omitted here.

In step 2032, matrix operation is performed on each sub-block combination in the second storage area, so as to obtain an operation result of each sub-block combination.

In the embodiment of the application, the electronic equipment performs matrix operation on the sub-block combination loaded to the second storage area to obtain an operation result of the sub-block combination. The sub-block combination includes a first sub-block and a second sub-block, and the method of performing matrix operation by the sub-block combination is not limited in the embodiment of the present application, and illustratively, the matrix operation includes at least one of multiplication operation or addition operation by the first sub-block and the second sub-block, or the matrix operation includes operations such as multiplication operation or addition by each element in the first sub-block and each element in the second sub-block, and multiplication operation or addition by the operation result of the element.

As shown in fig. 8, fig. 8 is a schematic diagram of another matrix operation according to an embodiment of the present application. In the embodiment of the present application, the size of the first matrix block is tile_k×tile_n, and the number of first sub-blocks in the N dimension is controlled by using warp_n_num=4, where the first matrix block includes 4 first sub-blocks in the N dimension. Similarly, the size of the second matrix block is tile_m×tile_k, and the number of second sub-blocks in the M dimension is controlled using watp_m_num=2, in which case the second matrix block includes 2 second sub-blocks in the M dimension. And performing matrix operation on each first sub-block and each second sub-block through 8 thread bundles to obtain each operation result.

For example, a thread bundle is used to perform matrix operation on a first sub-block of a first matrix block in the N dimension and a first second sub-block of a second matrix block in the M dimension to obtain an operation result; Performing matrix operation on the second first sub-block of the first matrix block in the N dimension and the first second sub-block of the second matrix block in the M dimension by using another thread bundle to obtain an operation result; And the like, 8 operation results are obtained in total.

When the first sub-block and the second sub-block are subjected to matrix operation, the data types of the first sub-block and the second sub-block are required to be the same. For example, the data type of the first sub-block and the data type of the second sub-block are FP16, in which case the matrix operation may be directly performed on the first sub-block and the second sub-block.

In an exemplary embodiment, the data types of the first sub-block and the second sub-block are different. In this case, step 2032 includes: for any sub-block combination, performing data type conversion on a first sub-block in any sub-block combination to obtain a target sub-block, wherein the data types of the target sub-block and a second sub-block are the same; and performing matrix operation on the target sub-block and a second sub-block in any sub-block combination to obtain an operation result of any sub-block combination.

In the embodiment of the application, if the data types of the first sub-block and the second sub-block are different, the data type conversion is needed to be carried out on one of the sub-blocks, so that the data type of the converted sub-block is the same as the data type of the other sub-block. The first sub-block is a sub-block of the weight matrix, and the second sub-block is a sub-block of the feature matrix, and in general, the precision of the weight matrix is smaller than that of the feature matrix, so that the sub-block of the weight matrix can be subjected to data type conversion. Based on the data type conversion is carried out on the first sub-block in the sub-block combination to obtain a target sub-block, so that the data types of the target sub-block and the second sub-block in the sub-block combination are the same.

The embodiment of the application does not limit the mode of data type conversion. Illustratively, if the data type of the first sub-block is the set data type, the data type conversion is directly performed on the first sub-block. If the data type of the first sub-block is not the set data type, the element is complemented by a complemented instruction for any element in the first sub-block to obtain the element of the set data type, so as to obtain a third sub-block. The data type of the third sub-block is a set data type, and then the data type conversion is performed on the third sub-block. When the target element is an element for setting the data type, a bit filling instruction can be avoided, so that extra instruction overhead is reduced, and the conversion efficiency of the data type is improved.

For example, one type of GPU includes at least one register, either of which is 32 bits long, based on which the data type is set to be INT32. If the data type of the first sub-block is INT8 or INT4, the elements of INT8 or INT4 in the first sub-block are complemented into the elements of INT32 by using a complemented bit instruction, so that a third sub-block is obtained, and then the data type of the third sub-block is converted. If the elements of the first sub-block are obtained by merging the elements of INT8 or INT4 and the elements of the first sub-block are the elements of INT32, the data type conversion may be directly performed on the first sub-block.

In the embodiment of the application, the reading instruction and the subtracting instruction can be used for carrying out data type conversion on the sub-block with the set data type to obtain the target sub-block. The data type of the target sub-block is the target data type and is the same as the data type of the second sub-block. The process of data type conversion for the first sub-block will be described below taking the example that the data type of the first sub-block is a set data type.

Optionally, the second storage area is multiple, and the first sub-block in the sub-block combination includes multiple target elements. In the process of loading the sub-block combination into the second storage area, for any target element included in the first sub-block in the sub-block combination, a read instruction may be used to extract part of data from the target element, and load the part of data into the second storage area, in this way, loading the target element into at least two second storage areas is achieved, thereby achieving loading of the first sub-block into the plurality of second storage areas. Likewise, a second sub-block of the sub-block combination may be loaded into the plurality of second storage areas. From here on, the loading of the sub-block combinations into the plurality of second storage areas is achieved.

And for partial data loaded into the second storage area, the reading instruction also indicates to carry out bit filling on the partial data, and the data after each bit filling is obtained. The data of the bit indicated by the read instruction may be set data or may be adaptively adjusted according to the data type of the target element. And combining part of data loaded into the second storage area with the data of the complementary bit indicated by the reading instruction to obtain the data after the complementary bit. Then, subtracting the data indicated by the subtracting instruction from the data after the bit filling by using the subtracting instruction to obtain the data of the target data type. Since the target element is loaded to at least two second storage areas, each second storage area can perform data type conversion in the above manner, thereby realizing conversion of the target element into an element of the target data type. The data indicated by the subtraction instruction may be set data or may be adaptively adjusted according to the data type of the target element.

The process of data type conversion is described below taking the example that the data type of the quantization element is INT 8. In the embodiment of the present application, as shown in step 2013, for each column of quantized elements in the quantization weight matrix, 4 INT8 data of 4 consecutive rows are combined into a target element of INT32. In this way, a target weight matrix may be obtained, and a first sub-block may be determined based on the target weight matrix, where the data type of the target element included in the first sub-block is INT32, and in the embodiment of the present application, the set data type includes INT32.

Assume that consecutive 16 rows in the first sub-block are: row 0, row 1, row 2, row 3, row 4, row 5, row 6, row 7, row 8, row 9, row 10, row 11, row 12, row 13, row 14, row 15, each row including one INT32 target element. A GPU includes at least one thread bundle, one thread bundle having a total of 32 threads, requiring: thread 0-3 takes the target elements corresponding to line 0, line 1, line 8, line 9, thread 4-7 takes the target elements corresponding to line 2, line 3, line 10, line 11, and so on. In this case, each thread can read one INT32 target element, and since 4 INT8 data are combined into one INT32 target element, it is equivalent to each thread can read 4 INT8 data, and these 4 INT8 data are set as w0, w1, w2, and w3, respectively.

First, w0 and w1 are extracted using a prmt instruction (a read instruction) and stored in bits 0-7 and 16-23 of a 32-bit register R0, respectively, where w0 and w1 are part of the data mentioned above. Then, 8-15 bits and 24-31 bits of the register R0 are bit-filled, wherein the bit-filled data is 0b01100100. Then, the sub.f16x2 instruction (a subtracting instruction) is used to subtract the data 0x64806480 indicated by the subtracting instruction from the data (i.e. the data after bit filling) in the register R0 to obtain the data of the FP16 data type (i.e. the target data type), so as to realize the conversion of the binary storage mode of the INT8 data type into the binary storage mode of the FP16 data type.

Similarly, w2 and w3 are fetched using one prmt instruction and are stored in bits 0-7 and 16-23, respectively, of another 32-bit register R1, where w2 and w3 are part of the data mentioned above. Then, 8-15 bits and 24-31 bits of the register R1 are bit-filled, wherein the bit-filled data is 01100100. Then, the sub.f16x2 instruction is used to subtract the data 0x64806480 indicated by the subtraction instruction from the data (i.e. the data after the bit addition) in the register R1, so as to implement the conversion of the binary storage mode of the INT8 data type to the binary storage mode of the FP16 data type.

In the above manner, 4 INT8 data are converted into 4 FP16 data based on 4 instructions (2 prmt instructions and 2 sub.f16x2 instructions) are realized. The target elements are INT32 data and comprise 4 INT8 data, and the elements in the target sub-blocks comprise 4 FP16 data.

The process of data type conversion is described below taking the example that the data type of the quantization element is INT 4. In the embodiment of the present application, as shown in step 2013, for each column of quantization elements in the quantization weight matrix, a column of quantization elements is rearranged, so that the row order of the consecutive 32 rows (marked as 0 to 31 rows) is changed to: 0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27, 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31. Wherein, the consecutive 8 rows of INT4 elements after rearrangement are combined into the target element of INT32, for example, the elements corresponding to 0 row, 8 row, 16 row, 24 row, 1 row, 9 row, 17 row and 25 row are saved as the target element of INT32 data type.

Assuming that a GPU includes at least one thread bundle, a total of 32 threads, the requirement is: the thread 0-3 takes the elements corresponding to the line 0, the line 1, the line 8, the line 9, the line 16, the line 17, the line 24 and the line 25, the thread 4-7 takes the data corresponding to the line 2, the line 3, the line 10, the line 11, the line 18, the line 19, the line 26 and the line 27, and so on. Taking thread 0 as an example, thread 0 reads elements of rows 0 and 1, and since consecutive 8 rows of INT4 elements after rearrangement are combined into a target element of INT32, the elements of rows 0 and 1 are part of data in the target element. The two INT4 types of data of 0 row and 1 row can be converted into the FP16 type of data by 8 instructions and stored in a 32-bit register R0. Similarly, two INT4 types of 8 lines and 9 lines are saved to a register R1 through 8 instructions; saving the data of two INT4 types of 16 lines and 17 lines to a register R2 through 8 instructions; two INT4 types of data of 24 and 25 lines are saved to the register R3 by 8 instructions. The manner in which the INT4 type data is converted into the FP16 type data can be seen from the description of the INT8 type data to the FP16 type data, and the implementation principles of the two are similar, which is not repeated here.

By the method, the first sub-block is converted into the target sub-block, so that the data types of the target sub-block and the second sub-block are the same. And then, the electronic equipment performs matrix operation on the target sub-block and the second sub-block to obtain an operation result of sub-block combination. In this way, the result of the operation of each sub-block combination can be determined.

Step 2033, determining an operation result of any one of the second block combinations based on the operation results of the respective sub-block combinations.

The embodiment of the application does not limit the mode of determining the operation result of the second block combination based on the operation result of each sub-block combination. Illustratively, the operation results of the sub-block combinations are spliced to obtain the operation result of the second block combination. Or performing at least one operation such as matrix multiplication or matrix addition on the operation results of each sub-block combination, or performing operation such as element multiplication or element addition, and performing multiplication or addition on the operation results of the elements to obtain operation results of a second block combination.

And 204, determining a matrix operation result of the target weight matrix and the feature matrix based on the operation result of each second block combination, wherein the matrix operation result is used for determining a result of processing the media information through the neural network.

The embodiment of the application does not limit the mode of determining the matrix operation result based on the operation result of each second block combination. Illustratively, the operation results of the second block combinations are spliced to obtain a matrix operation result. Or performing at least one operation such as matrix multiplication or matrix addition on the operation results of the second block combinations, or performing operation such as element multiplication or element addition, and performing multiplication or addition on the operation results of the elements to obtain a matrix operation result.

The matrix operation result is used for determining a result of processing the media information through the neural network, and the embodiment of the application does not limit the processing. Illustratively, the media information may be classified, semantically partitioned, translated, speech synthesized, etc. by a neural network, with different processes corresponding to different results. For example, if the processing is classification, the result of the processing is the classification result of the media information; if the processing is speech synthesis, the result of the processing is generated speech.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.) and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant region. For example, the weight matrix, the feature matrix, and the like, which are referred to in the present application, are acquired under the condition of sufficient authorization.

In the above method, the target weight matrix includes a plurality of first matrix blocks, the feature matrix includes a plurality of second matrix blocks, and the first block combination includes the first matrix blocks and the second matrix blocks. Each first block combination is loaded into at least two first storage areas, and the second matrix blocks are first matrix blocks loaded into the first storage areas. By loading each first matrix block into at least two first storage areas and carrying out matrix operation on the second block combination in the process of loading the first matrix blocks, the matrix blocks which are loaded in advance and wait for operation are realized, and the operation unit is ensured to be in the operation state of the matrix blocks all the time, so that the delay of loading the matrix blocks is covered, and the matrix operation efficiency between the target weight matrix and the feature matrix is improved. Because the target weight matrix is a parameter of the neural network, and the feature matrix is used for representing the content of the media information, the efficiency of processing the media information through the neural network can be improved by improving the matrix operation efficiency between the target weight matrix and the feature matrix.

The foregoing describes the media information processing method according to the embodiment of the present application from the perspective of method steps, and is described in more detail below. The method of the embodiment of the application can be applied to any scene related to the neural network, and the scene comprises but is not limited to a scene related to the tasks of text translation, speech synthesis, intelligent question-answering, image recognition and the like through the neural network. In these scenarios, the network parameters of the neural network include a weight matrix, while the inputs of the neural network include a feature matrix by which text semantics or image content or audio information, etc. can be characterized. In the embodiment of the application, the weight matrix and the feature matrix are stored in the global memory, and the weight matrix is used for performing matrix multiplication operation with the feature matrix. Assuming that the feature matrix a includes elements of M rows and K columns, and the weight matrix B includes elements of K rows and N columns, the operation result C of the matrix multiplication operation satisfies: c=a×b, and C includes elements of M rows and N columns.

When the GPU performs matrix multiplication, the matrix may be divided into matrix blocks, and then multiplication operations are performed on the matrix block level. In the embodiment of the application, the partitioning on the thread block level is used.

At the thread block level, M, K, N dimensions are partitioned. The partition size of each thread block in the M dimension and the N dimension is controlled by using a parameter TILE_ M, TILE _N, and the partition number of the thread block in the K dimension is controlled by using a parameter SPLIT_K. Based on this, the number of thread blocks in the x direction is N divided by tile_n rounded up, the number of thread blocks in the y direction is M divided by tile_m rounded up, the number of thread blocks in the z direction is split_k, the partition size of each thread block in the K dimension is tile_k, and tile_k is equal to the number of elements of the K columns divided by split_k.

Each thread block continuously takes the matrix blocks of TILE_M and TILE_K in the feature matrix A, takes the matrix blocks of TILE_K and TILE_N in the weight matrix B, and stores the matrix blocks into a shared memory. Each thread block corresponds to at least two shared memories, and in the process of storing the matrix blocks into the shared memories, the matrix blocks stored into the shared memories are operated.

The following description will take three shared memories (shared memories 0 through 2, respectively) corresponding to a thread block as an example. As shown in fig. 9, fig. 9 is a schematic diagram of storage and operation according to an embodiment of the present application. Firstly, a matrix block of TILE_M and TILE_K is taken from a feature matrix A in a global memory, a matrix block of TILE_K and TILE_N is taken from a weight matrix B in the global memory, and the two matrix blocks are stored in a shared memory 0. Then, two matrix blocks in the shared memory 0 are operated, and simultaneously, a matrix block of TILE_M and a matrix block of TILE_K are taken from a feature matrix A in the global memory, a matrix block of TILE_K and a matrix block of TILE_N are taken from a weight matrix B in the global memory, and the two matrix blocks are stored in the shared memory 1. And so on.

That is, in the process of storing the matrix blocks in one shared memory, the matrix blocks already stored in the other shared memory are operated, so that the operation unit can be in a calculation state all the time, and the delay of accessing the global memory is covered, thereby improving the calculation efficiency of the GPU.

Wherein, when the matrix block stored in the other shared memory is operated, the matrix block can be read from the shared memory into the register. When the thread blocks operate on the matrix blocks, the thread blocks are performed in units of thread bundles, so that the matrix blocks can be further segmented on the thread bundle level.

On the partition of the thread bundle layer, the thread bundles in the thread block are divided into two dimensions in the M and N directions, and the WARP_M_NUM and WARP_N_NUM are used for controlling the number of the thread bundles in the M and N directions respectively. That is, for matrix blocks of tile_m×tile_k in the feature matrix, the number of sub-blocks in the M dimension is controlled using warp_m_num. For the matrix blocks of tile_k×tile_n in the weight matrix, the number of sub-blocks in the N dimension is controlled using warp_n_num.

In the register, the sub-blocks of the matrix blocks in the feature matrix and the sub-blocks of the matrix blocks in the weight matrix are operated to obtain the operation result of the sub-block combination (comprising the two sub-blocks). Based on the operation results of the plurality of sub-block combinations, the operation results of the matrix block combinations (including the aforementioned plurality of sub-block combinations) are determined. And determining the operation results of the feature matrix and the weight matrix based on the operation results of the combination of the matrix blocks.

By partitioning the weight matrix and the feature matrix and executing multiplication operation on the small blocks, the locality is improved, so that the parallelism of the GPU can be better utilized under the condition that the memory resources of the equipment are limited, the operation efficiency is improved, and the delay is reduced. The neural network of the embodiment of the application can be any network, for example, the neural network can be a large model, and the parallelism of the GPU can be better utilized, so that the deployment requirement of the large model is reduced, and the application of the large model in an actual scene is facilitated.

Fig. 10 is a schematic structural diagram of a media information processing device according to an embodiment of the present application, where the device includes, as shown in fig. 10.

The obtaining module 1001 is configured to obtain a target weight matrix of the neural network and a feature matrix of the input neural network, where the target weight matrix includes a plurality of first matrix blocks, and the feature matrix includes a plurality of second matrix blocks, and the feature matrix is used to characterize content of the media information.

The loading module 1002 is configured to sequentially load a plurality of first block combinations to be loaded into at least two first storage areas, where the first block combinations include a first matrix block and a second matrix block.

The operation module 1003 is configured to sequentially perform matrix operation on each second block combination loaded into the first storage area in a process of loading a plurality of first block combinations into at least two first storage areas, to obtain an operation result of each second block combination, where the second block combination includes a first matrix block and a second matrix block.

A determining module 1004, configured to determine a matrix operation result of the target weight matrix and the feature matrix based on the operation result of each second block combination, where the matrix operation result is used to determine a result of processing the media information through the neural network.

In one possible implementation, the obtaining module 1001 is configured to obtain a first weight matrix of the neural network, where the first weight matrix includes a plurality of columns of first elements; for any column of first elements, determining quantization parameters based on any column of first elements and data bits corresponding to the first data type, and quantizing any column of first elements based on the quantization parameters to obtain a column of quantized elements; a target weight matrix is determined based on the columns of quantized elements.

In one possible implementation, the target weight matrix includes multiple columns of target elements.

An obtaining module 1001, configured to, for a column of target elements, merge a column of quantized elements into a column of target elements, where the target elements include at least two continuous quantized elements; or rearranging a row of quantized elements to obtain a row of rearranged quantized elements, merging the row of rearranged quantized elements into a row of target elements, wherein the target elements comprise at least two consecutive rearranged quantized elements.

In one possible implementation, the dimensions of the target weight matrix include a first dimension and a second dimension, the block parameter of the first dimension being used to control the number of elements of the first matrix block in the first dimension, and the block parameter of the second dimension being used to control the number of elements of the first matrix block in the second dimension.

The dimensions of the feature matrix include a third dimension and a fourth dimension, the block parameter of the third dimension being used to control the number of elements of the second matrix block in the third dimension, and the block parameter of the fourth dimension being used to control the number of elements of the second matrix block in the fourth dimension.

In a possible implementation manner, the loading module 1002 is configured to determine, for any one of the first block combinations, at least one second loading unit from the plurality of first loading units, and load, by the at least one second loading unit, each element in any one of the first block combinations into a corresponding first storage area, where the target dimension meets a set condition.

In one possible implementation, the obtaining module 1001 is further configured to obtain a configuration file, where the configuration file is used to record the matrix shape and the corresponding parameter values.

The determining module 1004 is further configured to determine a block parameter of the first dimension, a block parameter of the second dimension, a block parameter of the third dimension, and a block parameter of the fourth dimension based on the parameter values corresponding to the matrix shape if the matrix shape of the profile record includes the matrix shape of the target weight matrix and the matrix shape of the feature matrix.

In one possible implementation, the configuration file is further used to record a plurality of candidate parameter values.

The determining module 1004 is further configured to determine a performance indicator of each candidate parameter value if the matrix shape recorded in the configuration file does not include at least one of the matrix shape of the target weight matrix or the matrix shape of the feature matrix, where the performance indicator of the candidate parameter value is used to describe an operation time required for performing matrix operation on the target weight matrix and the feature matrix based on the candidate parameter value.

The determining module 1004 is further configured to determine a block parameter of the first dimension, a block parameter of the second dimension, a block parameter of the third dimension, and a block parameter of the fourth dimension based on the candidate parameter values that the performance index satisfies the index condition.

In one possible implementation, the first matrix block comprises at least two first sub-blocks and the second matrix block comprises at least two second sub-blocks.

An operation module 1003, configured to load, for any one of second block combinations that have been loaded into the first storage area, a plurality of sub-block combinations into the second storage area, any one of the second block combinations including a plurality of sub-block combinations, any one of the sub-block combinations including a first sub-block and a second sub-block; performing matrix operation on each sub-block combination in the second storage area to obtain an operation result of each sub-block combination; the operation result of any one of the second block combinations is determined based on the operation result of each of the sub-block combinations.

In one possible implementation, the dimensions of the first matrix block include a second dimension, and the sub-block parameters of the second dimension are used to control the number of first sub-blocks in the second dimension.

The dimensions of the second matrix block include a third dimension, and sub-block parameters of the third dimension are used to control the number of second sub-blocks in the third dimension.

In one possible implementation, the first sub-block and the second sub-block are different in data type.

The operation module 1003 is configured to perform data type conversion on a first sub-block in any sub-block combination to obtain a target sub-block, where the data types of the target sub-block and the second sub-block are the same; and performing matrix operation on the target sub-block and a second sub-block in any sub-block combination to obtain an operation result of any sub-block combination.

In the above device, the target weight matrix includes a plurality of first matrix blocks, the feature matrix includes a plurality of second matrix blocks, and the first block combination includes the first matrix blocks and the second matrix blocks. Each first block combination is loaded into at least two first storage areas, and the second matrix blocks are first matrix blocks loaded into the first storage areas. By loading each first matrix block into at least two first storage areas and carrying out matrix operation on the second block combination in the process of loading the first matrix blocks, the matrix blocks which are loaded in advance and wait for operation are realized, and the operation unit is ensured to be in the operation state of the matrix blocks all the time, so that the delay of loading the matrix blocks is covered, and the matrix operation efficiency between the target weight matrix and the feature matrix is improved. Because the target weight matrix is a parameter of the neural network, and the feature matrix is used for representing the content of the media information, the efficiency of processing the media information through the neural network can be improved by improving the matrix operation efficiency between the target weight matrix and the feature matrix.

It should be understood that, in implementing the functions of the apparatus provided in fig. 10, only the division of the functional modules is illustrated, and in practical application, the functional modules may be allocated to different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Fig. 11 shows a block diagram of a terminal device 1100 according to an exemplary embodiment of the present application. The terminal device 1100 includes: a processor 1101 and a memory 1102.

The processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 1101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1101 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one computer program for execution by processor 1101 to implement the method of processing media information provided by a method embodiment of the present application.

In some embodiments, the terminal device 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102, and peripheral interface 1103 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, a display screen 1105, a camera assembly 1106, audio circuitry 1107, and a power supply 1108.

A peripheral interface 1103 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1101 and memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, memory 1102, and peripheral interface 1103 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1104 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 1104 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.

The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1105 is a touch display, the display 1105 also has the ability to collect touch signals at or above the surface of the display 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this time, the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1105 may be one and disposed on the front panel of the terminal device 1100; in other embodiments, the display 1105 may be at least two, and disposed on different surfaces of the terminal device 1100 or in a folded design; in other embodiments, the display 1105 may be a flexible display disposed on a curved surface or a folded surface of the terminal device 1100. Even more, the display 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display screen 1105 may be made of materials such as an LCD (Liquid CRYSTAL DISPLAY) and an OLED (Organic Light-Emitting Diode).

The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1106 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing, or inputting the electric signals to the radio frequency circuit 1104 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal device 1100, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1107 may also include a headphone jack.

A power supply 1108 is used to power the various components in terminal device 1100. The power supply 1108 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 1108 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal device 1100 also includes one or more sensors 1109. The one or more sensors 1109 include, but are not limited to: acceleration sensor 1111, gyroscope sensor 1112, pressure sensor 1113, optical sensor 1114, and proximity sensor 1115.

The acceleration sensor 1111 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established in the terminal apparatus 1100. For example, the acceleration sensor 1111 may be configured to detect components of gravitational acceleration in three coordinate axes. The processor 1101 may control the display screen 1105 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1111. Acceleration sensor 1111 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 1112 may detect a body direction and a rotation angle of the terminal device 1100, and the gyro sensor 1112 may collect a 3D motion of the user on the terminal device 1100 in cooperation with the acceleration sensor 1111. The processor 1101 may implement the following functions based on the data collected by the gyro sensor 1112: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 1113 may be disposed at a side frame of the terminal device 1100 and/or at a lower layer of the display screen 1105. When the pressure sensor 1113 is provided at a side frame of the terminal apparatus 1100, a grip signal of the terminal apparatus 1100 by a user can be detected, and the processor 1101 performs left-right hand recognition or quick operation based on the grip signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the display screen 1105, the processor 1101 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display screen 1105. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 1114 is used to collect the ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the display screen 1105 based on the intensity of ambient light collected by the optical sensor 1114. Specifically, when the intensity of the ambient light is high, the display luminance of the display screen 1105 is turned up; when the ambient light intensity is low, the display luminance of the display screen 1105 is turned down. In another embodiment, the processor 1101 may also dynamically adjust the shooting parameters of the camera assembly 1106 based on the intensity of ambient light collected by the optical sensor 1114.

A proximity sensor 1115, also referred to as a distance sensor, is typically provided on the front panel of the terminal device 1100. The proximity sensor 1115 is used to collect the distance between the user and the front surface of the terminal device 1100. In one embodiment, when the proximity sensor 1115 detects that the distance between the user and the front surface of the terminal device 1100 gradually decreases, the processor 1101 controls the display 1105 to switch from the bright screen state to the off screen state; when the proximity sensor 1115 detects that the distance between the user and the front surface of the terminal apparatus 1100 gradually increases, the processor 1101 controls the display screen 1105 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is not limiting and that terminal device 1100 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1200 may include one or more processors 1201 and one or more memories 1202, where the one or more memories 1202 store at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 1201 to implement the processing method of media information provided by the foregoing method embodiments, and the processor 1201 is a CPU. Of course, the server 1200 may also have a wired or wireless network interface, a keyboard, an input/output interface, etc. for performing input/output, and the server 1200 may also include other components for implementing device functions, which are not described herein.

In an exemplary embodiment, there is also provided a computer readable storage medium having stored therein at least one computer program loaded and executed by a processor to cause an electronic device to implement a method of processing any one of the above media information.

Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (Compact Disc Read-Only Memory, CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program, which is at least one piece, loaded and executed by a processor, to cause an electronic device to implement a method of processing any of the above-mentioned media information.

In an exemplary embodiment, there is also provided a computer program product in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to cause an electronic device to implement a method of processing any of the above-mentioned media information.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The above embodiments are merely exemplary embodiments of the present application and are not intended to limit the present application, any modifications, equivalent substitutions, improvements, etc. that fall within the principles of the present application should be included in the scope of the present application.

Claims

1. A method for processing media information, the method comprising:

Acquiring a target weight matrix of a neural network and inputting a feature matrix of the neural network, wherein the target weight matrix comprises a plurality of first matrix blocks, the feature matrix comprises a plurality of second matrix blocks, and the feature matrix is used for representing the content of media information;

Sequentially loading a plurality of first block combinations to be loaded into at least two first storage areas, wherein the first block combinations comprise the first matrix blocks and the second matrix blocks, the first matrix blocks comprise at least two first sub-blocks, and the second matrix blocks comprise at least two second sub-blocks;

In the process of loading the plurality of first block combinations into the at least two first storage areas, for any one second block combination which is loaded into the first storage areas, loading a plurality of sub-block combinations included in the any one second block combination into a second storage area, wherein the second block combination comprises the first matrix block and the second matrix block, and any one sub-block combination comprises the first sub-block and the second sub-block, and the data types of the first sub-block and the second sub-block are different;

For any one sub-block combination, performing data type conversion on a first sub-block in the any one sub-block combination to obtain a target sub-block, wherein the data types of the target sub-block and the second sub-block are the same; performing matrix operation on the target sub-block and a second sub-block in any sub-block combination to obtain an operation result of any sub-block combination;

Determining the operation result of any second block combination based on the operation result of each sub-block combination;

And determining a matrix operation result of the target weight matrix and the feature matrix based on operation results of the second block combinations, wherein the matrix operation result is used for determining a result of processing the media information through the neural network.

2. The method of claim 1, wherein the obtaining the target weight matrix for the neural network comprises:

acquiring a first weight matrix of the neural network, wherein the first weight matrix comprises a plurality of columns of first elements;

For any column of first elements, determining quantization parameters based on data bits corresponding to the any column of first elements and the first data type, and quantizing the any column of first elements based on the quantization parameters to obtain a column of quantized elements;

the target weight matrix is determined based on a plurality of columns of quantization elements.

3. The method of claim 2, wherein the target weight matrix comprises a plurality of columns of target elements; the determining the target weight matrix based on the multi-column quantization element includes:

for a column of target elements, merging a column of quantized elements into the column of target elements, the target elements comprising successive at least two quantized elements;

Or rearranging the array of quantized elements to obtain an array of rearranged quantized elements, and merging the array of rearranged quantized elements into the array of target elements, wherein the target elements comprise at least two consecutive rearranged quantized elements.

4. The method of claim 1, wherein the dimensions of the target weight matrix include a first dimension and a second dimension, wherein a block parameter of the first dimension is used to control a number of elements of the first matrix block in the first dimension, and wherein a block parameter of the second dimension is used to control a number of elements of the first matrix block in the second dimension;

the dimensions of the feature matrix include a third dimension and a fourth dimension, a block parameter of the third dimension is used for controlling the number of elements of the second matrix block in the third dimension, and a block parameter of the fourth dimension is used for controlling the number of elements of the second matrix block in the fourth dimension.

5. The method of claim 4, wherein sequentially loading the plurality of first block combinations to be loaded into the at least two first storage areas comprises:

For any one of the first block combinations, determining at least one second loading unit from the plurality of first loading units, and loading each element in the any one of the first block combinations to a corresponding first storage area through the at least one second loading unit when the target dimension meets the set condition;

The first loading unit is a unit for loading elements of the target dimension, the second loading unit is a unit for loading elements of the target dimension in any one of the first block combinations, the target dimension is at least one of the first dimension, the second dimension, the third dimension and the fourth dimension, and the target dimension meets a set condition and comprises that the number of elements of the target weight matrix or the feature matrix in the target dimension is not an integer multiple of block parameters of the target dimension.

6. The method according to claim 4, wherein the method further comprises:

acquiring a configuration file, wherein the configuration file is used for recording the shape of a matrix and the corresponding parameter value;

And if the matrix shape of the configuration file record comprises the matrix shape of the target weight matrix and the matrix shape of the feature matrix, determining the block parameters of the first dimension, the block parameters of the second dimension, the block parameters of the third dimension and the block parameters of the fourth dimension based on the parameter values corresponding to the matrix shape.

7. The method of claim 6, wherein the configuration file is further configured to record a plurality of candidate parameter values; the method further comprises the steps of:

if the matrix shape of the configuration file record does not comprise at least one of the matrix shape of the target weight matrix or the matrix shape of the feature matrix, determining performance indexes of the candidate parameter values, wherein the performance indexes of the candidate parameter values are used for describing operation time required by performing matrix operation on the target weight matrix and the feature matrix based on the candidate parameter values;

And determining the block parameters of the first dimension, the block parameters of the second dimension, the block parameters of the third dimension and the block parameters of the fourth dimension based on the candidate parameter values of which the performance indexes meet the index conditions.

8. The method of claim 1, wherein the dimensions of the first matrix block include a second dimension, a sub-block parameter of the second dimension being used to control the number of the first sub-blocks in the second dimension;

The dimensions of the second matrix block include a third dimension, a sub-block parameter of the third dimension being used to control the number of the second sub-blocks in the third dimension.

9. A device for processing media information, the device comprising:

The acquisition module is used for acquiring a target weight matrix of the neural network and inputting a feature matrix of the neural network, wherein the target weight matrix comprises a plurality of first matrix blocks, the feature matrix comprises a plurality of second matrix blocks, and the feature matrix is used for representing the content of the media information;

the loading module is used for sequentially loading a plurality of first block combinations to be loaded into at least two first storage areas, wherein the first block combinations comprise the first matrix blocks and the second matrix blocks, the first matrix blocks comprise at least two first sub-blocks, and the second matrix blocks comprise at least two second sub-blocks;

An operation module, configured to, in a process of loading the plurality of first block combinations into the at least two first storage areas, load, for any one second block combination that has been loaded into the first storage areas, a plurality of sub-block combinations that are included in the any one second block combination, the second block combination including the first matrix block and the second matrix block, any one sub-block combination including the first sub-block and the second sub-block, the data types of the first sub-block and the second sub-block being different;

The operation module is further configured to perform data type conversion on a first sub-block in the any sub-block combination to obtain a target sub-block, where the data types of the target sub-block and the second sub-block are the same; performing matrix operation on the target sub-block and a second sub-block in any sub-block combination to obtain an operation result of any sub-block combination;

the operation module is further used for determining the operation result of any second block combination based on the operation result of each sub-block combination;

And the determining module is used for determining a matrix operation result of the target weight matrix and the feature matrix based on operation results of the second block combinations, and the matrix operation result is used for determining a result of processing the media information through the neural network.

10. The apparatus of claim 9, wherein the means for obtaining is configured to obtain a first weight matrix of the neural network, the first weight matrix comprising a plurality of columns of first elements; for any column of first elements, determining quantization parameters based on data bits corresponding to the any column of first elements and the first data type, and quantizing the any column of first elements based on the quantization parameters to obtain a column of quantized elements; the target weight matrix is determined based on a plurality of columns of quantization elements.

11. The apparatus of claim 10, wherein the target weight matrix comprises a plurality of columns of target elements;

the acquisition module is used for merging a list of quantized elements into a list of target elements for the list of target elements, wherein the target elements comprise at least two continuous quantized elements; or rearranging the array of quantized elements to obtain an array of rearranged quantized elements, and merging the array of rearranged quantized elements into the array of target elements, wherein the target elements comprise at least two consecutive rearranged quantized elements.

12. The apparatus of claim 9, wherein the dimensions of the target weight matrix include a first dimension and a second dimension, a block parameter of the first dimension being used to control a number of elements of the first matrix block in the first dimension, and a block parameter of the second dimension being used to control a number of elements of the first matrix block in the second dimension;

13. The apparatus according to claim 12, wherein the loading module is configured to determine, for any one of the first block combinations, at least one second loading unit from among the plurality of first loading units, through which each element in the any one of the first block combinations is loaded into the corresponding first storage area, in a case where the target dimension satisfies the set condition;

14. The apparatus of claim 12, wherein the obtaining module is further configured to obtain a configuration file, the configuration file being configured to record a matrix shape and a corresponding parameter value;

the determining module is further configured to determine, if the matrix shape of the profile record includes a matrix shape of the target weight matrix and a matrix shape of the feature matrix, a block parameter of the first dimension, a block parameter of the second dimension, a block parameter of the third dimension, and a block parameter of the fourth dimension based on a parameter value corresponding to the matrix shape.

15. The apparatus of claim 14, wherein the configuration file is further configured to record a plurality of candidate parameter values;

The determining module is further configured to determine a performance indicator of each candidate parameter value if the matrix shape recorded in the configuration file does not include at least one of the matrix shape of the target weight matrix or the matrix shape of the feature matrix, where the performance indicator of the candidate parameter value is used to describe an operation time required for performing matrix operation on the target weight matrix and the feature matrix based on the candidate parameter value;

The determining module is further configured to determine a block parameter of the first dimension, a block parameter of the second dimension, a block parameter of the third dimension, and a block parameter of the fourth dimension based on the candidate parameter values that the performance index satisfies the index condition.

16. The apparatus of claim 9, wherein the dimensions of the first matrix block include a second dimension, a sub-block parameter of the second dimension being used to control the number of the first sub-blocks in the second dimension;

17. An electronic device comprising a processor and a memory, wherein the memory stores at least one computer program, the at least one computer program being loaded and executed by the processor to cause the electronic device to implement a method of processing media information according to any one of claims 1 to 8.

18. A computer readable storage medium, wherein at least one computer program is stored in the computer readable storage medium, and the at least one computer program is loaded and executed by a processor, so as to cause an electronic device to implement the method for processing media information according to any one of claims 1 to 8.

19. A computer program product, characterized in that at least one computer program is stored in the computer program product, which is loaded and executed by a processor to cause an electronic device to implement a method of processing media information according to any of claims 1 to 8.