CN117332197A - Data calculation method and related equipment - Google Patents

Data calculation method and related equipment Download PDF

Info

Publication number
CN117332197A
CN117332197A CN202210736691.6A CN202210736691A CN117332197A CN 117332197 A CN117332197 A CN 117332197A CN 202210736691 A CN202210736691 A CN 202210736691A CN 117332197 A CN117332197 A CN 117332197A
Authority
CN
China
Prior art keywords
matrix
data
column vector
network device
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210736691.6A
Other languages
Chinese (zh)
Inventor
傅光宁
谢星华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210736691.6A priority Critical patent/CN117332197A/en
Priority to PCT/CN2023/101090 priority patent/WO2024001841A1/en
Publication of CN117332197A publication Critical patent/CN117332197A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Stored Programmes (AREA)
  • Complex Calculations (AREA)

Abstract

The application provides a data calculation method and related equipment, wherein in the application, network equipment acquires a first matrix and a second matrix, wherein the first matrix is a weight matrix after pruning in a target model, and the second matrix is data input into the target model. The network device receives a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.

Description

Data calculation method and related equipment
Technical Field
The present application relates to the field of computers, and further to the application of artificial intelligence (artificial intelligence, AI) technology in the field of computer networks, and in particular to a data computing method and related devices.
Background
In the current AI field, models are of a large variety and models gradually tend to be complex. According to the lottery hypothesis, the sparse model and the complex model have the same or even better learning ability. And moreover, the cost and the time consumption are high during the training of the complex model. The sparse model training has the advantages of low cost and short time consumption. How to sparsify a complex model and ensure that the sparse model can run smoothly is a key to realizing model acceleration training.
In order to realize sparsification of a complex model, pruning can be generally performed on a weight matrix in the complex model, and operation can be performed on the weight matrix and data after pruning. Currently, the dominant sparse acceleration technique is 2:4 fine granularity structured matrix multiplication acceleration techniques. By 2-performing the weight matrix in the AI model: 4 structural pruning, namely reserving 2 elements in every 4 elements in the weight matrix, and completing sparsification of the weight matrix, so that AI model calculation acceleration is realized through the technology, and time required by model training is saved.
However, in the conventional model acceleration technique, only fixed 2:4, the weight matrix is thinned by the sparse structure, the sparse proportion of the weight matrix cannot be changed, and the sparse structure is single in applicable scene and poor in compatibility.
Disclosure of Invention
The application provides a data calculation method and related equipment. In the method, the network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.
The first aspect of the present application provides a data calculation method, in which a network device obtains a first matrix and a second matrix, where the first matrix is used to represent a weight matrix after pruning in a target model, and the second matrix is used to represent data input into the target model; the network equipment receives a configuration instruction, wherein the configuration instruction is used for setting sparse proportion; the network equipment compresses the second matrix according to the sparse proportion to obtain a third matrix; the network device calculates a product of the first matrix and the third matrix.
In the application, the network device acquires a first matrix and a second matrix, wherein the first matrix is a weight matrix after pruning in the target model, and the second matrix is data input into the target model. The network device receives a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.
In a possible implementation manner of the first aspect, the compressing, by the network device, the second matrix according to the sparse ratio to obtain a third matrix includes: the network equipment compresses a first column vector according to the sparse proportion to obtain a second column vector, wherein the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix; the network device calculating a product of the first matrix and the third matrix, comprising: the network device calculates a product of a first row vector and the second column vector, the first row vector belonging to the first matrix.
In this possible implementation, the first column vector belongs to the second matrix and the second column vector belongs to the third matrix. It may be understood that the process of the network device compressing the second matrix according to the sparse ratio to obtain the third matrix includes a process of the network device compressing the first column vector according to the sparse ratio to obtain the second column vector. Furthermore, the first row vector belongs to the first matrix. The process of the network device calculating the product of the first matrix and the third matrix may also include the process of the network device calculating the product of the first row vector and the second column vector. The possible implementation mode explains a specific calculation process of the product of the first matrix and the third matrix, and the feasibility of the scheme is improved.
In a possible implementation manner of the first aspect, the method further includes: the network equipment calculates a first moving number according to a non-zero data index, wherein the non-zero data index is used for representing the distribution condition of data in the first matrix in a weight matrix before pruning, the first moving number is used for representing the moving step number of selected data, and the selected data is used for representing data corresponding to the non-zero data index in a first column vector; the network device compresses the first column vector according to the non-zero data index to obtain a second column vector, comprising: and the network equipment compresses the first column vector according to the first moving number to obtain a second column vector.
In this possible implementation, the non-zero data index represents the distribution of the data in the first matrix in the weight matrix before pruning. Since the first row vector is included in the first matrix, the non-zero data index includes the distribution of the first row vector in the weight matrix before pruning. The network device may obtain the selected data from the first column vector according to the non-zero data index, where the selected data is obtained according to a distribution condition of the first row vector in the weight matrix before pruning, and may further obtain a moving number of the selected data, that is, a first moving number, according to a distribution condition of the first row vector in the weight matrix before pruning, and further compress the first column vector according to the first moving number to obtain the second column vector. In this possible implementation manner, the network device calculates the first moving step number through the non-zero data index, and further compresses the first column vector according to the first moving step number to obtain the second column vector. By the method, the first column vector is compressed, and the computing efficiency of the network equipment is improved.
In a possible implementation manner of the first aspect, the compressing, by the network device, the first column vector according to the first number of moves to obtain a second column vector includes: the network device selects selected data from the first column vector according to the non-zero data index; and the network equipment shifts the selected data according to the first shifting number to obtain the second column vector.
In this possible implementation manner, the network device may obtain, according to the non-zero data index, the selected data in the first column vector, where the selected data is obtained according to the distribution condition of the first row vector in the weight matrix before pruning. The network device can also obtain the first moving number according to the distribution condition of the first row vector in the weight matrix before pruning, and further shift the selected data according to the first moving number to obtain the second column vector.
In a possible implementation manner of the first aspect, the method further includes: the network equipment selects a plurality of groups of selected data from the first column vector according to the non-zero data index, wherein each group of selected data comprises one or more selected data; the network equipment shifts the multiple groups of access data according to the first moving number to obtain multiple groups of access vectors; and the network equipment compresses the multiple groups of selected vectors according to the sparse proportion to obtain the second column vector.
In this possible implementation manner, if the first matrix data size is larger, the network device may obtain multiple sets of selected data in the first column vector according to the non-zero data index, where each set of selected data is obtained according to the distribution condition of the first row vector in the weight matrix before pruning. The network device may further obtain a first moving number corresponding to each group of the selected data according to a distribution condition of the first row vectors in the weight matrix before pruning, further shift each group of the selected data according to each first moving number to obtain multiple groups of selected vectors, and compress the multiple groups of selected vectors according to the sparse ratio to obtain a second column vector. In the possible implementation manner, if the first matrix data size is larger, according to the computing power of the processing components (processing element, PE), the network device can compress the multiple groups of selected vectors to form the second column vector, and the product of the first row vector and the second column vector is completed in one processing component, so that the computing power of the processing component is fully utilized, the computing power waste is avoided, and the computing efficiency is improved.
In a possible implementation manner of the first aspect, the sparse proportion is determined by a computational effort of a processing component in the network device and/or a weight matrix before pruning.
In this possible implementation manner, when the sparse ratio is configured, the value of the sparse ratio may be determined according to the calculation force of the processing component, the complexity of the weight matrix before pruning, the precision required in training the weight matrix before pruning, and other factors. The possible implementation manner provides a plurality of factors to be considered when the sparse proportion is determined, and different values can be set according to different factors when the network equipment sets the sparse proportion, so that the flexibility of the scheme is improved.
A second aspect of the present application provides a network device comprising at least one processor, a memory, and a communication interface. The processor is coupled with the memory and the communication interface. The memory is used for storing instructions, the processor is used for executing the instructions, and the communication interface is used for communicating with other network devices under the control of the processor. The instructions, when executed by a processor, cause the network device to perform the method of the first aspect or any of the possible implementations of the first aspect.
A third aspect of the present application provides a computer readable storage medium storing a program for causing a terminal device to perform the method of the first aspect or any possible implementation of the first aspect.
A fourth aspect of the present application provides a computer program product storing one or more computer-executable instructions which, when executed by the processor, perform the method of the first aspect or any one of the possible implementations of the first aspect.
A fifth aspect of the present application provides a chip comprising a processor and a communication interface, the processor being coupled to the communication interface, the processor being configured to read instructions to perform a method according to the first aspect or any one of the possible implementations of the first aspect.
A sixth aspect of the present application is a network system comprising a network device on which the method of the first aspect or any one of the possible implementations of the first aspect is performed.
Drawings
FIG. 1 is a schematic diagram of a data computing system according to the present application;
FIG. 2 is a schematic flow chart of a data calculation method provided in the present application;
FIG. 3 is a schematic diagram of an embodiment of a data computing method provided herein;
FIG. 4 is a schematic diagram of another embodiment of a data calculation method provided herein;
FIG. 5 is a schematic diagram of another embodiment of a data calculation method provided herein;
FIG. 6 is a schematic diagram of another embodiment of a data calculation method provided herein;
FIG. 7 is a schematic diagram of another embodiment of a data calculation method provided herein;
FIG. 8 is a schematic diagram of another embodiment of a data calculation method provided herein;
FIG. 9 is a schematic diagram of another embodiment of a data calculation method provided herein;
fig. 10 is a schematic structural diagram of a network device provided in the present application;
fig. 11 is a schematic structural diagram of another network device provided in the present application.
Detailed Description
Examples provided herein are described below with reference to the accompanying drawings, it being apparent that the examples described are only examples of some, but not all, of the present applications. As one of ordinary skill in the art can appreciate, with the development of technology and the appearance of new scenes, the technical solution provided in the present application is also applicable to similar technical problems.
The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the examples described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The term "and/or" in this application is merely an association relation describing an association object, and means that three kinds of relations may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
In the current AI field, models are of a large variety and models gradually tend to be complex. According to the lottery hypothesis, the sparse model and the complex model have the same or even better learning ability. And moreover, the cost and the time consumption are high during the training of the complex model. The sparse model training has the advantages of low cost and short time consumption. How to sparsify a complex model and ensure that the sparse model can run smoothly is a key to realizing model acceleration training.
In order to realize sparsification of a complex model, pruning can be generally performed on a weight matrix in the complex model, and operation can be performed on the weight matrix and data after pruning. Currently, the dominant sparse acceleration technique is 2:4 fine granularity structured matrix multiplication acceleration techniques. By 2-performing the weight matrix in the AI model: 4 structural pruning, namely reserving 2 elements in every 4 elements in the weight matrix, and completing sparsification of the weight matrix, so that AI model calculation acceleration is realized through the technology, and time required by model training is saved.
However, in the conventional model acceleration technique, only fixed 2:4, the weight matrix is thinned by the sparse structure, the sparse proportion of the weight matrix cannot be changed, and the sparse structure is single in applicable scene and poor in compatibility.
In order to solve the problems in the above-mentioned scheme, the present application provides a data computing method, a data computing system and a network device, where in the present application, the network device obtains a first matrix and a second matrix, where the first matrix is a weight matrix after pruning in a target model, and the second matrix is data input into the target model. The network device receives a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.
The structure of a data computing system provided herein is first described.
Fig. 1 is a schematic structural diagram of a data computing system provided in the present application.
In this application, the data computing system includes a computing engine, where the computing engine obtains a first matrix and a second matrix from a register, where the first matrix is a weight matrix after pruning in the target model, as in fig. 1, it is assumed that the scale of the first matrix is mxk, that is, the first matrix has M rows of data and K columns of data. The second matrix is data of the input target model, and the second matrix is assumed to have a size wk×n, that is, the second matrix has WK row data and N column data, where W is a sparse ratio. The calculation engine can flexibly set the value of the sparse proportion according to the configuration instruction, and further, the second matrix can be compressed according to the sparse proportion to obtain the third matrix. The third matrix has a size of kxn, that is, the third matrix has K rows of data and N columns of data. The size of the matrix obtained by the calculation engine after calculating the product of the first matrix and the third matrix is m×n, i.e. the matrix has M rows of data and N columns of data.
In the application, the data computing system comprises a computing engine, and the computing engine can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.
The data computing method provided in the present application is described based on the data computing system described in fig. 1.
Fig. 2 is a schematic calculation flow chart of a data calculation method provided in the present application.
101. The network device obtains a first matrix and a second matrix.
In the application, the first matrix is used for representing the weight matrix after pruning in the target model, and the second matrix is used for representing the data input into the target model.
Illustratively, as in fig. 1, the first matrix is assumed to be m×k in size, i.e., the first matrix has M rows of data and K columns of data. The first matrix may be obtained by pruning a weight matrix before pruning, and the size of the weight matrix before pruning is wmxk. The second matrix is data of the input target model, and the second matrix is assumed to have a size wk×n, that is, the second matrix has WK row data and N column data, where W is a sparse ratio.
102. The network device receives the configuration instruction.
In the application, the configuration instruction is used for setting the sparse proportion, and the sparse proportion can be changed. The network device may configure the sparse proportion according to the configuration instruction in a plurality of ways, which will be described in detail below.
Mode one: the configuration instruction includes a sparse ratio value.
In the application, the configuration instruction may include a value of a sparse ratio, where the sparse ratio may be configured by an operation and maintenance personnel according to a requirement. Optionally, the calculation force of the processing component can be considered when the sparse proportion is configured, and if the processing component PE can complete data calculation of 8×8 types, the sparse proportion can select multiple schemes of 8:16,4:16,4:8,2:8, and the like according to the complexity of the weight matrix and the training precision requirement of the model on the basis that the calculation force of the processing component is not exceeded under the existing calculation force. Optionally, under the condition of calculation support of the processing component, training precision required by the weight matrix without pruning can be considered when the value of the sparse proportion is configured, and the value of the sparse proportion can be flexibly adjusted according to the training precision. Optionally, under the condition of calculation support of the processing component, the complexity of the weight matrix which is not pruned can be considered when the value of the sparse proportion is configured, and the value of the sparse proportion can be flexibly adjusted according to the complexity of the weight matrix. Alternatively, the operator may also configure the value of the sparse ratio according to other factors, which is not limited herein.
Mode two: generating sparse proportion according to the configuration instruction.
In the present application, the configuration instruction may not include a sparse ratio value, and the sparse ratio value may be generated after the network device receives the configuration instruction. Optionally, after receiving the configuration instruction, the network device may determine a value of the sparse proportion according to the data amount of the weight matrix without pruning, and generate different values of the sparse proportion for the weight matrix without pruning with different data amounts. For example, when the network device confirms that the data amount of the weight matrix that is not pruned is greater than the first threshold a, the value of the sparse ratio may be set to a ratio a, and when the network device confirms that the data amount of the weight matrix that is not pruned is less than or equal to the first threshold a, the value of the sparse ratio may be set to a ratio B, where the value of the ratio a is less than the value of the ratio B. In the setting mode, when the data quantity of the weight matrix which is not pruned is large, the sparse proportion is small, so that the training efficiency of the weight matrix can be improved. Optionally, after receiving the configuration instruction, the network device may determine the value of the sparse proportion according to the training accuracy required by the weight matrix that is not pruned. For example, an unbiased weight matrix may be accompanied by an identification that is used to account for the training accuracy of the unbiased weight matrix, from which the network device may generate a corresponding sparse ratio value, so that the network device may generate different coefficient ratio values according to different accuracies. Alternatively, the network device may also generate the value of the coefficient scale according to other factors, and is not limited herein in particular.
103. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix.
In this application, the computation engine may compress the second matrix according to a sparse ratio to obtain a third matrix. If the second matrix is assumed to have a size of wk×n, that is, the second matrix has WK row data and N column data, where W is a sparse ratio, the third matrix has a size of k×n, that is, the third matrix has K row data and N column data.
104. The network device calculates a product of the first matrix and the third matrix.
In this application, the network device may calculate the product of the first matrix and the third matrix using the processing element PE, so as to obtain the convolution result.
In the application, the network device acquires a first matrix and a second matrix, wherein the first matrix is a weight matrix after pruning in the target model, the second matrix is data input into the target model, the network device can compress the second matrix according to a sparse proportion to obtain a third matrix, and the network device calculates the product of the first matrix and the third matrix. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.
In this application, in step 103, the network device compresses the second matrix according to the sparse ratio to obtain the third matrix, which has a specific implementation manner, and the following examples are described in detail.
In the application, the process of compressing the second matrix by the network device according to the sparse ratio to obtain the third matrix includes a process of compressing the first column vector by the network device according to the sparse ratio to obtain the second column vector, where the first column vector belongs to the second matrix and the second column vector belongs to the third matrix. In addition, the process of calculating the product of the first matrix and the third matrix by the network device may also include a process of calculating the product of the first row vector and the second column vector by the network device, where the first row vector belongs to the first matrix.
Fig. 3 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.
For example, referring to fig. 3, in the present application, the network device may dynamically configure the sparse proportion according to the computation force of the computing component and the complexity of the weight matrix without pruning, so as to ensure that when the model adopts the configuration of the N: M sparse proportion, a fine-grain structured sparse acceleration process is implemented. The method comprises the steps of calculating the local shift step number (first shift number) of the selected data in the column vector of the B matrix (second matrix) of each PE row by inputting bitmap (non-zero data index) data of a matrix A (first matrix) to a Prefix Sum module, inputting the calculation result and the sparse proportion to a non-zero data compressor, and completing the data compression of the column vector in the B matrix on one PE through local non-zero data compression and global data compression. The sparse matrix accelerator can analyze and calculate the bitmap of the matrix A by adding a Prefix Sum and a None-Zero shift unit in each PE module, and non-Zero data movement is carried out on the column vector of the matrix B according to the result, so that the acceleration of matrix operation is realized.
The process by which the network device compresses the first column vector according to a sparse ratio to obtain a second column vector is described in detail below.
In the present application, first, the network device may calculate a first shifting number according to a non-zero data index, where the non-zero data index is used to represent a distribution condition of data in the first matrix in the weight matrix before pruning, the first shifting number is used to represent a shifting step number of selected data, and the selected data is used to represent data corresponding to the non-zero data index in the first column vector.
For example, as shown in fig. 3, the network device may introduce a Prefix Sum and a None-Zero shift module in the matrix calculation engine, so as to implement a fine-grained structured sparse acceleration process when the model adopts a configuration of N: M sparse proportion. In the present application, a specific operation flow of the Prefix-Sum module is shown in fig. 3. Each bit of data in bitmaps (non-zero data index) is inverted, and data before each bitmap is accumulated, so that the number of zero-value elements of the previous data can be obtained and used as the distance (first moving number) for the elements to be moved subsequently.
Fig. 4 is a schematic diagram of another calculation flow of a calculation method provided in the present application.
For example, the data included in the non-zero data index in fig. 4 are 1, 0, 1, 0, and 0, respectively, wherein the result of inverting the data is 0, 1, 0, 1, 0, and 1. The data are accumulated to obtain the results of 0, 1, 2, 3 and 3, that is, the data that indicates the first bit to 8 th bit in the first column vector need to be shifted is 0, 1, 2, 3, respectively.
In this application, after the network device obtains the first number of movements, the network device may compress the first column vector into the second column vector, and a specific compression process will be described in detail in the following examples.
In the application, a non-Zero data movement control signal of a required movement is acquired from a Prefix Sum and sent to a Local None-Zero shift unit, network equipment firstly selects selected data from a first column vector according to the non-Zero data index, and then the network equipment shifts the selected data according to a first moving number to obtain a second column vector, namely, each selected data unit is moved to a proper position according to the first moving number.
Fig. 5 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.
For example, referring to fig. 5, the None-Zero shift unit includes at least a local None-Zero shift unit, where the None-Zero shift can complete data compression of column vectors of a B matrix on a PE according to a non-Zero partial shift number (first shift number) calculated by a Prefix Sum. As shown in fig. 5, if the sparsity ratio is 4:8, according to the indication of the non-zero data index, the selected data is 1 in the first bit, 1 in the fourth bit, 1 in the fifth bit and 1 in the seventh bit respectively, and the selected data in each 8 elements is moved to the first 4 positions of the queue. Similarly, as shown in fig. 6, if the sparsity ratio is 2:8, the selected data in every 8 elements is moved to the first 2 positions of the queue. As shown in fig. 7, if the sparsity ratio is 1:8, the selected data in every 8 elements is moved to the first place of the queue.
In this application, after the network device obtains the first number of movements, there are other implementations for compressing the first column vector into the second column vector, and a specific compression process will be described in detail in the following examples.
In the application, a non-Zero data movement control signal of a required movement is acquired from a Prefix Sum and sent to a Local None-Zero shift unit, a network device selects multiple groups of selected data from a first column vector according to a non-Zero data index, wherein each group of selected data comprises one or more selected data, the network device shifts the multiple groups of selected data according to a first moving number to obtain multiple groups of selected vectors, and the network device compresses the multiple groups of selected vectors according to a sparse ratio to obtain a second column vector.
Fig. 8 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.
Referring to fig. 8, the None-Zero shift unit may be divided into two sub-modules, i.e., a local None-Zero shift unit and a global None-Zero shift unit, respectively. The None-Zero shift can calculate the non-Zero local shift step number (first shift number) and the sparse proportion according to the Prefix Sum, and the data compression process of column vectors in the B matrix on one PE is completed through two modules of local non-Zero data compression and global data compression. The Local None-Zero shift is to complete Local compression, and a specific implementation is similar to the implementation mentioned in the above example, and global None-Zero shift is to perform global data compression by taking the result of Local compression as input, and to transmit the compressed data to the PE.
Fig. 9 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.
Referring to fig. 9, a specific implementation manner of the global data compression module in compressing the second matrix will be described in detail in the following examples.
In the application, a non-Zero data movement control signal of the required movement is acquired from the Prefix Sum and sent to the Local None-Zero shift unit, and non-Zero data of each data unit is moved to a proper position. If the sparsity ratio is 4:8, then move the non-zero data in every 8 elements to the first 4 positions of the queue; if the sparsity ratio is 2:8, moving the non-zero data in every 8 elements to the first 2 positions of the queue; if the sparsity ratio is 1:8, then move the non-zero data in every 8 elements to the first bit of the queue. And according to the queue data and the sparse proportion parameter W of the Local None-Zero shift, the non-Zero data in the queues are assembled to complete 16 data input and subsequent calculation. If the sparsity ratio is 4:8, moving the non-zero data of the first 4 in the 4 queues into a non-zero data vector with the number of 16; if the sparsity ratio is 2:8, moving the non-zero data of the first 2 in 8 queues into a non-zero data vector with the number of 16; if the sparsity ratio is 1:8, the first zero data in the 16 queues are moved into a non-zero data vector with the number of 16.
In the application, the network device acquires a first matrix and a second matrix, wherein the first matrix is a weight matrix after pruning in the target model, and the second matrix is data input into the target model. The network device receives a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.
The foregoing examples provide different embodiments of a data computing method, and the following provides a network device 20, as shown in fig. 10, where the network device 20 is configured to perform the data computing method related to the foregoing examples, and the performing steps and the corresponding beneficial effects are specifically understood with reference to the foregoing corresponding examples, which are not repeated herein, and include:
the processing unit 201 is configured to obtain a first matrix and a second matrix, where the first matrix is used to represent a weight matrix after pruning in a target model, and the second matrix is used to represent data input into the target model;
A receiving unit 202, configured to receive a configuration instruction, where the configuration instruction is used to set a sparse ratio;
the processing unit 201 is configured to:
compressing the second matrix according to the sparse proportion to obtain a third matrix;
the network device calculates a product of the first matrix and the third matrix.
In one possible implementation of the method, the method comprises,
the processing unit 201 is configured to:
compressing a first column vector according to the sparse proportion to obtain a second column vector, wherein the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;
a product of a first row vector and the second column vector is calculated, the first row vector belonging to the first matrix.
In one possible implementation of the method, the method comprises,
the processing unit 201 is further configured to calculate a first shifting number according to a non-zero data index, where the non-zero data index is used to represent a distribution condition of data in the first matrix in a weight matrix before pruning, the first shifting number is used to represent a shifting number of selected data, and the selected data is used to represent data corresponding to the non-zero data index in a first column vector;
the processing unit 201 is configured to compress the first column vector according to the first moving number to obtain a second column vector.
In one possible implementation of the method, the method comprises,
the processing unit 201 is configured to:
selecting selected data from the first column vector according to the non-zero data index;
and shifting the selected data according to the first shifting number to obtain the second column vector.
In one possible implementation of the method, the method comprises,
the processing unit 201 is configured to:
selecting multiple groups of selected data from the first column vector according to the non-zero data index, wherein each group of selected data comprises one or more selected data;
shifting the multiple groups of access data according to the first moving number to obtain multiple groups of access vectors;
and compressing the multiple groups of selected vectors according to the sparse proportion to obtain the second column vector.
In one possible implementation of the method, the method comprises,
the sparse proportion is determined by the computational power of processing components in the network device and/or the weight matrix before pruning.
It should be noted that, since the content of information interaction, execution process and the like between the modules of the network device 20 is based on the same concept as the method example of the present application, the execution steps thereof are consistent with the details of the method steps, and reference may be made to the description at the method example.
The foregoing examples provide different embodiments of the network device 20, and the following provides a network device 30, as shown in fig. 11, where the network device 30 is configured to perform the method for handling burst access in the foregoing examples, and the performing steps and the corresponding beneficial effects are specifically understood with reference to the foregoing corresponding examples, which are not repeated herein.
Referring to fig. 11, a schematic structural diagram of a network device is provided herein, where the network device 30 includes: a processor 302, a communication interface 303, a memory 301. Alternatively, bus 304 may be included. Wherein the communication interface 303, the processor 302 and the memory 301 may be interconnected via a bus 304; bus 304 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in FIG. 11, but not only one bus or one type of bus. The network device 30 may implement the functionality of any of the network devices in the example shown in fig. 11. The processor 302 and the communication interface 303 may perform the operations corresponding to the network device in the method example described above.
The following describes the components of the network device in detail with reference to fig. 10:
wherein the memory 301 may be a volatile memory (RAM), such as a random-access memory (RAM); or a nonvolatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or a combination of the above, for storing program code, configuration files, or other content that may be used to implement the methods of the present application.
Processor 302 is a control center of the controller, and may be a central processing unit (central processing unit, CPU), an application specific integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits configured to implement examples provided herein, such as: one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA).
The communication interface 303 is used to communicate with other network devices.
The processor 302 may perform the operations performed by the network device in the foregoing example shown in fig. 10, which are not described herein in detail.
It should be noted that, since the content of information interaction, execution process and the like between the modules of the network device 30 is based on the same concept as the method example of the present application, the execution steps thereof are consistent with the details of the method steps, and reference may be made to the description at the method example.
The present application provides a chip comprising a processor and a communication interface, the processor being coupled to the communication interface, the processor being configured to read instructions to perform operations performed by a network device in the embodiments described above with reference to fig. 1-11.
The present application provides a network system comprising a network device as described in the embodiment described above with respect to fig. 10.
It will be clear to those skilled in the art that for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing examples, and are not repeated herein.
In the several examples provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus examples described above are merely illustrative, e.g., the division of the elements is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the object of this example.
In addition, each functional unit in each example of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the examples of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random access memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
While the foregoing embodiments have been described in some detail for purposes of clarity of understanding, it will be appreciated that various embodiments of the invention may be practiced otherwise than as specifically described, and that no limitations are intended to the scope of the invention, except as may be modified, practiced with respect to any combination, modification, equivalent replacement, or improvement made within the spirit or principles of the invention. The foregoing examples are merely illustrative of the technical solutions of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing examples, it will be understood by those of ordinary skill in the art that: the technical scheme recorded in each example can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the essence of the corresponding technical solutions from the scope of the various example technical solutions of the present application.

Claims (15)

1. A data computing method, comprising:
the method comprises the steps that network equipment obtains a first matrix and a second matrix, wherein the first matrix is used for representing a weight matrix after pruning in a target model, and the second matrix is used for representing data input into the target model;
The network equipment receives a configuration instruction, wherein the configuration instruction is used for setting sparse proportion;
the network equipment compresses the second matrix according to the sparse proportion to obtain a third matrix;
the network device calculates a product of the first matrix and the third matrix.
2. The data computing method of claim 1, wherein the network device compressing the second matrix according to the sparse ratio to obtain a third matrix comprises:
the network equipment compresses a first column vector according to the sparse proportion to obtain a second column vector, wherein the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;
the network device calculating a product of the first matrix and the third matrix, comprising:
the network device calculates a product of a first row vector and the second column vector, the first row vector belonging to the first matrix.
3. The data computing method of claim 2, wherein the method further comprises:
the network equipment calculates a first moving number according to a non-zero data index, wherein the non-zero data index is used for representing the distribution condition of data in the first matrix in a weight matrix before pruning, the first moving number is used for representing the moving step number of selected data, and the selected data is used for representing data corresponding to the non-zero data index in a first column vector;
The network device compresses the first column vector according to the non-zero data index to obtain a second column vector, comprising:
and the network equipment compresses the first column vector according to the first moving number to obtain a second column vector.
4. The data computing method of claim 3, wherein the network device compressing the first column vector according to the first number of moves to obtain a second column vector, comprising:
the network device selects selected data from the first column vector according to the non-zero data index;
and the network equipment shifts the selected data according to the first shifting number to obtain the second column vector.
5. A data computing method according to claim 3, wherein the method further comprises:
the network equipment selects a plurality of groups of selected data from the first column vector according to the non-zero data index, wherein each group of selected data comprises one or more selected data;
the network equipment shifts the multiple groups of access data according to the first moving number to obtain multiple groups of access vectors;
and the network equipment compresses the multiple groups of selected vectors according to the sparse proportion to obtain the second column vector.
6. The data calculation method according to any one of claims 1 to 5, characterized in that the sparsity ratio is determined by a calculation force of a processing component in the network device and a weight matrix before pruning.
7. A network device, comprising:
the processing unit is used for acquiring a first matrix and a second matrix, wherein the first matrix is used for representing a weight matrix after pruning in a target model, and the second matrix is used for representing data input into the target model;
the receiving unit is used for receiving a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion;
the processing unit is used for:
compressing the second matrix according to the sparse proportion to obtain a third matrix;
the network device calculates a product of the first matrix and the third matrix.
8. The network device of claim 7, wherein the network device,
the processing unit is used for:
compressing a first column vector according to the sparse proportion to obtain a second column vector, wherein the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;
a product of a first row vector and the second column vector is calculated, the first row vector belonging to the first matrix.
9. The network device of claim 8, wherein the network device,
the processing unit is further configured to calculate a first shifting number according to a non-zero data index, where the non-zero data index is used to represent a distribution condition of data in the first matrix in a weight matrix before pruning, the first shifting number is used to represent a shifting number of selected data, and the selected data is used to represent data corresponding to the non-zero data index in a first column vector;
the processing unit is configured to compress the first column vector according to the first moving number to obtain a second column vector.
10. The network device of claim 9, wherein the network device,
the processing unit is used for:
selecting selected data from the first column vector according to the non-zero data index;
and shifting the selected data according to the first shifting number to obtain the second column vector.
11. The network device of claim 9, wherein the network device,
the processing unit is used for:
selecting multiple groups of selected data from the first column vector according to the non-zero data index, wherein each group of selected data comprises one or more selected data;
shifting the multiple groups of access data according to the first moving number to obtain multiple groups of access vectors;
And compressing the multiple groups of selected vectors according to the sparse proportion to obtain the second column vector.
12. Network device according to any of claims 7 to 11, characterized in that the sparsity ratio is determined by the computational effort of processing components in the network device and/or by a weight matrix before pruning.
13. A network device, comprising:
a processor and a memory;
the processor is configured to execute instructions stored in the memory such that the method of any one of claims 1 to 6 is performed.
14. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed on a computer or processor, causes the method of any one of claims 1 to 6 to be performed.
15. A computer program product, characterized in that it, when run on a computer or processor, causes the method of any one of claims 1 to 6 to be performed.
CN202210736691.6A 2022-06-27 2022-06-27 Data calculation method and related equipment Pending CN117332197A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210736691.6A CN117332197A (en) 2022-06-27 2022-06-27 Data calculation method and related equipment
PCT/CN2023/101090 WO2024001841A1 (en) 2022-06-27 2023-06-19 Data computing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210736691.6A CN117332197A (en) 2022-06-27 2022-06-27 Data calculation method and related equipment

Publications (1)

Publication Number Publication Date
CN117332197A true CN117332197A (en) 2024-01-02

Family

ID=89290739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210736691.6A Pending CN117332197A (en) 2022-06-27 2022-06-27 Data calculation method and related equipment

Country Status (2)

Country Link
CN (1) CN117332197A (en)
WO (1) WO2024001841A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944555B (en) * 2017-12-07 2021-09-17 广州方硅信息技术有限公司 Neural network compression and acceleration method, storage device and terminal
US20210065005A1 (en) * 2019-08-29 2021-03-04 Alibaba Group Holding Limited Systems and methods for providing vector-wise sparsity in a neural network
US11763156B2 (en) * 2019-11-15 2023-09-19 Microsoft Technology Licensing, Llc Neural network compression based on bank-balanced sparsity
CN113762493A (en) * 2020-06-01 2021-12-07 阿里巴巴集团控股有限公司 Neural network model compression method and device, acceleration unit and computing system
CN112732222B (en) * 2021-01-08 2023-01-10 苏州浪潮智能科技有限公司 Sparse matrix accelerated calculation method, device, equipment and medium

Also Published As

Publication number Publication date
WO2024001841A1 (en) 2024-01-04

Similar Documents

Publication Publication Date Title
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111382867B (en) Neural network compression method, data processing method and related devices
US20200117700A1 (en) Sparse matrix vector multiplication with a matrix vector multiplication unit
CN110262773B (en) Computer data processing method and device
CN102171682B (en) Computing module for efficient FFT and FIR hardware accelerator
CN109937418B (en) Waveform-based reconstruction for simulation
CN111542839A (en) Hardware acceleration method and device of deconvolution neural network and electronic equipment
CN109801693B (en) Medical records grouping method and device, terminal and computer readable storage medium
CN112286864B (en) Sparse data processing method and system for accelerating operation of reconfigurable processor
CN112200300A (en) Convolutional neural network operation method and device
CN113741858B (en) Memory multiply-add computing method, memory multiply-add computing device, chip and computing equipment
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
CN110554854B (en) Data processor, method, chip and electronic equipment
CN110728351A (en) Data processing method, related device and computer storage medium
CN108159694B (en) Flexible body flutter simulation method, flexible body flutter simulation device and terminal equipment
CN117332197A (en) Data calculation method and related equipment
CN111047025B (en) Convolution calculation method and device
CN113065663A (en) Data access method, device, equipment and storage medium
CN116009889A (en) Deep learning model deployment method and device, electronic equipment and storage medium
Sartin et al. Approximation of hyperbolic tangent activation function using hybrid methods
CN117407640A (en) Matrix calculation method and device
CN114662689A (en) Pruning method, device, equipment and medium for neural network
CN110688087B (en) Data processor, method, chip and electronic equipment
CN113496228A (en) Human body semantic segmentation method based on Res2Net, TransUNet and cooperative attention
Sartin et al. ANN in Hardware with Floating Point and Activation Function Using Hybrid Methods.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination