CN117332197A

CN117332197A - Data calculation method and related equipment

Info

Publication number: CN117332197A
Application number: CN202210736691.6A
Authority: CN
Inventors: 傅光宁; 谢星华
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2024-01-02
Also published as: WO2024001841A1

Abstract

The application provides a data calculation method and related equipment, wherein in the application, network equipment acquires a first matrix and a second matrix, wherein the first matrix is a weight matrix after pruning in a target model, and the second matrix is data input into the target model. The network device receives a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.

Description

Data calculation method and related equipment

Technical Field

The present application relates to the field of computers, and further to the application of artificial intelligence (artificial intelligence, AI) technology in the field of computer networks, and in particular to a data computing method and related devices.

Background

In the current AI field, models are of a large variety and models gradually tend to be complex. According to the lottery hypothesis, the sparse model and the complex model have the same or even better learning ability. And moreover, the cost and the time consumption are high during the training of the complex model. The sparse model training has the advantages of low cost and short time consumption. How to sparsify a complex model and ensure that the sparse model can run smoothly is a key to realizing model acceleration training.

In order to realize sparsification of a complex model, pruning can be generally performed on a weight matrix in the complex model, and operation can be performed on the weight matrix and data after pruning. Currently, the dominant sparse acceleration technique is 2:4 fine granularity structured matrix multiplication acceleration techniques. By 2-performing the weight matrix in the AI model: 4 structural pruning, namely reserving 2 elements in every 4 elements in the weight matrix, and completing sparsification of the weight matrix, so that AI model calculation acceleration is realized through the technology, and time required by model training is saved.

However, in the conventional model acceleration technique, only fixed 2:4, the weight matrix is thinned by the sparse structure, the sparse proportion of the weight matrix cannot be changed, and the sparse structure is single in applicable scene and poor in compatibility.

Disclosure of Invention

The application provides a data calculation method and related equipment. In the method, the network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.

The first aspect of the present application provides a data calculation method, in which a network device obtains a first matrix and a second matrix, where the first matrix is used to represent a weight matrix after pruning in a target model, and the second matrix is used to represent data input into the target model; the network equipment receives a configuration instruction, wherein the configuration instruction is used for setting sparse proportion; the network equipment compresses the second matrix according to the sparse proportion to obtain a third matrix; the network device calculates a product of the first matrix and the third matrix.

In the application, the network device acquires a first matrix and a second matrix, wherein the first matrix is a weight matrix after pruning in the target model, and the second matrix is data input into the target model. The network device receives a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.

In a possible implementation manner of the first aspect, the compressing, by the network device, the second matrix according to the sparse ratio to obtain a third matrix includes: the network equipment compresses a first column vector according to the sparse proportion to obtain a second column vector, wherein the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix; the network device calculating a product of the first matrix and the third matrix, comprising: the network device calculates a product of a first row vector and the second column vector, the first row vector belonging to the first matrix.

In this possible implementation, the first column vector belongs to the second matrix and the second column vector belongs to the third matrix. It may be understood that the process of the network device compressing the second matrix according to the sparse ratio to obtain the third matrix includes a process of the network device compressing the first column vector according to the sparse ratio to obtain the second column vector. Furthermore, the first row vector belongs to the first matrix. The process of the network device calculating the product of the first matrix and the third matrix may also include the process of the network device calculating the product of the first row vector and the second column vector. The possible implementation mode explains a specific calculation process of the product of the first matrix and the third matrix, and the feasibility of the scheme is improved.

In a possible implementation manner of the first aspect, the method further includes: the network equipment calculates a first moving number according to a non-zero data index, wherein the non-zero data index is used for representing the distribution condition of data in the first matrix in a weight matrix before pruning, the first moving number is used for representing the moving step number of selected data, and the selected data is used for representing data corresponding to the non-zero data index in a first column vector; the network device compresses the first column vector according to the non-zero data index to obtain a second column vector, comprising: and the network equipment compresses the first column vector according to the first moving number to obtain a second column vector.

In this possible implementation, the non-zero data index represents the distribution of the data in the first matrix in the weight matrix before pruning. Since the first row vector is included in the first matrix, the non-zero data index includes the distribution of the first row vector in the weight matrix before pruning. The network device may obtain the selected data from the first column vector according to the non-zero data index, where the selected data is obtained according to a distribution condition of the first row vector in the weight matrix before pruning, and may further obtain a moving number of the selected data, that is, a first moving number, according to a distribution condition of the first row vector in the weight matrix before pruning, and further compress the first column vector according to the first moving number to obtain the second column vector. In this possible implementation manner, the network device calculates the first moving step number through the non-zero data index, and further compresses the first column vector according to the first moving step number to obtain the second column vector. By the method, the first column vector is compressed, and the computing efficiency of the network equipment is improved.

In a possible implementation manner of the first aspect, the compressing, by the network device, the first column vector according to the first number of moves to obtain a second column vector includes: the network device selects selected data from the first column vector according to the non-zero data index; and the network equipment shifts the selected data according to the first shifting number to obtain the second column vector.

In this possible implementation manner, the network device may obtain, according to the non-zero data index, the selected data in the first column vector, where the selected data is obtained according to the distribution condition of the first row vector in the weight matrix before pruning. The network device can also obtain the first moving number according to the distribution condition of the first row vector in the weight matrix before pruning, and further shift the selected data according to the first moving number to obtain the second column vector.

In a possible implementation manner of the first aspect, the method further includes: the network equipment selects a plurality of groups of selected data from the first column vector according to the non-zero data index, wherein each group of selected data comprises one or more selected data; the network equipment shifts the multiple groups of access data according to the first moving number to obtain multiple groups of access vectors; and the network equipment compresses the multiple groups of selected vectors according to the sparse proportion to obtain the second column vector.

In this possible implementation manner, if the first matrix data size is larger, the network device may obtain multiple sets of selected data in the first column vector according to the non-zero data index, where each set of selected data is obtained according to the distribution condition of the first row vector in the weight matrix before pruning. The network device may further obtain a first moving number corresponding to each group of the selected data according to a distribution condition of the first row vectors in the weight matrix before pruning, further shift each group of the selected data according to each first moving number to obtain multiple groups of selected vectors, and compress the multiple groups of selected vectors according to the sparse ratio to obtain a second column vector. In the possible implementation manner, if the first matrix data size is larger, according to the computing power of the processing components (processing element, PE), the network device can compress the multiple groups of selected vectors to form the second column vector, and the product of the first row vector and the second column vector is completed in one processing component, so that the computing power of the processing component is fully utilized, the computing power waste is avoided, and the computing efficiency is improved.

In a possible implementation manner of the first aspect, the sparse proportion is determined by a computational effort of a processing component in the network device and/or a weight matrix before pruning.

In this possible implementation manner, when the sparse ratio is configured, the value of the sparse ratio may be determined according to the calculation force of the processing component, the complexity of the weight matrix before pruning, the precision required in training the weight matrix before pruning, and other factors. The possible implementation manner provides a plurality of factors to be considered when the sparse proportion is determined, and different values can be set according to different factors when the network equipment sets the sparse proportion, so that the flexibility of the scheme is improved.

A second aspect of the present application provides a network device comprising at least one processor, a memory, and a communication interface. The processor is coupled with the memory and the communication interface. The memory is used for storing instructions, the processor is used for executing the instructions, and the communication interface is used for communicating with other network devices under the control of the processor. The instructions, when executed by a processor, cause the network device to perform the method of the first aspect or any of the possible implementations of the first aspect.

A third aspect of the present application provides a computer readable storage medium storing a program for causing a terminal device to perform the method of the first aspect or any possible implementation of the first aspect.

A fourth aspect of the present application provides a computer program product storing one or more computer-executable instructions which, when executed by the processor, perform the method of the first aspect or any one of the possible implementations of the first aspect.

A fifth aspect of the present application provides a chip comprising a processor and a communication interface, the processor being coupled to the communication interface, the processor being configured to read instructions to perform a method according to the first aspect or any one of the possible implementations of the first aspect.

A sixth aspect of the present application is a network system comprising a network device on which the method of the first aspect or any one of the possible implementations of the first aspect is performed.

Drawings

FIG. 1 is a schematic diagram of a data computing system according to the present application;

FIG. 2 is a schematic flow chart of a data calculation method provided in the present application;

FIG. 3 is a schematic diagram of an embodiment of a data computing method provided herein;

FIG. 4 is a schematic diagram of another embodiment of a data calculation method provided herein;

FIG. 5 is a schematic diagram of another embodiment of a data calculation method provided herein;

FIG. 6 is a schematic diagram of another embodiment of a data calculation method provided herein;

FIG. 7 is a schematic diagram of another embodiment of a data calculation method provided herein;

FIG. 8 is a schematic diagram of another embodiment of a data calculation method provided herein;

FIG. 9 is a schematic diagram of another embodiment of a data calculation method provided herein;

fig. 10 is a schematic structural diagram of a network device provided in the present application;

fig. 11 is a schematic structural diagram of another network device provided in the present application.

Detailed Description

Examples provided herein are described below with reference to the accompanying drawings, it being apparent that the examples described are only examples of some, but not all, of the present applications. As one of ordinary skill in the art can appreciate, with the development of technology and the appearance of new scenes, the technical solution provided in the present application is also applicable to similar technical problems.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the examples described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The term "and/or" in this application is merely an association relation describing an association object, and means that three kinds of relations may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

In order to solve the problems in the above-mentioned scheme, the present application provides a data computing method, a data computing system and a network device, where in the present application, the network device obtains a first matrix and a second matrix, where the first matrix is a weight matrix after pruning in a target model, and the second matrix is data input into the target model. The network device receives a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.

The structure of a data computing system provided herein is first described.

Fig. 1 is a schematic structural diagram of a data computing system provided in the present application.

In this application, the data computing system includes a computing engine, where the computing engine obtains a first matrix and a second matrix from a register, where the first matrix is a weight matrix after pruning in the target model, as in fig. 1, it is assumed that the scale of the first matrix is mxk, that is, the first matrix has M rows of data and K columns of data. The second matrix is data of the input target model, and the second matrix is assumed to have a size wk×n, that is, the second matrix has WK row data and N column data, where W is a sparse ratio. The calculation engine can flexibly set the value of the sparse proportion according to the configuration instruction, and further, the second matrix can be compressed according to the sparse proportion to obtain the third matrix. The third matrix has a size of kxn, that is, the third matrix has K rows of data and N columns of data. The size of the matrix obtained by the calculation engine after calculating the product of the first matrix and the third matrix is m×n, i.e. the matrix has M rows of data and N columns of data.

In the application, the data computing system comprises a computing engine, and the computing engine can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.

The data computing method provided in the present application is described based on the data computing system described in fig. 1.

Fig. 2 is a schematic calculation flow chart of a data calculation method provided in the present application.

101. The network device obtains a first matrix and a second matrix.

In the application, the first matrix is used for representing the weight matrix after pruning in the target model, and the second matrix is used for representing the data input into the target model.

Illustratively, as in fig. 1, the first matrix is assumed to be m×k in size, i.e., the first matrix has M rows of data and K columns of data. The first matrix may be obtained by pruning a weight matrix before pruning, and the size of the weight matrix before pruning is wmxk. The second matrix is data of the input target model, and the second matrix is assumed to have a size wk×n, that is, the second matrix has WK row data and N column data, where W is a sparse ratio.

102. The network device receives the configuration instruction.

In the application, the configuration instruction is used for setting the sparse proportion, and the sparse proportion can be changed. The network device may configure the sparse proportion according to the configuration instruction in a plurality of ways, which will be described in detail below.

Mode one: the configuration instruction includes a sparse ratio value.

In the application, the configuration instruction may include a value of a sparse ratio, where the sparse ratio may be configured by an operation and maintenance personnel according to a requirement. Optionally, the calculation force of the processing component can be considered when the sparse proportion is configured, and if the processing component PE can complete data calculation of 8×8 types, the sparse proportion can select multiple schemes of 8:16,4:16,4:8,2:8, and the like according to the complexity of the weight matrix and the training precision requirement of the model on the basis that the calculation force of the processing component is not exceeded under the existing calculation force. Optionally, under the condition of calculation support of the processing component, training precision required by the weight matrix without pruning can be considered when the value of the sparse proportion is configured, and the value of the sparse proportion can be flexibly adjusted according to the training precision. Optionally, under the condition of calculation support of the processing component, the complexity of the weight matrix which is not pruned can be considered when the value of the sparse proportion is configured, and the value of the sparse proportion can be flexibly adjusted according to the complexity of the weight matrix. Alternatively, the operator may also configure the value of the sparse ratio according to other factors, which is not limited herein.

Mode two: generating sparse proportion according to the configuration instruction.

In the present application, the configuration instruction may not include a sparse ratio value, and the sparse ratio value may be generated after the network device receives the configuration instruction. Optionally, after receiving the configuration instruction, the network device may determine a value of the sparse proportion according to the data amount of the weight matrix without pruning, and generate different values of the sparse proportion for the weight matrix without pruning with different data amounts. For example, when the network device confirms that the data amount of the weight matrix that is not pruned is greater than the first threshold a, the value of the sparse ratio may be set to a ratio a, and when the network device confirms that the data amount of the weight matrix that is not pruned is less than or equal to the first threshold a, the value of the sparse ratio may be set to a ratio B, where the value of the ratio a is less than the value of the ratio B. In the setting mode, when the data quantity of the weight matrix which is not pruned is large, the sparse proportion is small, so that the training efficiency of the weight matrix can be improved. Optionally, after receiving the configuration instruction, the network device may determine the value of the sparse proportion according to the training accuracy required by the weight matrix that is not pruned. For example, an unbiased weight matrix may be accompanied by an identification that is used to account for the training accuracy of the unbiased weight matrix, from which the network device may generate a corresponding sparse ratio value, so that the network device may generate different coefficient ratio values according to different accuracies. Alternatively, the network device may also generate the value of the coefficient scale according to other factors, and is not limited herein in particular.

103. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix.

In this application, the computation engine may compress the second matrix according to a sparse ratio to obtain a third matrix. If the second matrix is assumed to have a size of wk×n, that is, the second matrix has WK row data and N column data, where W is a sparse ratio, the third matrix has a size of k×n, that is, the third matrix has K row data and N column data.

104. The network device calculates a product of the first matrix and the third matrix.

In this application, the network device may calculate the product of the first matrix and the third matrix using the processing element PE, so as to obtain the convolution result.

In the application, the network device acquires a first matrix and a second matrix, wherein the first matrix is a weight matrix after pruning in the target model, the second matrix is data input into the target model, the network device can compress the second matrix according to a sparse proportion to obtain a third matrix, and the network device calculates the product of the first matrix and the third matrix. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and further calculates the product of the first matrix and the third matrix. The network equipment can flexibly set the value of the sparse proportion according to the configuration instruction, so that the computing performance of hardware can be fully exerted aiming at different models, the application scene is wider, and the compatibility is stronger.

In this application, in step 103, the network device compresses the second matrix according to the sparse ratio to obtain the third matrix, which has a specific implementation manner, and the following examples are described in detail.

In the application, the process of compressing the second matrix by the network device according to the sparse ratio to obtain the third matrix includes a process of compressing the first column vector by the network device according to the sparse ratio to obtain the second column vector, where the first column vector belongs to the second matrix and the second column vector belongs to the third matrix. In addition, the process of calculating the product of the first matrix and the third matrix by the network device may also include a process of calculating the product of the first row vector and the second column vector by the network device, where the first row vector belongs to the first matrix.

Fig. 3 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.

For example, referring to fig. 3, in the present application, the network device may dynamically configure the sparse proportion according to the computation force of the computing component and the complexity of the weight matrix without pruning, so as to ensure that when the model adopts the configuration of the N: M sparse proportion, a fine-grain structured sparse acceleration process is implemented. The method comprises the steps of calculating the local shift step number (first shift number) of the selected data in the column vector of the B matrix (second matrix) of each PE row by inputting bitmap (non-zero data index) data of a matrix A (first matrix) to a Prefix Sum module, inputting the calculation result and the sparse proportion to a non-zero data compressor, and completing the data compression of the column vector in the B matrix on one PE through local non-zero data compression and global data compression. The sparse matrix accelerator can analyze and calculate the bitmap of the matrix A by adding a Prefix Sum and a None-Zero shift unit in each PE module, and non-Zero data movement is carried out on the column vector of the matrix B according to the result, so that the acceleration of matrix operation is realized.

The process by which the network device compresses the first column vector according to a sparse ratio to obtain a second column vector is described in detail below.

In the present application, first, the network device may calculate a first shifting number according to a non-zero data index, where the non-zero data index is used to represent a distribution condition of data in the first matrix in the weight matrix before pruning, the first shifting number is used to represent a shifting step number of selected data, and the selected data is used to represent data corresponding to the non-zero data index in the first column vector.

For example, as shown in fig. 3, the network device may introduce a Prefix Sum and a None-Zero shift module in the matrix calculation engine, so as to implement a fine-grained structured sparse acceleration process when the model adopts a configuration of N: M sparse proportion. In the present application, a specific operation flow of the Prefix-Sum module is shown in fig. 3. Each bit of data in bitmaps (non-zero data index) is inverted, and data before each bitmap is accumulated, so that the number of zero-value elements of the previous data can be obtained and used as the distance (first moving number) for the elements to be moved subsequently.

Fig. 4 is a schematic diagram of another calculation flow of a calculation method provided in the present application.

For example, the data included in the non-zero data index in fig. 4 are 1, 0, 1, 0, and 0, respectively, wherein the result of inverting the data is 0, 1, 0, 1, 0, and 1. The data are accumulated to obtain the results of 0, 1, 2, 3 and 3, that is, the data that indicates the first bit to 8 th bit in the first column vector need to be shifted is 0, 1, 2, 3, respectively.

In this application, after the network device obtains the first number of movements, the network device may compress the first column vector into the second column vector, and a specific compression process will be described in detail in the following examples.

In the application, a non-Zero data movement control signal of a required movement is acquired from a Prefix Sum and sent to a Local None-Zero shift unit, network equipment firstly selects selected data from a first column vector according to the non-Zero data index, and then the network equipment shifts the selected data according to a first moving number to obtain a second column vector, namely, each selected data unit is moved to a proper position according to the first moving number.

Fig. 5 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.

For example, referring to fig. 5, the None-Zero shift unit includes at least a local None-Zero shift unit, where the None-Zero shift can complete data compression of column vectors of a B matrix on a PE according to a non-Zero partial shift number (first shift number) calculated by a Prefix Sum. As shown in fig. 5, if the sparsity ratio is 4:8, according to the indication of the non-zero data index, the selected data is 1 in the first bit, 1 in the fourth bit, 1 in the fifth bit and 1 in the seventh bit respectively, and the selected data in each 8 elements is moved to the first 4 positions of the queue. Similarly, as shown in fig. 6, if the sparsity ratio is 2:8, the selected data in every 8 elements is moved to the first 2 positions of the queue. As shown in fig. 7, if the sparsity ratio is 1:8, the selected data in every 8 elements is moved to the first place of the queue.

In this application, after the network device obtains the first number of movements, there are other implementations for compressing the first column vector into the second column vector, and a specific compression process will be described in detail in the following examples.

In the application, a non-Zero data movement control signal of a required movement is acquired from a Prefix Sum and sent to a Local None-Zero shift unit, a network device selects multiple groups of selected data from a first column vector according to a non-Zero data index, wherein each group of selected data comprises one or more selected data, the network device shifts the multiple groups of selected data according to a first moving number to obtain multiple groups of selected vectors, and the network device compresses the multiple groups of selected vectors according to a sparse ratio to obtain a second column vector.

Fig. 8 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.

Referring to fig. 8, the None-Zero shift unit may be divided into two sub-modules, i.e., a local None-Zero shift unit and a global None-Zero shift unit, respectively. The None-Zero shift can calculate the non-Zero local shift step number (first shift number) and the sparse proportion according to the Prefix Sum, and the data compression process of column vectors in the B matrix on one PE is completed through two modules of local non-Zero data compression and global data compression. The Local None-Zero shift is to complete Local compression, and a specific implementation is similar to the implementation mentioned in the above example, and global None-Zero shift is to perform global data compression by taking the result of Local compression as input, and to transmit the compressed data to the PE.

Fig. 9 is a schematic diagram of another calculation flow of a data calculation method provided in the present application.

Referring to fig. 9, a specific implementation manner of the global data compression module in compressing the second matrix will be described in detail in the following examples.

In the application, a non-Zero data movement control signal of the required movement is acquired from the Prefix Sum and sent to the Local None-Zero shift unit, and non-Zero data of each data unit is moved to a proper position. If the sparsity ratio is 4:8, then move the non-zero data in every 8 elements to the first 4 positions of the queue; if the sparsity ratio is 2:8, moving the non-zero data in every 8 elements to the first 2 positions of the queue; if the sparsity ratio is 1:8, then move the non-zero data in every 8 elements to the first bit of the queue. And according to the queue data and the sparse proportion parameter W of the Local None-Zero shift, the non-Zero data in the queues are assembled to complete 16 data input and subsequent calculation. If the sparsity ratio is 4:8, moving the non-zero data of the first 4 in the 4 queues into a non-zero data vector with the number of 16; if the sparsity ratio is 2:8, moving the non-zero data of the first 2 in 8 queues into a non-zero data vector with the number of 16; if the sparsity ratio is 1:8, the first zero data in the 16 queues are moved into a non-zero data vector with the number of 16.

The foregoing examples provide different embodiments of a data computing method, and the following provides a network device 20, as shown in fig. 10, where the network device 20 is configured to perform the data computing method related to the foregoing examples, and the performing steps and the corresponding beneficial effects are specifically understood with reference to the foregoing corresponding examples, which are not repeated herein, and include:

the processing unit 201 is configured to obtain a first matrix and a second matrix, where the first matrix is used to represent a weight matrix after pruning in a target model, and the second matrix is used to represent data input into the target model;

A receiving unit 202, configured to receive a configuration instruction, where the configuration instruction is used to set a sparse ratio;

the processing unit 201 is configured to:

compressing the second matrix according to the sparse proportion to obtain a third matrix;

the network device calculates a product of the first matrix and the third matrix.

In one possible implementation of the method, the method comprises,

the processing unit 201 is configured to:

compressing a first column vector according to the sparse proportion to obtain a second column vector, wherein the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;

a product of a first row vector and the second column vector is calculated, the first row vector belonging to the first matrix.

In one possible implementation of the method, the method comprises,

the processing unit 201 is further configured to calculate a first shifting number according to a non-zero data index, where the non-zero data index is used to represent a distribution condition of data in the first matrix in a weight matrix before pruning, the first shifting number is used to represent a shifting number of selected data, and the selected data is used to represent data corresponding to the non-zero data index in a first column vector;

the processing unit 201 is configured to compress the first column vector according to the first moving number to obtain a second column vector.

In one possible implementation of the method, the method comprises,

the processing unit 201 is configured to:

selecting selected data from the first column vector according to the non-zero data index;

and shifting the selected data according to the first shifting number to obtain the second column vector.

In one possible implementation of the method, the method comprises,

the processing unit 201 is configured to:

selecting multiple groups of selected data from the first column vector according to the non-zero data index, wherein each group of selected data comprises one or more selected data;

shifting the multiple groups of access data according to the first moving number to obtain multiple groups of access vectors;

and compressing the multiple groups of selected vectors according to the sparse proportion to obtain the second column vector.

In one possible implementation of the method, the method comprises,

the sparse proportion is determined by the computational power of processing components in the network device and/or the weight matrix before pruning.

It should be noted that, since the content of information interaction, execution process and the like between the modules of the network device 20 is based on the same concept as the method example of the present application, the execution steps thereof are consistent with the details of the method steps, and reference may be made to the description at the method example.

The foregoing examples provide different embodiments of the network device 20, and the following provides a network device 30, as shown in fig. 11, where the network device 30 is configured to perform the method for handling burst access in the foregoing examples, and the performing steps and the corresponding beneficial effects are specifically understood with reference to the foregoing corresponding examples, which are not repeated herein.

Referring to fig. 11, a schematic structural diagram of a network device is provided herein, where the network device 30 includes: a processor 302, a communication interface 303, a memory 301. Alternatively, bus 304 may be included. Wherein the communication interface 303, the processor 302 and the memory 301 may be interconnected via a bus 304; bus 304 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in FIG. 11, but not only one bus or one type of bus. The network device 30 may implement the functionality of any of the network devices in the example shown in fig. 11. The processor 302 and the communication interface 303 may perform the operations corresponding to the network device in the method example described above.

The following describes the components of the network device in detail with reference to fig. 10:

wherein the memory 301 may be a volatile memory (RAM), such as a random-access memory (RAM); or a nonvolatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or a combination of the above, for storing program code, configuration files, or other content that may be used to implement the methods of the present application.

Processor 302 is a control center of the controller, and may be a central processing unit (central processing unit, CPU), an application specific integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits configured to implement examples provided herein, such as: one or more digital signal processors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA).

The communication interface 303 is used to communicate with other network devices.

The processor 302 may perform the operations performed by the network device in the foregoing example shown in fig. 10, which are not described herein in detail.

It should be noted that, since the content of information interaction, execution process and the like between the modules of the network device 30 is based on the same concept as the method example of the present application, the execution steps thereof are consistent with the details of the method steps, and reference may be made to the description at the method example.

The present application provides a chip comprising a processor and a communication interface, the processor being coupled to the communication interface, the processor being configured to read instructions to perform operations performed by a network device in the embodiments described above with reference to fig. 1-11.

The present application provides a network system comprising a network device as described in the embodiment described above with respect to fig. 10.

It will be clear to those skilled in the art that for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing examples, and are not repeated herein.

In the several examples provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus examples described above are merely illustrative, e.g., the division of the elements is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the object of this example.

In addition, each functional unit in each example of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the examples of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random access memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

While the foregoing embodiments have been described in some detail for purposes of clarity of understanding, it will be appreciated that various embodiments of the invention may be practiced otherwise than as specifically described, and that no limitations are intended to the scope of the invention, except as may be modified, practiced with respect to any combination, modification, equivalent replacement, or improvement made within the spirit or principles of the invention. The foregoing examples are merely illustrative of the technical solutions of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing examples, it will be understood by those of ordinary skill in the art that: the technical scheme recorded in each example can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the essence of the corresponding technical solutions from the scope of the various example technical solutions of the present application.

Claims

1. A data computing method, comprising:

the method comprises the steps that network equipment obtains a first matrix and a second matrix, wherein the first matrix is used for representing a weight matrix after pruning in a target model, and the second matrix is used for representing data input into the target model;

The network equipment receives a configuration instruction, wherein the configuration instruction is used for setting sparse proportion;

the network equipment compresses the second matrix according to the sparse proportion to obtain a third matrix;

2. The data computing method of claim 1, wherein the network device compressing the second matrix according to the sparse ratio to obtain a third matrix comprises:

the network equipment compresses a first column vector according to the sparse proportion to obtain a second column vector, wherein the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;

the network device calculating a product of the first matrix and the third matrix, comprising:

the network device calculates a product of a first row vector and the second column vector, the first row vector belonging to the first matrix.

3. The data computing method of claim 2, wherein the method further comprises:

the network equipment calculates a first moving number according to a non-zero data index, wherein the non-zero data index is used for representing the distribution condition of data in the first matrix in a weight matrix before pruning, the first moving number is used for representing the moving step number of selected data, and the selected data is used for representing data corresponding to the non-zero data index in a first column vector;

The network device compresses the first column vector according to the non-zero data index to obtain a second column vector, comprising:

and the network equipment compresses the first column vector according to the first moving number to obtain a second column vector.

4. The data computing method of claim 3, wherein the network device compressing the first column vector according to the first number of moves to obtain a second column vector, comprising:

the network device selects selected data from the first column vector according to the non-zero data index;

and the network equipment shifts the selected data according to the first shifting number to obtain the second column vector.

5. A data computing method according to claim 3, wherein the method further comprises:

the network equipment selects a plurality of groups of selected data from the first column vector according to the non-zero data index, wherein each group of selected data comprises one or more selected data;

the network equipment shifts the multiple groups of access data according to the first moving number to obtain multiple groups of access vectors;

and the network equipment compresses the multiple groups of selected vectors according to the sparse proportion to obtain the second column vector.

6. The data calculation method according to any one of claims 1 to 5, characterized in that the sparsity ratio is determined by a calculation force of a processing component in the network device and a weight matrix before pruning.

7. A network device, comprising:

the processing unit is used for acquiring a first matrix and a second matrix, wherein the first matrix is used for representing a weight matrix after pruning in a target model, and the second matrix is used for representing data input into the target model;

the receiving unit is used for receiving a configuration instruction, wherein the configuration instruction is used for setting the sparse proportion;

the processing unit is used for:

8. The network device of claim 7, wherein the network device,

the processing unit is used for:

9. The network device of claim 8, wherein the network device,

the processing unit is further configured to calculate a first shifting number according to a non-zero data index, where the non-zero data index is used to represent a distribution condition of data in the first matrix in a weight matrix before pruning, the first shifting number is used to represent a shifting number of selected data, and the selected data is used to represent data corresponding to the non-zero data index in a first column vector;

the processing unit is configured to compress the first column vector according to the first moving number to obtain a second column vector.

10. The network device of claim 9, wherein the network device,

the processing unit is used for:

11. The network device of claim 9, wherein the network device,

the processing unit is used for:

12. Network device according to any of claims 7 to 11, characterized in that the sparsity ratio is determined by the computational effort of processing components in the network device and/or by a weight matrix before pruning.

13. A network device, comprising:

a processor and a memory;

the processor is configured to execute instructions stored in the memory such that the method of any one of claims 1 to 6 is performed.

14. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed on a computer or processor, causes the method of any one of claims 1 to 6 to be performed.

15. A computer program product, characterized in that it, when run on a computer or processor, causes the method of any one of claims 1 to 6 to be performed.