WO2024001841A1

WO2024001841A1 - Data computing method and related device

Info

Publication number: WO2024001841A1
Application number: PCT/CN2023/101090
Authority: WO
Inventors: 傅光宁; 谢星华
Original assignee: 华为技术有限公司
Priority date: 2022-06-27
Filing date: 2023-06-19
Publication date: 2024-01-04
Also published as: CN117332197A

Abstract

The present application provides a data computing method and a related device. In the present application, a network device obtains a first matrix and a second matrix, wherein the first matrix is a pruned weight matrix in a target model, and the second matrix is data input to the target model. The network device receives a configuration instruction, the configuration instruction being used for setting a sparsity ratio. The network device compresses the second matrix according to the sparsity ratio to obtain a third matrix, and then calculates the product of the first matrix and the third matrix. The network device can flexibly set the value of the sparsity ratio according to the configuration instruction. Therefore, the computing performance of hardware can be more fully exerted for different models, the application scenario is wide, and the compatibility is high.

Description

A data calculation method and related equipment

This application claims priority to the Chinese patent application submitted to the China Patent Office on June 27, 2022, with the application number CN202210736691.6 and the application title "A data calculation method and related equipment", the entire content of which is incorporated by reference. in this application.

Technical field

This application relates to the field of computers, and further relates to the application of artificial intelligence (AI) technology in the field of computer networks, and in particular, to a data calculation method and related equipment.

Background technique

In the current AI field, there are many types of models and the models are gradually becoming more complex. According to the lottery ticket hypothesis, sparse models have the same or even better learning capabilities than complex models. Moreover, training complex models is expensive and time-consuming. Sparse model training is less expensive and takes less time. How to sparse a complex model and ensure that the sparse model can run smoothly is the key to accelerated model training.

In order to realize the sparseness of complex models, you can usually prune the weight matrix in the complex model and perform operations on the pruned weight matrix and data. Currently, the mainstream sparse acceleration technology is 2:4 fine-grained structured matrix multiplication acceleration technology. By performing 2:4 structural pruning on the weight matrix in the AI model, that is, retaining 2 elements out of every 4 elements in the weight matrix, the sparseness of the weight matrix is completed, thereby achieving AI model calculation acceleration through this technology. Save time required for model training.

However, in traditional model acceleration technology, only a fixed 2:4 sparse structure can be used to sparse the weight matrix, and the sparse ratio of the weight matrix cannot be changed. The applicable scenarios are relatively single and the compatibility is poor.

Contents of the invention

This application provides a data calculation method and related equipment. In this application, the network device can flexibly set the value of the sparsity ratio according to the configuration instructions. Therefore, the computing performance of the hardware can be more fully utilized for different models, with a wide range of applicable scenarios and strong compatibility.

A first aspect of this application provides a data calculation method. In this method, the network device obtains a first matrix and a second matrix. The first matrix is used to represent the pruned weight matrix in the target model. The second matrix The matrix is used to represent the data input to the target model; the network device receives configuration instructions, and the configuration instructions are used to set the sparse ratio; the network device compresses the second matrix according to the sparse ratio to obtain a third matrix ; The network device calculates the product of the first matrix and the third matrix.

In this application, the network device obtains a first matrix and a second matrix, where the first matrix is the pruned weight matrix in the target model, and the second matrix is the data input to the target model. Network devices receive configuration instructions that set the sparsity ratio. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and then calculates the product of the first matrix and the third matrix. Network devices can flexibly set the sparsity ratio value according to configuration instructions. Therefore, the computing performance of the hardware can be more fully utilized for different models. The applicable scenarios are wider and the compatibility is stronger.

In a possible implementation of the first aspect, the network device compresses the second matrix according to the sparse ratio to obtain a third matrix, including: the network device compresses the first column vector according to the sparse ratio. Obtain a second column vector, the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix; the network device calculates the product of the first matrix and the third matrix , including: the network device calculates the product of a first row vector and the second column vector, where the first row vector belongs to the first matrix.

In this possible implementation, the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix. It can be understood that the process in which the network device compresses the second matrix according to the sparse ratio to obtain the third matrix includes the process in which the network device compresses the first column vector according to the sparse ratio to obtain the second column vector. Furthermore, the first row of vectors belongs to the first matrix. The process of the network device calculating the product of the first matrix and the third matrix will also include the process of the network device calculating the product of the first row vector and the second column vector. This possible implementation method illustrates a specific calculation process of multiplying the first matrix and the third matrix, which improves the realizability of the solution.

In a possible implementation of the first aspect, the method further includes: the network device calculates a first relocation number based on a non-zero data index, the non-zero data index being used to represent the number of moves in the first matrix. The distribution of data in the weight matrix before pruning, as described in Chapter 1 A moving number is used to represent the number of moving steps of the selected data, and the selected data is used to represent the data corresponding to the non-zero data index in the first column vector; the network device compresses the first column vector according to the non-zero data index. Obtaining the second column vector includes: the network device compresses the first column vector according to the first migration number to obtain the second column vector.

In this possible implementation, the non-zero data index represents the distribution of the data in the first matrix in the weight matrix before pruning. Since the first matrix includes the first row vector, the non-zero data index includes the distribution of the first row vector in the weight matrix before pruning. The network device can obtain the selected data in the first column vector based on the non-zero data index. The selected data is obtained based on the distribution of the first row vector in the weight matrix before pruning. In addition, it can also obtain the selected data based on the first row vector. The distribution status in the weight matrix before pruning is obtained by the number of moving steps of the selected data, that is, the first moving number, and then the first column vector is compressed according to the first moving number to obtain the second column vector. In this possible implementation, the network device calculates the first number of moving steps through the non-zero data index, and then compresses the first column vector according to the first number of moving steps to obtain the second column vector. In this way, the compression of the first column vector is achieved, which improves the computing efficiency of the network device.

In a possible implementation of the first aspect, the network device compresses the first column vector according to the first migration index to obtain a second column vector, including: the network device compresses the first column vector according to the non-zero data The index selects the selected data from the first column vector; the network device shifts the selected data according to the first movement number to obtain the second column vector.

In this possible implementation, the network device can obtain the selected data in the first column vector based on the non-zero data index, and the selected data is obtained based on the distribution of the first row vector in the weight matrix before pruning. The network device can also obtain the first migration number based on the distribution of the first row vector in the weight matrix before pruning, and then shift the selected data according to the first migration number to obtain the second column vector. This possible implementation The method provides a specific way to obtain the second column vector, which improves the achievability of the solution.

In a possible implementation of the first aspect, the method further includes: the network device selects multiple groups of selected data from the first column vector according to the non-zero data index, and each group of selected data includes One or more selected data; the network device shifts the multiple sets of selected data according to the first movement number to obtain multiple sets of selected vectors; the network device compresses the multiple sets of selected vectors according to the sparsity ratio Finally, the second column vector is obtained.

In this possible implementation, if the first matrix data amount is large, the network device can obtain multiple sets of selected data in the first column vector based on the non-zero data index, where each set of selected data is based on the first row Obtained from the distribution of vectors in the weight matrix before pruning. The network device can also obtain the first movement number corresponding to each group of selected data based on the distribution of the first row vector in the weight matrix before pruning, and then shift each group of selected data according to each first movement number. Using multiple groups of selected vectors, compress the multiple groups of selected vectors according to the sparsity ratio to obtain the second column of vectors. In this possible implementation, if the amount of data in the first matrix is large, based on the computing power of the processing element (PE), the network device can compress multiple sets of selected vectors to form a second column vector, which is processed in one processing The product of the first row vector and the second column vector is completed in the component, making full use of the computing power of the processing component, avoiding waste of computing power, and improving calculation efficiency.

In a possible implementation of the first aspect, the sparsity ratio is determined by the computing power of the processing component in the network device and/or the weight matrix before pruning.

In this possible implementation, when configuring the sparse ratio, the value of the sparse ratio can be determined based on the computing power of the processing component, the complexity of the weight matrix before pruning, the accuracy required for weight matrix training before pruning, and other factors. . This possible implementation provides multiple factors that need to be considered when determining the sparse ratio. When the network device sets the sparse ratio, different values can be set based on different factors, which increases the flexibility of the solution.

A second aspect of the present application provides a network device, which includes at least one processor, a memory, and a communication interface. The processor is coupled to memory and communication interfaces. The memory is used to store instructions, the processor is used to execute the instructions, and the communication interface is used to communicate with other network devices under the control of the processor. When executed by the processor, the instruction causes the network device to execute the method in the above first aspect or any possible implementation of the first aspect.

A third aspect of the present application provides a computer-readable storage medium that stores a program that causes a terminal device to execute the method in the above-mentioned first aspect or any possible implementation of the first aspect.

A fourth aspect of the present application provides a computer program product that stores one or more computer-executable instructions. When the computer-executable instructions are executed by the processor, the processor executes the above-mentioned first aspect or any of the first aspects. One possible way to achieve this.

A fifth aspect of the present application provides a chip. The chip includes a processor and a communication interface. The processor is coupled to the communication interface. The processor is configured to read instructions to execute the above first aspect or any one of the first aspects. possible implementation methods.

A sixth aspect of this application is a network system. The network system includes a network device, and the network device can execute the method described in the first aspect or any possible implementation of the first aspect.

Description of drawings

Figure 1 is a schematic structural diagram of a data computing system provided by this application;

Figure 2 is a schematic flow chart of a data calculation method provided by this application;

Figure 3 is a schematic diagram of an embodiment of a data calculation method provided by this application;

Figure 4 is a schematic diagram of another embodiment of a data calculation method provided by this application;

Figure 5 is a schematic diagram of another embodiment of a data calculation method provided by this application;

Figure 6 is a schematic diagram of another embodiment of a data calculation method provided by this application;

Figure 7 is a schematic diagram of another embodiment of a data calculation method provided by this application;

Figure 8 is a schematic diagram of another embodiment of a data calculation method provided by this application;

Figure 9 is a schematic diagram of another embodiment of a data calculation method provided by this application;

Figure 10 is a schematic structural diagram of a network device provided by this application;

Figure 11 is a schematic structural diagram of another network device provided by this application.

Detailed ways

The examples provided in this application are described below in conjunction with the accompanying drawings. Obviously, the described examples are only examples of a part of this application, not all examples. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in this application are also applicable to similar technical problems.

The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used are interchangeable under appropriate circumstances so that the examples described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

"And/or" in this application is just an association relationship describing related objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and they exist alone. In the three cases of B, A and B can be singular or plural. Furthermore, in the description of this application, unless otherwise specified, "plurality" means two or more than two. "At least one of the following" or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items). For example, at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .

In order to solve the problems existing in the above solution, this application provides a data calculation method, a data calculation system and a network device. In this application, the network device obtains the first matrix and the second matrix, where the first matrix is the target model. The weight matrix after pruning, the second matrix is the data input to the target model. Network devices receive configuration instructions that set the sparsity ratio. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and then calculates the product of the first matrix and the third matrix. Network equipment can be configured according to Setting instructions are used to flexibly set the value of the sparse ratio. Therefore, the computing performance of the hardware can be more fully utilized for different models. The applicable scenarios are wider and the compatibility is stronger.

The following first introduces the structure of a data computing system provided by this application.

Figure 1 is a schematic structural diagram of a data computing system provided by this application.

In this application, the data computing system includes a computing engine. The computing engine obtains a first matrix and a second matrix from a register, where the first matrix is the pruned weight matrix in the target model. As shown in Figure 1, assuming that the first matrix The scale of is M×K, that is, the first matrix has M rows of data and K columns of data. The second matrix is the data input to the target model. It is assumed that the scale of the second matrix is WK×N, that is, the second matrix has WK rows of data and N columns of data, where W is the sparse ratio. The calculation engine can flexibly set the value of the sparse ratio according to the configuration instruction, and further, can compress the second matrix according to the sparse ratio to obtain the third matrix. The size of the third matrix is K×N, that is, the third matrix has K rows of data and N columns of data. The size of the matrix obtained after the calculation engine calculates the product of the first matrix and the third matrix is M×N, that is, the matrix has M rows of data and N columns of data.

In this application, the data computing system includes a computing engine. The computing engine can flexibly set the value of the sparsity ratio according to the configuration instructions. Therefore, the computing performance of the hardware can be more fully utilized for different models. The applicable scenarios are wider and the compatibility is stronger. .

Based on the data calculation system described in Figure 1, the data calculation method provided by this application is introduced.

Figure 2 is a schematic diagram of the calculation flow of a data calculation method provided by this application.

101. The network device obtains the first matrix and the second matrix.

In this application, the first matrix is used to represent the pruned weight matrix in the target model, and the second matrix is used to represent the data input to the target model.

For example, as shown in Figure 1, it is assumed that the size of the first matrix is M×K, that is, the first matrix has M rows of data and K columns of data. The first matrix may be obtained by pruning the weight matrix before pruning, and the scale of the weight matrix before pruning is WM×K. The second matrix is the data input to the target model. It is assumed that the scale of the second matrix is WK×N, that is, the second matrix has WK rows of data and N columns of data, where W is the sparse ratio.

102. The network device receives the configuration instruction.

In this application, the configuration instruction is used to set the sparse ratio, and the sparse ratio can be changed. There are many ways for network devices to configure the sparsity ratio according to configuration instructions, which are explained in detail below.

Method 1: Include the sparsity ratio value in the configuration directive.

In this application, the configuration instruction may include the value of the sparse ratio, and the sparse ratio may be configured by operation and maintenance personnel according to requirements. Optionally, the computing power of the processing component can be considered when configuring the sparse ratio. Assuming that the processing component PE can complete 8×8 type data calculation, the sparse ratio can be based on the existing computing power and not exceeding the computing power of the processing component. , depending on the complexity of the weight matrix and the training accuracy requirements of the model, choose 8:16, 4:16, 4:8, 2:8 and other solutions. Optionally, if the computing power of the processing component supports it, you can also consider the training accuracy required for the unpruned weight matrix when configuring the sparse ratio value, and flexibly adjust the sparse ratio value based on the training accuracy. Optionally, if the computing power of the processing component supports it, the complexity of the unpruned weight matrix can also be considered when configuring the value of the sparse ratio, and the value of the sparse ratio can be flexibly adjusted according to the complexity of the weight matrix. Optionally, operation and maintenance personnel can also configure the sparsity ratio value based on other factors, which are not limited here.

Method 2: Generate sparse ratio according to configuration instructions.

In this application, the configuration instruction may not include the sparse ratio value, and the sparse ratio value may be generated by the network device after receiving the configuration instruction. Optionally, after receiving the configuration instruction, the network device can determine the sparsity ratio value based on the data amount of the unpruned weight matrix, and generate different sparse ratio values for the unpruned weight matrix with different data amounts. For example, when the network device confirms that the data amount of the unpruned weight matrix is greater than the first threshold A, the value of the sparse ratio can be set to ratio A. When the network device confirms that the data amount of the unpruned weight matrix is less than or equal to the first threshold Threshold A, then the value of the sparse proportion can be set to proportion B, where the value of proportion A is smaller than the value of proportion B. In this setting, the sparsity ratio of the unpruned weight matrix is small when the amount of data is large, which can improve the training efficiency of the weight matrix. Optionally, after receiving the configuration instruction, the network device can determine the sparsity ratio value based on the training accuracy required for the unpruned weight matrix. For example, an unpruned weight matrix can be accompanied by an identifier, which is used to describe the training accuracy of the unpruned weight matrix. The network device can generate a corresponding sparsity ratio value based on the identifier, so that the network device can generate a corresponding sparsity ratio value based on the identifier. to generate values for different coefficient ratios. Optionally, the network device can also generate a coefficient ratio value based on other factors, which are not limited here.

103. The network device compresses the second matrix according to the sparsity ratio to obtain the third matrix.

In this application, the calculation engine can compress the second matrix according to the sparsity ratio to obtain the third matrix. If it is assumed that the scale of the second matrix is WK×N, that is, the second matrix has WK rows of data and N columns of data, where W is the sparse ratio, then the scale of the third matrix is K×N, that is, the third matrix has K rows of data. , N columns of data.

104. The network device calculates the product of the first matrix and the third matrix.

In this application, the network device may use the processing component PE to calculate the product of the first matrix and the third matrix to obtain a convolution result.

In this application, the network device obtains the first matrix and the second matrix. The first matrix is the pruned weight matrix in the target model. The second matrix is the data input to the target model. The network device can compress the second matrix according to the sparse ratio. matrix to obtain a third matrix, and the network device calculates the product of the first matrix and the third matrix. The network device compresses the second matrix according to the sparse ratio to obtain a third matrix, and then calculates the product of the first matrix and the third matrix. Network devices can flexibly set the sparsity ratio value according to configuration instructions. Therefore, the computing performance of the hardware can be more fully utilized for different models. The applicable scenarios are wider and the compatibility is stronger.

In this application, in the above-mentioned step 103, the network device compresses the second matrix according to the sparse ratio to obtain the third matrix. There is a specific implementation method, which will be described in detail below with an example.

In this application, the process of the network device compressing the second matrix according to the sparse ratio to obtain the third matrix includes the process of the network device compressing the first column vector according to the sparse ratio to obtain the second column vector, wherein the first column vector belongs to the third matrix. Two matrices, the second column vector belongs to the third matrix. In addition, the process of the network device calculating the product of the first matrix and the third matrix will also include the process of the network device calculating the product of the first row vector and the second column vector, where the first row vector belongs to the first matrix.

Figure 3 is a schematic diagram of another calculation flow of a data calculation method provided by this application.

For example, please refer to Figure 3. In this application, the network device can dynamically configure the sparse ratio according to the computing power of the computing component and the complexity of the unpruned weight matrix to ensure that the model adopts the configuration of N:M sparse ratio. , to achieve a fine-grained structured sparse acceleration process. By inputting the bitmap (non-zero data index) data of matrix A (first matrix) into the Prefix Sum module, calculate the number of local moving steps of the selected data in the column vector of matrix B (second matrix) of each row of PE (first Move number), and then input the calculation result and sparse ratio to the non-zero data compressor, and complete the data compression of the column vector in the B matrix on a PE through local non-zero data compression and global data compression. The sparse matrix accelerator can parse and calculate the bitmap of matrix A by adding Prefix Sum and None-Zero Shifter units to each PE module, and perform non-zero data movement on the column vector of matrix B based on the results, thereby achieving acceleration of matrix operations. .

The following describes in detail the process by which the network device compresses the first column vector according to the sparsity ratio to obtain the second column vector.

In this application, first, the network device can calculate the first migration number based on the non-zero data index. The non-zero data index is used to represent the distribution of the data in the first matrix in the weight matrix before pruning. The first migration number is It represents the number of moving steps of the selected data. The selected data is used to represent the data corresponding to the non-zero data index in the first column vector.

For example, as shown in Figure 3, the network device can introduce the Prefix Sum and None-Zero Shifter modules in the matrix calculation engine to ensure that the fine-grained structured sparse acceleration process can be achieved when the model adopts the configuration of N:M sparse ratio. In this application, the specific operation process of the Prefix-Sum module is shown in Figure 3. Invert each bit of data in the bitmap (non-zero data index), and accumulate the data before each bitmap, so that the number of zero-valued elements of the previous data can be obtained as the distance that the element will subsequently move (first Number of moves).

Figure 4 is a schematic diagram of another calculation flow of a calculation method provided by this application.

For example, the data included in the non-zero data index in Figure 4 are 1, 0, 0, 1, 1, 0, 1, 0 respectively. After inverting the above data, it can be seen that the inverted results are 0, 1, 1, 0, 0, 1, 0, 1. Accumulating the above data, you will get the result of 0, 0, 1, 2, 2, 2,, 3, 3, which means that the data that needs to be shifted from the first to the 8th bit in the first column vector are 0 respectively. ,0,1,2,2,2,3,3.

In this application, after the network device obtains the first migration number, it can compress the first column vector into the second column vector. The specific compression process will be described in detail in the following example.

In this application, the non-zero data movement control signal required to be moved is obtained from Prefix Sum and sent to the Local None-Zero Shifter unit. The network device first selects the selected data from the first column vector according to the non-zero data index. Then, the network The device shifts the selected data according to the first move number to obtain the second column vector, that is, moves each selected data unit to the appropriate position according to the first move number.

Figure 5 is a schematic diagram of another calculation flow of a data calculation method provided by this application.

For example, please refer to Figure 5. The None-Zero Shifter unit at least includes the local None-Zero Shifter unit. None-Zero Shifter can complete the data compression of the column vector of the B matrix on a PE based on the number of non-zero local movement steps (the first movement number) calculated by Prefix Sum. As shown in Figure 5, if the sparse ratio is 4:8, according to the instructions of the non-zero data index, the selected data are 1 in the first position, 1 in the fourth position, 1 in the fifth position, and 7th position. 1, then the selected data in every 8 elements will be moved to the first 4 positions of the queue. In the same way, as shown in Figure 6, if the sparse ratio is 2:8, the selected data in every 8 elements will be moved to the first 2 positions of the queue. As shown in Figure 7, if the sparse ratio is 1:8, the selected data in every 8 elements will be moved to the first place in the queue.

In this application, after the network device obtains the first migration number, there are other ways to compress the first column vector into the second column vector. The specific compression process will be explained in detail in the following example.

In this application, the non-zero data movement control signal required to be moved is obtained from Prefix Sum and sent to the Local None-Zero Shifter unit. The network device selects multiple groups of selected data from the first column vector according to the non-zero data index, where each group The selected data includes one or more selected data. The network device shifts the multiple sets of selected data according to the first shift number to obtain multiple sets of selected vectors. The network device compresses the multiple sets of selected vectors according to the sparsity ratio to obtain the second column vector.

Figure 8 is a schematic diagram of another calculation flow of a data calculation method provided by this application.

Please refer to Figure 8. The None-Zero Shifter unit can be divided into two sub-modules, namely local None-Zero Shifter and global None-Zero Shifter. None-Zero Shifter can calculate the number of non-zero local moving steps (the first moving number) and the sparse ratio based on Prefix Sum, and complete the data of column vectors in the B matrix on a PE through the two modules of local non-zero data compression and global data compression. compression process. Among them, Local None-Zero Shifter completes local compression. The specific implementation method is similar to the implementation method mentioned in the above example. Global None-Zero Shifter uses the result of local compression as input to perform global data compression, and compresses the compressed data. Data is sent to PE.

Figure 9 is a schematic diagram of another calculation flow of a data calculation method provided by this application.

Please refer to Figure 9. In this application, the global data compression module has a specific implementation method for compressing the second matrix, which will be explained in detail in the following example.

In this application, the non-zero data movement control signal required to be moved is obtained from Prefix Sum and sent to the Local None-Zero Shifter unit to move the non-zero data of each data unit to the appropriate location. If the sparse ratio is 4:8, then the non-zero data in every 8 elements will be moved to the first 4 positions of the queue; if the sparse ratio is 2:8, then the non-zero data in every 8 elements will be moved to the queue. The first 2 positions; if the sparse ratio is 1:8, non-zero data in every 8 elements will be moved to the first position of the queue. According to the queue data moved by Local None-Zero Shifter and the sparse proportion parameter W, the non-zero data in multiple queues are assembled to complete 16 data inputs for subsequent calculations. If the sparse ratio is 4:8, move the first 4 non-zero data in the 4 queues into a non-zero data vector with 16 digits; if the sparse ratio is 2:8, move the first 2 non-zero data in the 8 queues The data is moved to form a non-zero data vector with 16 digits; if the sparse ratio is 1:8, the first zero data in the 16 queues is moved to form a non-zero data vector with 16 digits.

The above examples provide different implementations of a data calculation method. The following provides a network device 20, as shown in Figure 10. The network device 20 is used to execute the data calculation method involved in the above examples. The execution steps are as follows: Please refer to the above corresponding examples to understand the corresponding beneficial effects, which will not be repeated here, including:

The processing unit 201 is configured to obtain a first matrix and a second matrix, the first matrix is used to represent the pruned weight matrix in the target model, and the second matrix is used to represent the data input to the target model;

The receiving unit 202 is used to receive configuration instructions, where the configuration instructions are used to set the sparse ratio;

The processing unit 201 is used for:

Compress the second matrix according to the sparse ratio to obtain a third matrix;

The network device calculates a product of the first matrix and the third matrix.

In one possible implementation,

The processing unit 201 is used for:

Compress the first column vector according to the sparse ratio to obtain a second column vector, the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;

Calculate the product of a first row vector belonging to the first matrix and the second column vector.

In one possible implementation,

The processing unit 201 is also configured to calculate the first migration number based on the non-zero data index, which is used to represent the distribution status of the data in the first matrix in the weight matrix before pruning, so The first moving number is used to represent the number of moving steps of the selected data, and the selected data is used to represent the data corresponding to the non-zero data index in the first column vector;

The processing unit 201 is configured to compress the first column vector according to the first movement number to obtain a second column vector.

In one possible implementation,

The processing unit 201 is used for:

Select selected data from the first column vector based on the non-zero data index;

The selected data is shifted according to the first shift number to obtain the second column vector.

In one possible implementation,

The processing unit 201 is used for:

Select multiple groups of selected data from the first column vector according to the non-zero data index, each group of selected data including one or more selected data;

Shift the multiple sets of selected data according to the first shift number to obtain multiple sets of selected vectors;

The second column vector is obtained by compressing the plurality of selected vectors according to the sparsity ratio.

In one possible implementation,

The sparse ratio is determined by the computing power of the processing component in the network device and/or the weight matrix before pruning.

It should be noted that the information interaction, execution process, etc. between the modules of the network device 20 are based on the same concept as the method examples of this application, and the execution steps are consistent with the details of the above method steps. Please refer to the above method examples. description of the location.

The above examples provide different implementations of the network device 20. The following provides a network device 30, as shown in Figure 11. The network device 30 is used to perform the sudden access response method in the above example. The execution steps Please refer to the above corresponding examples to understand the specific beneficial effects and will not be repeated here.

Referring to FIG. 11 , a schematic structural diagram of a network device is provided for this application. The network device 30 includes: a processor 302 , a communication interface 303 , and a memory 301 . Optionally, bus 304 may be included. Among them, the communication interface 303, the processor 302 and the memory 301 can be connected to each other through the bus 304; the bus 304 can be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (EISA) bus etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 11, but it does not mean that there is only one bus or one type of bus. The network device 30 can implement the functions of any network device in the example shown in FIG. 11 . The processor 302 and the communication interface 303 can perform corresponding operations of the network device in the above method examples.

The following is a detailed introduction to each component of the network equipment in conjunction with Figure 10:

The memory 301 may be a volatile memory (volatile memory), such as a random-access memory (RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). memory (ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (SSD); or a combination of the above types of memories, used to store data that can implement the method of the present application. Program code, configuration files, or other content.

The processor 302 is the control center of the controller, which can be a central processing unit (CPU), an application specific integrated circuit (ASIC), or is configured to implement the examples provided in this application. One or more integrated circuits, such as one or more digital signal processors (DSP), or one or more field programmable gate arrays (FPGA).

Communication interface 303 is used to communicate with other network devices.

The processor 302 can perform the operations performed by the network device in the example shown in FIG. 10, which will not be described again here.

It should be noted that the information exchange, execution process, etc. between the modules of the above-mentioned network device 30 are different from those of the method of the present application. The example is based on the same concept, and its execution steps are consistent with the details of the above method steps. Please refer to the description of the above method example.

This application provides a chip. The chip includes a processor and a communication interface. The processor is coupled to the communication interface. The processor is used to read instructions and execute the network in the embodiments described in Figures 1 to 11. The operation performed by the device.

This application provides a network system, which includes the network device described in the embodiment described in Figure 10.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing examples, and will not be described again here.

Among the several examples provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device examples described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. You can select some or all of the units according to actual needs to achieve the purpose of this example.

In addition, each functional unit in each example of this application can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in each example of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, read-only memory), random access memory (RAM, random access memory), magnetic disk or optical disk and other media that can store program code. .

The above-mentioned specific implementations further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that different examples can be combined, and the above are only specific implementations of the present invention. It is not intended to limit the protection scope of the present invention. Any combination, modification, equivalent substitution, improvement, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention. As mentioned above, the above examples are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing examples, those of ordinary skill in the art should understand that they can still modify the foregoing examples. The recorded technical solutions may be modified, or some of the technical features thereof may be equivalently substituted; however, these modifications or substitutions shall not cause the essence of the corresponding technical solutions to depart from the scope of the exemplary technical solutions of this application.

Claims

A data calculation method, characterized by including:

The network device obtains a first matrix and a second matrix, the first matrix is used to represent the pruned weight matrix in the target model, and the second matrix is used to represent the data input to the target model;

The network device receives a configuration instruction, and the configuration instruction is used to set the sparse ratio;

The network device compresses the second matrix according to the sparse ratio to obtain a third matrix;

The network device calculates a product of the first matrix and the third matrix.
The data calculation method according to claim 1, characterized in that the network device compresses the second matrix according to the sparse ratio to obtain a third matrix, including:

The network device compresses the first column vector according to the sparsity ratio to obtain a second column vector, the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;

The network device calculates the product of the first matrix and the third matrix, including:

The network device calculates the product of a first row vector belonging to the first matrix and the second column vector.
The data calculation method according to claim 2, characterized in that the method further includes:

The network device calculates a first migration number based on a non-zero data index. The non-zero data index is used to represent the distribution status of the data in the first matrix in the weight matrix before pruning. The first migration number Used to represent the number of moving steps of the selected data, which is used to represent the data corresponding to the non-zero data index in the first column vector;

The network device compresses the first column vector according to the non-zero data index to obtain the second column vector, including:

The network device compresses the first column vector according to the first migration number to obtain a second column vector.
The data calculation method according to claim 3, wherein the network device compresses the first column vector according to the first movement index to obtain a second column vector, including:

The network device selects selected data from the first column vector based on the non-zero data index;

The network device shifts the selected data according to the first movement number to obtain the second column vector.
The data calculation method according to claim 3, characterized in that the method further includes:

The network device selects multiple groups of selected data from the first column vector according to the non-zero data index, and each group of selected data includes one or more selected data;

The network device shifts the plurality of groups of selected data according to the first movement number to obtain multiple groups of selected vectors;

The network device compresses the plurality of selected vectors according to the sparsity ratio to obtain the second column vector.
The data calculation method according to any one of claims 1 to 5, characterized in that the sparsity ratio is determined by the computing power of the processing component in the network device and the weight matrix before pruning.
A network device, characterized by including:

A processing unit configured to obtain a first matrix and a second matrix, the first matrix being used to represent the pruned weight matrix in the target model, and the second matrix being used to represent the data input to the target model;

A receiving unit, configured to receive configuration instructions, where the configuration instructions are used to set the sparse ratio;

The processing unit is used for:

Compress the second matrix according to the sparse ratio to obtain a third matrix;

The network device calculates a product of the first matrix and the third matrix.
The network device according to claim 7, characterized in that,

The processing unit is used for:

Compress the first column vector according to the sparse ratio to obtain a second column vector, the first column vector belongs to the second matrix, and the second column vector belongs to the third matrix;

Calculate the product of a first row vector belonging to the first matrix and the second column vector.
The network device according to claim 8, characterized in that:

The processing unit is further configured to calculate a first migration number based on a non-zero data index, which is used to represent the distribution of the data in the first matrix in the weight matrix before pruning. The first migration number is used to represent the number of migration steps of the selected data. The selected data is used to represent the data corresponding to the non-zero data index in the first column vector;

The processing unit is configured to compress the first column vector according to the first movement number to obtain a second column vector.
The network device according to claim 9, characterized in that:

The processing unit is used for:

Select selected data from the first column vector based on the non-zero data index;

The selected data is shifted according to the first shift number to obtain the second column vector.
The network device according to claim 9, characterized in that:

The processing unit is used for:

Select multiple groups of selected data from the first column vector according to the non-zero data index, each group of selected data including one or more selected data;

Shift the multiple sets of selected data according to the first shift number to obtain multiple sets of selected vectors;

The second column vector is obtained by compressing the plurality of selected vectors according to the sparsity ratio.
The network device according to any one of claims 7 to 11, characterized in that the sparsity ratio is determined by the computing power of the processing unit in the network device and/or the weight matrix before pruning.
A network device, characterized by including:

processor and memory;

The processor is configured to execute instructions stored in the memory, so that the method described in any one of claims 1 to 6 is executed.
A computer-readable storage medium used to store a computer program, characterized in that when the computer program is executed on a computer or processor, the method described in any one of claims 1 to 6 is executed.
A computer program product, characterized in that, when run on a computer or processor, the method according to any one of claims 1 to 6 is executed.