CN111294056B

CN111294056B - Data decompression method and coding circuit

Info

Publication number: CN111294056B
Application number: CN201811496284.2A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2022-03-29
Anticipated expiration: 2038-12-07
Also published as: CN111294056A

Abstract

The application relates to a data decompression method and a coding circuit, firstly, compressed data are decomposed to obtain data blocks to be decompressed, wherein the data blocks to be decompressed comprise corresponding data heads and data bodies, then, each data block to be decompressed is decompressed by using a traditional decompression method to obtain a decompressed data block, and finally, decompressed data are obtained according to the decompressed data block. The method converts the compressed data comprising the header section and the data section into data which can be decompressed by the traditional decompression method, and is simple to implement. The method also decompresses the compressed data in blocks, can realize parallel decompression and improve the decompression efficiency.

Description

Data decompression method and coding circuit

Technical Field

The present application relates to the field of information technology, and in particular, to a data decompression method and an encoding circuit.

Background

Data decompression is the inverse process of data compression, and in the conventional technology, the data decompression process generally processes compressed data by selecting a proper coding algorithm so as to restore the compressed data to a state before compression.

However, the conventional data decompression method cannot decompress compressed data including a header section and a data section.

Disclosure of Invention

In view of the above, it is necessary to provide a data decompression method and an encoding circuit capable of decompressing compressed data including a header section and a data section in response to the above-described technical problem.

A method of data decompression comprising:

acquiring compressed data, wherein the compressed data comprises a header section and a header section data section, the header section comprises a plurality of data heads, the data section comprises a plurality of data bodies corresponding to the data heads, the data heads comprise starting addresses and data lengths of the corresponding data bodies, and the data bodies comprise encoded data of corresponding data blocks before compression;

decomposing compressed data to obtain a plurality of data blocks to be decompressed, wherein the data blocks to be decompressed comprise a data head and a corresponding data body;

decompressing each data block to be decompressed by using a preset decompression algorithm to obtain decompressed data blocks;

and placing the decompressed data blocks according to a second preset placing format to obtain decompressed data.

As an optional implementation manner, the second preset placing format is obtained according to a position relationship between data blocks included before compression of the compressed data.

As an optional implementation manner, decomposing the compressed data to obtain a plurality of data blocks to be decompressed includes:

and if the data head and the data body contain identification bits for identifying the corresponding relationship, determining the data head and the data body in each data block to be decompressed according to the numerical values of the identification bits.

As an optional implementation, the preset codec includes: huffman coding, run-length coding and LZ 77.

A method of data decompression comprising:

grouping the obtained data blocks to be decompressed according to the number of the coding circuits to obtain a plurality of data groups to be decompressed;

distributing the obtained data group to be decompressed to a plurality of coding circuits, decompressing the data block to be decompressed in the received data group to be decompressed by the coding circuits according to a preset decompression algorithm to obtain a plurality of decompressed data blocks;

As an optional implementation manner, the grouping the obtained multiple data blocks to be decompressed according to the number of the encoding circuits to obtain multiple data groups to be decompressed includes:

and if the number of the coding circuits is n, dividing the data blocks to be decompressed into m groups, wherein m is an integral multiple of n.

In an optional implementation manner, the layout format of each data body in the data segment of the compressed data is one-dimensional compact, two-dimensional compact, or compact in any dimension.

An encoding circuit, comprising: a data dividing circuit and a compression/decompression circuit connected with each other,

the data dividing circuit is configured to obtain compressed data, where the compressed data includes a header segment and a header segment data segment, the header segment includes multiple data headers, the data segment includes multiple data volumes corresponding to the data headers, the data headers include start addresses and data lengths of the corresponding data volumes, and the data volumes include encoded data of corresponding data blocks before compression; decomposing compressed data to obtain a plurality of data blocks to be decompressed, wherein the data blocks to be decompressed comprise a data head and a corresponding data body;

the compression and decompression circuit is used for decompressing each data block to be decompressed by using a preset compression and decompression algorithm to obtain a decompressed data block; and placing the decompressed data blocks according to a second preset placing format to obtain decompressed data.

According to the data decompression method and the coding circuit, firstly, compressed data are decomposed to obtain data blocks to be decompressed, wherein the data blocks to be decompressed comprise corresponding data headers and data bodies, then, each data block to be decompressed is decompressed by using a traditional decompression method to obtain a decompressed data block, and finally, decompressed data are obtained according to the decompressed data block. The method converts the compressed data comprising the header section and the data section into data which can be decompressed by the traditional decompression method, and is simple to implement. The method also decompresses the compressed data in blocks, can realize parallel decompression and improve the decompression efficiency.

Drawings

FIG. 1 is a block diagram of a data access circuit in one embodiment;

FIG. 2 is a flow diagram illustrating a data access method according to one embodiment;

FIG. 3 is a diagram illustrating placement of data blocks in data to be accessed according to an embodiment;

FIG. 4 is a block diagram of an embodiment of a computing device;

FIG. 5 is a flow diagram that illustrates a data processing method, in accordance with one embodiment;

FIG. 6 is a block diagram showing the structure of an arithmetic device according to an embodiment;

FIG. 7 is a flow diagram illustrating the steps taken by a master slave arithmetic unit to transfer data in one embodiment;

FIG. 8 is a block diagram showing the structure of a computing device according to another embodiment;

FIG. 9 is a flowchart illustrating the steps of a master-slave arithmetic unit in one embodiment for transferring data;

FIG. 10 is a block diagram of an encoding circuit in one embodiment;

FIG. 11 is a flow diagram that illustrates a method for data compression, according to one embodiment;

FIG. 12 is a flow diagram illustrating a process for obtaining compressed data based on a header portion of the compressed data and a data portion of the compressed data in one embodiment;

FIG. 13 is a flow chart illustrating the process of obtaining compressed data according to the header section and the data section of the compressed data in another embodiment;

FIG. 14 is a block diagram showing the structure of an arithmetic device according to an embodiment;

FIG. 15 is a flow diagram illustrating a data processing method, according to an embodiment;

FIG. 16 is a block diagram showing the structure of an arithmetic device according to an embodiment;

FIG. 17 is a block diagram showing the construction of an arithmetic device according to another embodiment;

FIG. 18 is a flow diagram illustrating a method for neural network operation, according to one embodiment;

FIG. 19 is a flow diagram illustrating a full join operation in one embodiment;

FIG. 20 is a flowchart illustrating a data compression method according to another embodiment;

FIG. 21 is a flow diagram illustrating a process for obtaining compressed data based on a header portion of the compressed data and a data portion of the compressed data, according to one embodiment;

FIG. 22 is a diagram illustrating a flow of obtaining compressed data according to a header section of the compressed data and a data section of the compressed data in another embodiment;

FIG. 23 is a flowchart illustrating a data compression method according to another embodiment;

FIG. 24 is a flow diagram that illustrates a data processing method, under an embodiment;

FIG. 25 is a flow diagram illustrating a method for neural network operation, according to one embodiment;

FIG. 26 is a flow diagram that illustrates the processing of a full join operation, according to one embodiment;

FIG. 27 is a schematic flow chart diagram illustrating a data decompression method in one embodiment;

FIG. 28 is a schematic flow chart diagram illustrating a data decompression method according to another embodiment;

FIG. 29 is a flowchart illustrating a data decompression method according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in FIG. 1, a data access circuit 100 is provided, which includes a read-write control circuit 110, a read-write circuit 120, and a memory 130, coupled to each other. The read-write control circuit 110 is configured to generate a data read-write parameter, and generate a read-write control instruction according to the data read-write parameter. The read/write control command is used to control the read/write circuit 120 to perform a specific read/write operation. Specifically, the data read/write parameters generated by the read/write control circuit 110 include: read-write start address, step length, read-write operation times and single read-write block number. The memory 130 is used to store data to be accessed. The data to be accessed may be divided into a plurality of data blocks, which may be stored in the memory 130 in a one-dimensional compact or two-dimensional compact form.

Alternatively, when the data to be accessed is partitioned, one or more of the total data size of the data to be accessed, the data distribution characteristics, the importance of the data, and the like may be considered. Wherein the importance of the data can be determined according to the appearance frequency, data size and other characteristics of the data. Optionally, when the data to be accessed is partitioned, a preset value may also be referred to. Alternatively, the data in memory 130 may be input data, intermediate data, or the like.

In one embodiment, as shown in fig. 2, a data access method is provided, where the data access method is executed by the data access circuit in the foregoing embodiment, and the method specifically includes:

step S101: dividing data to be accessed into a plurality of data blocks, and placing each data block in the data to be accessed according to a preset format.

The data blocks are placed according to a preset format, namely, each data block is accessed to a corresponding position of an access medium according to the preset format. The predetermined format may be one-dimensional compact, two-dimensional compact, or compact in other dimensions. Specifically, the read-write control circuit 110 divides the data to be accessed into a plurality of data blocks, and places each data block in the data to be accessed according to a preset format. Further, the read-write control circuit 110 divides the data to be accessed into a plurality of data blocks, and places the data blocks according to a preset format, so as to obtain the storage parameters of each data block in the data to be accessed. The storage parameters of each data block in the data to be accessed include a start address, a block sequence number, a line number, and the like. The storage parameter is used for performing read-write operation on the data to be accessed. Wherein the block sequence number of the data block can be used to distinguish different data blocks. Alternatively, the data blocks may be numbered using numbers.

For example: assuming that the data to be accessed is divided into a plurality of data blocks and arranged according to a preset format, the arrangement result is shown in fig. 3. At this time, the data to be accessed is divided into 9 data blocks, and the 9 values 0 to 8 may be used to configure the block sequence numbers for the respective data blocks in the order from top to bottom and from left to right. The line numbers of the respective data blocks are arranged in order from top to bottom using 3 values of 0 to 2.

Step S102: and acquiring the read-write parameters and obtaining a read-write control instruction according to the read-write parameters. Wherein, the read-write parameters include: read-write start address, step size read-write operation, and number of blocks read-write once. Further, the step size is the difference value of the block sequence numbers of the starting data blocks of the two adjacent read-write operations. The number of read/write operations is the number of times the read/write circuit 120 performs the read/write operations. The number of blocks read/written at a time is the number of data blocks read/written at each time by the read/write circuit 120. Optionally, the read-write parameters may further include read-write operation times, and the read-write operation times may be obtained according to the total number of the data blocks to be read and written and the number of the single read-write blocks. The total number of the data blocks read and written can be set according to actual requirements. Optionally, the total number of the pre-read-write data blocks is an integer multiple of the number of the single read-write blocks in the read-write parameters.

Specifically, the read/write control circuit 110 obtains the read/write parameters and obtains the read/write control command according to the read/write parameters. For example, after the data to be accessed is divided into data blocks, the layout format is as shown in fig. 3, and it is assumed that the read/write control circuit 110 pre-reads the data blocks with block serial numbers of 4, 5, 7, and 8 in the data to be accessed. At this time, in the obtained read-write parameters, the read-write start address may be the start address a of the data block with the block sequence number of 4, the step length is 3, and the number of the single read-write blocks is 2. Since the total number of the data blocks to be read and written in advance in this example is greater than the number of the blocks to be read and written at a time, the number of times of the read and write operations may be set to 2 in this example.

Step S103: and performing read-write operation on the data to be accessed according to the read-write control instruction.

Specifically, the read/write circuit 120 of the data access circuit performs read/write operations on the data to be accessed according to the read/write control instruction. The read-write circuit 120 of the data access circuit determines the target data block of each read-write operation and the read-write operation order of the target data block according to the read-write control instruction and the storage parameter of the data to be stored, and reads and writes the target data block corresponding to the read-write operation according to the read-write operation order. Referring to fig. 3, the reading/writing circuit 120 of the reading/writing circuit 120 specifically includes, according to the data to be accessed, performing reading/writing operations: the read-write circuit 120 of the read-write circuit 120 first uses the start address a of the data block with the block sequence number of 4 in the data to be accessed as the start address of the first read-write operation, and uses the data blocks with the block sequence numbers of 4 and 5 as the target data block of the first read-write operation. Then, obtaining the initial address of the second read-write operation according to the step length 3 set in the read-write parameters: and taking the data blocks with the block sequence numbers of 7 and 8 as target data blocks of the second read-write operation. The data blocks with sequence numbers of 4 and 5 are read and written for the first time; and reading and writing the data blocks with the sequence numbers of 7 and 8 for the second time.

The data access method in the embodiment can acquire the data blocks in the data to be accessed according to different requirements through the read-write parameters, so that the data blocks in the stored data can be accessed without a storage sequence, and the data access method in the embodiment realizes the two-dimensional access to the stored data by introducing the access parameters of the step length and the number of the single read-write blocks, which provides great convenience for processing the data with two-dimensional similarity such as natural images, characteristic diagrams and the like.

Preset value

The data access circuit in the above embodiments may be arranged in any cluster, processor or arithmetic unit having data access requirements. The data access circuit is applied to a computing device as an example, and the application of the data access circuit to data operation will be described.

As shown in fig. 4, in one embodiment of the present application, an arithmetic device 10 is provided, which includes a master arithmetic unit 300 and a plurality of slave arithmetic units 400. The plurality of slave arithmetic units 400 are connected to the master arithmetic unit 300, respectively. Specifically, the master operation unit 300 may be used to perform preamble processing on input data and to transfer data with the plurality of slave operation units 400. Specifically, the plurality of slave arithmetic units 400 are configured to perform intermediate operations in parallel using data transmitted from the master arithmetic unit 300 to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the master arithmetic unit 300. The main arithmetic unit 300 is also used for performing subsequent processing on a plurality of intermediate results transmitted from the arithmetic unit 400. Further, the main operation unit 300 is further provided with the data access circuit 100 in the above embodiment. The data access circuit is used for data access. Alternatively, the arithmetic device may be configured such that the master arithmetic unit 300 and the slave arithmetic unit 400 are provided on a hardware level in accordance with a difference in function between the master arithmetic unit 300 and the slave arithmetic unit 400, or a plurality of identical arithmetic units may be provided from which the master arithmetic unit 300 and the slave arithmetic unit 400 are designated in the course of actually processing data.

In one embodiment, as shown in fig. 5, a data processing method is provided, which can be executed by the arithmetic device 10 to perform data processing. The method specifically comprises the following steps:

step S201: the main arithmetic unit of the arithmetic device obtains the data to be operated by using the data access method in any one of the embodiments.

Specifically, the operation host computing unit of the operation device 10 acquires data to be operated using the method in the above-described embodiment. More specifically, the data access circuit divides data to be input into a plurality of data blocks, and places each data block in the data to be accessed according to a preset format. And then, the data access circuit acquires the read-write parameters and obtains read-write control instructions according to the read-write parameters. And finally, the data access circuit performs read-write operation on the data to be accessed according to the read-write control instruction. And the data obtained by the read-write operation is the data to be operated. Optionally, the data to be calculated includes: a neuron matrix and/or a weight matrix.

Step S202: the main operation unit broadcasts or distributes the acquired data to be operated to the auxiliary operation unit, so that the auxiliary operation unit performs operation by using the acquired data to be operated to obtain an intermediate result, and the intermediate result is sent to the main operation unit. Alternatively, the intermediate result may be an output result of the slave arithmetic unit after performing a certain multiplication operation.

Step S203: and after receiving the intermediate result, the main operation unit performs subsequent processing to obtain an operation result.

Optionally, after the main operation unit 300 receives the intermediate result, performing subsequent processing may include: and carrying out accumulation and activation operation by using the intermediate result to obtain an operation result. Alternatively, if the operation result is the final operation result, the operation device 10 may terminate the data processing flow. If the operation result is not the final operation result, the operation device 10 can perform the operation of the next stage using the operation result.

In one alternative embodiment, as shown in fig. 6, a plurality of slave operation units 400 of the operation device 10 are distributed in an array; each slave computing unit 400 is connected to another adjacent slave computing unit 400, and the master computing unit 300 is connected to k slave computing units 400 among the plurality of slave computing units 400, where the k slave computing units 400 are: n slave arithmetic units 400 in row 1, n slave arithmetic units 400 in row m, and m slave arithmetic units 400 in column 1. As shown in fig. 6, the K slave arithmetic units 400 include only the n slave arithmetic units 400 in the 1 st row, the n slave arithmetic units 400 in the m th row, and the m slave arithmetic units 400 in the 1 st column, and in other words, the K slave arithmetic units 400 are the slave arithmetic units 400 directly connected to the master arithmetic unit 300 among the plurality of slave arithmetic units 400. Specifically, the K slave arithmetic units 400 are used for forwarding data between the master arithmetic unit 300 and the plurality of slave arithmetic units 400.

Further, the main operation unit 300 may include an active operation circuit, an addition operation circuit, and the data access circuit in the above embodiments. The activation arithmetic circuit is used for executing activation arithmetic of data in the main arithmetic unit 300; an addition operation circuit for performing an addition operation or an accumulation operation; the data access circuit is mainly used for transporting the data to be operated to the main operation unit 300 by using the data access method in the above embodiment. Specifically, the slave operation unit 400 includes a multiplication operation circuit. The multiplication operation circuit is used for executing multiplication operation on the received data block to obtain a product result. Optionally, the slave operation unit 400 may further include an addition operation circuit for performing an addition operation or an accumulation operation. Optionally, the slave operation unit 400 further comprises a forwarding circuit for forwarding the product result to the master operation unit 300.

In the present embodiment, as shown in fig. 7, step S202 (master-slave operation unit transfer data) includes:

step S2021 a: the main operation unit broadcasts or distributes the acquired data to be operated to the slave operation units through the K slave operation units.

Step S2022 a: and the slave operation unit performs multiplication or addition operation by using the data to be operated according to the corresponding operation instruction to obtain an intermediate result.

Step S2023 a: the slave arithmetic unit transmits the obtained intermediate result to the slave arithmetic unit through the K slave arithmetic units.

In another alternative embodiment, as shown in fig. 8, the operation device 10 may further include a branch operation unit 500, the main operation unit 300 is connected to one or more branch operation units 500, and the branch operation unit 500 is connected to one or more slave operation units 400.

Specifically, the branch arithmetic unit 500 is used for forwarding data between the master arithmetic unit 300 and the slave arithmetic unit 400. The main operation unit 300 may include an active operation circuit, an addition operation circuit, and the data access circuit in the above embodiments. The activation arithmetic circuit is used for executing activation arithmetic of data in the main arithmetic unit 300; an addition operation circuit for performing an addition operation or an accumulation operation; the data access circuit is mainly used for transporting the data to be operated to the main operation unit 300 by using the data access method in the above embodiment. Specifically, the slave operation unit 400 includes a multiplication operation circuit. The multiplication operation circuit is used for executing multiplication operation on the received data block to obtain a product result. Optionally, the slave operation unit 400 may further include an addition operation circuit for performing an addition operation or an accumulation operation. Optionally, the slave operation unit 400 further comprises a forwarding circuit for forwarding the product result to the master operation unit 300.

In the present embodiment, as shown in fig. 9, step S202 (master-slave operation unit transfer data) includes:

step S2021 b: the main operation unit broadcasts or distributes the acquired data to be operated to the slave operation units through the branch operation unit.

Step S2022 b: and the slave operation unit performs multiplication or addition operation by using the data to be operated according to the corresponding operation instruction to obtain an intermediate result.

Step S2023 b: the slave arithmetic unit sends the obtained intermediate result to the slave arithmetic unit through the branch arithmetic unit.

In one embodiment, as shown in fig. 10, an encoding circuit 200 is also provided. The encoding circuit includes a data splitting circuit 210 and a compression and decompression circuit 220. The data dividing circuit 210 is connected to the codec circuit 220. The data dividing circuit 210 may divide the data to be compressed according to a preset rule. Alternatively, the data dividing circuit 210 may divide the data into a plurality of data blocks according to the characteristics of the data. The codec 220 is used for compressing or decompressing data using a predetermined encoding scheme. Optionally, the encoding circuit may be disposed on each device in the cluster, and configured to compress data transmitted between the devices in the cluster. This may reduce the bandwidth requirements for data transmission by the devices in the cluster. Alternatively, the encoding circuit may be provided on a processor of a computer device including a plurality of processors. The encoding circuit may also be provided in the arithmetic unit of the arithmetic device or in other devices or components requiring data transmission.

In one embodiment, as shown in fig. 11, a data compression method is proposed, which is performed by the encoding circuit 200 in the above embodiment, and includes:

step S301, dividing the data to be compressed into a plurality of data blocks according to the characteristics of the data to be compressed.

Specifically, the data dividing circuit 210 of the encoding circuit 200 divides the data to be compressed into a plurality of data blocks according to the characteristics of the data to be compressed. Optionally, the characteristics of the data to be compressed may include one or more of a total size of the data, a distribution characteristic of the data, an importance degree of the data, and the like. Wherein the importance of the data can be determined according to the frequency of occurrence of the data, the size of the data, and the like. Optionally, a preset value may also be considered when dividing the data to be compressed into a plurality of data blocks. Optionally, the "0" value in the data to be compressed is sifted out before dividing the data to be compressed into a plurality of data blocks.

Step S302, respectively compressing each data block of the data to be compressed to obtain a data header and a data body corresponding to each data block. Each data header includes information such as a start address and a data length of a corresponding data body. Optionally, the data header may further include a correspondence identifier. Optionally, the data volume comprises encoded data of the corresponding data block before compression. Alternatively, the codec 220 may compress the blocks using Huffman coding, run-length coding, LZ77, any combination thereof, and the like. Optionally, the data to be processed is pre-processed according to a selected compression algorithm before the codec 220 compresses the individual data blocks using Huffman coding, run-length coding, LZ77, any combination thereof, and the like. For example, when Huffman coding is used to compress each data block, the data to be compressed needs to be sorted to obtain a Huffman tree, and then each data block is compressed based on the Huffman tree.

Step S303, obtaining a header section of the compressed data according to each obtained data header, obtaining a data section of the compressed data according to each obtained data volume, and obtaining the compressed data according to the header section of the compressed data and the data section of the compressed data.

Specifically, the codec circuit 220 of the encoder circuit 200 obtains a header section of compressed data from each obtained data header, obtains a data section of compressed data from each obtained data volume, and obtains compressed data from the header section of the compressed data and the data section of the compressed data.

In the data compression method in the above embodiment, first, data to be compressed is partitioned, then, each data block is partitioned and compressed, so as to obtain a data header and a data body corresponding to each data block one to one, and then, according to the obtained data header and data body corresponding to each data block, a header section and a data section of compressed data are obtained, so as to obtain compressed data. The method realizes the purpose of compressing each data block in parallel and improving the compression efficiency by compressing the data to be compressed in blocks

In one optional embodiment, as shown in fig. 12, step S303 includes:

step S3031 a: and using the identification bits to identify the corresponding relation between the data head and the data body corresponding to each data block. Specifically, the codec circuit 220 of the encoder circuit 200 uses the identification bits to identify the corresponding relationship between the data header and the data body corresponding to each data block.

Step S3032 a: and combining the data heads containing the identification bits to obtain a head section of the compressed data, combining the data bodies containing the identification bits to obtain a data section of the compressed data, and combining the head section of the compressed data and the data section of the compressed data to obtain the compressed data.

In another alternative embodiment, as shown in fig. 13, step S303 includes:

step S3031 b: and obtaining the corresponding placing format of the data head according to the position relation among the data blocks in the data to be compressed.

Specifically, the compression/decompression circuit 220 of the encoding circuit 200 obtains the corresponding data header placement format according to the position relationship between the data blocks in the data to be compressed. Optionally, the data headers may be placed in a format that places the data header corresponding to each data block according to a positional relationship between the data block corresponding to each data header and another data block.

Step S3032 b: and placing the data heads corresponding to the data blocks according to the placing format of the data heads to obtain head sections of the compressed data, placing the data bodies corresponding to the data blocks according to a first preset placing format to obtain data sections of the compressed data, and combining the head sections of the compressed data and the data sections of the compressed data to obtain the compressed data.

Specifically, the compression/decompression circuit 220 of the encoding circuit 200 places the data heads corresponding to the data blocks according to the placement format of the data heads to obtain the head segments of the compressed data, places the data bodies corresponding to the data blocks according to the first preset placement format to obtain the data segments of the compressed data, and combines the head segments of the compressed data and the data segments of the compressed data to obtain the compressed data.

Optionally, the codec circuit 220 splices the header of the obtained compressed data with the data volume of the compressed data to obtain the compressed data. Optionally, the first preset placing format may be one-dimensional compact, two-dimensional compact, or any dimension of compact placement of the data volume corresponding to each data block. Alternatively, the correspondence between each data body in the data segment of the compressed data and each data head in the header segment of the compressed data may be identified by setting the identification bits.

Optionally, the data to be compressed may be data to be transmitted between each device in the cluster, may be data to be transmitted between a plurality of processors, and may also be data to be transmitted between each operation unit in the operation device. Such as input data to be acquired by the computing device. The following takes the data to be transmitted in each operation unit of the operation device as an example, and specifically describes the application of the data compression method in the above embodiment.

In one embodiment, as shown in fig. 14, another arithmetic device 20 is proposed, the arithmetic device 20 including a master arithmetic unit 300 and a plurality of slave arithmetic units 400 connected to each other. The master operation unit 300 and the plurality of slave operation units 400 are provided with the encoding circuits in the above-described embodiments. Specifically, the master operation unit 300 is used to perform preamble processing on input data and to transfer data with the plurality of slave operation units 400. Specifically, the plurality of slave arithmetic units 400 are configured to perform intermediate operations in parallel using data transmitted from the master arithmetic unit 300 to obtain a plurality of intermediate results, and to transmit the plurality of intermediate results to the master arithmetic unit 300. The main arithmetic unit 300 is also used for performing subsequent processing on a plurality of intermediate results transmitted from the arithmetic unit 400.

As an alternative embodiment, as shown in fig. 15, a data processing method is proposed, where the method is executed by the computing device 20, and specifically includes:

in step S401, the main arithmetic unit receives input data, and block-compresses the input data using the data compression method in the above embodiment to obtain compressed data.

Specifically, the encoding circuit 200 of the main operation unit 300 depends on the characteristics of the input data. Input data is divided into a plurality of data blocks. Then, the encoding circuit 200 compresses each data block of the data to be compressed to obtain a data header and a data body corresponding to each data block. Finally, the encoding circuit 200 obtains a header section of compressed data from each obtained data header, obtains a data section of compressed data from each obtained data volume, and obtains compressed data from the header section of the compressed data and the data section of the compressed data.

In step S402, the master operation unit transfers the obtained compressed data to the slave operation unit.

In step S403, the compressed data is received from the arithmetic unit, and then decompressed to obtain decompressed data.

In step S404, the slave arithmetic unit performs a multiplication operation using the decompressed data to obtain an intermediate result, and transmits the intermediate result to the master arithmetic unit.

In step S405, the main arithmetic unit performs an accumulation and activation operation using the intermediate result to obtain an arithmetic result.

Alternatively, if the operation result is the final operation result, the operation device 20 may terminate the data processing flow. If the operation result is not the final operation result, the operation device 20 can perform the operation of the next stage using the operation result.

The computing device in the above embodiment compresses the input data and transmits the compressed input data to the slave computing unit 400, so that the bandwidth requirement of data transmission between the computing units can be reduced.

In one alternative embodiment, as shown in fig. 16, a plurality of slave arithmetic units 400 of the arithmetic device 20 are distributed in an array; each slave computing unit 400 is connected to another adjacent slave computing unit 400, and the master computing unit 300 is connected to k slave computing units 400 among the plurality of slave computing units 400, where the k slave computing units 400 are: n slave arithmetic units 400 in row 1, n slave arithmetic units 400 in row m, and m slave arithmetic units 400 in column 1. As shown in fig. 16, the K slave arithmetic units 400 include only the n slave arithmetic units 400 in the 1 st row, the n slave arithmetic units 400 in the m th row, and the m slave arithmetic units 400 in the 1 st column, and in other words, the K slave arithmetic units 400 are the slave arithmetic units 400 directly connected to the master arithmetic unit 300 among the plurality of slave arithmetic units 400. Specifically, the K slave arithmetic units 400 are used for forwarding data between the master arithmetic unit 300 and the plurality of slave arithmetic units 400.

Further, the main operation unit 300 may include an active operation circuit, an addition operation circuit, and the encoding circuit in the above embodiments. The activation arithmetic circuit is used for executing activation arithmetic of data in the main arithmetic unit 300; an addition operation circuit for performing an addition operation or an accumulation operation; the encoding circuit is mainly used for compressing data by using the data compression method in the above embodiment. Specifically, the slave operation unit 400 includes a multiplication operation circuit. The multiplication operation circuit is used for executing multiplication operation on the received data block to obtain a product result. Optionally, the slave operation unit 400 may further include an addition operation circuit for performing an addition operation or an accumulation operation. Optionally, the slave operation unit 400 further comprises a forwarding circuit for forwarding the product result to the master operation unit 300.

In this embodiment, step S402 includes: the master arithmetic unit broadcasts or distributes the obtained compressed data to the slave arithmetic units through the K slave arithmetic units 0.

In this embodiment, step S404 includes: the slave arithmetic unit transmits the obtained intermediate result to the master arithmetic unit through the K slave arithmetic units.

In another alternative embodiment, as shown in fig. 17, the operation device 20 may further include a branch operation unit 500, the main operation unit 300 is connected to one or more branch operation units 500, and the branch operation unit 500 is connected to one or more slave operation units 400.

Specifically, the branch arithmetic unit 500 is used for forwarding data between the master arithmetic unit 300 and the slave arithmetic unit 400. The main operation unit 300 may include an active operation circuit, an addition operation circuit, and the encoding circuit in the above embodiments. The activation arithmetic circuit is used for executing activation arithmetic of data in the main arithmetic unit 300; an addition operation circuit for performing an addition operation or an accumulation operation; the encoding circuit is mainly used for compressing data using the data compression method in the above embodiment. Specifically, the slave operation unit 400 includes a multiplication operation circuit. The multiplication operation circuit is used for executing multiplication operation on the received data block to obtain a product result. Optionally, the slave operation unit 400 may further include an addition operation circuit for performing an addition operation or an accumulation operation.

In this embodiment, step S402 includes: the master arithmetic unit broadcasts or distributes the obtained compressed data to the slave arithmetic units through the branch arithmetic unit.

In this embodiment, step S404 includes: the slave arithmetic unit transmits the obtained intermediate result to the master arithmetic unit through the branch arithmetic unit.

In one embodiment, as shown in fig. 18, a neural network operation method is also provided. The neural network operation method may be performed by the operation device 20 in the above embodiment, the operation device 20 forwards data between the master operation unit 300 and the slave operation unit 400 through the branch operation unit 500, and the method includes:

in step S501, the main arithmetic unit acquires broadcast data and distribution data, and divides the distribution data into a sub-data. Alternatively, the broadcast data may be neuron data or weight value, etc., where a is a positive integer. Alternatively, the distribution data may be neuron data or weight value and the like.

In step S502, the main arithmetic unit uses the data compression method in the above embodiment to respectively block and compress the broadcast data and the distribution sub-data to obtain the broadcast compressed data and a distribution compressed data.

Specifically, the encoding circuit of the main arithmetic unit uses the data compression method in the above embodiment to separately block and compress the broadcast data and the distribution sub-data, and obtain the broadcast compressed data and a pieces of distribution compressed data. Specifically, the encoding circuit divides the broadcast data into a plurality of data blocks according to the characteristics of the broadcast data, and then compresses each of the plurality of data blocks of the broadcast data to obtain a data header and a data body corresponding to each data block of the broadcast data. And finally, the coding circuit obtains a header section of the broadcast compressed data according to each obtained data header, obtains a data section of the broadcast compressed data according to each obtained data body, and obtains the broadcast compressed data according to the header section of the broadcast compressed data and the data section of the broadcast compressed data. Specifically, the encoding circuit compresses the distribution sub-data respectively by using the data compression method in the above embodiment to obtain a pieces of distribution compressed data. Further, the encoding circuit divides a certain distribution sub-data of the a sub-data into a plurality of data blocks according to the characteristics of the distribution data. Then, the coding circuit compresses each of the plurality of data blocks of the certain distribution subdata to obtain a distribution data head and a distribution data body corresponding to each data block of the certain distribution subdata. And finally, the coding circuit obtains a head section of the distribution compressed data according to each obtained distribution data head, obtains a data section of the distribution compressed data according to each obtained distribution data body, and obtains the distribution compressed data according to the head section of the distribution compressed data and the data section of the distribution compressed data. The data compression method can obtain a pieces of distribution compressed data by compressing each sub data in the a pieces of sub data.

In step S503, the master arithmetic unit distributes the obtained a pieces of distribution compressed data to the a pieces of slave arithmetic units, and broadcasts the obtained broadcast compressed data to the a pieces of slave arithmetic units.

Alternatively, the master operation unit 300 distributes the resultant a pieces of distribution compressed data to the a pieces of slave operation units 400 through the branch operation unit 500, and broadcasts the resultant broadcast compressed data to the a pieces of slave operation units 400 through the branch operation unit 500. Alternatively, if the number of slave operation units 400 to which the branch operation unit 500 is connected is equal to or greater than a, the master operation unit 300 may distribute the obtained a pieces of distribution compressed data to the a pieces of slave operation units 400 through one or more branch operation units 500, and broadcast the obtained broadcast compressed data to the a pieces of slave operation units 400 through one or more branch operation units 500. Alternatively, if the number of slave arithmetic units 400 to which the branch arithmetic unit 500 is connected is smaller than a, the master arithmetic unit 300 may distribute the resultant a pieces of distribution compressed data to the a pieces of slave arithmetic units 400 through the plurality of branch arithmetic units 500, and broadcast the resultant broadcast compressed data to the a pieces of slave arithmetic units 400 through the plurality of branch arithmetic units 500.

Alternatively, the master operation unit 300 distributes the obtained a distribution compressed data to the a slave operation units 400 through the k slave operation units 400 connected to the master operation unit 300, and broadcasts the obtained broadcast compressed data to the a slave operation units 400 through the k slave operation units 400 connected to the master operation unit 300.

In a specific application, the data transfer of the master operation unit 300 and the plurality of slave operation units 400 using the k slave operation units 400 and the branch operation unit 500 connected to the master operation unit 300 should be determined according to a specific structure of the operation device, and the present application is not particularly limited.

In step S504, each slave arithmetic unit decompresses the broadcast compressed data and the corresponding distribution compressed data to obtain broadcast decompressed data and distribution decompressed data. Specifically, the broadcast compressed data and the corresponding distribution compressed data are decompressed from the encoding circuit of the arithmetic unit to obtain broadcast decompressed data and distribution decompressed data.

In step S505, each slave arithmetic unit performs an arithmetic operation using the corresponding broadcast decompressed data and distribution decompressed data, and obtains an intermediate result. Alternatively, the identity information of the slave arithmetic units may be used to identify intermediate results obtained from each slave arithmetic unit.

In step S506, the a slave arithmetic units respectively send the obtained intermediate results to the master arithmetic unit.

Alternatively, the a slave arithmetic units 400 respectively transmit the obtained intermediate results to the master arithmetic unit 300 through the branch circuits. Alternatively, the a slave arithmetic units 400 transmit the obtained intermediate results to the master arithmetic unit 300 through the k slave arithmetic units 400 connected to the master arithmetic unit 300, respectively.

In step S507, the master operation unit performs an operation using the intermediate results of the a slave operation units to obtain an operation result.

According to the neural network operation method provided by the embodiment, the broadcast data and the distribution data are compressed and then broadcast or distributed, so that the bandwidth requirement of the operation device during neural network operation can be effectively reduced, and the data transmission efficiency among the operation units is improved.

The above-mentioned neural network operation process is specifically described by taking a fully-connected operation in the neural network operation as an example, where the fully-connected operation is an operation process of executing y ═ f (wx + b), where x is a neuron matrix, w is a weight matrix, b is a bias scalar, and f is an activation function, and the activation function may be: sigmoid function, tanh, relu, softmax function. Here, the neuron matrix is used as broadcast data, the weight matrix is used as distribution data, and a specific operation process is shown in fig. 19, and includes:

in step S601, the main operation unit divides the weight matrix w into a weight submatrices.

Step S602, the main operation unit compresses the neuron matrix x by using the data compression method in the above embodiment to obtain neuron compressed data, and the main operation unit compresses the a weight submatrices by using the data compression method in the above embodiment to obtain a weight submatrices compressed data.

In step S603, the master computing unit distributes the obtained a weight submatrix compressed data to a slave computing units, and broadcasts the obtained neuron compressed data to the a slave computing units.

In step S604, the coding circuit of each slave arithmetic unit decompresses the neuron compressed data and the corresponding weight compressed data to obtain neuron decompressed data and weight decompressed data.

In step S605, each slave operation unit performs multiplication and accumulation of the weight-decompressed data and the neuron-decompressed data to obtain a intermediate results, and each slave operation unit transmits the obtained intermediate results to the master operation unit.

In step S606, the main arithmetic unit obtains a final arithmetic result according to the a intermediate results. Specifically, the main operation unit 300 firstly orders the intermediate results to obtain the operation result of wx, then performs the operation of the offset b on the operation result, and finally performs the activation operation through the activation circuit of the main operation unit 300 to obtain the final operation result y.

In the embodiment, when the operation device performs the full-connection operation, the distributed weight submatrix and the broadcasted neuron matrix are compressed first, and then data is distributed or broadcasted, so that the data transmission efficiency between the master operation unit 300 and the slave operation unit 400 can be improved, and the bandwidth requirement of data transmission between the master operation unit 300 and the slave operation unit 400 in the operation process of the neural network is reduced.

In one embodiment, as shown in fig. 20, another data compression method is proposed, which can be executed by the encoding circuit 200 in the above embodiment, and includes:

step S701, dividing the data to be calculated into a plurality of groups according to the number of the encoding circuits, and obtaining a plurality of data to be compressed.

Specifically, the data dividing circuit 210 of the encoding circuit divides the data to be operated into a plurality of groups according to the number of the encoding circuits, and obtains a plurality of data to be compressed. Alternatively, the encoding circuit may be provided on a device in the cluster, or on a processor of a computer device. The encoding circuit may also be provided on an arithmetic unit of the arithmetic device. Alternatively, the number of groups into which the data to be operated is divided is an integer multiple of the number of encoding circuits.

Step S702, dividing each data to be compressed in the plurality of data to be compressed into a plurality of data blocks according to the characteristics of the data to be compressed.

Specifically, the data dividing circuit 210 of the encoding circuit divides each of the plurality of data to be compressed into a plurality of data blocks according to the characteristics of the data to be compressed, respectively. Optionally, the characteristics of the data to be compressed may include one or more of a total size of the data, a distribution characteristic of the data, an importance degree of the data, and the like. Wherein the importance of the data can be determined according to the frequency of occurrence of the data, the size of the data, and the like. Optionally, a preset value may also be considered when dividing the data to be compressed into a plurality of data blocks. Optionally, the "0" value in the data to be compressed is sifted out before dividing the data to be compressed into a plurality of data blocks.

Step S703 is to compress each data block in each data to be compressed, respectively, to obtain a data header and a data volume corresponding to each data block in each data to be compressed. The data header includes information such as a start address and a data length of the corresponding data body. The data volume contains encoded data of the corresponding data block before compression. Optionally, the codec 220 may compress the data blocks using huffman coding, run-length coding, LZ77, any combination thereof, and the like. Optionally, the data to be processed is pre-processed according to a selected compression algorithm before the codec 220 compresses the individual data blocks using huffman coding, run-length coding, LZ77, any combination thereof, and the like. For example, when Huffman coding is used to compress each data block, the data to be compressed needs to be sorted to obtain a Huffman tree, and then each data block is compressed based on the Huffman tree.

Step S704, obtaining a header section of the compressed data according to all the obtained data headers, obtaining a data section of the compressed data according to all the obtained data volumes, and obtaining the compressed data according to the header section of the compressed data and the data section of the compressed data.

Specifically, the codec circuit 220 of the encoder circuit 200 obtains a header section of the compressed data from all the obtained data headers, obtains a data section of the compressed data from all the obtained data volumes, and obtains the compressed data from the header section of the compressed data and the data section of the compressed data.

In the data compression method in the above embodiment, first, data to be compressed is grouped according to the number of the encoding circuits, then, the data to be compressed is blocked according to characteristics of the data, then, each data block is blocked and compressed, so as to obtain a data header and a data body corresponding to each data block one to one, and then, a header section and a data section of the compressed data are obtained according to the obtained data header and data body corresponding to each data block, so as to obtain the compressed data. The method realizes the purpose of compressing each data block in parallel and improving the compression efficiency by grouping and compressing the data to be compressed in blocks.

In one optional embodiment, as shown in fig. 21, step S704 includes:

step S7041 a: and using the identification bits to identify the corresponding relation between the data head and the data body corresponding to each data block. Specifically, the codec circuit 220 of the encoder circuit 200 uses the identification bits to identify the corresponding relationship between the data header and the data body corresponding to each data block.

Step S7042 a: and combining the data heads containing the identification bits to obtain a head section of compressed data, combining the data bodies containing the identification bits to obtain a data section of the compressed data, and combining the head section of the compressed data and the data section of the compressed data respectively to obtain the compressed data.

Specifically, the codec 220 of the encoder 200 combines the data headers containing the identification bits to obtain a header section of the compressed data, combines the data volumes containing the identification bits to obtain a data section of the compressed data, and combines the header section of the compressed data and the data section of the compressed data to obtain the compressed data.

In another alternative embodiment, as shown in fig. 22, step S704 includes:

step S7041 b: and obtaining the placing format of each data head according to the position relationship among the data to be compressed and the position relationship between each data block and other data blocks in the data to be compressed.

Specifically, the codec 220 of the encoder 200 obtains the placement format of each data header according to the position relationship between each data to be compressed and the position relationship between each data block and other data blocks in each data to be compressed. Optionally, the placing format of each data header may be consistent with the position of the corresponding data block in the data to be operated. The position consistency refers to that the relative positions of the data blocks to be operated in the data block are consistent, and the relative positions of the data to be operated in the data block are consistent with the relative positions of other data to be operated.

Step S7042 b: placing each data head according to the obtained placing format of the data head to obtain a head section of the compressed data; and placing each data body according to a first preset placing format to obtain a data section of the compressed data, and combining a head section of the compressed data and the data section of the compressed data to obtain the compressed data.

Specifically, the compression/decompression circuit 220 of the encoding circuit 200 places each data header according to the obtained placement format of the data header to obtain a header section of the compressed data; and placing each data body according to a first preset placing format to obtain a data section of the compressed data, and combining a head section of the compressed data and the data section of the compressed data to obtain the compressed data.

Optionally, the codec circuit 220 splices the obtained header of each compressed data with the corresponding data volume of each compressed data to obtain each compressed data. Optionally, the first preset placing format of the data volume in the data segment of each compressed data may be one-dimensional compact, two-dimensional compact, or any dimension of compact placing of the data volume corresponding to each data block. Alternatively, the correspondence between each data body in the data segment of the compressed data and each data head in the header segment of the compressed data may be identified by setting the identification bits.

Optionally, the data to be compressed may be data to be transmitted between each device in the cluster, may be data to be transmitted between a plurality of processors, and may also be data to be transmitted between each operation unit in the operation device. Such as input data to be acquired by the computing device. The following takes the data to be transmitted in each operation unit of the operation device as an example, and specifically describes the specific compression process and application of the data compression method in the above embodiment.

As an alternative implementation, as shown in fig. 23, a data compression method is proposed, where the method is executed by the computing device 20, and specifically includes:

in step S801, the main arithmetic unit receives input data and compresses the input data using the data compression method in the above embodiment to obtain a plurality of compressed data.

Specifically, the encoding circuit of the master operation unit 300 receives input data, and then groups the input data by the number of slave operation units 400. It should be clear that the number of slave arithmetic units 400 is the number of slave arithmetic units 400 that perform the data compression operation.

In step S802, the master arithmetic unit distributes the obtained plurality of data to be compressed to the plurality of slave arithmetic units.

Alternatively, the master operation unit 300 may add an identity to the slave operation unit 400 to the obtained plurality of compressed data, and distribute the data to be compressed according to the identity.

In step S803, the encoding circuit of each slave arithmetic unit divides the data to be compressed into blocks according to the characteristics of the received data to be compressed, and obtains a plurality of data blocks of the data to be compressed.

Step S804, the encoding circuit of each slave arithmetic unit compresses the obtained multiple data blocks to obtain a data header and a data body corresponding to each data block in each data to be compressed.

In step S805, each slave arithmetic unit transmits the obtained data header and data body to the main processing circuit. The main processing circuit obtains a header section of the compressed data according to all the obtained data heads, obtains a data section of the compressed data according to all the obtained data bodies, and obtains the compressed data according to the header section of the compressed data and the data section of the compressed data.

Alternatively, the encoding circuit 200 of the main operation unit 300 uses the identification bits to identify the corresponding relationship between the data header and the data body corresponding to each data block. Then, the encoding circuit of the main arithmetic unit 300 combines the data headers containing the identification bits to obtain a header section of the compressed data, combines the data volumes containing the identification bits to obtain a data section of the compressed data, and combines the header section of the compressed data and the data section of the compressed data to obtain the compressed data.

Optionally, the encoding circuit 200 of the main operation unit 300 first obtains the placement format of each data header according to the positional relationship between each data to be compressed and the positional relationship between each data block and other data blocks in each data to be compressed. Then, the encoding circuit 200 of the main arithmetic unit 300 places each data header according to the obtained placement format of the data header to obtain a header section of the compressed data; and placing each data body according to a first preset placing format to obtain a data section of the compressed data, and combining a head section of the compressed data and the data section of the compressed data to obtain the compressed data.

The data compression method in the embodiment can perform packet compression on input data in parallel, and improves the data compression efficiency.

As an alternative embodiment, as shown in fig. 24, another data processing method is proposed, where the method is executed by the computing device 20, and specifically includes:

in step S901, the main arithmetic unit receives input data and compresses the input data by using the data compression method in the above embodiment, so as to obtain a plurality of compressed data.

Specifically, after receiving the input data, the encoding circuit of the master operation unit 300 groups the input data according to the number of the slave operation units 400 to obtain a plurality of data to be compressed. The master operation unit 300 distributes the obtained plurality of data to be compressed to a plurality of slave processing units to perform parallel block compression to obtain a plurality of data heads and data bodies. The slave processing unit sends the obtained data heads and the data bodies to the main processing circuit, the main processing circuit obtains a head section of compressed data according to all the obtained data heads, obtains a data section of the compressed data according to all the obtained data bodies, and obtains the compressed data according to the head section of the compressed data and the data section of the compressed data.

In step S902, the master operation unit transmits the obtained compressed data to the plurality of slave operation units.

Alternatively, the master operation unit 300 transmits the resulting compressed data to the plurality of slave operation units 400 through the branch operation unit 500. Alternatively, the master operation unit 300 transmits the resulting compressed data to the plurality of slave operation units 400 through the k slave operation units 400 connected to the master operation unit 300. It should be noted that, in a specific application, whether the k slave arithmetic units 400 or the branch arithmetic unit 500 connected to the master arithmetic unit 300 is used to perform data transfer of the master arithmetic unit 300 and the plurality of slave arithmetic units 400 should be determined according to a specific structure of the arithmetic device, and the present application is not particularly limited.

In step S903, the plurality of slave encoder circuits decompress the received compressed data to obtain decompressed data.

In step S904, the multiplication unit of each slave unit performs multiplication using the decompressed data to obtain an intermediate result, and transmits the intermediate result to the master unit.

Alternatively, the slave arithmetic unit 400 sends the resulting intermediate result to the master arithmetic unit 300 through the branch arithmetic unit 500. Alternatively, each slave arithmetic unit 400 transmits the obtained intermediate result to the master arithmetic unit 300 through k slave arithmetic units 400 connected to the master arithmetic unit 300. In a specific application, the data transfer of the master operation unit 300 and the plurality of slave operation units 400 using the k slave operation units 400 and the branch operation unit 500 connected to the master operation unit 300 should be determined according to a specific structure of the operation device, and the present application is not particularly limited.

In step S905, the master operation unit performs an accumulation and activation operation using the intermediate result to obtain an operation result.

In one embodiment, as shown in fig. 25, a neural network operation method is also provided. The neural network operation method may be performed by the operation device 20 in the above embodiment, the operation device 20 forwards data between the master operation unit 300 and the slave operation unit 400 through the branch operation unit 500, and the method includes:

in step S1001, the main arithmetic unit acquires broadcast data and distribution data, and divides the distribution data into a sub-data. Alternatively, the broadcast data may be neuron data or weight value and the like. Alternatively, the distribution data may be neuron data or weight value and the like.

In step S1002, the arithmetic device compresses the broadcast data and the distribution sub-data respectively by using the data compression method in any of the embodiments described above, so as to obtain the broadcast compressed data and a pieces of distribution compressed data.

Specifically, the arithmetic device 20 includes a main arithmetic unit 300 for compressing the broadcast data by using the data compression method in the above embodiment, to obtain the broadcast compressed data. Further, the encoding circuit of the master operation unit 300 divides the broadcast data into a plurality of groups according to the number of the slave operation units 400, resulting in a plurality of data to be compressed. The master operation unit 300 distributes a plurality of data to be compressed to a plurality of slave operation units 400. Each slave arithmetic unit 400 in the plurality of slave arithmetic units 400 performs block compression on the received data to be compressed, and obtains a plurality of data headers and data bodies. The plurality of slave arithmetic units 400 transmit the obtained data header and data body to the main processing unit. The main processing unit obtains a header section of the compressed data according to all the obtained data headers, obtains a data section of the compressed data according to all the obtained data volumes, and obtains the broadcast compressed data according to the header section of the compressed data and the data section of the compressed data. And processing the a sub data in the same way to obtain a distributed compressed data.

In step S1003, the master arithmetic unit distributes the obtained a pieces of distribution compressed data to a pieces of slave arithmetic units, and broadcasts the obtained broadcast compressed data to the a pieces of slave arithmetic units.

Specifically, the master operation unit 300 distributes the resultant a pieces of distribution compressed data to the a pieces of slave operation units 400 through the branch operation unit 500, and the master operation unit 300 broadcasts the resultant broadcast compressed data to the a pieces of slave operation units 400 through the branch operation unit 500. Alternatively, the master operation unit 300 distributes the obtained a pieces of distribution compressed data to the a pieces of slave operation units 400 through the k pieces of slave operation units 400 connected to the master operation unit 300, and the master operation unit 300 broadcasts the obtained broadcast compressed data to the a pieces of slave operation units 400 through the k pieces of slave operation units 400 connected to the master operation unit 300.

In step S1004, the encoding circuit of each slave arithmetic unit decompresses the broadcast compressed data and the corresponding distribution compressed data to obtain broadcast decompressed data and distribution decompressed data.

In step S1005, each slave arithmetic unit 400 performs arithmetic operation using the corresponding broadcast decompressed data and distribution decompressed data to obtain an intermediate result, and transmits the obtained intermediate result to the main processing unit.

In step S1006, the main arithmetic unit 300 performs an arithmetic operation using the received intermediate result, and obtains an arithmetic result.

The operation method of the neural network in the embodiment has high data compression efficiency and low requirement on the bandwidth of data transmission of the operation device.

The above-mentioned neural network operation process is specifically described by taking a fully-connected operation in the neural network operation as an example, where the fully-connected operation is an operation process of executing y ═ f (wx + b), where x is a neuron matrix, w is a weight matrix, b is a bias scalar, and f is an activation function, and the activation function may be: sigmoid function, tanh, relu, softmax function. Here, the neuron matrix is used as broadcast data, the weight matrix is used as distribution data, and a specific operation process is shown in fig. 26, and includes:

in step S1101, the main operation unit divides the weight matrix w into n weight sub-matrices.

Step S1102, the compression/decompression circuit of the main operation unit compresses the neuron matrix x and the n weight submatrices respectively by using the data compression method in the above embodiment, so as to obtain neuron compressed data and a submatrix compressed data.

In step S1103, the master operation unit distributes the obtained a sub-matrix compressed data to a slave operation units, and broadcasts the obtained neuron compressed data to the a slave operation units.

In step S1104, the coding circuit of each slave arithmetic unit decompresses the neuron compressed data and the corresponding weight compressed data to obtain neuron decompressed data and weight decompressed data.

In step S1105, each slave arithmetic unit transmits the obtained intermediate result to the master arithmetic unit.

In step S1106, the main operation unit obtains a final operation result according to the received intermediate result. Specifically, the main operation unit 300 firstly orders the received a intermediate results to obtain the operation result of wx, then performs the operation of the offset b on the operation result, and finally performs the activation operation through the activation circuit of the main operation unit 300 to obtain the final operation result y.

In one embodiment, as shown in fig. 27, a data decompression method is provided, where the data compression method can be executed by the encoding circuit in the above embodiment, and is used for decompressing compressed data obtained by the data compression method, where the method includes:

step S1201, obtaining compressed data, where the compressed data includes a header segment and the header segment data segment. The header section comprises a plurality of data headers, and the data section comprises a plurality of data bodies corresponding to the data headers.

Specifically, the encoding circuit obtains compressed data, wherein the compressed data includes a header segment and a data segment. The header section comprises a plurality of data headers, and the data section comprises a plurality of data bodies corresponding to the data headers. The data header includes information such as a start address and a data length of the corresponding data body. The data volume contains encoded data of the corresponding data block before compression.

Optionally, the layout format of each data volume in the data segment of the compressed data may be one-dimensional compact, two-dimensional compact, or any dimension of compact layout of the data volume corresponding to each data block. Optionally, each data header in the header section of the compressed data is correspondingly placed according to the relative position between the plurality of data blocks contained before the compressed data is compressed.

Step S1202, decomposing the compressed data to obtain a plurality of data blocks to be decompressed, where the data blocks to be decompressed include a data header and a corresponding data body.

Specifically, the data dividing circuit 210 of the encoding circuit decomposes the compressed data to obtain a plurality of data blocks to be decompressed, where the data blocks to be decompressed include a data header and a corresponding data body.

Optionally, if the data header and the data body include an identification bit identifying a corresponding relationship, the data header and the data body in each data block to be decompressed are determined according to a numerical value of the identification bit.

In step S1203, each to-be-decompressed data block is decompressed by using a preset codec, so as to obtain a decompressed data block.

Specifically, the codec 220 decompresses each block of data to be decompressed using a predetermined codec to obtain a decompressed data block. Alternatively, the codec 220 may decompress the individual blocks using huffman coding, run-length coding, LZ77, any combination thereof, and the like. It should be noted that the encoding method for decompressing the compressed data and the encoding method used to obtain the compressed data need to be the same.

Step S1204, placing the decompressed data blocks according to a second preset placing format to obtain decompressed data.

Optionally, the second preset placing format may be obtained according to data before compression of the compressed data. Further, the second preset placing format may be obtained according to a position relationship between data blocks included before compression of the compressed data.

In the data decompression method in this embodiment, compressed data is first decomposed to obtain data blocks to be decompressed, which include corresponding data headers and data volumes, then each data block to be decompressed is decompressed by using a conventional decompression method to obtain a decompressed data block, and finally decompressed data is obtained according to the decompressed data block. The method converts the compressed data comprising the header section and the data section into data which can be decompressed by the traditional decompression method, and is simple to implement. The method also decompresses the compressed data in blocks, can realize parallel decompression and improve the decompression efficiency.

In one embodiment, as shown in fig. 28, another data decompression method is proposed for decompressing compressed data obtained by the data compression method, and the method includes:

step S1301, obtaining compressed data, where the compressed data includes a header and the header data segment. The header section comprises a plurality of data headers, and the data section comprises a plurality of data bodies corresponding to the data headers.

Specifically, the encoding circuit 200 obtains compressed data, wherein the compressed data includes a header segment and a data segment. The header section comprises a plurality of data headers, and the data section comprises a plurality of data bodies corresponding to the data headers. The data header includes information such as a start address and a data length of the corresponding data body. The data volume contains encoded data of the corresponding data block before compression.

Step S1302, decomposing the compressed data to obtain a plurality of data blocks to be decompressed, where the data blocks to be decompressed include a data header and a corresponding data body.

Specifically, the data dividing circuit of the encoding circuit 200 decomposes the compressed data to obtain a plurality of data blocks to be decompressed, where the data blocks to be decompressed include a data header and a corresponding data body.

Step S1303, grouping the obtained multiple data blocks to be decompressed according to the number of the encoding circuits, so as to obtain multiple data groups to be decompressed.

Step S1304, distributing the obtained data group to be decompressed to a plurality of coding circuits, where the coding circuits decompress the data blocks to be decompressed in the received data group to be decompressed according to a preset codec to obtain a plurality of decompressed data blocks.

Step 1305, placing the decompressed data blocks according to a second preset placing format to obtain decompressed data.

In the data decompression method in this embodiment, compressed data is first decomposed to obtain data blocks to be decompressed, which include corresponding data headers and data volumes, and then the data blocks to be decompressed are grouped according to the number of coding circuits, and then each data block to be decompressed is decompressed by using a conventional decompression method to obtain a decompressed data block, and finally decompressed data is obtained according to the decompressed data block. The method converts the compressed data comprising the header section and the data section into data which can be decompressed by the traditional decompression method, and is simple to implement. The method also decompresses the compressed data in blocks, can realize parallel decompression and improve the decompression efficiency.

The following describes the data decompression method specifically by taking as an example how the arithmetic device 20 executes the steps of the data decompression method, and as shown in fig. 29, the data decompression method includes:

step S1401: the main operation unit acquires compressed data, wherein the compressed data comprises a head segment and a data segment of the head segment. The header section comprises a plurality of data headers, and the data section comprises a plurality of data bodies corresponding to the data headers.

Step S1402: the coding circuit of the main operation unit decomposes the compressed data to obtain a plurality of data blocks to be decompressed, wherein the data blocks to be decompressed comprise a data head and a corresponding data body.

Step S1403: and the coding circuit of the main operation unit groups the obtained data blocks to be decompressed according to the number of the slave operation units to obtain a plurality of data groups to be decompressed.

Step S1404: the master operation unit distributes the plurality of data groups to be decompressed to the plurality of slave processing units.

Alternatively, the master operation unit 300 transmits the resultant plurality of data groups to be decompressed to the plurality of slave operation units 400 through the branch operation unit 500. Alternatively, the master operation unit 300 transmits the resulting plurality of data groups to be decompressed to the plurality of slave operation units 400 through the k slave operation units 400 connected to the master operation unit 300. It should be noted that, in a specific application, whether the k slave arithmetic units 400 or the branch arithmetic unit 500 connected to the master arithmetic unit 300 is used to perform data transfer of the master arithmetic unit 300 and the plurality of slave arithmetic units 400 should be determined according to a specific structure of the arithmetic device, and the present application is not particularly limited.

Step S1405: and each slave processing unit coding circuit decompresses the data blocks to be decompressed in the received data group to be decompressed according to a preset compression and decompression algorithm to obtain a plurality of decompressed data blocks.

Step S1406: each slave processing unit sends the resulting plurality of decompressed data blocks to the master processing circuit.

Alternatively, the slave operation unit 400 transmits the resulting plurality of decompressed data blocks to the master operation unit 300 through the branch operation unit 500. Alternatively, each slave arithmetic unit 400 transmits the resulting plurality of decompressed data blocks to the master arithmetic unit 300 through k slave arithmetic units 400 connected to the master arithmetic unit 300. In a specific application, the data transfer of the master operation unit 300 and the plurality of slave operation units 400 using the k slave operation units 400 and the branch operation unit 500 connected to the master operation unit 300 should be determined according to a specific structure of the operation device, and the present application is not particularly limited.

Step S1407: and placing the decompressed data blocks according to a second preset placing format to obtain decompressed data. Optionally, the second preset placing format may be obtained according to data before compression of the compressed data. Further, the second preset placing format may be obtained according to a position relationship between data blocks included before compression of the compressed data.

The data decompression method in the above embodiment uses a plurality of slave processing circuits to decompress compressed data in parallel, thereby improving the data decompression efficiency.

It should be understood that, although the respective steps in the flowcharts of fig. 2, 5, 7, 9, 11-13, 15, 18-29 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 5, 7, 9, 11-13, 15, 18-29 may include multiple sub-steps or phases that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or phases is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or phases of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of data decompression, comprising:

the method comprises the steps that a main operation unit obtains compressed data, wherein the compressed data comprises a header section and a header section data section, the header section comprises a plurality of data heads, the header section data section comprises a plurality of data bodies corresponding to the data heads, the data heads comprise the initial addresses and the data lengths of the corresponding data bodies, and the data bodies comprise encoded data of corresponding data blocks before compression;

the main operation unit decomposes the compressed data based on at least one of the total size of the data, the data distribution characteristics and the importance degree of the data to obtain a plurality of data blocks to be decompressed, wherein the data blocks to be decompressed comprise a data head and a corresponding data body;

the master operation unit distributes the data blocks to be decompressed to the slave operation units; the slave operation unit decompresses each data block to be decompressed by using a preset compression and decompression algorithm to obtain a decompressed data block; the decompressed data blocks are used for placing data heads in the decompressed data blocks according to a second preset placing format to obtain decompressed head sections, placing data bodies in the decompressed data blocks according to the second preset placing format to obtain decompressed head section data sections, and obtaining decompressed data based on the decompressed head sections and the decompressed head section data sections; the slave operation unit carries out operation based on the decompressed data to obtain an intermediate result, and the intermediate result is sent to the main operation unit;

and the main operation unit operates the intermediate result to obtain an operation result, the operation result is used for determining whether the intermediate result is matched with a preset operation result, and if the intermediate result is matched with the preset operation result, the flow of the data decompression method is stopped.

2. The method according to claim 1, wherein the second predetermined layout format is obtained according to a position relationship between data blocks included before the compression of the compressed data.

3. The method according to claim 1, wherein the decomposing of the compressed data by the main arithmetic unit based on at least one of the total size of the data, the distribution characteristics of the data, and the importance of the data to obtain a plurality of data blocks to be decompressed comprises:

and if the data head and the data body contain identification bits identifying corresponding relations, the main operation unit determines the data head and the data body in each data block to be decompressed according to the numerical value of the identification bits based on at least one of the total size of data, the data distribution characteristics and the importance degree of the data.

4. The method according to claim 1, wherein the predetermined codec comprises: huffman coding, run-length coding and LZ 77.

5. A method of data decompression, comprising:

the main operation unit groups the obtained data blocks to be decompressed according to the number of the coding circuits to obtain a plurality of data groups to be decompressed;

the main operation unit distributes the obtained data group to be decompressed to coding circuits of a plurality of slave operation units, so that the coding circuits decompress the received data blocks to be decompressed in the data group to be decompressed according to a preset compression and decompression algorithm to obtain a plurality of decompressed data blocks; the decompressed data blocks are used for placing data heads in the decompressed data blocks according to a second preset placing format to obtain decompressed head sections, placing data bodies in the decompressed data blocks according to the second preset placing format to obtain decompressed head section data sections, and obtaining decompressed data based on the decompressed head sections and the decompressed head section data sections; the slave operation unit carries out operation based on the decompressed data to obtain an intermediate result, and the intermediate result is sent to the main operation unit;

6. The method according to claim 5, wherein the main arithmetic unit groups the obtained data blocks to be decompressed according to the number of coding circuits to obtain a plurality of data groups to be decompressed, and comprises:

if the number of the coding circuits is n, the main operation unit divides the data blocks to be decompressed into m groups, wherein m is an integral multiple of n.

7. The method according to claim 5, wherein the decomposing the compressed data into a plurality of data blocks to be decompressed by the main arithmetic unit based on at least one of a total data size, a data distribution characteristic, and a degree of importance of the data comprises:

8. The method of claim 5, wherein the second predetermined layout format is obtained according to a position relationship between data blocks included before the compression of the compressed data.

9. The method of claim 1 or 5, wherein the layout format of each data volume in the header data segment of the compressed data is one-dimensional compact, two-dimensional compact, or compact in any dimension.

10. An encoding circuit, comprising: a data dividing circuit and a compression/decompression circuit connected with each other,

the data dividing circuit is used for acquiring compressed data through a main operation unit, wherein the compressed data comprises a header section and a header section data section, the header section comprises a plurality of data heads, the header section data section comprises a plurality of data bodies corresponding to the data heads, the data heads comprise the initial addresses and the data lengths of the corresponding data bodies, and the data bodies comprise the coded data of the corresponding data blocks before compression; decomposing the compressed data by the main operation unit based on at least one of the total size of the data, the data distribution characteristics and the importance degree of the data to obtain a plurality of data blocks to be decompressed, wherein the data blocks to be decompressed comprise a data head and a corresponding data body; distributing the plurality of data blocks to be decompressed to a slave arithmetic unit through the master arithmetic unit;

the compression and decompression circuit is used for decompressing each data block to be decompressed by using a preset compression and decompression algorithm from the operation unit to obtain a decompressed data block; the decompressed data blocks are used for placing data heads in the decompressed data blocks according to a second preset placing format to obtain decompressed head sections, placing data bodies in the decompressed data blocks according to the second preset placing format to obtain decompressed head section data sections, obtaining decompressed data based on the decompressed head sections and the decompressed head section data sections, calculating based on the decompressed data through the slave operation unit to obtain intermediate results, sending the intermediate results to the main operation unit, calculating the intermediate results through the main operation unit to obtain operation results, and determining whether the operation results are matched with preset operation results or not, and stopping a data decompression flow if the operation results are matched with the preset operation results.