CN111385580A - Data decompression device and related product - Google Patents

Data decompression device and related product Download PDF

Info

Publication number
CN111385580A
CN111385580A CN201811609579.6A CN201811609579A CN111385580A CN 111385580 A CN111385580 A CN 111385580A CN 201811609579 A CN201811609579 A CN 201811609579A CN 111385580 A CN111385580 A CN 111385580A
Authority
CN
China
Prior art keywords
data
decompression
circuit
decoding
pipeline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811609579.6A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201811609579.6A priority Critical patent/CN111385580A/en
Priority to PCT/CN2019/121056 priority patent/WO2020114283A1/en
Publication of CN111385580A publication Critical patent/CN111385580A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements

Abstract

The application relates to a data decompression device and a related product, comprising: at least one decompression pipeline, and each decompression pipeline includes at least two stages of pipelined decompression data units, the pipelined decompression data units including: a decoding circuit, a selection circuit and a bypass channel; the decompression pipeline is used for realizing multi-stage decompression processing on input data; the selection circuit is used for determining input data output to a decoding circuit in the next stage of the pipeline data decompression unit according to the input control signal. The data decompression device provided by the application can realize the decompression processing of the data by flexibly configuring the decoding mode, and improves the decompression accuracy of the data.

Description

Data decompression device and related product
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a data decompression device and a related product.
Background
With the continuous development of information technology, especially with the continuous development of various machine learning algorithms, the data volume is increasing day by day, and in the process of transmitting data and processing data, people have higher and higher requirements on the data access and data processing speed.
At present, in order to meet the above requirements, in the process of transmitting and processing data, a common decoding method, such as Huffman decoding, run length decoding, LZW decoding, arithmetic decoding, etc., is usually adopted, and a decoding hardware circuit corresponding to the decoding method is designed to implement decompression processing on the data, so that the decompressed data can be restored to the original data before compression operation, and thus, a data transmission circuit or a data processing circuit in each application system can perform corresponding operation on the original data. In practical applications, the decompression processing of data is usually implemented by using one of the decoding circuits described above.
However, the above decoding method often has the problems of inflexible decoding method and low decompression accuracy.
Disclosure of Invention
The application provides a data decompression device and related products, can realize nimble configuration decoding mode, decompress the processing to data to can improve data decompression's accuracy.
In a first aspect, an embodiment of the present application provides a data decompression apparatus, where the data decompression apparatus includes at least one decompression pipeline, each decompression pipeline includes at least two stages of pipelined decompression data units, and each pipelined decompression data unit includes: a decoding circuit, a selection circuit and a bypass channel; decoding modes of decoding circuits in the pipeline decompression data units of each stage are different; the output end of the decoding circuit is connected with the input end of the selection circuit in the same-stage pipeline decompression data unit on the current decompression pipeline; the output end of the selection circuit is respectively connected with one end of a bypass channel in the next-stage pipeline data decompression unit on the current decompression pipeline and the input end of a decoding circuit in the next-stage pipeline data decompression unit on the current decompression pipeline, and the other end of the bypass channel is connected with the input end of the selection circuit in the next-stage pipeline data decompression unit on the current decompression pipeline;
the decompression pipeline is used for realizing multi-stage decompression processing on input data;
the selection circuit is used for determining input data output to a decoding circuit in the next stage of the pipeline data decompression unit according to the input control signal.
In a second aspect, an embodiment of the present application provides a computing apparatus for performing machine learning calculation, the computing apparatus including an arithmetic unit and a control unit; the arithmetic unit includes: a master processing circuit and a plurality of slave processing circuits; the main processing circuit includes: the data decompression device according to the first aspect, and a main arithmetic circuit; the slave processing circuit includes: the data decompression device according to the first aspect, and a slave arithmetic circuit;
the control unit is used for acquiring original data, an operation instruction and a control instruction and sending the original data, the operation instruction and the control instruction to the main processing circuit;
the master processing circuit is used for performing compression processing on the original data and transmitting data and operation instructions with the plurality of slave processing circuits;
the plurality of slave processing circuits are used for decompressing the data transmitted by the main processing circuit, executing intermediate operation in parallel according to the decompressed data and the operation instruction to obtain a plurality of intermediate results, and sending the plurality of intermediate results to the main processing circuit;
in a third aspect, an embodiment of the present application provides a machine learning chip, which includes the computing device of the second aspect.
In a fourth aspect, an embodiment of the present application provides a chip packaging structure, where the chip packaging structure includes the machine learning chip described in the third aspect.
In a fifth aspect, an embodiment of the present application provides a board card, where the board card includes the chip packaging structure described in the fourth aspect.
In a sixth aspect, an embodiment of the present application provides an electronic device, which includes the board card described in the fifth aspect.
In the data decompression device and the related products, each decompression pipeline in the data decompression device comprises at least two stages of pipeline decompression data units, and the decoding modes of the decoding circuits in the pipeline decompression data units are different, so the data decompression device can realize the multi-stage decompression processing of the data compressed by the multi-stage different compression modes, meanwhile, the selection circuit in the pipeline decompression data units can select whether to output the data output by each decoding circuit or not by setting different control signals, realize the combination of a plurality of decoding circuits, and decompress the input data by adopting the combined decoding circuit, so that the data decompression device can flexibly configure the corresponding decoding modes to decompress the input compressed data according to the compression modes adopted during data compression, thereby improving the accuracy of the decompression.
In addition, the data decompression device comprises at least one decompression pipeline, and can simultaneously decompress a plurality of input parallel data, so that the data decompression device provided by the application can further improve the speed of parallel data processing.
Drawings
Fig. 1 is a schematic diagram of a data decompression device according to an embodiment;
fig. 2 is a schematic diagram of a data decompression device according to an embodiment;
fig. 2A is a schematic diagram of a data decompression device according to an embodiment;
fig. 3 is a schematic diagram of a data decompression device according to an embodiment;
fig. 4 is a schematic diagram of a data decompression device according to an embodiment;
FIG. 5 is a schematic diagram of a computing device provided by an embodiment;
FIG. 6 is a schematic diagram of a computing device provided by an embodiment;
fig. 7 is a schematic diagram of a board card according to an embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Fig. 1 is a schematic diagram of a data decompression device according to an embodiment. The data decompression device is used for realizing decompression processing operation on the received compressed data. As shown in fig. 1, the data decompression apparatus includes: at least one decompression pipeline (01, 02, 03. cndot. cndot.) each comprising at least two stages of pipelined decompression data units (10, 11, 12. cndot. cndot.); the pipeline decompression data units of each stage comprise: decoding circuits (100, 110, 120. cndot. cndot.,. cndot. The output end of the decoding circuit is connected with the input end of the selection circuit in the same stage pipeline decompression data unit on the current decompression pipeline; the output end of the selection circuit is respectively connected with one end of a bypass channel in the next-stage pipeline data decompression unit on the current decompression pipeline and the input end of a decoding circuit in the next-stage pipeline data decompression unit on the current decompression pipeline, and the other end of the bypass channel is connected with the input end of the selection circuit in the next-stage pipeline data decompression unit on the current decompression pipeline; the decompression pipeline is used for realizing multi-stage decompression processing on input data; the selection circuit is used for determining input data output to a decoding circuit in the next stage of the pipeline data decompression unit according to the input control signal.
It should be noted that the structure shown in fig. 1 is an alternative, and the application environment is as follows: when the data decompression device receives a plurality of data at the same time and needs to decompress the data at the same time, the data decompression device can comprise a plurality of parallel decompression pipelines to realize the simultaneous decompression processing of the input data, each decompression pipeline can comprise a plurality of stages of flow decompression data units, and the flow decompression data units on each decompression pipeline can be set to be in a cascade connection relationship in sequence to realize the multi-stage decompression processing of the data. In this embodiment, the multi-stage pipeline decompression data units on the multiple parallel decompression pipelines may be configured to perform simultaneous decompression processing operations on different input data, and output a result after decompression processing at the output end of each decompression pipeline.
Optionally, the data decompression apparatus shown in fig. 1 may further include only one decompression pipeline, which is used to implement multistage decompression processing on input single data, and this data decompression apparatus may be applicable to an application scenario of serial transmission data. The following embodiments will be explained in terms of the structure of such a data decompression apparatus.
In the data decompression device, the decoding circuit is configured to decompress the input data in a preset decoding manner and output the decompressed data. The decoding mode may include multiple decoding modes, and a user may select a corresponding decoding mode, that is, a corresponding decoding circuit, according to an actual application requirement, to implement decompression processing on data. In this embodiment, the decoding modes of the decoding circuits in the pipelined data decompression units of each stage are different, so that the data decompression apparatus provided by the present application can select different decoding circuits to perform decompression processing on compressed data according to application requirements. For example, the decoding modes adopted by the decoding circuit 100, the decoding circuit 110, the decoding circuit 120, and the like in the figure are all different, and a user may select only the decoding circuit 100, only the decoding circuit 110, or both the decoding circuit 100 and the decoding circuit 110 according to actual needs.
Optionally, the bypass channel is a hardware line for implementing physical direct connection, and may be a bypass, or optionally, may also be a pass circuit. The bypass path 112 in this embodiment directly connects the selection circuit 101 of the previous stage and the selection circuit 111 of the present stage, and transmits the output data of the selection circuit 101 of the previous stage to the selection circuit 111 of the present stage.
Optionally, the selection circuit may be a 2-to-1 selector, which may include two data input ports, a control signal input port and a data output port, and the selector may be specifically configured to gate one of the two data input ports according to a control signal received by the control signal input port, so that the data output port of the selector may output data received by one of the data input ports. The control signal may be a strobe signal for strobing the two data input ports of the selection circuit. It may be a high-low level signal, for example, a high level signal 1, and a low level signal 0, and assuming that the selection circuit has two data input ports, i.e., a 1# port and a 0# port, and the 1# port corresponds to the high level signal 1 and the 0# port corresponds to the low level signal 0, the high level signal 1 may control the selection circuit to gate on the data on the 1# port for output, and the low level signal 0 may control the selection circuit to gate on the data on the 0# port for output.
In practical application, each stage of the pipeline decompression data unit can select whether to decompress the data output by the selection circuit of the previous stage by using the decoding circuit in the pipeline decompression data unit of the current stage by controlling the selection circuit, if the current stage decompression processing is adopted, the selection circuit in the pipeline decompression data unit of the current stage is controlled by a control signal to output the data decompressed by the decoding circuit, and if the current stage decompression processing is not adopted, the selection circuit in the pipeline decompression data unit of the current stage is controlled by the control signal to output the data transmitted on the bypass channel.
For example, taking the structure of one decompression pipeline 01 in the data decompression device shown in fig. 1 as an example, the decoding circuit 100 in the pipelined data decompression unit 10 in the first stage in the figure decompresses the acquired compressed data in a corresponding decoding manner, and sends the decompressed result to the # 1 port of the selection circuit 101 in the present stage. Meanwhile, the 0# port of the selection circuit 101 of the present stage can receive the compressed data. When the selection circuit 101 of the present stage receives a control signal for gating the 1# port, the selection circuit 101 outputs data on the 1# port, that is, decompressed data output by the decoding circuit 100 of the present stage; accordingly, when the selection circuit 101 receives a control signal for gating the 0# port, the selection circuit 101 outputs data on the 0# port, i.e., compressed data; after the first stage of the pipelined decompression data unit 10 completes the corresponding operation, the data output by this stage is sent to the decoding circuit 110 and the bypass channel 112 in the second stage of the pipelined decompression data unit 11, respectively. By analogy, when which decoding circuit needs to decompress the data, the selection circuit of the stage is controlled by the control signal to output the output data of the decoding circuit. As can be seen from the above-described procedure, the decompressed data finally output by the data decompression apparatus may be decompressed data that is output after being decompressed by all stages of decoding circuits (for example, 100, 110, 120. cndot. cndot..
Optionally, when the data decompression device needs to decompress a plurality of received compressed data, as shown in fig. 1, the data decompression device may include a plurality of groups of parallel streaming data decompression units, and the streaming data decompression units in each group may be set in a cascade relationship. In this embodiment, multiple parallel sets of pipeline decompression data units may be configured to perform simultaneous decompression processing operations on different input compressed data and output a result after the decompression processing.
The data decompression device provided by the above embodiment includes: a decompression pipeline, each of said decompression pipelines comprising at least two stages of pipelined decompressed data units; and the pipelined data decompression unit comprises: the decoding circuit comprises a decoding circuit, a selection circuit and a bypass channel, wherein the decoding modes of the decoding circuit in each stage of the pipeline data decompression unit are different; the output end of the decoding circuit is connected with the input end of the selection circuit in the same stage pipeline decompression data unit on the current decompression pipeline; the output end of the selection circuit is respectively connected with one end of a bypass channel in the next-stage pipeline data decompression unit on the current decompression pipeline and the input end of a decoding circuit in the next-stage pipeline data decompression unit on the current decompression pipeline, and the other end of the bypass channel is connected with the input end of the selection circuit in the next-stage pipeline data decompression unit on the current decompression pipeline; the decompression pipeline is used for realizing multi-stage decompression processing on input data; the selection circuit is used for determining input data output to a decoding circuit in the next stage of the pipeline data decompression unit according to the input control signal. In the process of decompressing the data, because each decompressing pipeline in the data decompressing device comprises at least two stages of pipeline decompressing data units, and the decoding modes of the decoding circuits in the pipeline decompressing data units are different, the data decompressing device can perform multi-stage decompressing processing on the data compressed by different multi-stage compression modes, meanwhile, the selecting circuit in the pipeline decompressing data units can select whether to output the data output by each decoding circuit by setting different control signals, so as to realize the combination of a plurality of decoding circuits, and decompress the input data by using the combined decoding circuit, so that the data decompressing device provided by the application can flexibly configure the corresponding decoding mode to perform decompressing processing on the input compressed data according to the compression mode adopted during compressing the data, thereby improving the decompression accuracy.
In addition, the data decompression device comprises at least one decompression pipeline, and can simultaneously decompress a plurality of input parallel data, so that the data decompression device provided by the application can further improve the speed of parallel data processing.
Fig. 2 is a schematic diagram of a data decompression device according to an embodiment. As shown in fig. 2, the data decompression apparatus further includes a control unit 13, and the control unit 13 is connected to the input terminal of the selection circuit (101, 111, 121 in the figure). The control unit 13 is configured to output a control signal.
Optionally, the control unit 13 may be a controller that outputs a high-low level signal, specifically, the control unit 13 may generate a corresponding high-low level signal according to an instruction input by a user, and then send the high-low level signal to a selection circuit connected to the control unit; optionally, the control unit 13 may also receive a control signal sent by another circuit, perform decoding processing on the received control signal, generate a corresponding high-low level signal, and send the high-low level signal to the selection circuit connected thereto.
In this embodiment, the data decompression device can flexibly configure different decoding circuits (100, 110, 120. cndot. cndot. in the figure), through the control unit 13 and the selection circuits (101, 111, 121. cndot.) in each streaming data decompression unit (10, 11.
Illustrating the configuration process described above by way of example, as shown in fig. 2A, the data decompression apparatus in the figure comprises three pipeline decompression data units, namely a pipeline decompression data unit a, a pipeline decompression data unit b and a pipeline decompression data unit c, and the control unit D is connected with the selection circuit a, the selection circuit b and the selection circuit c. Wherein the control unit D sends a control signal of high and low level. When the control unit D sends a high level signal (1) to the selection circuit a, the control unit D sends a low level signal (0) to the selection circuit b, and the control unit D sends a high level signal (1) to the selection circuit c, the data output by the decoding circuit a in the figure is selectively output by the selection circuit a, and the data output by the decoding circuit c is selectively output by the selection circuit c, so that the data decompression device in the figure sequentially decompresses the input compressed data by adopting the combination of the decoding circuit a and the decoding circuit c. Therefore, different control signals correspond to different decoding modes, and a user can flexibly configure a decoding circuit by inputting different control signals according to actual application requirements to decompress input compressed data.
Fig. 3 is a schematic diagram of a data decompression device according to an embodiment. As shown in fig. 3, the data decompression apparatus further includes a storage unit 14, where the storage unit 14 is respectively connected to the input terminals of the decoding circuit 100 and the selection circuit 101 in the first-stage pipeline data decompression unit; and the storage unit 14 is used to store compressed data that needs to be decompressed.
Wherein the compressed data may be stored in the storage unit 14 in advance. The hardware circuit corresponding to the storage unit 14 may be a register, a cache, or a memory RAM, which is not limited in this embodiment.
In this embodiment, the decoding circuit 100 in the first-stage pipeline data decompression unit 10 may acquire compressed data from the storage unit 14, and simultaneously perform decompression processing on the compressed data by using a corresponding decoding manner to obtain decompressed data, and then send the decompressed data to one # 1 data input port of the selection circuit 101, where in addition, the data received at the # 0 data input port of the selection circuit 101 may be the compressed data in the storage unit 14. In this application scenario, the data output from the output port of the selection circuit 101 in the first-stage pipelined data decompression unit 10 may be compressed data or decompressed data output from the decoding circuit 100 of this stage, depending on the control signal. For example, when the control signal is a high-low level signal, one scheme that can be selected is: the high level signal controls the selection circuit 101 to output decompressed data, and the low level signal controls the selection circuit 101 to output compressed data.
Optionally, the decoding method of the decoding circuit in each of the pipelined decompressed data units may be at least one of run-length decoding, huffman decoding, LZ77 decoding, and JPEG decoding. Alternatively, the decoding method of the decoding circuit may be other methods capable of decompressing and encoding data.
Optionally, if the decoding method of the decoding circuit in the pipelined data decompression unit is huffman decoding, the decoding circuit in the pipelined data decompression unit may include: the address table look-up circuit and the decompression data table look-up circuit; the input end of the address table look-up circuit is connected with the output end of the selection circuit in the previous stage of the pipelined decompression data unit, and the output end of the address table look-up circuit is connected with the input end of the decompressed data table look-up circuit; the output end of the decompressed data look-up table circuit is connected with the input end of the selection circuit in the same-stage pipeline decompressed data unit.
The address table look-up circuit is used for outputting an address corresponding to data output by the selection circuit in the previous stage of the pipeline decompression data unit. Specifically, the address table look-up circuit stores an address list, and a plurality of addresses are recorded in the address list. The decompressed data look-up table circuit is used for outputting decompressed data corresponding to the address output by the address look-up table circuit. Specifically, the decompressed data table look-up circuit stores a decompressed data list, and the decompressed data list records a plurality of decompressed data and a plurality of corresponding addresses.
In this embodiment, when the address lookup table circuit receives data output by the selection circuit in the previous stage of the streaming data decompression data unit, optionally, the address lookup table circuit may sequentially search corresponding addresses from the address list according to the sequence of receiving the data, and output the searched addresses to the decompressed data lookup table circuit. When the decompressed data table look-up circuit receives the address, the decompressed data list can be looked up, the decompressed data corresponding to the address in the decompressed data list can be looked up according to the address, and the looked-up decompressed data can be output to the selection circuit connected with the decompressed data list.
In one embodiment, the application also provides a data decompression device based on the run-length decoding circuit and the Huffman decoding circuit. The data decompression device combines the run length decoding mode and the Huffman decoding mode to realize the decompression processing of the compressed data. The compressed data may be data compressed by run-length coding and compressed by huffman coding, or alternatively, the compressed data may be data compressed by any one of the run-length coding and the huffman coding. For the decompression processing procedure of the data decompression apparatus, the following embodiment will give a specific explanation with reference to the schematic structural diagram of the data decompression apparatus shown in fig. 4.
Exemplary illustration, as shown in fig. 4. The data decompression device comprises a running water data decompression unit A and a running water data decompression unit B, wherein a decoding circuit in the running water data decompression unit A is a Huffman decoding circuit which comprises an address lookup circuit and a decompressed data lookup circuit, and a decoding circuit in the running water data decompression unit B is a run decoding circuit. In this embodiment, when the data decompression device needs to decompress compressed data (for example, the compressed data is compressed data that is run-length encoded and then huffman encoded), the data decompression device needs to successively decompress the compressed data by using a huffman decoding circuit and a run-length decoding circuit. The specific process is as follows: the address lookup circuit in the figure searches for an address on an address list according to received compressed data, and outputs an address A corresponding to the compressed data to the decompressed data lookup circuit, the decompressed data lookup circuit searches for decompressed data corresponding to the address A in the decompressed data list according to the address A to obtain decompressed data A, and outputs the decompressed data A to the selection circuit A, at this time, when the control signal A is 0, the selection circuit A outputs the compressed data, and when the control signal A is 1, the selection circuit A outputs the decompressed data A; then, the selection circuit a sends the output data (compressed data or decompressed data a) to the run-length decoding circuit and the bypass channel B connected to the selection circuit a, the run-length decoding circuit decompresses the received data to obtain decompressed data B, and outputs the decompressed data B to the selection circuit B, at this time, when the control signal a is 0 or 1 and the control signal B is 1, the selection circuit B outputs the decompressed data B, when the control signal a is 1 and the control signal B is 0, the selection circuit B outputs the decompressed data a, and when the control signal a is 0 and the control signal B is 0, the selection circuit B outputs the compressed data. To sum up, the data decompression device can realize four processing operations on compressed data by setting the control signal a and the control signal B, one is to decompress the compressed data by only adopting the huffman decoding circuit, one is to decompress the compressed data by only adopting the run-length decoding circuit, one is to decompress the compressed data by adopting the cascade mode of the huffman decoding circuit and the run-length decoding circuit, and the other is to directly output the compressed data without decompressing the compressed data.
The data decompression devices described in all the above embodiments can be applied to different scenarios, for example, they can be applied to all systems that need to perform data transmission, and also can be applied to all systems that need to perform data processing. Next, a computing device is introduced, which comprises the data decompression device according to any of the above embodiments.
Fig. 5 is a schematic diagram of a computing apparatus for performing machine learning calculation according to an embodiment, as shown in fig. 5, the computing apparatus includes an arithmetic unit 20 and a control unit 21; the arithmetic unit 20 includes: a master processing circuit 201 and a plurality of slave processing circuits 202; the main processing circuit 201 includes: a data decompression device 2011, and a main arithmetic circuit 2012; the slave processing circuit 202 includes: a data decompression device 2021, and a slave arithmetic circuit 2022;
the control unit 21 is configured to obtain original data, an operation instruction, and a control instruction, and send the original data, the operation instruction, and the control instruction to the main processing circuit 201;
the main processing circuit 201 is used for performing compression processing on original data and transmitting data and operation instructions with the plurality of slave processing circuits 202;
the plurality of slave processing circuits 202 are configured to decompress data transmitted by the master processing circuit 201, execute intermediate operations in parallel according to the decompressed data and the operation instruction, obtain a plurality of intermediate results, and send the plurality of intermediate results to the master processing circuit 201;
based on the above application, the main processing circuit 201 is further configured to perform decompression processing on the plurality of intermediate results, and perform subsequent processing on the plurality of intermediate results after the decompression processing, so as to obtain a calculation result.
In the present embodiment, the data decompression device is applied to the arithmetic unit 20, and data interaction between the master processing circuit 201 and the slave processing circuit 202 in the arithmetic unit 20 is realized. The specific data interaction process is as follows: when the main processing circuit 201 acquires original data from the control unit 21, compresses the original data, and transmits the compressed data to the slave processing circuit 202, the data decompression device 2021 in the slave processing circuit decompresses the compressed data to obtain decompressed data, and the slave arithmetic circuit 2022 performs an operation (for example, a product operation) on the decompressed data to obtain an intermediate result, and then compresses the intermediate result, and transmits the compressed intermediate result to the main processing circuit 201, so that the data decompression device 2021 decompresses the intermediate result, and the master arithmetic circuit 2012 performs an operation (for example, an accumulation and operation or an activation operation) on the decompressed intermediate result to obtain a calculation result.
Optionally, as shown in the schematic structural diagram of the computing apparatus shown in fig. 6, the computing apparatus may further include a storage unit 22, and the storage unit 22 is connected to the main processing circuit 201, and based on this application, the main processing circuit 201 is further configured to send the computing result to the storage unit 22.
In this embodiment, the main processing circuit 201 may directly obtain the original data from the storage unit 22, and perform corresponding processing on the original data. After the main processing circuit 201 performs the corresponding operation to obtain the final calculation result, the calculation result may be sent to the storage unit 22 for storage, so as to be used by other circuits. Note that the calculation result here may be a calculation result decompressed by the data decompression device 2011, or may alternatively be a calculation result not decompressed by the data decompression device 2011.
In the process of executing the machine learning operation, the computing device according to the above embodiment includes the data decompression device proposed in the present application, and the data decompression device can flexibly configure the decoding circuit, so that the decompression accuracy of the data decompression device is high. Therefore, when the computing device executes machine learning operation to transmit and process data, the data transmission accuracy is improved.
In one embodiment, the present application further provides a machine learning chip including the above-mentioned computing device.
In one embodiment, the present application further provides a chip package structure, which includes the above chip.
In an embodiment, the present application further provides a board card including the above chip package structure. Referring to fig. 7, the board card may include other accessories besides the chip package structure 81, including but not limited to: a memory device 82, an interface device 83, and a control device 84; the memory device 82 is connected to the machine learning chip 811 in the chip package 81 through a bus for storing data, and the memory device 82 may include a plurality of sets of memory cells 821. Each set of the storage units 821 and the machine learning chip 811 are connected by a bus. It is understood that each group of the memory units 821 may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip may internally include 4 72-bit DDR4 controllers, wherein 64bit of the 72-bit DDR4 controller is used for data transmission, and 8bit is used for ECC check. In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The interface device 83 is electrically connected to a machine learning chip 811 in the chip package 81. The interface device 83 is used for data transmission between the machine learning chip 811 and an external device (such as a server or a computer). For example, in one embodiment, the interface device 83 may be a standard PCIE (peripheral component interconnect express) interface. For example, the data to be processed is transmitted to the machine learning chip by the server through a standard PCIE interface, so as to implement data transfer. Preferably, when PCIE3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device 83 may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface device may implement the switching function. In addition, the calculation result of the machine learning chip 811 is still transmitted back to an external device (e.g., a server) by the interface device 83.
The control device 84 is electrically connected to the machine learning chip 811. The control device 84 is used to monitor the state of the chip. Specifically, the machine learning chip 811 and the control device 84 may be electrically connected through an SPI (Serial Peripheral Interface) Interface. The control device may include a single chip Microcomputer (MCU). As the machine learning chip may include a plurality of data processing devices and/or a combination processing device, a plurality of loads may be carried. Therefore, the machine learning chip can be in different working states such as multi-load and light load. The control device 84 can be used to control the operating states of a plurality of data processing devices and/or combination processing devices in the machine learning chip.
In some embodiments, an electronic device is provided that includes the above board card. The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
Those skilled in the art should also appreciate that the embodiments described in this specification are all alternative embodiments and that the acts and modules involved are not necessarily required for this application. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
It will be understood by those skilled in the art that all or part of the processing of the above embodiments may be implemented by a program to instruct associated hardware, and the program may be stored in a computer readable memory, and the memory may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (11)

1. A data decompression apparatus, characterized in that said data decompression apparatus comprises at least one decompression pipeline, each of said decompression pipelines comprising at least two stages of pipelined decompression data units, said pipelined decompression data units comprising: a decoding circuit, a selection circuit and a bypass channel; decoding modes of decoding circuits in the pipeline decompression data units of each stage are different; the output end of the decoding circuit is connected with the input end of the selection circuit in the same-stage pipeline decompression data unit on the current decompression pipeline; the output end of the selection circuit is respectively connected with one end of a bypass channel in the next-stage pipeline data decompression unit on the current decompression pipeline and the input end of a decoding circuit in the next-stage pipeline data decompression unit on the current decompression pipeline, and the other end of the bypass channel is connected with the input end of the selection circuit in the next-stage pipeline data decompression unit on the current decompression pipeline;
the decompression pipeline is used for realizing multi-stage decompression processing on input data;
the selection circuit is used for determining input data output to a decoding circuit in the next stage of the pipeline data decompression unit according to the input control signal.
2. The apparatus of claim 1, wherein the data decompression apparatus further comprises a control unit, the control unit is connected to the input terminal of the selection circuit, and the control unit is configured to output the control signal.
3. The apparatus according to claim 1 or 2, wherein the data decompression apparatus further comprises a storage unit, the storage unit is respectively connected with the input terminals of the decoding circuit and the selection circuit in the first stage pipeline decompression data unit; the storage unit is used for storing data needing decompression.
4. The apparatus of claim 1, wherein the decoding circuit decodes at least one of run-length decoding, huffman decoding, LZ77 decoding, and JPEG decoding.
5. The apparatus of claim 4, wherein if the decoding manner of the decoding circuit is Huffman decoding, the decoding circuit comprises: the address table look-up circuit and the decompression data table look-up circuit; the output end of the address table look-up circuit is connected with the input end of the decompressed data table look-up circuit; the output end of the decompressed data table look-up circuit is connected with the input end of the selection circuit.
6. A computing device for performing machine learning calculations, the computing device comprising an arithmetic unit and a control unit; the arithmetic unit includes: a master processing circuit and a plurality of slave processing circuits; the main processing circuit includes: a data decompression apparatus according to any one of claims 1 to 5, and a main arithmetic circuit; the slave processing circuit includes: a data decompression apparatus according to any one of claims 1 to 5, and a slave arithmetic circuit;
the control unit is used for acquiring original data, an operation instruction and a control instruction and sending the original data, the operation instruction and the control instruction to the main processing circuit;
the master processing circuit is used for performing compression processing on the original data and transmitting data and operation instructions with the plurality of slave processing circuits;
the plurality of slave processing circuits are used for decompressing the data transmitted by the main processing circuit, executing intermediate operation in parallel according to the decompressed data and the operation instruction to obtain a plurality of intermediate results, and sending the plurality of intermediate results to the main processing circuit;
the main processing circuit is further configured to decompress the plurality of intermediate results, and perform subsequent processing on the plurality of intermediate results after the decompression processing, so as to obtain a calculation result.
7. The apparatus of claim 6, further comprising a storage unit coupled to the main processing circuit, the main processing circuit further configured to send the calculation result to the storage unit.
8. A machine learning chip, comprising the computing device of claim 6 or 7.
9. A chip packaging structure, characterized in that the chip packaging structure comprises the machine learning chip of claim 8.
10. A board comprising the chip package structure of claim 8.
11. An electronic device, characterized in that it comprises a card according to claim 10.
CN201811609579.6A 2018-12-07 2018-12-27 Data decompression device and related product Pending CN111385580A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811609579.6A CN111385580A (en) 2018-12-27 2018-12-27 Data decompression device and related product
PCT/CN2019/121056 WO2020114283A1 (en) 2018-12-07 2019-11-26 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811609579.6A CN111385580A (en) 2018-12-27 2018-12-27 Data decompression device and related product

Publications (1)

Publication Number Publication Date
CN111385580A true CN111385580A (en) 2020-07-07

Family

ID=71217889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811609579.6A Pending CN111385580A (en) 2018-12-07 2018-12-27 Data decompression device and related product

Country Status (1)

Country Link
CN (1) CN111385580A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250686A1 (en) * 2004-06-25 2007-10-25 Koninklijke Philips Electronics, N.V. Instruction Processing Circuit
CN101800901A (en) * 2009-02-09 2010-08-11 联发科技股份有限公司 Signal processing apparatus and method
CN102915766A (en) * 2011-06-13 2013-02-06 马维尔国际贸易有限公司 Systems and methods for operating on a storage device using a life-cycle dependent coding scheme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070250686A1 (en) * 2004-06-25 2007-10-25 Koninklijke Philips Electronics, N.V. Instruction Processing Circuit
CN101800901A (en) * 2009-02-09 2010-08-11 联发科技股份有限公司 Signal processing apparatus and method
CN102915766A (en) * 2011-06-13 2013-02-06 马维尔国际贸易有限公司 Systems and methods for operating on a storage device using a life-cycle dependent coding scheme

Similar Documents

Publication Publication Date Title
US11354097B2 (en) Compressor circuit, Wallace tree circuit, multiplier circuit, chip, and device
CN111279617A (en) Data decompression device and method
CN111385580A (en) Data decompression device and related product
CN111384958B (en) Data compression device and related product
CN111260043B (en) Data selector, data processing method, chip and electronic equipment
CN111381882A (en) Data processing device and related product
Kitaoka et al. Reducing the configuration loading time of a coarse grain multicontext reconfigurable device
CN112527714B (en) PECI signal interconnection method, system, equipment and medium of server
KR101727508B1 (en) Apparatus and method for accelerating hardware compression based on hadoop
CN111862885B (en) Bidirectional data transmission LED control method and system of internal IC and storage medium
US20180004697A1 (en) Control system and control method thereof
CN114490495A (en) Control method, control device, terminal equipment and computer readable storage medium
CN111260042B (en) Data selector, data processing method, chip and electronic equipment
CN112395003A (en) Operation method, device and related product
US20070162630A1 (en) Single-chip multiple-microcontroller package structure
CN111384944B (en) Full adder, half adder, data processing method, chip and electronic equipment
US5337268A (en) Partial multiplier selector for multiplication circuit
WO2021223638A1 (en) Data processing method and device, and related product
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN113033799B (en) Data processor, method, device and chip
CN113031909B (en) Data processor, method, device and chip
CN111340229B (en) Data selector, data processing method, chip and electronic equipment
CN109660610B (en) Data processing method, device, equipment and storage medium
CN111966402A (en) Instruction processing method and device and related product
CN111860795A (en) Data processing method and device and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200707