WO2022045448A1 - Procédé de compression de données de sortie d'un accélérateur matériel, procédé de décodage de données entrées dans un accélérateur matériel et accélérateur matériel associé - Google Patents

Procédé de compression de données de sortie d'un accélérateur matériel, procédé de décodage de données entrées dans un accélérateur matériel et accélérateur matériel associé Download PDF

Info

Publication number
WO2022045448A1
WO2022045448A1 PCT/KR2020/015477 KR2020015477W WO2022045448A1 WO 2022045448 A1 WO2022045448 A1 WO 2022045448A1 KR 2020015477 W KR2020015477 W KR 2020015477W WO 2022045448 A1 WO2022045448 A1 WO 2022045448A1
Authority
WO
WIPO (PCT)
Prior art keywords
uncompressed
compressed
data group
dimension direction
array
Prior art date
Application number
PCT/KR2020/015477
Other languages
English (en)
Korean (ko)
Inventor
양성모
Original Assignee
오픈엣지테크놀로지 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 오픈엣지테크놀로지 주식회사 filed Critical 오픈엣지테크놀로지 주식회사
Publication of WO2022045448A1 publication Critical patent/WO2022045448A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to computing technology, and to a technology for effectively compressing output data of a hardware accelerator and effectively decoding input data.
  • the description of the present invention begins with an example of the structure of a neural network accelerator, which is a kind of hardware accelerator that is the subject of the present invention.
  • a neural network is a well-known technology used as one of the technologies to implement artificial intelligence.
  • FIG. 1 is a conceptual diagram illustrating a partial configuration of a neural network presented to aid understanding of the present invention.
  • the neural network 600 may include a plurality of layers.
  • the first layer 610 among the plurality of layers may output output data 611 called a feature map or activation.
  • the output data 611 output from the first layer 610 may be provided as input data of the second layer 620 downstream of the first layer 610 .
  • Each of the above layers may be regarded as a data conversion function module or a data operation unit that converts input data into output data according to a predetermined rule.
  • the first layer 610 may be regarded as a data conversion function module that converts input data 609 input to the first layer 610 into output data 611 .
  • the structure of the first layer 610 must be defined.
  • input variables in which the input data 609 input to the first layer 610 are stored should be defined, and the output data 611 output from the first layer 610 should be defined.
  • Output variables representing The first layer 610 may use a set of weights 612 to perform its function.
  • the set of weights 612 may be values multiplied by the input variables to calculate the output variables from the input variables.
  • the set of weights 612 may be one of various parameters of the neural network 600 .
  • the set of weights 612 may be a constant for a determined layer.
  • the set of weights 612 is stored in a memory outside the layer, and when the layer operates, It may be input to the layer from the memory. That is, depending on the implementation, the set of weights 612 may also be treated as data input to the layer.
  • the calculation process for calculating the output data 611 output from the first layer 610 from the input data 609 input to the first layer 610 of the neural network 600 may be implemented as software, but hardware or It may be implemented as hardware and firmware.
  • FIG. 2 shows a main structure of a neural network computing device including a neural network accelerator in which a function of a neural network is implemented by hardware, and a part of a computing device including the same.
  • the computing device 1 includes a DRAM (Dynamic Random Access Memory) 10, a neural network operating device 100, a bus 700 connecting the DRAM 10 and the neural network operating device 100, and other hardware 99 connected to the bus 700 .
  • DRAM Dynamic Random Access Memory
  • the computing device 1 may further include a power supply unit, a communication unit, a main processor, a user interface, a storage unit, and peripheral device units (not shown).
  • the bus 700 may be shared by the neural network computing device 100 and other hardware 99 .
  • the neural network computing device 100 includes a DMA unit (Direct Memory Access part) 20 , a control unit 40 , an internal memory 30 , a compression unit 620 , a decoding unit 630 , and a neural network acceleration unit 60 .
  • DMA unit Direct Memory Access part
  • control unit 40 controls the processing unit 40 .
  • Decoding may be referred to herein as decompression. Therefore, decoding can be expressed as decoding or decompress in English.
  • compression may be expressed as encoding. So compression can be translated as compress or encoding.
  • the input array 310 In order for the neural network accelerator 60 to operate, the input array 310 must be provided as input data of the neural network accelerator 60 .
  • the neural network accelerator 60 may be configured to perform the function of only one specific layer, rather than performing the functions of all layers of the neural network in a specific operation time period.
  • the input array 310 may be a set of data in the form of a multidimensional array.
  • the input array 310 may include, for example, the input data 609 described in FIG. 1 and a set of weights 612 .
  • the input array may be referred to as input data.
  • the input array 310 provided to the neural network accelerator 60 may be output from the internal memory 30 .
  • the internal memory 30 may be provided on the same wafer as the wafer on which the neural network accelerator 60 is implemented.
  • all of the functional modules in the block represented by the neural network computing device 100 shown in FIG. 2 may be implemented on the same wafer.
  • the internal memory 30 may receive at least some or all of the input array 310 from the DRAM 10 through the bus 700 of the computing device.
  • the controller 40 and the DMA unit 20 may control the internal memory 30 and the DRAM 10 .
  • the output array 330 may be generated based on the input array 310 .
  • the output array 330 may be a set of data in the form of a multidimensional array.
  • the output array may be referred to as output data.
  • the generated output array 330 may first be stored in the internal memory 30 .
  • the output array 330 stored in the internal memory 30 may be written to the DRAM 10 under the control of the controller 40 and the DMA unit 20 .
  • the controller 40 may collectively control the operations of the DMA unit 20 , the internal memory 30 , and the neural network accelerator 60 .
  • the neural network acceleration unit 60 performs, for example, the function of the first layer 610 shown in FIG. 1 during the first time period, and for example, the second layer 620 shown in FIG. 1 during the second time period. can perform the function of
  • the neural network accelerator 60 and its associated functional modules shown in FIG. 2 may be provided in plurality in the neural network computation device 100 to perform the computation requested by the controller 40 in parallel. there is.
  • the neural network accelerator 60 may sequentially output all the data of the output array 330 according to a given order according to time, rather than outputting all the data at once.
  • the compression unit 620 may compress the output array 330 to reduce the amount of data of the output output array 330 and provide it to the internal memory 30 . As a result, the output array 330 may be stored in the DRAM 10 in a compressed state.
  • the input array 310 input to the neural network accelerator 60 may be read from the DRAM 10 .
  • Data read from the DRAM 10 may be compressed, and the compressed data may be decoded by the decoding unit 630 before being provided to the neural network accelerator 60 and converted to the input array 310 .
  • the internal memory 30 is stored from the DRAM 10 . It is desirable to acquire new data.
  • the neural network accelerator 60 may receive the input array 609 and the first set of weights 612 of FIG. 1 during the first time period to perform the function of the first layer 610 .
  • the neural network accelerator 60 may receive the input array 611 and the second set of weights 622 of FIG. 1 during the second time period to perform the function of the second layer 620 .
  • the internal memory 30 obtains the input array 611 and the second set of weights 622 from the DRAM 10 . It is preferable
  • FIG. 3 illustrates the structure of the output array 330 of FIG. 2 .
  • the output array 330 may be a set of data having a multidimensional structure.
  • data having a two-dimensional structure is exemplified for convenience, but the concept of the present invention to be described later can be applied as it is even when the output array 330 has a three-dimensional or more structure.
  • the output array 330 is divided into a plurality of uncompressed data groups (NCG) and defined, of which only the first uncompressed data group is first written to the internal memory 30, and then, the internal memory
  • the first uncompressed data group written in 30 may be moved to the DRAM 10 and then deleted from the internal memory 30 .
  • only the second uncompressed data group of the output array 330 is first written to the internal memory 30
  • the second uncompressed data group written to the internal memory 30 is transferred to the DRAM 10 .
  • it may be deleted from the internal memory 30 .
  • This method may be adopted, for example, when the size of the internal memory 30 is not large enough to store all of the output array 330 in one set.
  • the arbitrary uncompressed data group (NCG)
  • the compressed data group CG obtained by first data-compressing the NCG may be recorded in the internal memory 30 . Then, the compressed data group CG recorded in the internal memory 30 may be moved to the DRAM 10 .
  • a separate data buffer not shown in FIG. 2 may be provided to compress each uncompressed data group to generate each compressed data group.
  • 4A is a diagram for explaining some constraints considered in the present invention as constraints that may occur in some embodiments.
  • the neural network accelerator 60 may be configured to perform the function of the layer k 610 in the first time period T1 .
  • the neural network accelerator 60 may output the output array 330, without outputting each element of the output data 330 at once, indexes 1, 2, 3, 4, . ..., 15, 16 can be output in the order (refer to the zigzag arrow of FIG. 4A (a)).
  • the output array 330 output in the first time period T1 may be stored in the DRAM 10 .
  • the neural network accelerator 60 may be configured to perform the function of the layer k+1 620 . To this end, the neural network accelerator 60 may request the output array 330 recorded in the DRAM 10 as input data. At this time, the neural network accelerator 60 indexes the components of the output array 330 to index 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, 4, 8, 12, 16 There may be restrictions that must be input in the order of (refer to the zigzag arrow of FIG. 4A (b)).
  • FIG. 4B is a diagram for explaining data input/output characteristics under conditions different from those of FIG. 4A.
  • the neural network accelerator 60 when the neural network accelerator 60 outputs the output array 330, the indexes 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, 4, 8, 12, 16 can be output in the order (refer to the zigzag arrow of FIG. 4B (a)).
  • the neural network acceleration unit 60 indexes the components of the output array 330 as indexes 1, 5, 9, 13, 2, 6, 10, 14 , 3, 7, 11, 15, 4, 8, 12, 16 (refer to the zigzag arrow of FIG. 4B (b)) can be input.
  • 5A to 5D show a method of dividing the output array 330 of the neural network accelerator 60 into several groups, and compressing and storing data of each divided group according to an embodiment.
  • the neural network accelerator 60 may be configured to perform the function of the layer k 610 .
  • the neural network accelerator 60 is a component corresponding to indexes 1, 5, 9, 13, 14, 2, 6, 10, and 14 belonging to the first uncompressed data group (NCG1) of the output array 330 . are sequentially output to complete the output of the first uncompressed data group NCG1.
  • the compression unit 620 may compress the completed first uncompressed data group NCG1 to generate the first compressed data group CG1 .
  • the first compressed data group CG1 may be temporarily stored in the internal memory 30 and then moved to the DRAM 10 .
  • the compression unit 620 may include an output data buffer.
  • the output data butter may be externally provided to the compression unit 620 as a separate function module.
  • the neural network accelerator 60 indexes 3, 7, 11 belonging to the second uncompressed data group NCG2 of the output array 330 . , 15, 4, 8, 12, and 16 may be sequentially output to complete output of the second uncompressed data group NCG2.
  • the compression unit 620 may compress the completed second uncompressed data group NCG2 to generate a second compressed data group CG2 .
  • the second compressed data group CG2 may be temporarily stored in the internal memory 30 and then moved to the DRAM 10 .
  • the neural network accelerator 60 may be configured to perform the function of the layer k+1 620 in the second time period T2 .
  • the neural network acceleration unit 60 performing the function of the layer k+1 620 is indexed 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15 , 4, 8, 12, 16 may be restricted in the order of input.
  • the first compressed data group CG1 is read from the DRAM 10 and stored in the internal memory 30 , and then the first uncompressed data group CG1 stored in the internal memory 30 is decoded.
  • a data group NCG1 can be created.
  • each element of the first uncompressed data group NCG1 is generated is indices 14, 10, 6, 2, 13, 9, 5, If they are generated in the order of 1, in order for the data of index 1 to be input to the neural network accelerator 60, a total of 8 clocks must be waited from the time when decoding of the first compressed data group CG1 is started. In this case, there is a problem that latency is generated in the process of inputting data to the neural network accelerator 60 .
  • An object of the present invention is to provide a method of compressing elements of an output array output by a data operation unit of a hardware accelerator and a method of decoding compressed data to be input to a data operation unit in order to solve the above-described problems.
  • decoding when decoding compressed data, decoding should be performed in the order of indexes to be input to the data operation unit.
  • decoding is implemented corresponding to the encoding (compression) technology
  • the technology for compressing the output array output by the data operation unit is also a subject of the present invention.
  • the compression unit may compress the non-compressible elements included in the output array output from the data operation unit.
  • the output array may be divided into one or a plurality of groups, that is, one or a plurality of uncompressed data groups, and data processing may be performed for each uncompressed data group.
  • the compression unit may compress each of the uncompressed data groups to generate respective compressed data groups corresponding thereto.
  • a plurality of components constituting the compressed data group may be referred to as compression components, respectively.
  • each of the plurality of elements constituting the uncompressed data group may be referred to as an uncompressed element.
  • each component constituting the output array output by the data operation unit may be referred to as a non-compression component.
  • the compression component and the non-compression component may each have a real value or a complex value, for example.
  • Each of the uncompressed data group and the compressed data group may have the form of a multidimensional array including at least a first dimension and a second dimension.
  • the compression unit gives priority to any one dimension of the first dimension and the second dimension, and sequentially compresses components along the one dimension direction.
  • the data operation unit receiving the uncompressed constituent elements of the uncompressed data group restored by decoding the completed compressed data group sequentially inputs the uncompressed constituent elements It may be determined by the order in which they are received.
  • the data operation unit sequentially receives the incompressible components, if the first dimension is given priority among the first dimension and the second dimension, the one dimension is determined to be the first dimension.
  • control unit of the hardware accelerator includes the data operation unit receiving the uncompressed constituent elements of the uncompressed data group restored by decoding the completed compressed data group, the uncompressed constituent element It is configured to acquire in advance the order in which the items are sequentially input.
  • control unit may be configured to control the compression unit so that the compression unit sequentially generates the compression components along the one dimension direction according to the obtained input receiving order.
  • the compression unit uses only the non-compression components already output by the data operation unit before the k-th clock to generate the k-th compression component to be generated in the k-th clock among the N clocks.
  • a specific compression algorithm for generating each compression component of the compressed data group from the uncompressed data group may be presented in various ways.
  • the decoding unit included in the hardware accelerator provided according to an aspect of the present invention may decode the completed compressed data group to restore the uncompressed data group.
  • the compression method of the compression unit corresponds to the decoding method.
  • the decoding unit in the index order of the uncompressed constituent elements to be sequentially provided to the data operation unit among the uncompressed constituent elements of the uncompressed data group to be decoded and restored from the compressed data group. Accordingly, the components of the uncompressed data group are sequentially output.
  • the decoding unit gives priority to any one dimension of the first dimension and the second dimension, and sequentially uncompresses the compression along the one dimension direction. You can create elements.
  • the one dimension may be determined according to the order in which the data calculating unit receiving the uncompressed constituent elements of the restored uncompressed data group sequentially receives the uncompressed constituent elements.
  • the data operation unit when the data operation unit gives priority to the first dimension and receives the incompressible elements, it may be determined that the one dimension is the first dimension.
  • the decoding unit restores the specific uncompressed element of the uncompressed data group
  • the other uncompressed element already restored before restoring the specific uncompressed element together with the particular compressed element of the compressed data group is configured. It may be intended to use only elements. In this case, some or all of the already restored non-compressible components may be used.
  • the decoding unit is a first uncompressed constituent element to be restored and output first among uncompressed elements belonging to the uncompressed data group, and an uncompressed element to be first input to the data calculating unit among the uncompressed data group. can be restored by selecting .
  • a specific algorithm for recovering each uncompressed component of the uncompressed data group from the compressed data group may be presented in various ways.
  • the decoding unit sequentially restores each uncompressed constituent element of the uncompressed data group
  • the restored uncompressed constituent element may be input to the data calculating unit in real time.
  • the decoding unit sequentially outputs N uncompressed components of the uncompressed data group over N consecutive clocks
  • the output N uncompressed components are outputted over the N consecutive clocks. may be sequentially input to the data operation unit.
  • the decoding unit may be configured to use only the uncompressed constituent elements already restored before the k-th clock.
  • a hardware accelerator provided according to an aspect of the present invention includes: a data operation unit; a compression unit for generating a compressed output array by compressing the output array output by the data operation unit in a first operation time period; and a decoding unit that restores an input array from the compressed output array and provides the restored input array to the data operation unit in a second operation time period.
  • the components of the input array 310 are sequentially input to the data operation unit by giving priority to a predetermined dimension direction, and the compression unit compresses the output array,
  • the components constituting the compressed output array are sequentially generated by giving priority to the predetermined dimension direction.
  • the output array is divided into a plurality of uncompressed data groups, and the compression unit compresses each of the uncompressed data groups to generate a compressed data group corresponding to the uncompressed data group,
  • the output array is composed of a plurality of the generated compressed data groups, and the compression unit compresses each of the uncompressed data groups to generate the compressed data group, the components constituting the compressed data group. It is configured to be sequentially generated by giving priority to the predetermined dimension direction.
  • the p-th component generated in the compressed data group CG has priority in the predetermined dimension direction 91 in the uncompressed data group NCG corresponding to the compressed data group CG.
  • the decoding unit 630 decodes the compressed output array 340 to restore the input array 310
  • the components constituting the input array 310 are arranged in the predetermined dimension direction ( 91) may be sequentially restored by giving priority.
  • the compressed output array 340 is divided into a plurality of compressed data groups (CG), and the decoding unit 630 decodes each of the compressed data groups (CG) to thereby decode the compressed data groups (CG).
  • the decoding unit 630 is , when each compressed data group (CG) is decoded to restore the uncompressed data group (NCG), the components constituting the uncompressed data group (NCG) are prioritized in the predetermined dimension direction (91). It may be configured to be restored sequentially by placing a ranking.
  • the p-th component restored by giving priority to the predetermined dimension direction 91 is in the predetermined dimension direction 91 in the compressed data group (CG).
  • CG compressed data group
  • a component having a first index to a p-th index counted with priority and the elements of the non-compressed data group (NCG) that have already been restored until the point in time when the p-th restored element is restored.
  • the output array 330 is divided into a plurality of uncompressed data groups (NCG), and the compression unit 620 compresses each of the uncompressed data groups (NCG) to form the uncompressed data group ( NCG) to generate a compressed data group (CG) corresponding to the compressed output array 340 is composed of a plurality of the generated compressed data group (CG), the compressed output array (340) is provided to the memory to store the generated plurality of compressed data groups (CG) in the memory, and to restore the input array 310 from the compressed output array 340 to The compressed data groups CG may be sequentially read.
  • the output array 330 includes a plurality of uncompressed data groups (NCG) along the predetermined dimension direction 91
  • the compressed output array 340 is the predetermined dimension direction 91 . Accordingly, when reading the plurality of compressed data groups (CG) from the memory to include a plurality of compressed data groups (CG) and to restore the input array 310 from the compressed output array 340, the The plurality of compressed data groups CG may be read by giving priority to the determined dimension direction 91 .
  • all the plurality of uncompressed data groups are arranged along the predetermined dimension direction 91 at a first point 101 in a dimension direction 92 different from the predetermined dimension direction 91 .
  • decoding all components belonging to the uncompressed data groups (NCGs) of the first column consisting of It may be configured to start decoding the components belonging to the uncompressed data groups (NCGs).
  • all components arranged along the predetermined dimension direction at the third point 113 in a dimension direction 92 different from the predetermined dimension direction 91 are selected.
  • the decoding of the component belonging to the fourth point 114 in the other dimension direction 92 among the components of the input array 310 may be started.
  • the third point 113 and the fourth point 114 may belong to different uncompressed data groups or compressed data groups.
  • the hardware accelerator described above; and DRAM; and a computing device including a DRAM may be provided.
  • a method of compressing elements of an output array output by a data operation unit of a hardware accelerator and a method of decoding compressed data to be input to a data operation unit.
  • the decoding unit that restores the uncompressed constituent elements to be provided to the data calculating unit restores the uncompressed constituent element
  • the restored uncompressed constituent element may be input to the data calculating unit in real time.
  • FIG. 1 is a conceptual diagram illustrating a partial configuration of a neural network presented to aid understanding of the present invention.
  • FIG. 2 shows a main structure of a neural network computing device including a neural network accelerator in which a function of a neural network is implemented by hardware, and a part of a computing device including the same.
  • FIG. 3 illustrates the structure of the output array 330 of FIG. 2 .
  • FIG. 4A is a diagram for explaining some constraints considered in the present invention as constraints that may occur in some embodiments
  • FIG. 4B is a view for explaining data input/output characteristics under conditions different from those of FIG. 4A.
  • 5A to 5D show a method of dividing the output array 330 of the neural network accelerator 60 into several groups, and compressing and storing data of each divided group according to an embodiment.
  • FIG. 6A shows the main structure of a hardware accelerator provided according to an embodiment of the present invention and a part of a computing device including the same
  • FIG. 6B is a method in which a data operation unit operates according to time according to an embodiment of the present invention is a diagram illustrating
  • 7A to 7E are provided to explain a data compression method provided according to an embodiment of the present invention.
  • FIG. 8A to 8B are provided to explain a data decoding method provided according to an embodiment of the present invention
  • FIG. 8C is a first dimension direction 91 of the uncompressed constituent elements of the first uncompressed data group NCG101. ) is an example of restoration by prioritizing it.
  • 9A to 9D are provided to explain a data compression method provided according to another embodiment of the present invention.
  • 10A to 10D are provided to explain a data decoding method provided according to another embodiment of the present invention.
  • FIG. 11 shows an output array (uncompressed) 330 output by the data operation unit in the first operation time period according to an embodiment of the present invention.
  • FIG. 12 is a conceptual diagram illustrating the form of an input array or an output array of a neural network accelerator provided according to an embodiment of the present invention.
  • FIG. 6A illustrates a main structure of a hardware accelerator provided according to an embodiment of the present invention and a part of a computing device including the hardware accelerator.
  • the computing device 1 includes a memory 11 , a hardware accelerator 110 , a bus 700 connecting the memory 11 and the hardware accelerator 110 , and other hardware 99 connected to the bus 700 .
  • the computing device 1 may further include a power supply unit, a communication unit, a main processor, a user interface, a storage unit, and peripheral device units (not shown).
  • the bus 700 may be shared by the hardware accelerator 110 and other hardware 99 .
  • the hardware accelerator 110 includes a DMA unit (Direct Memory Access part) 20, a control unit 40, an internal memory 30, a compression unit 620, a decoding unit 630, a data operation unit 610, an output buffer ( 640 ), and an input buffer 650 .
  • DMA unit Direct Memory Access part
  • the decoding unit 630, the compression unit 620, the output buffer 640, the input buffer 650, and the internal memory 30 are separate components different from each other, but in a modified embodiment,
  • the decoding unit 630 , the compression unit 620 , the output buffer 640 , the input buffer 650 , and the internal memory 30 may be provided as one single functional unit.
  • the memory 11 , the hardware accelerator 110 , and the data calculator 610 may be, for example, the DRAM 10 , the neural network computation device 100 , and the neural network accelerator 60 shown in FIG. 2 , respectively.
  • the present invention is not limited thereto.
  • the input array 310 In order for the data operation unit 610 to operate, the input array 310 must be provided to the data operation unit 610 .
  • the input array 310 may be a set of data in the form of a multidimensional array.
  • the input array 310 provided to the data operation unit 610 may be output from the internal memory 30 .
  • the internal memory 30 may receive at least some or all of the input array 310 from the memory 11 through the bus 700 .
  • the controller 40 and the DMA unit 20 may control the internal memory 30 and the memory 11 .
  • the output array 330 may be generated based on the input array 310 .
  • the output array 330 may be a set of data in the form of a multidimensional array.
  • the generated output array 330 may first be stored in the internal memory 30 .
  • the output array 330 stored in the internal memory 30 may be written to the memory 11 under the control of the controller 40 and the DMA unit 20 .
  • the controller 40 may collectively control the operations of the DMA unit 20 , the internal memory 30 , and the data operation unit 610 .
  • the data operation unit 610 may perform a first function during the first time period and perform a second function during the second time period.
  • the second function may be different from the first function.
  • the data operation unit 610 performs, for example, the function of the first layer 610 shown in FIG. 1 during the first time period, and performs the function of, for example, the second layer 620 shown in FIG. 1 during the second time period. can do.
  • a plurality of data operation units 610 shown in FIG. 6A may be provided to perform operations requested by the control unit 40 in parallel.
  • the data operation unit 610 may output all the data of the output array 330 sequentially according to time without outputting at once.
  • the compression unit 620 may compress the output array 330 to reduce the data amount of the output output array 330 and provide it to the internal memory 30 .
  • the output array 330 may be stored in the memory 11 as the array 340 in a compressed state.
  • the output buffer 640 may have a storage space smaller than the size of the output array 330 .
  • Data constituting the output array 330 may be sequentially output according to the passage of time. First, only the first sub data, which is a part of the output array 330, which is output first, may be stored in the output buffer 640, and the first sub data stored in the output buffer 640 is compressed by the compression unit 620. It may be transferred to the memory 11. After that, the second sub data, which is another part output later in the output array 330, may be transferred to the memory 11 through the same process.
  • the input array 310 input to the data operation unit 610 may be read from the memory 11 .
  • Data read from the memory 11 may be compressed, and may be decoded by the decoding unit 630 before being provided to the data operation unit 610 and converted to the input array 310 .
  • the input buffer 650 may have a storage space smaller than the size of the input array (uncompressed) 310 .
  • Data constituting the input array (compression) 320 may be sequentially provided according to the passage of time. First, only the first sub data, which is a part of the input array (compression) 320, which is provided first, may be stored in the input buffer 650, and the first sub data stored in the input buffer 650 is in the decoding unit 630. may be decoded and input to the data operation unit 610 . After that, second sub data, which is another part of the input array (compression) 320 provided later, may be input to the data operation unit 610 through the same process.
  • 6B is a diagram illustrating a method in which a data operation unit operates according to time according to an embodiment of the present invention.
  • the data operation unit 610 may be configured to output the first output array 331 based on the first input array 311 input to the data operation unit 610 during the first operation time period T1 .
  • the data operation unit 610 may be configured to output the second output array 332 based on the second input array 312 input to the data operation unit 610 during the second operation time period T2.
  • the second input array 312 may be the same as the first output array 331 .
  • the first output array 331 may be compressed and stored in the memory, then decoded again from the memory and provided as the second input array 312 .
  • 7A to 7E are provided to explain a data compression method provided according to an embodiment of the present invention.
  • FIG 7A shows the structure of an output array output from the data operation unit in the first operation time period according to an embodiment of the present invention.
  • the output array (uncompressed) 330 shown in FIG. 7A is output from the data operation unit 610 in the first operation time period T1 and includes a total of 100 uncompressed elements.
  • Each of the non-compressible components may have a real value or an imaginary value, and is an uncompressed value.
  • N1 1
  • N2 1
  • N2 5
  • FIG. 7B and 7C show the order in which the non-compressed elements constituting the output array (uncompressed) 330 are sequentially output from the data operation unit 610 .
  • the data operation unit 610 gives priority to the second dimension direction 92 over the first dimension direction 91 and outputs the non-compressible elements. That is, in FIG. 7B , the non-compression components belonging to the k-th row may be sequentially output from left to right, and then the non-compression components belonging to the k+1th row may be sequentially output from left to right.
  • indices 1, 2, 3, ... of the uncompressed elements are sequentially outputted, and only when the index 92 is outputted, the first uncompressed data group NCG101 can be completely prepared.
  • the compression unit 620 may compress the first uncompressed data group NCG101 to generate the first compressed data group CG101. .
  • the plurality of compression elements constituting the first compressed data group CG101 may be sequentially generated, rather than generated at once.
  • control unit 40 causes the compression unit 620 to prioritize the compression components included in the output array (compression) 340 in the first dimension direction 91 and sequentially It is possible to determine whether to generate , or whether to sequentially generate the second dimension direction 91 with priority.
  • the data operation unit 610 decodes the completed first compressed data group CG101 in the second operation time period T2, which is a time period after the first operation time period T1, and restores the first uncompressed first uncompressed data group CG101.
  • Incompressible elements belonging to the data group NCG101 may be input.
  • the control unit 40 the compression unit It is possible to control the 620 to sequentially generate the compression components by giving priority to the first dimension direction 91 .
  • the control unit 40 The unit 620 may control the compression components to be sequentially generated by giving priority to the second dimension direction 92 .
  • the control unit 40 should be able to distinguish the two cases described above.
  • control unit 40 allows the data operation unit 610 to prioritize the incompressible elements in either the first dimension direction 91 or the second dimension direction 92 in the second operation time period T2. It may be necessary to obtain information on whether to receive input sequentially in advance. Such information may be set in the control unit 40 or may be dynamically acquired by the control unit 40 according to time.
  • each of the generated compressed data groups may be stored in the memory 11 through the internal memory 30 , respectively.
  • 7E is a diagram illustrating a method of generating a compressed data group by compressing one uncompressed data group according to an embodiment of the present invention.
  • FIG. 7E shows an example in which the compression components of the first compressed data group CG101 are generated by giving priority to the first dimension direction 91 .
  • the kth uncompressed constituent element having the kth index counted by giving priority to the first dimension direction 91 must be used, but the first uncompressed data group ( Non-compressed elements of NCG101 (ex: indexes 1, 11, 21, 31, and 41) and compressed elements of the first compressed data group CG101 (ex: indexes 1, 11, 21, 31, and 41) may be used to generate the k-th generated compression component using at least some or all of.
  • Uncompressed elements ex: indexes 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 2, 12, 22, 32, 42, and 52).
  • 8A to 8B are provided to explain a data decoding method provided according to an embodiment of the present invention.
  • the input array (compression) 320 shown in FIG. 8A is to be input to the data operation unit 610 in the second operation time period T2, and includes a total of five compressed data groups (CG). Each of the compressed data groups CG may be sequentially provided from the memory 11 to the decoding unit 630 through the internal memory 30 .
  • the input array (compression) 320 shown in FIG. 8A may be the same as the output array (compression) 340 stored in the memory 11 of FIG. 7D .
  • the decoding unit 630 may restore the input array (uncompressed) 310 from the input array (compressed) 320 .
  • the non-compressed elements belonging to the restored input array (uncompressed) 310 give priority to the first dimension direction 91 rather than the second dimension direction 92, and the data operation unit 610. can be entered in
  • the first uncompressed data group NCG101 310 is restored from the first compressed data group CG101 320 .
  • the uncompressed elements belonging to the first uncompressed data group (NCG101) 310 may be sequentially input to the data operation unit 610 by giving priority to the first dimension direction.
  • the uncompressed elements constituting the first uncompressed data group (NCG101) 310 may be restored sequentially, rather than in an instant.
  • control unit 40 may cause the decoding unit 630 to sequentially restore the non-compressible elements by giving priority to the first dimension direction 91 .
  • FIG. 8C shows an example of restoring the uncompressed constituent elements of the first uncompressed data group NCG101 by giving priority to the first dimension direction 91 .
  • the uncompressed elements (ex: indexes 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 2, 12, 22, 32, 42, and 52) can be reconstructed based on only.
  • the p-th compression element generated by giving priority to the first dimension direction 91 in a specific compressed data group takes precedence over the first dimension direction 91 in the specific uncompressed data group.
  • the p-th compression element generated by giving priority to the first dimension direction 91 in a specific compressed data group takes precedence over the first dimension direction 91 in the specific uncompressed data group.
  • the p-th uncompressed constituent element restored by giving priority to the first dimension direction 91 in a specific uncompressed data group is in the first dimension direction 91 in the specific compressed data group. Based on only the uncompressed constituent elements of the specific uncompressed data group that have already been restored up to the point in time of restoring the p-th compressed constituent element and the p-th restored uncompressed constituent element having the p-th index counted with priority. can be restored.
  • Embodiment 1 described above explains the basic idea of the present invention.
  • Example 2 to be described later relates to a preferred embodiment of the present invention.
  • 9A to 9D are provided to explain a data compression method provided according to another embodiment of the present invention.
  • the output array (uncompressed) 330 shown in FIG. 9A is the same as the output array (uncompressed) 330 shown in FIG. 7A.
  • the output array (uncompressed) 330 output from the data operation unit in the first operation time period T1 may be the same as the input array 310 to be input to the data operation unit in the second operation time period T2.
  • the uncompressed elements are sequentially output from the data operator in the order of indices 1, 2, 3, ..., and only when the index 42 is output, the first uncompressed data group NCG101 will be completely prepared. can Except for these points, the contents shown in FIG. 9B are the same as those of FIG. 7C.
  • FIG. 9C The contents presented in FIG. 9C are the same as those of FIG. 7D except that the sizes of the uncompressed data group and the compressed data group are different from those shown in FIG. 7D.
  • FIG. 9D The contents presented in FIG. 9D are the same as those of FIG. 7E except that there is a difference in the sizes of the uncompressed data group and the compressed data group.
  • FIGS. 9A to 9D and related contents may be clearly understood from the above-described contents related to FIGS. 7A to 7E .
  • 10A to 10D are provided to explain a data decoding method provided according to another embodiment of the present invention.
  • the input array (compression) 320 shown in FIG. 10A is to be input to the data operation unit 610 in the second operation time period T2, and includes a total of 10 compressed data groups CG.
  • the compressed data groups CG may be sequentially provided from the memory 11 to the decoding unit 630 through the internal memory 30 .
  • the input array (compression) 320 shown in FIG. 10A may be the same as the output array (compression) 340 stored in the memory 11 in FIG. 9C .
  • the decoding unit 630 may restore the input array (uncompressed) 310 from the input array (compressed) 320 .
  • the non-compressed elements belonging to the restored input array (uncompressed) 310 give priority to the first dimension direction 91 rather than the second dimension direction 92, and the data operation unit 610.
  • the compressed data groups CG may be obtained from the memory 11 in the order of CG101, CG106, CG102, CG107, CG103, CG108, CG104, CG109, CG105, and CG110. That is, the compressed data groups CG may also be obtained from the memory 11 by giving priority to the first dimension direction 91 over the second dimension direction 92 .
  • the uncompressed elements belonging to the first uncompressed data group (NCG101) and the sixth uncompressed data group (NCG106) are sequentially input to the data operation unit 610 by giving priority to the first dimension direction.
  • the uncompressed elements constituting the first uncompressed data group NCG101 or the sixth uncompressed data group NCG106 are not restored at once, but may be restored sequentially.
  • control unit 40 causes the decoding unit 630 to sequentially restore the uncompressed constituent elements belonging to the uncompressed data group by giving priority to the first dimension direction 91. can do.
  • 10c and 10d show that the uncompressed constituent elements of the first uncompressed data group NCG101 are restored by giving priority to the first dimension direction 91, and the uncompressed constituent elements of the sixth uncompressed data group NCG106 are restored.
  • An example of restoration by giving priority to the first dimension direction 91 is shown.
  • FIGS. 10C and 10D The principle of restoring the non-compressible components in FIGS. 10C and 10D is the same as the principle described in FIG. 8C .
  • Another embodiment of the present invention has the following characteristics.
  • the input array (compression) to be input to the data operation unit in the second operation time interval has at least a first dimension and a second dimension.
  • the input array (compression) includes a plurality of compressed data groups (CG) along the first dimension direction.
  • the input array (compression) may include a plurality of compressed data groups (CG) along the second dimension direction.
  • uncompressed elements belonging to the input array (uncompressed) restored from the input array (compressed) are sequentially input to the data operation unit by giving priority to the first dimension direction, and the compressed data groups (CG) ) may also be sequentially acquired from the memory 11 by giving priority to the first dimension direction.
  • each of the uncompressed data groups restored from the respective compressed data groups has a plurality of uncompressed elements along the first dimension direction, and a plurality of uncompressed elements along the second dimension direction.
  • uncompressed elements belonging to a plurality of uncompressed data groups arranged along the first dimension direction are decoded.
  • the uncompressed constituent elements of a row arranged over a plurality of uncompressed data groups along the first dimension direction are sequentially decoded along the first dimension direction.
  • the first dimension direction 91 is given priority to the p-th
  • the restored non-compressible constituent element restores a p-th compression constituent element having a p-th index, which is counted with priority in the first dimension direction 91 in the specific compressed data group, and the p-th restored non-compression constituent element. It can be restored based on only the uncompressed elements of the specific uncompressed data group that have already been restored up to the point in time.
  • the p-th compression component generated by giving priority to the first dimension direction 91 in the specific compressed data group is counted by giving priority to the first dimension direction 91 in the specific uncompressed data group
  • the p-th uncompressed element having the p-th index and the p-th generated compressed element may be generated based on only the uncompressed elements of the specific uncompressed data group that have already been used up to the point in time.
  • the data operation unit in the first operation time period, gives priority to the first dimension direction to output the output array, and in the second operation time period, the data operation unit gives priority to the second dimension direction to the input array
  • An example of receiving input is shown. That is, an example in which an output priority at the time of data output is different from an input priority at the time of data input is shown.
  • Example 3 An example in which the output priority at the time of data output and the input priority at the time of data input are the same will be described in Example 3, which will be described later.
  • FIG. 11 shows an output array (uncompressed) 330 output by the data operation unit in the first operation time period according to an embodiment of the present invention.
  • the output array (uncompressed) 330 shown in FIG. 11 is the same as the output array (uncompressed) 330 shown in FIG. 9A. However, in FIG. 11 , it is different in that the data operation unit sequentially outputs the uncompressed elements of the output array (uncompressed) 330 by giving priority to the first dimension direction in the first operation time period.
  • the non-compression components are in the order of indices 1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 2, 12, 22, 32, 42, ... from the data operation unit. If outputted sequentially and up to index 42, the first uncompressed data group NCG101 may be completely prepared.
  • the control unit 40 controls the data operation unit 610 to calculate the incompressible elements in the first dimension direction 91 and the second dimension direction in the second operation time period T2.
  • Information on which direction of (92) is to be sequentially received by giving priority to the direction may be acquired in advance.
  • the data operation unit 610 sequentially receives the incompressible elements in the first dimension direction 91 with priority in the second operation time period T2 .
  • the compression unit 620 may perform compression with the same priority as the priority at which the data operation unit 610 receives data in the second operation time period T2. That is, in the example of FIG. 11 , since the data operation unit 610 sequentially receives the incompressible elements in the first dimension direction 91 in the second operation time period T2, the compression unit ( 620) also compresses data by giving priority to the first dimension direction 91. That is, as shown in FIGS. 7E, 10C, and 10D, the p-th compression element generated by giving priority to the first dimension direction in a given compressed data group (CG) is a given uncompressed data group (NCG).
  • CG compressed data group
  • NCG uncompressed data group
  • the given uncompressed data group (NCG) that has already been used up to the point in time when the p-th uncompressed component having the p-th index counted by giving priority to the first dimension direction and the p-th generated compressed component is generated. It can be created based only on the incompressible components of
  • a specific method for the compression unit 620 to compress data is a non-compression component of the output array (uncompressed) 330 provided to the compression unit 620 .
  • the order in which they are output is irrelevant.
  • the compression unit 620 compresses the output array (uncompressed) 330 output by the data operation unit 610 in the first operation time period T1, and the output array (compression) ) (340) can be created.
  • the compression may be performed for each data group constituting the output array (uncompressed) 330 . That is, the compression unit 620 may compress a specific uncompressed data group (NCG) of the output array (uncompressed) 330 to generate a compressed data group (CG).
  • NCG uncompressed data group
  • the decoding unit 630 may restore the input array (uncompressed) 310 by decoding the output array (compressed) 340 .
  • the decoding may be performed for each data group constituting the output array (compression) 340 . That is, the decoding unit 630 may decode a specific compressed data group CG of the output array (compressed) 340 to restore a corresponding uncompressed data group CG.
  • Uncompressed elements constituting the restored input array (uncompressed) 310 may be sequentially input to the data operation unit according to a predetermined priority.
  • a specific method for the compression unit 620 to compress data is determined according to the priority.
  • the data operation unit is set to receive input sequentially by giving priority to the uncompressed elements constituting the restored input array (uncompressed) 310 in the k-th dimension direction.
  • the compression unit 620 also compresses the given output array (uncompressed) 330 to generate the output array (compressed) 340, the compression component constituting the output array (compressed) 340 They may be sequentially generated by giving priority to the k-th dimension direction.
  • the compression unit 620 compresses a specific uncompressed data group (NCG) constituting the output array (uncompressed) 330 to form a compressed data group (CG) constituting the output array (compressed) 340 .
  • NCG specific uncompressed data group
  • CG compressed data group
  • the compressed elements constituting the compressed data group CG may be sequentially generated by giving priority to the k-th dimension direction.
  • the p-th compression component generated in the compressed data group (CG) has a p-th index which is counted by giving priority to the k-th dimension direction in the corresponding uncompressed data group (NCG). It is created based on only the uncompressed constituent elements of the uncompressed data group (NCG) that have already been used up to the point in time of generating the p-th uncompressed constituent element and the p-th generated compressed element.
  • the decoding unit 630 also decodes the given output array (compressed) 340 to generate the input array (uncompressed) 310, uncompressed constituting the input array (uncompressed) 310
  • the components may be sequentially generated by giving priority to the k-th dimension direction.
  • the decoding unit 630 decodes a specific compressed data group (CG) constituting the output array (compressed) 340 to form an uncompressed data group (NCG) constituting the input array (uncompressed) 310 .
  • CG specific compressed data group
  • NCG uncompressed data group
  • the uncompressed elements constituting the uncompressed data group NCG may be sequentially generated by giving priority to the k-th dimension direction.
  • the k-th dimension direction is given priority
  • the p-th uncompressed component is restored by giving priority to the k-th dimension direction in the compressed data group (CG).
  • Restoration based on only the uncompressed constituent elements of the uncompressed data group (NCG) that have already been restored up to the point in time when the counted p-th compressive element having the p-th index and the p-th uncompressed constituent element are restored can be
  • the first data operation unit for outputting the output array in the first operation time period may be different from the second data operation unit for receiving the input array in the second operation time period. That is, the first data operation unit and the second data operation unit may be hardware functional modules implemented at different positions on the wafer on which the hardware accelerator 110 shown in FIG. 6A is implemented.
  • the hardware accelerator 110 provided according to an embodiment of the present invention compresses and compresses the data operation unit 610 and the output array 330 output by the data operation unit 610 in the first operation time period T1.
  • the compression unit 620 for generating the output array 340, and restores the input array 310 from the compressed output array 340, the restored input array 310 a second operation time period (T2) ) may include a decoding unit 640 provided to the data operation unit 610 .
  • the output array 330 may be divided into a plurality of uncompressed data groups (NCG).
  • NCG uncompressed data groups
  • the compression unit 620 may be configured to compress each of the uncompressed data groups NCG to generate a compressed data group CG corresponding to the uncompressed data group NCG.
  • the compressed output array 340 may be composed of a plurality of the generated compressed data groups.
  • the compression unit 620 when generating the compressed data group CG by compressing each of the uncompressed data groups NCG, selects the components constituting the compressed data group CG as the predetermined It may be configured to be sequentially generated by giving priority to the dimension direction 91 .
  • the p-th component generated in the compressed data group CG has priority over the predetermined dimension direction 91 in the uncompressed data group NCG corresponding to the compressed data group CG.
  • a component having a p-th index counted by ranking and a component of the uncompressed data group (NCG) that have already been used up to a point in time when the p-th component is generated may be generated based on only the components.
  • the decoding unit 630 decodes the compressed output array 340 to restore the input array 310
  • the components constituting the input array 310 are arranged in the predetermined dimension direction ( 91) may be sequentially restored by giving priority.
  • the compressed output array 340 may be divided into a plurality of compressed data groups (CG).
  • the decoding unit 630 may be configured to decode each compressed data group CG to restore an uncompressed data group NCG corresponding to the compressed data group CG.
  • the restored input array 310 may be composed of a plurality of the restored uncompressed data groups (NCG).
  • the decoding unit 630 decodes each of the compressed data groups (CG) to reconstruct the uncompressed data group (NCG), the components constituting the uncompressed data group (NCG) in advance. Priority may be given to the determined dimension direction 91 to be sequentially restored.
  • the p-th component restored by giving priority to the predetermined dimension direction 91 has priority over the predetermined dimension direction 91 in the compressed data group (CG). Only the components having the p-th index counted by ranking and the components of the non-compressed data group (NCG) that have been restored up to the point in time at which the p-th component is restored may be restored based on only the components.
  • the output array 330 may be divided into a plurality of uncompressed data groups (NCG).
  • NCG uncompressed data groups
  • the compression unit 620 may be configured to compress each uncompressed data group NCG to generate a compressed data group CG corresponding to the uncompressed data group NCG.
  • the compressed output array 340 may be composed of a plurality of the generated compressed data groups (CG).
  • the generated plurality of compressed data groups CG may be provided to the memory.
  • the plurality of compressed data groups CG may be sequentially read from the memory.
  • the output array 330 may include a plurality of uncompressed data groups (NCG) along the predetermined dimension direction 91 .
  • the compressed output array 340 may include a plurality of compressed data groups (NCG) along the predetermined dimension direction (91).
  • the predetermined dimension direction 91 is given priority.
  • a plurality of the compressed data groups CG may be read.
  • the fourth point (ex: 114) of the other dimension direction 92 among the elements of the input array 310 Before decoding any component (eg, index 2, 12, 22, 32, 42, 52, 62, 72, 82, or 92) disposed along the predetermined dimension direction 91 in the All elements (ex: indexes 1, 11, 21, 31, 41, 51, 61, 71, 81, and 91).
  • the first uncompressed data group NCG101 among the uncompressed data groups NCG101 and NCG106 of the first column After decoding the first set of elements (ex: indexes 1, 11, 21, 31, 41), which are some of the elements belonging to The second set of elements (ex: indexes 51, 61, 71, 81, 91), which are some of the elements belonging to the 2 uncompressed data group (NCG106), are decoded, and the elements of the second set are decoded. Thereafter, the third set of elements (eg, indices 2, 12, 22, 32, 42) that are some other elements belonging to the first uncompressed data group NCG101 may be decoded.
  • the third set of elements eg, indices 2, 12, 22, 32, 42
  • the computing device 1 including the above-described hardware accelerator 110 and memory (eg: DRAM, 11, 13), a bus 700, and other hardware 99 is provided.
  • the hardware accelerator 110 and memory eg: DRAM, 11, 13
  • a bus 700 e.g. DRAM, 11, 13
  • other hardware 99 e.g. DRAM, 11, 13
  • FIG. 12 is a conceptual diagram illustrating the form of an input array or an output array of a neural network accelerator provided according to an embodiment of the present invention.
  • the output array and the input array are presented as having a two-dimensional array form, but the output array or the input array may have a three-dimensional array form as shown in FIG. 12 . In addition, it may have an array shape having dimensions of four or more dimensions (not shown). However, it can be understood that the present invention can be applied even when the output array or the input array is three-dimensional or more multidimensional.
  • the present invention is a complex among the next-generation intelligent semiconductor technology development (design)-artificial intelligence processor business, a research project supported by the Ministry of Science and ICT and the Information and Communication Planning and Evaluation Institute affiliated with the National Research Foundation of OpenEdge Technology Co., Ltd. (the task execution organization). It was developed in the course of carrying out the research task of developing a sensory-based situational prediction type mobile artificial intelligence processor (task unique number 2020-0-01310, task number 2020-0-01310, research period 2020.04.01 ⁇ 2024.12.31).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un accélérateur matériel comprenant : une unité de compression pour générer un réseau de sortie compressé par compression d'un réseau de sortie délivré depuis une unité de calcul de données ; et une unité de décodage pour reconstruire un réseau d'entrée à partir du réseau de sortie compressé et fournir le réseau d'entrée reconstruit à l'unité de calcul de données. Des éléments constitutifs du réseau d'entrée sont entrés séquentiellement dans l'unité de calcul de données par attribution d'une priorité à une direction de dimension prédéterminée. En outre, l'unité de compression génère séquentiellement des éléments constitutifs, qui forment le réseau de sortie compressé, par attribution d'une priorité à la direction de dimension prédéterminée.
PCT/KR2020/015477 2020-08-25 2020-11-06 Procédé de compression de données de sortie d'un accélérateur matériel, procédé de décodage de données entrées dans un accélérateur matériel et accélérateur matériel associé WO2022045448A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020200107051A KR102384587B1 (ko) 2020-08-25 2020-08-25 하드웨어 가속기의 출력 데이터를 압축하는 방법, 하드웨어 가속기로의 입력 데이터를 디코딩하는 방법, 및 이를 위한 하드웨어 가속기
KR10-2020-0107051 2020-08-25

Publications (1)

Publication Number Publication Date
WO2022045448A1 true WO2022045448A1 (fr) 2022-03-03

Family

ID=80353376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/015477 WO2022045448A1 (fr) 2020-08-25 2020-11-06 Procédé de compression de données de sortie d'un accélérateur matériel, procédé de décodage de données entrées dans un accélérateur matériel et accélérateur matériel associé

Country Status (2)

Country Link
KR (1) KR102384587B1 (fr)
WO (1) WO2022045448A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180048899A (ko) * 2015-09-03 2018-05-10 퀄컴 인코포레이티드 하드웨어-가속화된 저장 압축
KR20190019937A (ko) * 2016-06-16 2019-02-27 텍사스 인스트루먼츠 인코포레이티드 레이더 하드웨어 가속기
KR20190049593A (ko) * 2017-10-31 2019-05-09 난징 호리존 로보틱스 테크놀로지 코., 엘티디. 콘볼루션 신경망에서의 연산들을 수행하는 방법 및 장치
KR20200069477A (ko) * 2018-12-07 2020-06-17 서울대학교산학협력단 싱글포트 메모리를 포함하는 신경망 하드웨어 가속기 및 그 동작 방법
JP2020521195A (ja) * 2017-05-19 2020-07-16 グーグル エルエルシー ニューラルネットワーク処理のスケジューリング

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180048899A (ko) * 2015-09-03 2018-05-10 퀄컴 인코포레이티드 하드웨어-가속화된 저장 압축
KR20190019937A (ko) * 2016-06-16 2019-02-27 텍사스 인스트루먼츠 인코포레이티드 레이더 하드웨어 가속기
JP2020521195A (ja) * 2017-05-19 2020-07-16 グーグル エルエルシー ニューラルネットワーク処理のスケジューリング
KR20190049593A (ko) * 2017-10-31 2019-05-09 난징 호리존 로보틱스 테크놀로지 코., 엘티디. 콘볼루션 신경망에서의 연산들을 수행하는 방법 및 장치
KR20200069477A (ko) * 2018-12-07 2020-06-17 서울대학교산학협력단 싱글포트 메모리를 포함하는 신경망 하드웨어 가속기 및 그 동작 방법

Also Published As

Publication number Publication date
KR102384587B1 (ko) 2022-04-08
KR20220026251A (ko) 2022-03-04

Similar Documents

Publication Publication Date Title
WO2017222140A1 (fr) Procédés et dispositifs de codage et de décodage comprenant un filtre en boucle à base de cnn
WO2020242057A1 (fr) Appareil de décompression et procédé de commande de celui-ci
WO2013115431A1 (fr) Appareil et système de calcul de réseau neuronal et procédé associé
WO2012044076A2 (fr) Procédé et dispositif de codage vidéo et procédé et dispositif de décodage
WO2019143026A1 (fr) Procédé et dispositif de traitement d'image utilisant une compression de carte de caractéristiques
WO2020231049A1 (fr) Appareil de modèle de réseau neuronal et procédé de compression de modèle de réseau neuronal
WO2016032021A1 (fr) Appareil et procédé de reconnaissance de commandes vocales
WO2019143024A1 (fr) Procédé de super-résolution et dispositif utilisant un fonctionnement linéaire
WO2019143027A1 (fr) Procédé et dispositif de traitement en pipeline d'images
WO2018076453A1 (fr) Procédé d'affichage d'application associée, dispositif et terminal mobile
WO2022124607A1 (fr) Procédé d'estimation de profondeur, dispositif, équipement électronique et support de stockage lisible par ordinateur
WO2022045448A1 (fr) Procédé de compression de données de sortie d'un accélérateur matériel, procédé de décodage de données entrées dans un accélérateur matériel et accélérateur matériel associé
WO2021125496A1 (fr) Dispositif électronique et son procédé de commande
WO2011105879A2 (fr) Filtre numérique pouvant être reconfiguré en fréquence et égaliseur utilisant celui-ci
WO2024111747A1 (fr) Système et procédé de prévision immédiate d'averses de pluie par région sur la base d'une extension d'un réseau antagoniste génératif cyclique
WO2015119361A1 (fr) Système de service de diffusion en continu en nuage, procédé de fourniture de service de diffusion en continu en nuage, et dispositif associé
WO2023229094A1 (fr) Procédé et appareil pour la prédiction d'actions
WO2024048868A1 (fr) Procédé de calcul dans un réseau neuronal et dispositif associé
WO2020262825A1 (fr) Procédé de multiplication de matrice et dispositif basé sur un algorithme de winograd
WO2021172708A1 (fr) Procédé de traitement d'une commande de barrière de mémoire cache pour réseau de disques et dispositif associé
WO2017206882A1 (fr) Procédé et appareil de commande de capteur, support de stockage et dispositif électronique
WO2019143025A1 (fr) Procédé et dispositif de traitement d'images utilisant une entrée et une sortie de ligne
WO2024076165A1 (fr) Procédé de génération d'un ensemble d'instructions pour une opération de réseau de neurones artificiels et dispositif informatique associé
WO2022098056A1 (fr) Dispositif électronique servant à effectuer un calcul de convolution et procédé de fonctionnement associé
WO2022045449A1 (fr) Procédé pour stocker des données de sortie d'un accélérateur matériel dans une mémoire, procédé pour lire des données d'entrée d'un accélérateur matériel à partir d'une mémoire, et accélérateur matériel associé

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20951678

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20951678

Country of ref document: EP

Kind code of ref document: A1