WO2021196745A1 - 数据处理装置、集成电路和ai加速器 - Google Patents
数据处理装置、集成电路和ai加速器 Download PDFInfo
- Publication number
- WO2021196745A1 WO2021196745A1 PCT/CN2020/136960 CN2020136960W WO2021196745A1 WO 2021196745 A1 WO2021196745 A1 WO 2021196745A1 CN 2020136960 W CN2020136960 W CN 2020136960W WO 2021196745 A1 WO2021196745 A1 WO 2021196745A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- heap
- stack
- units
- unit
- Prior art date
Links
- 238000000034 method Methods 0.000 claims description 67
- 230000008569 process Effects 0.000 claims description 51
- 238000013500 data storage Methods 0.000 claims description 32
- 238000007781 pre-processing Methods 0.000 claims description 27
- 238000012216 screening Methods 0.000 claims description 21
- 238000013473 artificial intelligence Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 13
- 230000008901 benefit Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/22—Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
- G06F7/24—Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers sorting methods in general
Definitions
- the present disclosure relates to the field of data processing technology, and in particular to data processing devices, integrated circuits, and artificial intelligence (AI) accelerators.
- AI artificial intelligence
- Heapsort refers to a sorting method designed by using the data structure of the heap.
- the present disclosure provides data processing devices, integrated circuits, and AI accelerators.
- a data processing device comprising: a plurality of heap storage units, each heap storage unit is used to store data of a group of nodes of the heap, in the group of nodes At least some of the nodes in the same layer of the heap; and a plurality of heap adjustment units, each of which is used to access at least two heap storage units to store the input original data and the at least two heaps The data stored in the cell is sorted.
- an integrated circuit including the data processing device described in the first aspect.
- an AI accelerator is provided, and the AI accelerator includes the integrated circuit described in the second aspect.
- the embodiment of the present disclosure stores the data of each node in the heap in multiple heap storage units, the data in the multiple heap storage units can be read and written independently, and the previous data is sorted by the multiple heap adjustment units At the same time, the latter data can be put into the heap, so that it can be sorted at the same time during the heap building process, which improves the sorting efficiency.
- Figure 1A is a schematic diagram of a stack of some embodiments.
- Figure 1B is a schematic diagram of the heap sorting process of some embodiments.
- Fig. 2 is a schematic diagram of a data processing device according to an embodiment of the present disclosure.
- FIG. 3A and FIG. 3B are schematic diagrams of data storage manners according to embodiments of the present disclosure, respectively.
- Fig. 4 is a schematic diagram of a data processing device according to other embodiments of the present disclosure.
- 5A to 5F are schematic diagrams of data changes during the heap sorting process of an embodiment of the present disclosure.
- Fig. 6 is a schematic diagram of a data flow process of an embodiment of the present disclosure.
- first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
- first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
- word “if” as used herein can be interpreted as "when” or “when” or "in response to determination”.
- Heap sort is widely used to deal with sorting problems. Heap sorting refers to a sorting method designed by using the data structure of the heap. As shown in Figure 1A, the heap is an approximately complete binary tree structure, and when the heap is the smallest heap, the data corresponding to each node in the heap is always less than or equal to its child nodes; in the case where the heap is the largest heap Next, the data corresponding to each node in the heap is always greater than or equal to its child nodes.
- a complete storage unit can be used to store the entire heap, that is, the data corresponding to each node of the heap is stored in the same storage unit. Due to read-write conflicts, the data of only one node and its child nodes can be sorted at a time.
- Fig. 1B it is a schematic diagram including 5 nodes, where the data corresponding to these 5 nodes are all stored in the same storage unit, that is, mem in the figure.
- the data of 1 is exchanged to get the largest heap after sorting, as shown in the schematic diagram in the lower left corner.
- the data at the top of the heap (that is, the root node of the heap) is written out from the storage unit, and the remaining data repeats the above sorting process until the data corresponding to each node in the heap is written out from the storage unit. It can be seen that the sorting efficiency of this heap sorting method is low.
- the device may include multiple stack storage units 201 and multiple stack adjustment units 202.
- a plurality of heap storage units 201 each heap storage unit is used to store data of a group of nodes of the heap, and the group of nodes includes at least some of the nodes in the same layer of the heap.
- a plurality of heap adjustment units 202 each of which is used to access at least two heap storage units to sort the input raw data and the data stored in the at least two heap storage units.
- data enters the heap from the bottom of the heap, and then sorts from the top of the heap. Therefore, the process of building and sorting is performed independently, and parallel sorting cannot be performed during the process of entering the heap.
- the data of each node in the heap is stored in multiple heap storage units 201, and the data in the multiple heap storage units 201 can be read and written independently, and the previous data passes through the multiple heap adjustment units.
- 202 is sorting, the latter data can be put into the heap, so that sorting can be performed at the same time during the heap building process, which improves the sorting efficiency.
- the heap adjustment unit n for the last heap adjustment unit, the heap adjustment unit n shown in Figure 2, although two heap storage units are also connected, such as heap storage unit n and heap storage unit n+1, there is no adjustment The cell writes data to the heap memory cell n+1, and therefore, the heap adjustment unit n cannot actually read data from the heap memory cell n+1.
- the heap storage unit n+1 may be a virtual storage unit, or a storage unit similar to other heap storage units.
- FIG. 2 schematically illustrates the data flow direction of the heap adjustment unit accessing the heap storage unit during sorting
- the present disclosure does not limit the heap adjustment unit i to only write to the heap storage unit i.
- Data, and/or data can only be read from heap memory cell i+1.
- FIG. 3A is a schematic diagram of a stack including 4-layer nodes and the storage mode of data of each node in the stack.
- the i-th stack storage unit can be used to store data of all nodes located in the i-th layer of the stack.
- the first stack storage unit is used to store the data of the first layer node P11 of the stack
- the second stack The storage unit is used to store the data of the layer 2 nodes P21 and P22 of the heap, and so on.
- the embodiment shown in FIG. 3A is only one possible implementation of the present disclosure, and the present disclosure is not limited thereto.
- the data of all nodes in any node of the heap can also be stored in multiple heap storage units.
- the heap storage unit that stores the data of the node P31 and the data of the node P32 may be different from the heap storage unit that stores the data of the node P33 and the data of the node P34.
- each heap adjustment unit can access two stack storage units, where the two stack storage units are used to store data of some or all nodes in two adjacent layers of nodes in the stack.
- the stack adjustment unit 1 can access the first stack storage unit and the second stack storage unit
- the stack adjustment unit 2 can access the second stack storage unit and the third stack storage unit.
- Unit 3 can access the third stack of storage units and the fourth stack of storage units, and so on.
- each heap adjustment unit may also access more than two heap storage units to sort data in the two or more heap storage units, wherein the two or more heap storage units
- the data in the unit can be data of some or all of the nodes of two adjacent layers, or data of some or all of the nodes of three or more adjacent layers.
- each stack adjustment unit can also sort at least part of the data in any non-adjacent two or more nodes of the stack to meet sorting requirements in different application scenarios, which will not be repeated here.
- At least two stack adjustment units of the plurality of stack adjustment units may be sorted in parallel, thereby improving data processing efficiency.
- the multiple stack adjustment units may also serially sort the data in the multiple stack storage units.
- the heap storage units accessed by at least two heap adjustment units that are sorted in parallel are different from each other.
- the heap storage unit accessed by the heap adjustment unit 2 includes a second heap storage unit and a third heap storage unit
- the heap storage unit accessed by the heap adjustment unit 3 includes a third heap storage unit and The fourth stack storage unit, since the stack storage units accessed by the stack adjustment unit 2 and the stack adjustment unit 3 both include the third stack storage unit, the stack adjustment unit 2 and the stack adjustment unit 3 are not sorted in parallel.
- the stack storage unit accessed by the stack adjustment unit 1 includes the first stack storage unit and the second stack storage unit
- the stack storage unit accessed by the stack adjustment unit 3 includes the third stack storage unit and the fourth stack storage unit
- the stack adjustment unit 1 The heap storage units accessed by the heap adjustment unit 3 are different, that is, the heap storage units accessed by the two heap adjustment units do not include the same heap storage unit. Therefore, the stack adjustment unit 1 and the stack adjustment unit 3 can be sorted in parallel.
- two heap storage units respectively accessed by two adjacent heap adjustment units in the plurality of heap adjustment units include an identical heap storage unit.
- the heap storage unit accessed by the heap adjustment unit 1 includes the first heap storage unit and the second heap storage unit
- the heap storage unit accessed by the heap adjustment unit 2 includes the second heap storage unit and the third heap storage unit, so analogy.
- the stack adjustment unit 1 accesses the second stack storage unit
- the stack adjustment unit 2 can access the third stack storage unit
- the stack adjustment unit 1 can access the first stack storage unit, thus Avoid data read and write conflicts.
- the stack storage unit accessed by the stack adjustment unit 1 includes the first stack storage unit to the third stack storage unit
- the stack storage unit accessed by the stack adjustment unit 2 includes the third stack storage unit to the fifth stack storage unit, to And so on.
- the stack adjustment unit 1 accesses the third stack of storage units
- the stack adjustment unit 2 can access the fourth or fifth stack of storage units.
- At least one stack adjustment unit is spaced between any two stack adjustment units that are sorted in parallel. For example, there is an interval between the stack adjustment unit 1 that accesses the first and second stacks of storage units and the stack adjustment unit 3 that accesses the third and fourth stacks of storage units. If the stack adjustment unit 2 of the stack storage unit, the stack adjustment unit 1 and the stack adjustment unit 3 can be sorted in parallel.
- each time one piece of data is put into the pile the piled data and the data stored in the plurality of pile storage units can be sorted by the plurality of pile adjustment units.
- each of the plurality of heap adjustment units can acquire data, and combine the acquired data with data in at least one of the at least two heap storage units accessed. Sort.
- each heap storage unit stores data of all nodes in one layer of the heap
- the heap storage unit accessed by each heap adjustment unit is used to store data of two adjacent layers of the heap as an example.
- the scheme of the embodiment will be described. Suppose that the heap adjustment unit i is used to access the i-th pile of storage units and the (i+1)th pile of storage units, and i is a positive integer.
- the sorting method in other cases is similar to the above case, and will not be repeated here.
- ceil log 2 k stack adjustment units are used to form a stack adjustment pipeline.
- ceil represents the round-up operation
- k is the total number of ordered data that needs to be obtained, that is, k in the aforementioned top k sorting problem.
- the original data d1 is first input to the heap adjustment unit 1, and the heap adjustment unit 1 sorts the original data d1 with the previously stored data of at least one of the first and second heap storage units. And output data d1' to the stack adjustment unit 2 according to the sorting result, where d1' can be the original data d1, or it can be a piece of data in the second stack storage unit.
- the data d1' is input to the heap adjustment unit 2 as the original data, and the heap adjustment unit 2 sorts the data d1' with the data of at least one of the second and third heap storage units, and according to The sorting result outputs data d1" to the heap adjustment unit 3, and so on.
- the heap adjustment unit 1 When the heap is the smallest heap and the data in the heap is full, the heap adjustment unit 1 first compares the original data d1 with the data of the two child nodes of the root node P11, and compares the smallest data (assumed to be the left child of the root node). The data of the node P21) is written into the heap storage unit corresponding to the root node. Then, the original data d1 is used as the original data of the heap adjustment unit 2. The heap adjustment unit 2 compares the original data d1 with the data of the two child nodes of the node P21, and compares the smallest data (assumed to be the left child node P31 of the node P21). The data) is written into the heap storage unit corresponding to node P21, and so on.
- the heap adjustment unit 1 first compares the original data d1 with the data of the two child nodes of the root node P11, if d1 is smaller than the two child nodes of the root node P11 D1 can be further compared with the data of the root node P11. If d1 is less than or equal to the data of the root node P11, d1 is directly discarded. If d1 is greater than the data of the root node P11, d1 is stored in the first stack storage unit, and subsequent stack adjustment units do not need to be started. In this case, the stack adjustment unit 1 can read the data of the first stack of storage units.
- the heap adjustment unit 1 When the heap is the largest heap and the data in the heap is full, the heap adjustment unit 1 first compares the original data d1 with the data of the two child nodes of the root node P11, and compares the largest data (assumed to be the left child of the root node). The data of the node P21) is written into the heap storage unit corresponding to the root node. Then, the original data d1 is used as the original data of the heap adjustment unit 2. The heap adjustment unit 2 compares the original data d1 with the data of the two child nodes of the node P21, and compares the largest data (assumed to be the left child node P31 of the node P21). The data) is written into the heap storage unit corresponding to node P21, and so on.
- the data of each child node of the same node of the heap is stored in the same address of the same heap storage unit.
- the data bit length is n
- the data of the left child node of the node can be stored in the low n bits of the corresponding storage address
- the data of the right child node of the same node can be stored in the high n bits of the corresponding storage address.
- the bit width of the heap storage unit is twice the data bit length.
- the data of the node P11 is stored in the heap storage unit mem1
- the data of the two child nodes of the node P11 are stored under the same address of the heap storage unit mem2
- the two child nodes of the node P21 are stored in the same address of the heap storage unit mem3 (such as the first row of mem3)
- the data of the two child nodes of node P22 are stored in the heap storage unit mem3. Under another address (such as the second line of mem3).
- the data of each child node of the same node can be read from the same storage address of the same storage unit in one clock cycle, thereby reducing the number of data reads , Improve data processing efficiency.
- the device may further include: a pre-processing unit configured to perform pre-screening processing on the raw data obtained from the data storage device.
- the pre-screened data is input to the subsequent heap adjustment unit.
- the pre-screening process refers to filtering out the data that does not need to enter the heap from the original data. Through the pre-screening process, the number of times that data enters the heap can be reduced, thereby improving the efficiency of data processing. The greater the amount of input data, the more obvious the benefits of the pre-screening process, especially for the aforementioned top k data sorting scenarios, the greater the benefits.
- the data storage device may be a memory located outside the device provided by the present disclosure, and the external memory is connected to the data processing device of the present disclosure.
- the present disclosure does not limit the type of external memory.
- it can be volatile memory, such as RAM (Random Access Memory), SDRAM (Synchronous Dynamic RAM), DDR (Double Data Rate) SDRAM, etc., or it can be non-volatile memory.
- RAM Random Access Memory
- SDRAM Serial Dynamic RAM
- DDR Double Data Rate SDRAM
- non-volatile memory such as hard drives, mobile hard drives, disks, and so on.
- the pre-processing unit may perform a pre-screening process on the newly acquired original data when the data stored in the heap storage unit reaches a preset amount.
- the preprocessing unit may directly output the original data to the plurality of pile adjustment units.
- the preset number may be equal to the total number of data that can be stored in the heap storage unit, that is, only when multiple heap storage units are full, the newly acquired original data is pre-screened.
- the number of activated heap storage units may be determined according to the quantity of raw data, and only when the activated heap storage units are full, the newly acquired raw data is pre-screened.
- the amount of raw data is less than the total number of data that can be stored in all heap storage units, only some of the heap storage units are enabled, so that the total number of data that can be stored by the enabled heap storage units is equal to the number of original data.
- all heap storage units can be activated.
- the pre-processing unit may perform pre-screening processing on the raw data by comparing the acquired raw data with the data of the root node of the pile, so as to pre-determine whether the raw data needs to be put into the pile.
- the data of the root node of the heap is less than or equal to the data of any other node.
- the original data must also be smaller than the data of any other node of the heap, so that the original data does not need to be sorted by the heap adjustment unit. Only when a certain original data is greater than the data of the root node of the heap, the original data needs to be sorted by the heap adjustment unit. Therefore, in the case where the acquired original data is less than or equal to the data of the root node of the heap, it is determined that the original data does not need to be included in the heap; otherwise, it is determined that the original data needs to be included in the heap.
- the heap is the largest heap
- the acquired original data is greater than or equal to the data of the root node of the heap, it is determined that the original data does not need to enter the heap; otherwise, it is determined that the original data needs to enter the heap. heap.
- the use of the smallest heap can effectively improve the data processing efficiency.
- using the largest heap can effectively improve the data processing efficiency.
- the number of the pre-processing units may be multiple, and multiple pre-processing units may be used to perform pre-screening processing on the acquired raw data in parallel.
- multiple pre-processing units may be used to perform pre-screening processing on the acquired raw data in parallel.
- the preprocessing unit may transmit the original data to the first buffer unit or the stack adjustment unit.
- the original data may be first transmitted to the first cache unit, and then the original data in the first cache unit are sequentially output to the plurality of pile adjustment units for sorting.
- the preprocessing unit may directly output the raw data that needs to be piled up to the plurality of pile adjustment units for sorting.
- the preprocessing unit may delete the original data.
- the preprocessing unit returns the raw data that does not need to be piled into the data storage device, and the pile adjustment unit returns the raw data squeezed out during the sorting process to the data storage device, thereby Eliminate the limitation of the heap storage unit on the amount of output ordered data, and improve the versatility of the data processing device.
- the pile adjustment unit returns the raw data squeezed out during the sorting process to the data storage device, thereby Eliminate the limitation of the heap storage unit on the amount of output ordered data, and improve the versatility of the data processing device.
- by deleting the data that does not need to enter the heap storage space can be saved.
- the original data returned to the data storage device can be easily reused in the subsequent processing.
- the plurality of heap adjustment units may re-sort the data returned to the data storage device when the data in the plurality of heap storage units are all sorted.
- the amount of ordered data output by a sort is limited by the capacity of the heap, for example, it may not be output due to the influence of the number of layers of the heap, the number of heap adjustment units, and the size of the heap storage unit. Sufficient amount of ordered data.
- the device provided by the embodiment of the present disclosure supports writing the unselected raw data (such as the raw data that has not entered the heap and the raw data that is squeezed out after entering the heap) back to the data storage device in the sorting process, so as to perform multiple sorting, thereby improving The versatility of the data processing device.
- a first round of sorting may be performed on the data that has entered the heap, and after the first round of sorting, the next round of sorting is performed on the unselected data in the first round of sorting. Further, in the second round of sorting, the same processing can be performed as in the first round of sorting, including pre-screening again. In this way, multiple rounds of sorting can be carried out until a certain stopping condition is met.
- the stopping condition may be that all the original data to be sorted are sorted.
- the stop condition may also be that the number of sorted data reaches the required number.
- a data processing device with a smaller heap capacity is used to sort a larger amount of original data, which avoids sorting failures caused by insufficient heap capacity, and improves the application range of the data processing device.
- the sorting process in the second round and after the second round is the same as the sorting process in the first round, and will not be repeated here.
- the original data when the capacity of the data storage device is limited, the original data can also be written into the data storage device in batches, and each batch of data written into the data storage device is pre-screened and sorted. Therefore, it is possible to sort a larger amount of data through a data storage device with a smaller capacity, and to avoid a sorting failure caused by insufficient capacity of the data storage device.
- the data processing device further includes a second cache unit configured to cache the original data obtained from the data storage device, and the second cache unit sends the cached original data to the plurality of Heap adjustment unit; the plurality of heap adjustment units are used to sort the original data obtained from the second cache unit and the data in the plurality of heap storage units.
- the second cache unit may obtain one or more original data from the data storage device each time, and cache the obtained original data.
- the first caching unit may obtain one or more original data from the preprocessing unit each time, and cache the obtained original data.
- the first buffer unit and the second buffer unit may be FIFO (First In First Out) buffer units.
- FIG. 4 it is a schematic diagram of data processing apparatuses according to other embodiments of the present disclosure.
- the data processing device includes n+1 heap storage units 201, n heap adjustment units 202, 1 first cache unit 203, and 4 preprocessing units 204.
- each heap storage unit is used to store data of a layer node of the heap
- the heap adjustment unit i is used to access the i-th heap storage unit and the (i+1)th heap storage unit.
- the data path is as follows under the top k data sorting task.
- the original data is in parallel (assuming the parallelism is 4).
- the original data directly enters the first cache unit 203; for example, the data in the heap
- each preprocessing unit compares the input raw data with the data at the top of the current heap (that is, the root node of the heap). When the heap is the smallest heap, it will be larger than the original data at the top of the heap.
- the data is output to the first buffer unit 203, and the data less than or equal to the top of the heap is written back to an external data storage device (not shown in the figure) through the first output terminal to perform multiple sorting.
- the heap adjustment unit 1 fetches the number from the first buffer unit 203. Multiple heap adjustment units perform parallel heap adjustment to adjust the data in the heap to the smallest heap. The data squeezed out of the heap can be written back to the data storage device through the second output terminal for multiple sorting. Repeat the above process until all The raw data is put into the heap to complete.
- the commands executed by the device in this example are as follows.
- each heap storage unit may include a flag bit, which is used to indicate whether the data at a corresponding position in the heap storage unit is valid.
- the heap storage unit mem1 includes the flag bits of the data of the node P11, as shown by the black squares of flg1 in the figure
- the heap storage unit mem2 includes the flag bits of the data of the node P21 and the node P22, as shown in the figure.
- the flags of flg2 and P21 are represented by black squares
- the flags of P22 are represented by gray squares, and so on.
- the storage unit can include N flag bits.
- the data in the heap storage unit is valid, which means that the data is data that needs to be sorted; the data in the heap storage unit is invalid, which means that the data is not data that needs to be sorted.
- the flag bit is the first value.
- the flag bit is the second value. For example, the first value may be "1" and the second value may be "0".
- a common heap sorting method initializes the data in each heap storage unit, and as the depth of the heap increases, the initialization time also increases.
- the embodiment of the present disclosure uses flag bits to initialize each flag bit in the heap storage unit before writing data to the heap storage unit, so there is no need to initialize the data. Since the bit length of the flag bit is smaller than the bit length of the original data (for example, the flag bit can be 1 bit), in some cases, only 1 clock cycle can be used to initialize the flag bits of all heap storage units.
- the time for initializing processing is less than the time for initializing data in the heap storage unit, thereby improving the efficiency of data processing.
- the flag bit of the written valid data can be updated, that is, the flag bit is set from invalid to valid, so that the heap storage unit can be determined according to the flag bit of the data Whether the data in is valid data.
- each of the plurality of stack adjustment units is further used for: when the flag bit of the first stack storage unit indicates that the data at the corresponding position is valid data, the input The original data of the pile adjustment unit and the valid data are sorted; in the case where the flag bit in the first pile of storage units indicates that the data at the corresponding position includes any invalid data, the original data input to the pile adjustment unit Data is written to the corresponding location of invalid data.
- the first pile storage unit is the pile storage unit closer to the root node among the at least two pile storage units accessed by the pile adjustment unit.
- the data input to the heap adjustment unit is written to the left in the order of left and right. The location corresponding to the invalid data.
- the data stacking process of the embodiment of the present disclosure is similar to the data stacking process, in that one piece of data is input to the multiple stack adjustment units, and then the multiple stack adjustment units compare the input data with the stack storage unit. Sort the stored data in.
- each stack adjustment unit of the plurality of stack adjustment units can access at least two stack storage units, and store the specified data in the at least two stack storage units.
- the data stored in the at least two heap storage units are sorted out of the heap.
- the heap out process is similar to the sort process and is also executed in parallel.
- one designated data can be input to the multiple stack storage units each time.
- the value of the specified data can be greater than each data stored in the multiple stack storage units,
- the specified data may be data with a value of + ⁇ .
- the so-called + ⁇ data can be the maximum value in the data format of the original data. For example, for a 16-bit floating point number, 7c00 16 can represent + ⁇ .
- the value of the designated data may be smaller than each data stored in the multiple heap storage units, for example, the designated data may be data with a value of - ⁇ .
- the so-called - ⁇ data can be the minimum value in the data format of the original data. For example, for a 16-bit floating point number, fc00 16 can represent - ⁇ .
- the above-mentioned initialization, stacking, and stacking processes can be controlled by different instructions respectively.
- the entire sorting process is completed by one instruction, and after the parameters are fixed, the versatility of the data processing device is poor.
- a sorting is divided into three processes of initialization, entering the heap, and exiting the heap, corresponding to three kinds of instructions respectively.
- There can be multiple heaping instructions in a single sorting original data can be input multiple times). Eliminates the limitation on the quantity of original data by the data storage device, and enables the heap adjustment unit and the preprocessing unit to run in parallel, which is more flexible in use.
- the instructions in the process of initialization, stacking and stacking can be sent by an upper-level controller to the stack control unit in the data processing device, and are implemented under the control of the stack control unit.
- the device further includes: a heap control unit, configured to perform at least any one of the following operations: upon receiving an initialization instruction, control the multiple heap storage units to initialize in the same clock cycle In the case of receiving the stack instruction, read the original data from the data storage device, and transmit the read original data to the plurality of stack adjustment units, so that the plurality of stack adjustment units The original data and the data in the multiple heap storage units are sorted; and in the case of receiving the heap output instruction, the multiple heap adjustment units are controlled to remove the data in the multiple heap storage units from the heap in a specific order. Top output.
- a heap control unit configured to perform at least any one of the following operations: upon receiving an initialization instruction, control the multiple heap storage units to initialize in the same clock cycle In the case of receiving the stack instruction, read the original data from the data storage device, and transmit the read original data to the plurality of stack adjustment units, so that the plurality of stack adjustment units The original data and the data in the multiple heap storage units are sorted; and in the case of receiving the heap output instruction, the
- the heap control unit may send an initialization signal to the heap storage unit to initialize each flag bit in the heap storage unit.
- the stack control unit can read the original data from the data storage device, output the original data to the preprocessing unit, and the preprocessing unit determines whether the original data needs to be preprocessed. Screening processing. If necessary, the pre-processing unit directly deletes the original data that does not need to enter the heap or returns it to the data storage device, and outputs the data that needs to enter the heap to the first cache unit; if pre-screening processing is not required, directly The original data is output to the first buffer unit.
- the heap adjustment unit receives the original data in the first cache unit, and adjusts the data in the heap storage unit step by step according to the size of the original data until all the original data that needs to be sorted are processed.
- the stack control unit When the stack control unit receives the stack output instruction, it outputs the designated data to the stack adjustment unit, and the stack adjustment unit receives the specified data, adjusts the data in the stack storage unit step by step, and each specified data enters After the pile is piled, one piece of data (that is, the data of the root node of the pile) in the pile storage unit is squeezed out of the pile, and the pile control unit sequentially outputs the squeezed out data to the data output port of the data processing device.
- the stack control unit When the stack control unit receives the stack output instruction, it outputs the designated data to the stack adjustment unit, and the stack adjustment unit receives the specified data, adjusts the data in the stack storage unit step by step, and each specified data enters After the pile is piled, one piece of data (that is, the data of the root node of the pile) in the pile storage unit is squeezed out of the pile, and the pile control unit sequentially outputs the squeezed out data to the data output port of the data processing device.
- 5A to 5F are schematic diagrams of node data changes during the sorting process of the embodiments of the present disclosure.
- This embodiment takes the smallest heap as an example for description.
- the sorting process of the largest heap is similar to that of the smallest heap, and will not be repeated here.
- the depth of the heap is 6, that is, the heap includes 6 layers of nodes, the data of each node in each layer of nodes is stored in an independent heap storage unit, and the data of each child node of the same node is stored in the same heap storage unit.
- the heap storage unit corresponding to the i-th layer node is heap storage unit i
- the heap adjustment unit that accesses heap storage unit i and heap storage unit i+1 is heap adjustment unit i
- each node of the i-th layer is denoted as Pij , 1 ⁇ j ⁇ 2 i-1 , i is a positive integer.
- the heap at the initial time t0 is as shown in Figure 5A.
- the original data "70” enters the heap
- the data "8" of node P11 is squeezed out of the heap storage unit 1
- the heap adjustment unit 1 reads the data of node P21 and node P22 from the heap storage unit 2.
- the heap adjustment unit 1 writes the data of the node P21 into the heap storage unit 1 corresponding to the node P11, and outputs the original data "70" to the heap adjustment unit 2, as shown in FIG. 5B.
- the heap adjustment unit 2 reads the data of the node P31 from the heap storage unit 3 and compares the data of the node P32, and the heap adjustment unit 2 writes the data of the node P31 into the heap storage unit 2 corresponding to the node P21, and the original The data "70" is output to the stack adjustment unit 3 as shown in FIG. 5C.
- the heap adjustment unit 3 reads the data of the node P41 and the data of the node P42 from the heap storage unit 4; at the same time, the original data "75" enters the heap, and the data "12" of the node P11 is sent from the heap storage unit 1.
- the stack adjustment unit 1 reads the data of the node P21 and the data of the node P22 from the stack storage unit 2, as shown in FIG. 5D.
- the heap adjustment unit 3 writes the data of the node P41 into the heap storage unit 3 corresponding to the node P31, and outputs the original data "70" to the heap adjustment unit 4, and the heap adjustment unit 4 reads the node from the heap storage unit 5.
- the heap adjustment unit 1 writes the data of the node P22 into the heap storage unit 1 corresponding to the node P11, and outputs the original data "75" to the heap adjustment unit 2
- the heap adjustment unit 2 reads the data from the heap
- the storage unit 3 reads the data of the node P31 and the data of the node P32
- the heap adjustment unit 4 writes the data of the node P51 into the heap storage unit 3 corresponding to the node P41, and outputs the original data "70" to the heap adjustment unit 5, such as Shown in Figure 5E.
- the heap adjustment unit 5 reads the data of the node P61 and the data of the node P62 from the heap storage unit 6.
- the heap adjustment unit 2 writes the data of the node P34 into the heap storage unit 2 corresponding to the node P22, and the original The data "75" is output to the heap adjustment unit 3, and the heap adjustment unit 3 reads the data of the node P47 and the data of the node P48 from the heap storage unit 4; at the same time, the original data "80" enters the heap, as shown in FIG. 5F.
- the starting moments of t1 and t2 are separated by at least two cycles, and the starting moments of t2 and t3 are separated by at least two cycles.
- the parallel heap sorting method of the embodiment of the present disclosure can shorten the sorting time to about 1/3 of the original. The greater the depth of the heap, the greater the number of heap adjustment units working at the same time, that is, the higher the degree of parallelism, the more time can be shortened.
- Figure 6 is a schematic diagram of the data flow process when the depth of the stack is 8.
- d1, d2, etc. represent input raw data
- t1, t2, etc. represent time
- adj1, adj2, etc. represent stack adjustment units.
- the embodiment of the present disclosure merges the two processes of stack building and stack adjustment into a unified top-down stack adjustment process during the stack sorting process, and the data of two adjacent layers of the stack is performed by a stack adjustment unit. Adjustment, a plurality of stack adjustment units form an array, and the input data stream passes through each stack adjustment unit. At different times, multiple stack adjustment units can be executed in parallel. And starting from t6, the maximum degree of parallelism is reached, that is, 4.
- the stack adjustment unit 1, the stack adjustment unit 3, the stack adjustment unit 5, and the stack adjustment unit 7 all work at the same time.
- the next-level heap adjustment unit may modify the data stored in the heap storage unit required by the upper-level heap adjustment unit, in order to avoid data read and write conflicts, it takes time for two adjacent original data to enter the heap.
- the interval level is one level, that is, when the m-th original data itself or the data replaced from the heap storage unit by the m-th original data is sorted by adj3, the m+1-th original data can be sorted by adj1.
- Each unit in the data processing device of the embodiment of the present disclosure may be based on FPGA (Field Programmable Gate Array), PLD (programmable logic device, programmable logic device), ASIC (Application Specific Integrated Circuit, application specific integrated circuit) ) Controller, microcontroller, microprocessor or other electronic components.
- FPGA Field Programmable Gate Array
- PLD programmable logic device, programmable logic device
- ASIC Application Specific Integrated Circuit, application specific integrated circuit
- the data processing device realizes parallel heap sorting and improves data processing efficiency.
- there is no need to initialize the data in the heap storage unit and only the flag bit needs to be initialized, which improves the initialization efficiency.
- pre-screening processing can be performed, which reduces the number of times that raw data enters the heap, and further improves the efficiency of data processing.
- multiple rounds of sorting can be performed to support multiple sorting of the original data in the data storage device, and also support to write the original data into the data storage device in batches and then sort the same batch of data in the heap storage unit.
- the sorting process is not limited by the size of the heap storage unit and the data storage device, and it has strong versatility.
- an embodiment of the present disclosure also provides an integrated circuit, which includes the data processing device described in any of the embodiments.
- the integrated circuit further includes: a controller, configured to send at least any one of the following instructions to the data processing device: an initialization instruction, used to instruct the plurality of heap storage units to initialize; a stack instruction , For instructing the plurality of heap adjustment units to obtain original data, and to sort the original data and the data stored in the plurality of heap storage units; and a heap instruction, for instructing the plurality of heap adjustments
- the unit outputs the data stored in the plurality of stack storage units in a specific order.
- the initialization instruction, the stacking instruction, and the stacking instruction may be different instructions.
- a sorting is divided into three processes of initialization, entering the heap, and exiting the heap, corresponding to three kinds of instructions respectively.
- There can be multiple heaping instructions in a single sorting original data can be input multiple times). Eliminates the limitation on the quantity of original data by the data storage device, and enables the heap adjustment unit and the preprocessing unit to run in parallel, which is more flexible in use.
- the instructions in the process of initialization, stacking and stacking can be sent by the controller of the integrated circuit to the stack control unit in the data processing device, and implemented under the control of the stack control unit .
- an embodiment of the present disclosure also provides an AI (Artificial Intelligence) accelerator, and the AI accelerator includes the integrated circuit described in any of the embodiments.
- AI Artificial Intelligence
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Radar Systems Or Details Thereof (AREA)
Abstract
Description
Claims (21)
- 一种用于数据处理的装置,其特征在于,所述装置包括:多个堆存储单元,每个堆存储单元用于存储堆的一组节点的数据,所述一组节点中包括所述堆的同一层节点中的至少部分节点;以及多个堆调整单元,每个堆调整单元用于访问至少两个堆存储单元,以对输入的原始数据与所述至少两个堆存储单元中存储的数据进行排序。
- 根据权利要求1所述的装置,其特征在于,每个堆调整单元访问的所述至少两个堆存储单元用于存储所述堆的相邻层节点的数据;和/或所述多个堆调整单元中每个堆调整单元用于获取所述输入的原始数据,并将所述获取到的原始数据和访问的所述至少两个堆存储单元中的至少一个堆存储单元中的数据进行排序。
- 根据权利要求1或2所述的装置,其特征在于,所述多个堆调整单元中两个相邻堆调整单元分别访问的两个堆存储单元中包括一个相同的堆存储单元;和/或所述多个堆调整单元中的至少两个堆调整单元并行进行排序,所述至少两个堆调整单元访问的堆存储单元互不相同。
- 根据权利要求1至3任意一项所述的装置,其特征在于,相邻两个数据的进堆时序之间间隔两个堆存储单元的处理周期。
- 根据权利要求1至4任意一项所述的装置,其特征在于,所述堆的同一节点的各个子节点的数据存储在同一堆存储单元的同一地址中。
- 根据权利要求1至5任意一项所述的装置,其特征在于,所述装置还包括:预处理单元,用于对从数据存储装置获取的原始数据进行预筛选处理,经过预筛选处理的数据被输入所述多个堆调整单元。
- 根据权利要求6所述的装置,其特征在于,所述预处理单元用于在所述堆存储单元中存储的数据达到预设数量的情况下,对新获取到的所述原始数据进行所述预筛选处理。
- 根据权利要求6或7所述的装置,其特征在于,所述预处理单元用于通过比较所述原始数据与所述堆的根节点的数据,对所述原始数据进行所述预筛选处理,以预先判定所述原始数据是否需要进堆。
- 根据权利要求6至8任意一项所述的装置,其特征在于,所述预处理单元的数 量为多个,多个所述预处理单元用于并行对获取到的所述原始数据进行所述预筛选处理。
- 根据权利要求6至9任意一项所述的装置,其特征在于,所述预处理单元用于:在判定所述原始数据需要进堆的情况下,将所述原始数据传输至缓存单元或所述多个堆调整单元;和在判定所述原始数据不需要进堆的情况下,将所述原始数据删除或返回所述数据存储装置。
- 根据权利要求10所述的装置,其特征在于,所述多个堆调整单元还用于:将在排序过程中被挤出的原始数据返回所述数据存储装置;在所述多个堆存储单元中的数据均排序完成的情况下,对返回所述数据存储装置的原始数据进行再次排序。
- 根据权利要求6至11任意一项所述的装置,其特征在于,所述装置还包括:第一缓存单元,用于对从所述预处理单元获取的经过所述预筛选处理的原始数据进行缓存;所述多个堆调整单元用于对从所述第一缓存单元获取的原始数据与所述多个堆存储单元中的数据进行排序。
- 根据权利要求1至5任意一项所述的装置,其特征在于,所述装置还包括:第二缓存单元,用于对从数据存储装置获取的原始数据进行缓存;所述多个堆调整单元用于对从所述第二缓存单元获取的原始数据与所述多个堆存储单元中的数据进行排序。
- 根据权利要求1至13任意一项所述的装置,其特征在于,所述堆存储单元均包括标志位,其中,所述标志位用于指示所述堆存储单元中的对应位置的数据是否有效。
- 根据权利要求14所述的装置,其特征在于,所述堆存储单元还用于:对所述堆存储单元中的各个标志位进行初始化处理;和/或在确定一个标志位的对应位置写入有效数据的情况下,对该标志位进行更新。
- 根据权利要求14或15所述的装置,其特征在于,所述多个堆调整单元中的每个堆调整单元还用于:在该堆调整单元访问的第一堆存储单元的标志位指示对应位置的数据均为有效数据的情况下,对输入到该堆调整单元的原始数据和所述有效数据进行排序,其中,所述第一堆存储单元为该堆调整单元所访问的所述至少两个堆存储单元中更靠近根节点的堆存储单元;和在所述第一堆存储单元的标志位指示对应位置的数据包括任一无效数据的情况下, 将输入到该堆调整单元的所述原始数据写入所述无效数据对应的位置。
- 根据权利要求1至16任意一项所述的装置,其特征在于,所述多个堆调整单元中的每个堆调整单元用于:读取至少两个堆存储单元中的至少一个堆存储单元中存储的数据;对输入该堆调整单元的原始数据与所述读取的数据进行排序;以及根据排序的要求,将所述排序结果中较大或较小的数据写入所述至少两个堆存储单元中另一个堆存储单元,其中,所述另一个堆存储单元与所述至少一个堆存储单元不是同一个堆存储单元。
- 根据权利要求1至17任意一项所述的装置,其特征在于,所述装置还包括:堆控制单元,用于执行以下至少任一操作:在接收到初始化指令的情况下,控制所述多个堆存储单元在同一个时钟周期内进行初始化;在接收到进堆指令的情况下,从数据存储装置读取原始数据,将读取到的所述原始数据传输至所述多个堆调整单元,以使所述多个堆调整单元对所述原始数据和多个堆存储单元中的数据进行排序;以及在接收到出堆指令的情况下,控制所述多个堆调整单元按照特定顺序将所述多个堆存储单元中的数据从堆顶输出。
- 一种集成电路,其特征在于,所述集成电路包括权利要求1至18任意一项所述的数据处理装置。
- 根据权利要求19所述的集成电路,其特征在于,所述集成电路还包括控制器,用于向所述数据处理装置发送以下至少任一指令:初始化指令,用于指示所述多个堆存储单元进行初始化;进堆指令,用于指示所述多个堆调整单元获取原始数据,并对所述原始数据和所述多个堆存储单元中存储的数据进行排序;以及出堆指令,用于指示所述多个堆调整单元按照特定顺序将所述多个堆存储单元中存储的数据输出。
- 一种人工智能AI加速器,其特征在于,所述AI加速器包括权利要求19或20所述的集成电路。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021557465A JP2022531075A (ja) | 2020-03-31 | 2020-12-16 | データ処理 |
KR1020217031349A KR20210129715A (ko) | 2020-03-31 | 2020-12-16 | 데이터 처리 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010244150.2 | 2020-03-31 | ||
CN202010244150.2A CN113467702A (zh) | 2020-03-31 | 2020-03-31 | 数据处理装置、集成电路和ai加速器 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021196745A1 true WO2021196745A1 (zh) | 2021-10-07 |
Family
ID=77865417
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/136960 WO2021196745A1 (zh) | 2020-03-31 | 2020-12-16 | 数据处理装置、集成电路和ai加速器 |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP2022531075A (zh) |
KR (1) | KR20210129715A (zh) |
CN (1) | CN113467702A (zh) |
TW (1) | TWI773051B (zh) |
WO (1) | WO2021196745A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115932532B (zh) * | 2023-03-09 | 2023-07-25 | 长鑫存储技术有限公司 | 故障存储单元的物理地址的存储方法、装置、设备及介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060004897A1 (en) * | 2000-11-28 | 2006-01-05 | Paul Nadj | Data structure and method for sorting using heap-supernodes |
US20060095444A1 (en) * | 2000-11-28 | 2006-05-04 | Paul Nadj | Data structure and method for pipeline heap-sorting |
US20140181126A1 (en) * | 2001-08-16 | 2014-06-26 | Altera Corporation | System and Method for Scheduling and Arbitrating Events in Computing and Networking |
CN107402741A (zh) * | 2017-08-04 | 2017-11-28 | 电子科技大学 | 一种适宜于fpga实现的排序方法 |
CN108319454A (zh) * | 2018-03-27 | 2018-07-24 | 武汉中元华电电力设备有限公司 | 一种基于硬件fpga快速实现最优二叉树的方法 |
CN109375989A (zh) * | 2018-09-10 | 2019-02-22 | 中山大学 | 一种并行后缀排序方法及系统 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6116327A (ja) * | 1984-07-03 | 1986-01-24 | Agency Of Ind Science & Technol | デ−タ処理装置 |
JPS6154536A (ja) * | 1984-08-24 | 1986-03-18 | Hitachi Ltd | デ−タ整順化回路 |
JPS61150055A (ja) * | 1984-12-25 | 1986-07-08 | Panafacom Ltd | Dmaデ−タ転送方式 |
US10268410B2 (en) * | 2014-10-20 | 2019-04-23 | Netapp, Inc. | Efficient modification of storage system metadata |
US10761979B2 (en) * | 2016-07-01 | 2020-09-01 | Intel Corporation | Bit check processors, methods, systems, and instructions to check a bit with an indicated check bit value |
CN110825440B (zh) * | 2018-08-10 | 2023-04-14 | 昆仑芯(北京)科技有限公司 | 指令执行方法和装置 |
-
2020
- 2020-03-31 CN CN202010244150.2A patent/CN113467702A/zh active Pending
- 2020-12-16 WO PCT/CN2020/136960 patent/WO2021196745A1/zh active Application Filing
- 2020-12-16 JP JP2021557465A patent/JP2022531075A/ja active Pending
- 2020-12-16 KR KR1020217031349A patent/KR20210129715A/ko not_active Application Discontinuation
- 2020-12-24 TW TW109146076A patent/TWI773051B/zh active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060004897A1 (en) * | 2000-11-28 | 2006-01-05 | Paul Nadj | Data structure and method for sorting using heap-supernodes |
US20060095444A1 (en) * | 2000-11-28 | 2006-05-04 | Paul Nadj | Data structure and method for pipeline heap-sorting |
US20140181126A1 (en) * | 2001-08-16 | 2014-06-26 | Altera Corporation | System and Method for Scheduling and Arbitrating Events in Computing and Networking |
CN107402741A (zh) * | 2017-08-04 | 2017-11-28 | 电子科技大学 | 一种适宜于fpga实现的排序方法 |
CN108319454A (zh) * | 2018-03-27 | 2018-07-24 | 武汉中元华电电力设备有限公司 | 一种基于硬件fpga快速实现最优二叉树的方法 |
CN109375989A (zh) * | 2018-09-10 | 2019-02-22 | 中山大学 | 一种并行后缀排序方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN113467702A (zh) | 2021-10-01 |
TWI773051B (zh) | 2022-08-01 |
TW202138994A (zh) | 2021-10-16 |
KR20210129715A (ko) | 2021-10-28 |
JP2022531075A (ja) | 2022-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9547444B1 (en) | Selectively scheduling memory accesses in parallel based on access speeds of memory | |
KR101431205B1 (ko) | 캐시 메모리 장치 및 캐시 메모리 장치의 데이터 처리 방법 | |
CN101038531A (zh) | 用于嵌入式系统中部件的共用接口 | |
JP6935356B2 (ja) | 半導体装置、情報処理システム、および情報処理方法 | |
US20220197530A1 (en) | Memory system and operating method thereof | |
WO2020248982A1 (zh) | 一种区块链中交易处理的方法及装置 | |
JP2021072107A5 (zh) | ||
WO2021196745A1 (zh) | 数据处理装置、集成电路和ai加速器 | |
WO2024046230A1 (zh) | 存储器训练方法及系统 | |
US20190370097A1 (en) | Grouping requests to reduce inter-process communication in memory systems | |
WO2021115002A1 (zh) | 一种区块链交易记录的处理方法及装置 | |
US10489702B2 (en) | Hybrid compression scheme for efficient storage of synaptic weights in hardware neuromorphic cores | |
US11996860B2 (en) | Scaled bit flip thresholds across columns for irregular low density parity check decoding | |
TWI707362B (zh) | 資料寫入方法和儲存控制器 | |
KR20210108487A (ko) | 저장 디바이스 동작 오케스트레이션 | |
TWI843934B (zh) | 用於處理無結構源資料的方法及系統 | |
US11442643B2 (en) | System and method for efficiently converting low-locality data into high-locality data | |
CN113467704B (zh) | 通过智能阈值检测的命令优化 | |
US11734205B2 (en) | Parallel iterator for machine learning frameworks | |
US20240211135A1 (en) | Storage system, storage device and operating method thereof | |
CN113672530B (zh) | 一种服务器及其排序设备 | |
US20240086106A1 (en) | Accelerator Queue in Data Storage Device | |
WO2021223098A1 (en) | Hierarchical methods and systems for storing data | |
US20220107754A1 (en) | Apparatus and method for data packing and ordering | |
TWI721660B (zh) | 控制資料讀寫裝置與方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021557465 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217031349 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20929152 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20929152 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.04.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20929152 Country of ref document: EP Kind code of ref document: A1 |