WO2017086987A1 - Lecture aléatoire de données en mémoire - Google Patents

Lecture aléatoire de données en mémoire Download PDF

Info

Publication number
WO2017086987A1
WO2017086987A1 PCT/US2015/061843 US2015061843W WO2017086987A1 WO 2017086987 A1 WO2017086987 A1 WO 2017086987A1 US 2015061843 W US2015061843 W US 2015061843W WO 2017086987 A1 WO2017086987 A1 WO 2017086987A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
shuffle
data
index
buckets
Prior art date
Application number
PCT/US2015/061843
Other languages
English (en)
Inventor
Jun Li
Haris Volos
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2015/061843 priority Critical patent/WO2017086987A1/fr
Publication of WO2017086987A1 publication Critical patent/WO2017086987A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Definitions

  • Data shuffling involves processing data based on key/value pairs.
  • the shuffle data is sorted by a source (e.g., a mapper or data producer) and merged at a destination (e.g., a reducer or data consumer). Accordingly, the shuffle data including a plurality of keys and corresponding values, may be processed to provide data that is organized based on the keys.
  • FIG. 1 illustrates an example in-memory data shuffle system, including in-memory shuffle engines that may be implemented in accordance with an aspect of this disclosure.
  • FIG. 2 is a block diagram of an example in-memory shuffle engine that may be implemented by the in-memory data shuffle system of FIG. 1.
  • FIG. 3 illustrates an example representation of shuffle data being sorted/merged within the in-memory data shuffle system of FIG. 1 utilizing in- memory mappers and in-memory reducers of in-memory shuffle engines that may be implemented by the in-memory shuffle engine of FIG. 2.
  • FIG. 4 is a flowchart representative of example machine readable instructions that may be executed to implement the in-memory shuffle engine of FIG. 2.
  • FIG. 5 is a flowchart representative of example machine readable instructions that may be executed to implement the in-memory data shuffle system of FIG. 1 .
  • FIG. 8 is a block diagram of an example processor platform capable of executing the instructions of FIGS. F1 and/or F2 to implement the A1 of FIG. B.
  • FIG. 8 is a block diagram of an example processor platform capable of executing the instructions of FIGS. F1 and/or F2 to implement the A1 of FIG. B.
  • Examples disclosed herein involve an in-memory data shuffle system to shuffle data within a shared memory fabric, such as a configuration dynamic random access memory (DRAM) and/or a non-volatile memory (NVM), of a multi-compute node system.
  • a shared memory fabric such as a configuration dynamic random access memory (DRAM) and/or a non-volatile memory (NVM), of a multi-compute node system.
  • DRAM configuration dynamic random access memory
  • NVM non-volatile memory
  • in-memory mappers of in- memory shuffle engines partition and index shuffle data in a shared memory region while in-memory reducers of the in-memory shuffle engines retrieve and reduce the shuffle data using the index buckets that map locations of the buckets in the shared-memory.
  • a plurality of mappers may process data based on key/value pairs (e.g., organize based on keys) and reducers may merge the processed data (retrieve ail data having a certain key). Accordingly, when the reducers are to merge the shuffle data, the reducers may have to contact ail mappers to retrieve data having a certain key, which may cause great delay and processing resources. This may be compounded in systems utilizing disk storage, solid-state drive (SSD) storage, etc. to shuffle the data as communication protocols (e.g., Transmission Control Protocol/Internet Protocol) in such systems introduce a lot of overhead, including the overhead from an operating system of the systems.
  • communication protocols e.g., Transmission Control Protocol/Internet Protocol
  • An example shared memory fabric e.g., a configuration of dynamic random access memory (DRAM) or a non-volatile memory (NVM)
  • DRAM dynamic random access memory
  • NVM non-volatile memory
  • in- memory shuffle engines utilize shared memory to shuffle data between nodes (e.g., in-memory shuffle engines) of the multi-compute node system via direct memory accesses.
  • shuffle data may be shuffled from sources (e.g., mappers) to destinations (e.g., reducers) using relatively low-latency, high bandwidth communication via the direct memory access to increase speed and efficiency of shuffling data in the example in-memory shuffle systems.
  • Direct memory access between in-memory mappers and in-memory reducers is possible as the in-memory mappers and the in-memory reducers are located within a same load/store domain (e.g., a collection of processors or computing devices that can access a same collection of fabric attached memory via load/store processor operations).
  • in-memory shuffle engines use a mapper to sort shuffle data into buckets of a shared memory and an index to map the sorted shuffle data to locations within the shared memory.
  • a bucket or data bucket is any block, portion, or memory space of a memory that stores data.
  • the in- memory shuffle engines further use reducers to retrieve bucket location information from index buckets in the shared memory region and process mapped shuffle data using (e.g., via merge engines, pass through, a hash-map, etc.). Examples herein may provide processed shuffle data that is
  • FIG. 1 is a schematic diagram of an example in-memory data shuffle system 100 including n in-memory shuffle engines 1 10(a),
  • the in-memory data shuffle system 100 of FIG. 1 includes an in-memory shuffle manager 120, the n in-memory shuffle engines 1 10, and a shared memory system 130.
  • the in-memory shuffle engines 1 10 may shuffle data within a shared memory region of the shared memory system 130 that is accessible by any or all of the in-memory shuffle engines 1 10.
  • the example shared memory system 130 may include a memory manager and memory or storage.
  • the example shared memory system 130 may include a shared non-volatile memory (NVM). a shared non-volatile storage, a shared dynamic random access memory
  • the in-memory shuffle manager 120 manages the in-memory shuffle engines 1 10 of the in-memory data shuffle system 100 of FIG. 1 .
  • the in- memory shuffle manager 120 e.g., a shuffle scheduler
  • the in-memory shuffle manager 120 may include an interface (e.g., a remote procedure call interface) to receive data and/or instructions to process data.
  • the example in-memory shuffle manager 120 may maintain communication between the in-memory shuffle engines 1 10.
  • Such communication may include facilitating exchanges of index buckets indicating locations of buckets including mapped shuffle data stored in a shared memory region of the shared memory system 130, as further described below.
  • the example in-memory shuffle manager 120 provides global pointers to index buckets in a shared NVM (or shared DRAM) of the shared memory system 130. Accordingly, the in-memory shuffle manager 120 may serve as a master node of the in-memory shuffle engines 1 10 to facilitate shuffling of shuffle data between in-memory mappers of in-memory shuffle engines and in-memory reducers of the in-memory shuffle engines.
  • the in-memory shuffle engines 1 10 of FIG. 1 process shuffle data (e.g., key/value pairs) in a shared memory region of the shared memory system 130 in accordance with the teachings of this disclosure.
  • the in-memory shuffle engines 1 10 may be implemented by virtual machine(s) (which may be implemented by hardware or hardware and executable instructions),
  • the in-memory shuffle engines 1 10 shuffle data by redistributing the data from a source stage (e.g., a map stage) to a destination stage (e.g., a reduce stage).
  • a source stage e.g., a map stage
  • a destination stage e.g., a reduce stage
  • the in-memory shuffle engines 1 10 may support a variety of shuffle operations, such as groupBy (to group together values sharing a same key or key family), reduceBy (to apply a reduce function on grouped values sharing a same key or key family), partitionBy (to move keys into different partitions), and sortBy (to sort keys with global ordering), etc.
  • the example in-memory shuffle engines 1 10 may partition the shuffle data and/or sort the shuffle data and store the sorted shuffle data into buckets of a shared memory region (e.g., of an NVM or DRAM) based on key/value pairs within the shuffle data.
  • a shared memory region e.g., of an NVM or DRAM
  • An example implementation of an in- memory shuffle engine that may be used to implement the in-memory shuffle engines 1 10 of FIG. 1 is disclosed below in connection with FIG. 2.
  • Example buckets of the shared memory region of a shared memory of the shared memory system 130 may be allocated via a memory manager of the shared memory system 130 (e.g., in response to requests from the in-memory shuffle engines 1 10).
  • the memory manager of the shared memory system 130 may be implemented by a memory broker (MB) of a shared-memory system 130.
  • the example memory manager may include an allocation interface (e.g., a malloc/free interface) that is called by the in-memory shuffle engines 1 10 to allocate shared memory to the in-memory shuffle engines 1 10.
  • the example memory manager may implement any suitable allocation schemes, offsets, address translations (e.g., virtual to physical, etc.) to enable the in-memory shuffle engines 1 10 to access the shared memory region of the shared memory system 130.
  • the memory manager of the shared memory system 130 may partition a shared memory region into fixed sized zones.
  • a zone may be acquired/allocated (e.g., via the in-memory shuffle manager 120).
  • the zones may be allocated to support locality (e.g., via a libnuma operation).
  • the zones may be allocated such that their proximity to a location of a corresponding in-memory shuffle engine 1 10 using the zone is relatively lessened (e.g., minimized, lowered, etc.).
  • blocks of the zones may be cleared (e.g., via a single bulk free operation) to enable use of the block for future/subsequent processes/shuffling.
  • FIG. 2 is a block diagram of an example in-memory shuffle engine 200, which may be used to implement the in-memory shuffle engines 1 10 of FIG. 1.
  • the example in-memory shuffle engine 200 includes an in-memory mapper 210 with an indexer 212, a partitioner 214, and a map operator 216, and an in-memory reducer 220 with an index retriever 222 and a reduce operator 224.
  • in-memory mappers similar to the in- memory mapper 210, of the in-memory shuffle engines 1 10 of FIG.
  • the 1 may partition and/or sort shuffle data from a processing pipeline and in-memory reducers, similar to the in-memory reducer 220, may sort-based merge, pass through, or hash-map based merge shuffle data from in-memory mappers 210 of the in-memory shuffle engines 1 10 of FIG. 1 in accordance with the teachings of this disclosure.
  • the following refers to a single example of the in-memory shuffle engine of FIG. 2, though all or some of the in- memory shuffle engines 1 10 of FIG. 1 may operate in the same or similar manner as the in-memory shuffle engine 1 10 of FIG. 2.
  • the example in-memory mapper 210 of FIG. 2 includes an indexer 212, a partitioner 214, and map operator 218.
  • the in-memory mapper 210 may also include a buffer (not shown) to receive shuffle data (e.g., key/value pairs).
  • the example buffer may be implemented by a dynamic random access memory (DRAM) within or in communication with the in-memory data shuffle system 100 of FIG. 1.
  • DRAM dynamic random access memory
  • shuffle data having a designated type or identifier e.g., integers, longs, floats, doubles, strings, byte[], etc.
  • the in-memory mapper 210 may utilize separate data structures for certain types of keys (e.g., integers, longs, floats, strings, etc.) of the shuffle data to enable suitable sorting of the different types.
  • the in-memory mapper 210 receives shuffle data (e.g., via a buffer of the in-memory mapper 210) from a processing pipeline (or a stage of a processing pipeline) the in-memory data shuffle system 100 of FIG. 1 .
  • the partitioner 214 encodes a partitioning strategy on received key/value pairs based on a value of the key to load shuffle data into data buckets of the shared memory the key/value pairs are to be assigned.
  • each of the in-memory mappers 210 may map data to plurality of data buckets in a same or similar manner as one another, based on a partitioning strategy implemented by the partitioner 214, such that similar data buckets (e.g., data buckets having a same name or identifier) receive similar key/value pairs based on the keys.
  • the in-memory mappers 210 may maintain an index bucket that maps the locations of the buckets in the shared memory regions of the in-memory mappers 210.
  • one partition strategy is a hash partitioning, which is implemented as follows:
  • p hash(key)/N (1 )
  • p is a partition number (e.g., an identifier of the bucket)
  • hashQ is a hash function that converts the key value into an integer
  • N is a number of the in- memory reducers 220 for the shuffle.
  • the map operator 218 of the in-memory mapper 210 sorts the received shuffle data and store shuffle data in the data buckets, for example, in order of the keys of the key/value pairs in the buckets, in some examples (for non-ordered operations such as groupBy, reduceBy, and partitionBy, etc.), the map operator 218 may pass the partitioned key/value pairs, such that the in-memory reducers may have access to the buckets without ordering of the key/value pairs within the buckets.
  • the Indexer 212 may provide map status information of the in- memory mapper 210 corresponding to the mapped shuffle data to the in- memory shuffle manager 120 of FIG. 1 .
  • the indexer 212 may provide a map identifier and a global pointer to the corresponding index bucket of the in-memory mapper 210 in which the partitioner 214 filled the received key/value pairs.
  • the in-memory shuffle manager 120 may then gather the map statuses of all in-memory mappers 210 of the in-memory data shuffle system 100, and pass the map status information to the in-memory reducers 220 of the in-memory shuffle data system 100.
  • the in-memory shuffle manager 120 may provide the map status information via a global pointer to shuffle data of each in-memory mapper 210.
  • the global pointer may include an offset to the bucket index and a region identifier of the shared memory region of the in-memory mapper 210. Accordingly, the in-memory shuffle manager 120 may facilitate or handle communication (e.g., on a dedicated channel) of the index information and/or provide such index information to in-memory reducers of the in-memory shuffle engines 1 10 of FIG. 1 (e.g., via a global pointer).
  • the indexer 212 provides a map identifier and a global pointer to the corresponding index bucket of the in-memory mapper 210.
  • Each bucket of the in-memory mapper 210 may be represented as a ⁇ start offset, size> entry in the index bucket. In examples herein, when the size of the bucket is 0, the data bucket may be considered empty.
  • the example in-memory reducer 220 in the example of FIG. 2 includes an index retriever 222 and a reduce operator 224.
  • the example in- memory reducer 220 may also include a buffer to receive shuffle data (e.g., sorted key/value pairs) for processing.
  • the example buffer of the in-memory reducer 220 may be implemented by a DRAM (e.g., the same DRAM or a different DRAM that implements the buffer of the in-memory mapper 210) of the in-memory data shuffle system 100, though the buffer of the in-memory reducer 220 may be separate from a buffer of the in-memory mapper 210.
  • the in-memory reducer 220 receives/retrieves shuffle data (e.g., via a buffer of the in-memory reducer 220) that has been sorted by in-memory mappers of the in-memory shuffle engines 1 10 of FIG. 1 (which may include shuffle data sorted by the in-memory mapper 210 of FIG. 2).
  • the in-memory reducers 220 of the in-memory shuffle engines 1 10 merge data based on keys of the shuffle data (e.g., via a priority queue).
  • the reduce operator 224 e.g., a sort-based merge engine such as a priority queue, a hash-map based merge engine, or a direct pass-through engine
  • the reduce operator 224 may merge or pass mapped shuffle data from a particular bucket.
  • the index retriever 222 of FIG. 2 may retrieve/receive index information (e.g., global pointers to the index buckets of the in-memory mappers 210 of the in-memory shuffle engines 1 10 of FIG. 1 ) from the in-memory shuffle manager 120 of FIG. 1 that includes location information of buckets.
  • the in- memory reducer 220 may retrieve shuffle data based on the index information from the in-memory shuffle manager 120. Using the index information, the in- memory reducer 220 retrieves data from non-empty buckets of the shared memory region (e.g., the buckets that did not receive sorted data from the in- memory mapper 210).
  • the in-memory reducer 220 may not necessarily retrieve data from each in-memory mapper of the in-memory data shuffle system 100 of FIG. 1 nor spend time attempting to fetch data from empty buckets.
  • the processed keys of the shuffie data from the in-memory reducer 220 and corresponding values may then be provided to in-memory shuffle engine 1 10 of FIG. 1 as output to go back to a processing pipeline (e.g., for a subsequent stage of the processing pipeline).
  • the in-memory reducer 220 may directly access the buckets via a direct memory access (e.g., to merge keys, merge values of keys, etc.).
  • the in-memory shuffle engine 200 may utilize a zero-copy technique to efficiently process shuffle data in accordance with the teachings of this disclosure.
  • a plurality of in-memory shuffle engines 200 may communicatively work together (e.g., via the in-memory shuffle manager 1 10) to process shuffle data using shared memory of the in- memory data shuffie system 100.
  • the in-memory shuffie engines 1 10 of FIG. 1 While an example manner of implementing the in-memory shuffie engines 1 10 of FIG. 1 is illustrated by the example in-memory shuffle engine 200 of FIG. 2, at least one of the elements, processes and/or devices illustrated in FIG. 2 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the in-memory mapper, including the indexes" 212, the partitioner 214, and the map operator 216, or the in-memory reducer 220, including the index retriever 222 and the reduce operator 224, and/or more generally, the example in-memory shuffle engine 200 of FIG.
  • any of the in-memory mapper, including the indexer 212, the partitioner 214, and the map operator 216, or the in-memory reducer 220, including the index retriever 222 and the reduce operator 224, and/or more generally, the example in- memory shuffie engine 200 could be implemented by at least one of an analog or digital circuit, a logic circuit, a programmable processor, an application specific integrated circuit (ASIC), a programmable logic device (PLD) and/or a field programmable logic device (FPLD).
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • FPLD field programmable logic device
  • At least one of the in-memory mapper 210, the indexer 212, the partitioner 214, the map operator 216, the in-memory reducer 220, the index retriever 222, or the reduce operator 224 is/are hereby expressly defined to include a tangible machine readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Biu-ray disk, etc. storing the executable instructions.
  • the example in-memory shuffle engine 200 of FIG. 2 may include at least one element, process, and/or device in addition to, or instead of, those illustrated in FIG. 2, and/or may include more than one of any or all of the illustrated elements, processes and devices.
  • FIG. 3 illustrates an example representation of shuffle data being processed within the in-memory data shuffle system 100 of FIG. 1 .
  • a plurality of in-memory mappers 310a, 310b, 310c, 31 Od (which may be collectively referred to as in-memory mappers 310) and in-memory reducers 320a, 320b, 320c (which may be collectively referred to as in-memory reducers 320) process shuffle data.
  • the example in-memory mappers 310 and in-memory reducers 320 of FIG. 3 may be implemented by the in-memory mapper 210 and in-memory reducer 220 of FIG.
  • the in-memory mappers 310 and in-memory reducers 320 of FIG. 3 may be managed and/or controlled via an in-memory shuffle manager (such as the in-memory shuffle manager 120 of FIG. 1 ). in examples herein, any one of the in-memory mappers 310 may be included within a same in-memory shuffle engine (e.g., the in-memory shuffle engine 200 of FIG. 2) as any one of the in- memory reducers 320.
  • the in-memory mapper 310a may be included within a same in-memory shuffle engine 1 10 as the in-memory reducer 320a or the in-memory mapper 310a may be included within a same in-memory shuffle engine 1 10 as the in-memory reducer 320b, etc.
  • the in-memory mappers 310 and in-memory reducers 320 shuffle data via a shared memory region 330.
  • the example shared memory region 330 may be implemented by a non-volatile memory fabric (e.g., a memristor array, a phase-change memory, etc.) and/or a dynamic random access memory.
  • shuffle data is represented in data buckets with the bucket identifiers being the numerals 0, 1 , and 2.
  • the 0, 1 , and 2 of the shuffle data may represent data buckets including keys resulting from a partitioning strategy (e.g., such as the strategy of Equation (1 ) above) of key/value pairs in the shuffle data.
  • the example shuffle data may be received by the in-memory mappers 310 in a random order (e.g., as received in a processing pipeline) via a buffer of the in-memory mappers 310. It is noted that the shuffle data received by the in- memory mappers 310 may also include additional or alternative data than the data represented by of the data buckets 0, 1 , and 2 in FIG. 3.
  • the example in-memory mappers 310 may sort the shuffle data and store the sorted shuffle data into buckets of the shared memory region 330 and provide index information corresponding to addresses/locations of the buckets within the shared memory region 330 to a master node or shuffling manager in communication with the in-memory mappers 310 (e.g., the in- memory shuffle manager 120 of FIG. 1 ).
  • key/ alue pairs may be sorted and stored into buckets of the shared memory region 330 based on the key resulting from a partitioning strategy. For example, using the above strategy of Equation 1 , a first bucket includes keys having a remainder 0, a second bucket includes keys having a remainder 1. and a third bucket includes keys having a remainder 2 .
  • the example shuffled data is received/retrieved by the in-memory reducers 320 and merged as shown (ail data buckets labelled with the identifier of 0 merged by the in-memory reducer 320a, all data buckets labelled with the identifier of 1 merged by the in-memory reducer 320b, and all data buckets labeled with the identifier of 2 merged by the in-memory reducer 320c).
  • the in-memory reducers 320 may receive/retrieve index information (e.g., a global pointer to an index bucket of the in-memory mappers 310) corresponding to the bucket locations (e.g., a beginning address of a bucket in the shared memory region 330 and merge corresponding values from buffers of the in-memory reducers 320.
  • the in-memory reducers 320 may refer to an index (e.g., index buckets of the in-memory mappers 310 in the shared memory region 330) to determine start locations (e.g., a beginning address) or start offsets of the data buckets 0, 1 , 2.
  • the in-memory reducers 320 may then retrieve the shuffle data from the data buckets for processing. For example, reduce operators of the in-memory reducers 320 may load the shuffle data into a merge engine (e.g., a priority queue), may perform a hash- map based merge of the shuffle data, or a pass-through of the shuffle data to the processing pipeline. The reducer operator pulls corresponding values corresponding to the keys of the shuffle data from a buffer of the in-memory reducers 320. The processing may then pull the processed shuffle data for the next stage.
  • a merge engine e.g., a priority queue
  • the in-memory mappers 310 and in- memory reducers 320 work together to shuffle data using the shared memory region 330. Accordingly, the in-memory reducers 320 may directly access data sorted by the in-memory mappers 310 within the shared memory region 330 without copying data, thus reducing processing overhead/resources.
  • FIG. 4 A flowchart representative of example machine readable instructions for implementing the in-memory shuffle engine 200 of FIG. 2 is shown in FIG. 4.
  • the machine readable instructions comprise a program/process for execution by a processor such as the processor 812 shown in the example processor platform 600 discussed below in connection with FIG. 6.
  • the program/process may be embodied in executable instructions (e.g., software) stored on a tangible machine readable storage medium such as a CD- ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 812, but the entire program/process and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware.
  • a tangible machine readable storage medium such as a CD- ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 812, but the entire program/process and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware.
  • a device such as a CD- ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk
  • the process 400 of FIG. 4 begins with an initiation of the in- memory shuffle engine 200 or a plurality of in-memory shuffle engines 200 (e.g., upon startup, upon instructions from a user, upon startup of a system
  • the example process 400 of FIG. 4 may be executed to implement a single in-memory shuffle engine 200 or the plurality of in-memory shuffle engines 100. For example, some blocks may be executed to implement a first shuffle engine and some blocks may be executed to implement a second shuffle engine in accordance with the examples herein.
  • the example process 400 may be iterativeiy executed by a single or multiple in- memory shuffle engines to process shuffle data until ail of shuffle data in a set of shuffle data is processed, in some examples, if shuffle data is missing from data buckets identified by the memory manager 120 via the index buckets, the in-memory shuffle engines 1 10 may notify the in-memory shuffle manager 120, which may instruct in-memory mappers 210 associated with the missing shuffle data to re-execute mapping of the shuffle data.
  • the partitioner 214 of an in-memory mapper 210 partitions shuffle data into buckets of a shared memory.
  • the in-memory mapper 210 may distribute the shuffle data based on keys of key/value pairs, such that the key/value pairs are allocated to the buckets so that each bucket comprises a same partition number (e.g., a same remainder from Equation 1 above) from the shuffle data.
  • the indexer 212 of the in-memory mapper 210 indexes location information of the sorted shuffle data in an index.
  • the example indexer 212 may index beginning addresses of buckets of the in-memory mapper 210 in the shared memory- region.
  • the index retriever 222 of an in-memory reducer 220 of an in-memory shuffle engine 200 determines a location in the shared memory region of a portion of the shuffle data based on the location information in the index.
  • the portion of the shuffle data may refer to shuffle data having a certain key of key/value pairs. Accordingly, the index retriever 222 may identify the keys in the index which maps the keys to corresponding locations of buckets in the shared memory region.
  • the in-memory reducer 220 may access the portion of the shuffle data from the determined location of the shared memory. For example, the in-memory reducer 220 may access the shuffle data by first retrieving the keys from the buckets to the reduce operator and then loading corresponding values of the keys from the bucket into a buffer of the in- memory reducer 220. Accordingly, after block 440, the shuffle data from the reduce operator 224 and the buffer of the in-memory reducer may be output to the processing pipeline of the next stage. After block 440, the example process 400 ends.
  • FIG. 5 A flowchart representative of example machine readable instructions for implementing the in-memory data shuffle system 100 of FIG. 1 is shown in FIG. 5.
  • the machine readable instructions comprise a program/process for execution by a processor such as the processor 812 shown in the example processor platform 600 discussed below in connection with FIG. 8.
  • the program/process may be embodied in executable instructions (e.g., software) stored on a tangible machine readable storage medium such as a CD- ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 812, but the entire program/process and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware.
  • a tangible machine readable storage medium such as a CD- ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or a memory associated with the processor 812, but the entire program/process and/or parts thereof could alternatively be executed by a device other than the processor 612 and/or embodied in firmware or dedicated hardware.
  • a device such as a CD- ROM, a floppy disk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk
  • the process 500 of FIG. 5 begins with an initiation of the in- memory data shuffle system 100 (e.g., upon startup, upon instructions from a user, upon startup of a device implementing the in-memory data shuffle system 100 (e.g., a server, a computer, etc.), etc.).
  • the shared memory system 130 e.g., via a memory broker or memory manager, allocates a shared region of a non-volatile memory for data shuffling, which is accessible to the plurality of in-memory shuffle engines 1 10.
  • the in-memory shuffle manager 120 provides shuffle data to the plurality of shuffle engines 1 10.
  • the shuffle engines are to process the shuffle data using a shared memory region of a shared memory (e.g., a DRAM, a NVSV1, etc.) and an index.
  • a shared memory e.g., a DRAM, a NVSV1, etc.
  • index e.g., an index.
  • the shuffle engines are to sort shuffle data and store the sorted shuffle data into the shared memory region of the shared memory system 130 and index location information in an index (e.g., an index bucket of the shared memory system 130), and in-memory reducers (e.g., similar to the in-memory reducer 220 of FIG. 2) retrieve and merge the sorted data using the index.
  • an index e.g., an index bucket of the shared memory system 130
  • in-memory reducers e.g., similar to the in-memory reducer 220 of FIG.
  • the processed shuffle data based on key/value pairs of the shuffle data is outputted to the processing pipeline in the next stage.
  • the example processed shuffle data from the in-memory manager 120 may include shuffle data from the plurality of in-memory shuffle engines.
  • FIGS. 4 and 5 may be implemented using coded instructions (e.g., computer and/or machine readable instructions) stored on a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • coded instructions e.g., computer and/or machine readable instructions
  • a tangible machine readable storage medium such as a hard disk drive, a flash memory, a read-only memory (ROM), a compact disk (CD), a digital versatile disk (DVD), a cache, a random-access memory (RAM) and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances,
  • tangible machine readable storage medium is expressly defined to include any type of machine readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media.
  • computer readable storage medium and “machine readable storage medium” are used interchangeably. Additionally or alternatively, the example processes of FIGS.
  • non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information).
  • a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk,
  • FIG. 6 is a block diagram of an example processor platform 600 capable of executing the instructions of FIGS. 4 and/or 5 to implement the shuffle engine 200 of FIG. 2 and/or the in-memory data shuffle system 100 of FIG. 1 .
  • the example processor platform 600 may be any type of apparatus or may be included in any type of apparatus, such as a server, a personal computer, or any other type of computing device.
  • the processor platform 600 of the illustrated example of FIG. 6 includes a processor 612.
  • the processor 612 of the illustrated example is hardware.
  • the processor 612 can be implemented by at least one integrated circuit, logic circuit, microprocessor or controller from any desired family or manufacturer.
  • the processor 612 of the illustrated example includes a local memory 613 (e.g., a cache).
  • the processor 612 of the illustrated example is in communication a volatile memory 614 and a non-volatile memory 616 via a bus 618.
  • the volatile memory 614 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory
  • the non-volatile memory 616 may be implemented by flash memory, a memristor memory fabric, and/or any other desired type of fast non-volatile memory device. Access to the main memory 614, 816 may be controlled by a memory controller.
  • the processor platform 600 of the illustrated example also includes an interface circuit 620.
  • the interface circuit 620 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), and/or a peripheral component interconnect (PCI) express interface.
  • At least one input device 622 is connected to the interface circuit 620.
  • the input device(s) 622 permit(s) a user to enter data and commands into the processor 612.
  • the input device(s) may be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, and/or a voice recognition system.
  • At least one output device 624 is also connected to the interface circuit 620 of the illustrated example.
  • the output device(s) 624 may be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display, a cathode ray tube display (CRT), a touchscreen, a tactile output device, a light emitting diode (LED), a printer and/or speakers).
  • the interface circuit 620 of the illustrated example thus, may include a graphics driver card, a graphics driver chip or a graphics driver processor.
  • the interface circuit 620 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 626 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
  • a communication device such as a transmitter, a receiver, a transceiver, a modem and/or network interface card to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 626 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, coaxial cable, a cellular telephone system, etc.).
  • DSL digital subscriber line
  • the processor platform 600 of the illustrated example also includes at least one mass storage device 628 for storing executable
  • mass storage device(s) 628 examples include floppy disk drives, hard drive disks, compact disk drives, Biu-ray disk drives, RAiD systems, and digital versatile disk (DVD) drives.
  • the coded instructions 632 of FIGS. 4 and/or 5 may be stored in the mass storage device 628, in the local memory 613 in the volatile memory 614, in the non-volatile memory 616, and/or on a removable tangible machine readable storage medium such as a CD or DVD.
  • in-memory mappers of shuffle engines and in-memory reducers of the shuffle engines have access to a same shared memory location and can therefore directly access the same data from the shared memory fabric rather than copying from separate storage locations, retrieving shuffle data via TCP/IP, etc.

Abstract

Des exemples décrits comprennent la lecture aléatoire de données en mémoire. Dans certains exemples, des moteurs de lecture aléatoire en mémoire utilisent un mappeur afin de trier des données de lecture aléatoire dans des compartiments d'une mémoire partagée et un indice afin de mapper les données de lecture aléatoire triées à des emplacements dans la mémoire partagée. Les moteurs de lecture aléatoire en mémoire utilisent en outre des réducteurs afin de récupérer des informations d'emplacement provenant de l'indice et de fusionner les données de lecture aléatoire triées au moyen de moteurs de fusion. Des exemples peuvent fournir des données de lecture aléatoire traitées qui sont triées/fusionnées sur la base de paires de clés/valeurs des données de lecture aléatoire.
PCT/US2015/061843 2015-11-20 2015-11-20 Lecture aléatoire de données en mémoire WO2017086987A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/061843 WO2017086987A1 (fr) 2015-11-20 2015-11-20 Lecture aléatoire de données en mémoire

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/061843 WO2017086987A1 (fr) 2015-11-20 2015-11-20 Lecture aléatoire de données en mémoire

Publications (1)

Publication Number Publication Date
WO2017086987A1 true WO2017086987A1 (fr) 2017-05-26

Family

ID=58717604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/061843 WO2017086987A1 (fr) 2015-11-20 2015-11-20 Lecture aléatoire de données en mémoire

Country Status (1)

Country Link
WO (1) WO2017086987A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220069A (zh) * 2017-07-03 2017-09-29 中国科学院计算技术研究所 一种针对非易失性内存的Shuffle方法
US20190197138A1 (en) * 2017-12-21 2019-06-27 International Business Machines Corporation Data shuffling with hierarchical tuple spaces
CN110046638A (zh) * 2018-12-29 2019-07-23 阿里巴巴集团控股有限公司 多平台间数据的融合方法、装置及设备
US10956125B2 (en) 2017-12-21 2021-03-23 International Business Machines Corporation Data shuffling with hierarchical tuple spaces
CN112785485A (zh) * 2019-11-04 2021-05-11 辉达公司 用于有效的结构附接存储器的技术

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323920A1 (en) * 2011-01-12 2012-12-20 International Business Machines Corporation Creating a semantically aggregated index in an indexer-agnostic index building system
US20130006612A1 (en) * 2011-06-30 2013-01-03 Google Inc. Training acoustic models
US20140059552A1 (en) * 2012-08-24 2014-02-27 International Business Machines Corporation Transparent efficiency for in-memory execution of map reduce job sequences
US20140365463A1 (en) * 2013-06-05 2014-12-11 Digitalglobe, Inc. Modular image mining and search
US20150150018A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Optimization of map-reduce shuffle performance through shuffler i/o pipeline actions and planning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323920A1 (en) * 2011-01-12 2012-12-20 International Business Machines Corporation Creating a semantically aggregated index in an indexer-agnostic index building system
US20130006612A1 (en) * 2011-06-30 2013-01-03 Google Inc. Training acoustic models
US20140059552A1 (en) * 2012-08-24 2014-02-27 International Business Machines Corporation Transparent efficiency for in-memory execution of map reduce job sequences
US20140365463A1 (en) * 2013-06-05 2014-12-11 Digitalglobe, Inc. Modular image mining and search
US20150150018A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Optimization of map-reduce shuffle performance through shuffler i/o pipeline actions and planning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220069A (zh) * 2017-07-03 2017-09-29 中国科学院计算技术研究所 一种针对非易失性内存的Shuffle方法
US20190197138A1 (en) * 2017-12-21 2019-06-27 International Business Machines Corporation Data shuffling with hierarchical tuple spaces
US10891274B2 (en) * 2017-12-21 2021-01-12 International Business Machines Corporation Data shuffling with hierarchical tuple spaces
US10956125B2 (en) 2017-12-21 2021-03-23 International Business Machines Corporation Data shuffling with hierarchical tuple spaces
CN110046638A (zh) * 2018-12-29 2019-07-23 阿里巴巴集团控股有限公司 多平台间数据的融合方法、装置及设备
CN112785485A (zh) * 2019-11-04 2021-05-11 辉达公司 用于有效的结构附接存储器的技术
CN112785485B (zh) * 2019-11-04 2023-11-07 辉达公司 用于有效的结构附接存储器的技术
US11822491B2 (en) 2019-11-04 2023-11-21 Nvidia Corporation Techniques for an efficient fabric attached memory

Similar Documents

Publication Publication Date Title
US10394847B2 (en) Processing data in a distributed database across a plurality of clusters
US11354230B2 (en) Allocation of distributed data structures
JP6356675B2 (ja) 集約/グループ化動作:ハッシュテーブル法のハードウェア実装
WO2017086987A1 (fr) Lecture aléatoire de données en mémoire
US8850158B2 (en) Apparatus for processing remote page fault and method thereof
JP2017182803A (ja) メモリの重複除去方法及び重複除去dramメモリモジュール
KR102440128B1 (ko) 통합된 객체 인터페이스를 위한 메모리 관리 장치, 시스템 및 그 방법
CN109753231A (zh) 键值存储设备及操作其的方法
US11288287B2 (en) Methods and apparatus to partition a database
WO2015142341A1 (fr) Extension de mémoire dynamique par compression de données
US10049035B1 (en) Stream memory management unit (SMMU)
WO2015176689A1 (fr) Procédé et dispositif de traitement de données
JP2011039800A (ja) データベース管理方法およびシステム並びにその処理プログラム
TWI710899B (zh) 計算系統以及其操作方法
US10062137B2 (en) Communication between integrated graphics processing units
JP6974510B2 (ja) データを処理するための方法、装置、デバイス及び媒体
US11061676B2 (en) Scatter gather using key-value store
US8935508B1 (en) Implementing pseudo content access memory
US11347551B2 (en) Methods, systems, articles of manufacture and apparatus to manage memory allocation
US11080299B2 (en) Methods and apparatus to partition a database
US10942864B2 (en) Shared memory for distributed data
US20180329756A1 (en) Distributed processing system, distributed processing method, and storage medium
WO2021249030A1 (fr) Procédé de génération de séquence de nombres aléatoires et moteur de nombres aléatoires
US11003578B2 (en) Method and system for parallel mark processing
US8959278B2 (en) System and method for scalable movement and replication of data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15908974

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15908974

Country of ref document: EP

Kind code of ref document: A1