WO2011070910A1 - Système d'agencement/calcul de données, procédé d'agencement/calcul de données, dispositif maître et procédé d'agencement de données - Google Patents

Système d'agencement/calcul de données, procédé d'agencement/calcul de données, dispositif maître et procédé d'agencement de données Download PDF

Info

Publication number
WO2011070910A1
WO2011070910A1 PCT/JP2010/070854 JP2010070854W WO2011070910A1 WO 2011070910 A1 WO2011070910 A1 WO 2011070910A1 JP 2010070854 W JP2010070854 W JP 2010070854W WO 2011070910 A1 WO2011070910 A1 WO 2011070910A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
data
replica
slave device
owner
Prior art date
Application number
PCT/JP2010/070854
Other languages
English (en)
Japanese (ja)
Inventor
祥治 西村
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2011545163A priority Critical patent/JP5799812B2/ja
Priority to US13/514,229 priority patent/US8898677B2/en
Publication of WO2011070910A1 publication Critical patent/WO2011070910A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data

Definitions

  • the present invention relates to a data arrangement / calculation system, and more particularly to a data arrangement / calculation system for executing a sliding window calculation.
  • Non-Patent Document 1 describes a Bigtable that manages large-scale data.
  • Bigtable is a system that divides a table into tablets and distributes the divided tablets to a plurality of servers for management. Bigtable is characterized in that the load can be distributed by a large number of servers at the same time that the table can be made as large as possible.
  • Sliding window calculation is a calculation method in which a series of data arranged in a certain order, such as time series data, is divided into specific sections and sequentially calculated. For example, the sliding window calculation is used to calculate a moving average of stock prices, to calculate a place where a user stays for a certain period of time from a user's time series position information, and a time.
  • Non-Patent Document 1 describes data processing of a calculation model called MapReduce.
  • MapReduce is divided into Map processing and Reduce processing.
  • the Map process is a filter calculation process executed by a server having data.
  • the Reduce process is a process of collecting a data set associated with a certain key as a result of the Map process and performing a reduction operation.
  • the Map process associates records with the window to which each record belongs as a key.
  • the Reduce process performs a sliding window calculation on the record associated with the key for each window.
  • Shadow is known in the field of high-performance computing (see, for example, Patent Document 1).
  • HPF High Performance Fortran
  • SHADOW directive When data is distributed to a plurality of computers, an area called a shadow area is overlapped and divided as shown in FIG.
  • the program developer explicitly specifies a shadow area by a SHADOW directive, only that portion can be duplicated and distributed to the compiler.
  • MapReduce has a problem in performing the sliding window calculation. Since Map processing is performed by a server having data, there is no communication cost. On the other hand, in the Reduce process, since the data set associated with the key has to be collected, the communication cost increases. Therefore, MapReduce can execute the Reduce process on a local data set after the Map process. However, if there is a window that crosses the data division boundary, data that should enter the same window is distributed to a plurality of servers, and therefore, the local Reduce process cannot be performed after the Map process. Therefore, MapReduce needs to be calculated after collecting the records in the same server when there are records included in the window that crosses the data division boundary, which deteriorates the efficiency of the sliding window calculation.
  • the optimization method using the shadow area of Patent Document 1 is a very effective optimization technique, but the programmer needs to know the width necessary for the shadow area in advance.
  • this method when a programmer wants to try various window widths by trial and error, there is a problem that a shadow area must be set and data redistributed every time calculation is performed.
  • this method is originally intended to perform calculations with multiple window widths in parallel on the same data set, since one process occupies the data set and computer system. I can't.
  • a system that implements this method is configured to calculate a necessary part for each calculation, and to replicate only that area at the start of the calculation.
  • this method also has a problem that sliding window calculation having various window widths cannot be executed simultaneously on the same data set.
  • An object of the present invention is to provide a data arrangement / calculation system that can efficiently perform a sliding window calculation even when there is a window that crosses a data division boundary when performing a sliding window calculation on distributed data. It is to provide.
  • a data arrangement / calculation system of the present invention includes a master device having data having a plurality of records arranged in order according to a predetermined key, and a job for executing a sliding window calculation for each predetermined window width, and a master device; And a plurality of slave devices to be connected.
  • the master device includes a data placement unit that divides data and places the data in each of the plurality of slave devices, and a job assignment unit that assigns a job to each of the plurality of slave devices.
  • the data arrangement unit divides the data to generate a plurality of blocks and a plurality of block replicas, and a first block of the plurality of blocks is defined as an Owner of the plurality of slave devices.
  • the first slave device includes a data holding unit that stores a copy of the first block and the second block, and a job execution unit that receives a job and executes a sliding window calculation for each predetermined window width with respect to the first block. Prepare. When the predetermined window width spans the first block and the second block copy, the job execution unit executes the sliding window calculation using the first block and the second block copy.
  • the data arrangement / calculation method of the present invention includes a step of dividing data having a plurality of records arranged in order according to a predetermined key by a master device, and a plurality of slave devices to which the master device is connected.
  • the dividing step includes a step of dividing the data to generate a plurality of blocks and a plurality of block replicas.
  • the arranging step is the step of arranging the first block of the plurality of blocks as the Owner in the first slave device of the plurality of slave devices, and the next order of the first block according to a predetermined key order. Placing a replica of the second block as a Replica on the first slave device.
  • the executing step includes a step in which the first slave device receives the job and executes a sliding window calculation for each predetermined window width with respect to the first block.
  • the step executed by the first slave device includes a step of executing a sliding window calculation using the first block and the second block replica when the predetermined window width spans the first block and the second block replica. Including.
  • the master device divides data having a plurality of records arranged in order according to a predetermined key, and arranges data in each of a plurality of connected slave devices, and sliding for each predetermined window width
  • a job assigning unit that assigns a job for executing window calculation to each of the plurality of slave devices.
  • the data arrangement unit divides the data to generate a plurality of blocks and a plurality of block replicas, and a first block of the plurality of blocks is defined as an Owner of the plurality of slave devices.
  • An arrangement unit arranged in the first slave device and further arranged as a replica in the first slave device is a replica of the second block which is the next order of the first block according to a predetermined key order.
  • the data arrangement method of the present invention includes a step of dividing data having a plurality of records arranged in order according to a predetermined key, a step of arranging the divided data in each of a plurality of connected slave devices, Assigning a job for executing the sliding window calculation for each of the plurality of window widths to each of the plurality of slave devices.
  • the step of dividing comprises generating a plurality of blocks and a copy of the plurality of blocks.
  • the arranging step is the step of arranging the first block of the plurality of blocks as the Owner in the first slave device of the plurality of slave devices, and the next order of the first block according to a predetermined key order. Placing a replica of the second block as a Replica on the first slave device.
  • the recording medium of the present invention records a computer readable program for realizing the data arrangement method.
  • a data arrangement method includes a step of dividing data having a plurality of records arranged in order according to a predetermined key, a step of arranging the divided data in each of a plurality of connected slave devices, and a predetermined window width Assigning a job for executing the sliding window calculation to each of the plurality of slave devices.
  • the step of dividing comprises generating a plurality of blocks and a copy of the plurality of blocks.
  • the arranging step is the step of arranging the first block of the plurality of blocks as the Owner in the first slave device of the plurality of slave devices, and the next order of the first block according to a predetermined key order. Placing a replica of the second block as a Replica on the first slave device.
  • the data arrangement / calculation system of the present invention can efficiently perform a sliding window calculation even when there is a window that crosses the data division boundary when performing the sliding window calculation on the distributed data.
  • FIG. 1 is a block diagram showing a configuration example of a data arrangement / calculation system 10 according to the first exemplary embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating a hardware configuration example of the master device 100 and the slave device 200 in the embodiment of the data arrangement / calculation system 10.
  • FIG. 3 is a flowchart showing the processing operation of the data arrangement / calculation system 10 according to the first exemplary embodiment of the present invention.
  • FIG. 4 shows data in which a user ID and a time are arranged in one set in order to perform a sliding window calculation for each user and along a time series.
  • FIG. 5 is an example of arrangement information indicating how each block is distributed to each slave.
  • FIG. 6 is an example of a job.
  • FIG. 7 is a block diagram showing a configuration example of the data arrangement / calculation system 20 according to the second exemplary embodiment of the present invention.
  • FIG. 8 is a flowchart showing a processing operation of the data arrangement / calculation system 20 according to the second exemplary embodiment of the present invention.
  • FIG. 9 is a flowchart showing the processing operation of the data arrangement / calculation system 20 according to the second embodiment of the present invention.
  • FIG. 1 is a block diagram showing a configuration of a data arrangement / calculation system 10 according to the first exemplary embodiment of the present invention.
  • the data arrangement / calculation system 10 includes a master device 100 and a plurality of slave devices 200.
  • the master device 100 includes data having a plurality of records arranged in order according to a predetermined key, and a job for executing a sliding window calculation for each predetermined window width.
  • Each of the plurality of slave devices 200 is connected to the master device 100.
  • the master device 100 includes a data arrangement unit 110, an Owner / Replica management unit 120, and a job allocation unit 130.
  • the data arrangement unit 110 divides data having a plurality of records arranged in order according to a predetermined key, and arranges the data in each of the plurality of slave devices 200.
  • the data placement unit 110 includes a data division unit 111 and a placement unit 112.
  • the data dividing unit 111 receives data having a plurality of records from an external device (not shown).
  • the data dividing unit 111 stores a plurality of records included in the data in the order in which each of the plurality of slave devices 200 wants to calculate a sliding window according to a predetermined key of each record. For example, when it is desired that each slave device 200 perform a sliding window calculation on a plurality of records in time series, the data distribution unit 111 stores the plurality of records in the order of keys indicating time.
  • the data division unit 111 divides the data to generate a plurality of blocks while reflecting the order of predetermined keys included in the plurality of records.
  • the data dividing unit 111 may divide the data by a preset size or may divide the data by a preset number of records. Thereafter, the data dividing unit 111 generates a copy of the plurality of divided blocks.
  • the data dividing unit 111 provides a plurality of divided blocks and a plurality of duplicated blocks to the arranging unit 112.
  • the placement unit 112 receives a plurality of divided blocks and a plurality of duplicated blocks.
  • the placement unit 112 places each of the plurality of blocks in each of the plurality of slave devices 200 as Owner. At this time, it is preferable that the arrangement unit 112 arranges the blocks as Owners evenly among all the slave devices 200.
  • An arbitrary block arranged in each slave device 200 as Owner is referred to as “Owner block”. For example, the arrangement unit 112 arranges the first block in the slave device 200a among the plurality of slave devices 200 as the Owner according to a predetermined key order.
  • the arrangement unit 112 arranges a replica of the block in the next order of the Owner block in each slave device 200 as a replica according to a predetermined key order.
  • a block arranged as a replica in each slave device 200 is referred to as a “Replica block”.
  • the arrangement unit 112 arranges, in the slave device 200a, a replica of the second block that is the next order of the first block according to a predetermined key order as a replica.
  • the placement unit 112 associates each block with each slave device 200 as Owner, and generates placement information associated with each slave device 200 with replica of each block as a replica. For example, the placement unit 112 associates the first block with the slave device 200a and generates placement information that associates the copy of the second block with the slave device 200a. The placement unit 112 provides the generated placement information to the Owner / Replica management unit 120.
  • the Owner / Replica management unit 120 stores arrangement information received from the arrangement unit 112.
  • the job allocation unit 130 receives a job for executing a sliding window calculation for each predetermined window width from an external device (not shown). Then, the job assignment unit 130 assigns the received job to each of the plurality of slave devices 200. Specifically, when the job allocation unit 130 receives a job, the job allocation unit 130 refers to the arrangement information of the Owner / Replica management unit 120. The job allocation unit 130 recognizes a predetermined key type that determines the order of a plurality of records included in the Owner block. Then, the job allocation unit 130 extracts, from each record, a key corresponding to a predetermined key type that determines the order from a plurality of records included in the Owner block.
  • the job assigning unit 130 assigns the extracted keys and jobs to the slave device 200 in which the Owner block is arranged.
  • the job allocation unit 130 refers to the arrangement information and recognizes time as a predetermined key type that defines the order of a plurality of records included in the first block that is the Owner block.
  • the job allocation unit 130 extracts a plurality of time keys corresponding to time from the first block.
  • the job assigning unit 130 assigns a job and a plurality of extracted time keys to the slave device 200a in which the first block is arranged as Owner. A specific example of job assignment will be described later.
  • Each slave device 200 includes a data holding unit 210 and a job execution unit 220.
  • the data holding unit 210 receives the Owner block and the Replica block from the data placement unit 110 and stores them.
  • the data holding unit 210 of the slave device 200a stores the first block as Owner, and stores a copy of the second block as Replica.
  • the job execution unit 220 receives a job and a plurality of keys that determine the order of a plurality of records included in the Owner block from the job allocation unit 130. Then, the job execution unit 220 executes a sliding window calculation for each predetermined window width set in the job for the Owner block stored in the data holding unit 210. At this time, the job execution unit 220 executes the sliding window calculation using a plurality of keys for which the order is determined as a start key for a predetermined window width of the job. Then, when the predetermined window width of the job spans the Owner block and the Replica block, the job execution unit 220 executes the sliding window calculation using not only the Owner block but also the Replica block.
  • the job execution unit 220 of the slave device 200a receives a job and a plurality of extracted time keys from the job allocation unit 130.
  • the job execution unit 220 executes a sliding window calculation for the first block using a plurality of time keys as window width start keys.
  • the job execution unit 220 of the slave device 200a executes the sliding window calculation using the first block and the duplication of the second block.
  • the Owner block and the Replica block are arranged in each slave device 200. Therefore, even when the window width exceeds the Owner block, the sliding window calculation is efficiently performed using the Replica block. Can be executed.
  • FIG. 2 is a block diagram illustrating a hardware configuration example of the master device 100 and the slave device 200 in the embodiment of the data arrangement / calculation system 10.
  • the master device 100 and the slave device 200 of the present invention include a CPU (Central Processing Unit) 1, a storage device 2, an input device 3, an output device 4, and a bus 5 that connects each device. It is comprised with the computer system provided with.
  • CPU Central Processing Unit
  • the CPU 1 performs arithmetic processing and control processing related to the data arrangement / calculation system 10 of the present invention stored in the storage device 2.
  • the storage device 2 is a device that records information, such as a hard disk or a memory.
  • the storage device 2 includes a program read from a computer-readable storage medium such as a CD-ROM or DVD, a program downloaded via a network (not shown), a signal or program input from the input device 3, and the CPU 1.
  • the input device 3 is a device that allows a user to input commands and signals, such as a mouse, a keyboard, and a microphone.
  • the output device 4 is a device that causes the user to recognize the output result, such as a display or a speaker.
  • this invention is not limited to what was shown as a hardware structural example, Each part can be implement
  • FIG. 3 is a flowchart showing the processing operation of the data arrangement / calculation system 10 according to the first embodiment of the present invention. With reference to FIG. 3, the processing operation of the data arrangement / calculation system 10 according to the first exemplary embodiment of the present invention will be described.
  • Step A01 The data dividing unit 111 receives data having a plurality of records from an external device (not shown).
  • the data dividing unit 111 stores a plurality of records included in the data in the order in which each of the plurality of slave devices 200 wants to calculate a sliding window according to a predetermined key of each record.
  • the data division unit 111 divides the data to generate a plurality of blocks while reflecting the order of predetermined keys included in the plurality of records. Thereafter, the data dividing unit 111 generates a copy of the plurality of divided blocks.
  • the data dividing unit 111 provides a plurality of divided blocks and a plurality of duplicated blocks to the arranging unit 112.
  • Step A02 The placement unit 112 places each of the plurality of blocks in each of the plurality of slave devices 200 as Owner. That is, the arrangement unit 112 determines Owner for each block. At this time, it is preferable that the arrangement unit 112 arranges blocks as Owners evenly among all the slaves 200.
  • Step A03 The placement unit 112 determines the location of the Replica block based on the location of the Owner block. Then, the arrangement unit 112 generates arrangement information in which the Owner block, the Replica block, and the slave device 200 are associated with each other. Specifically, the arrangement unit 112 arranges a replica of the block in the next order of the Owner block in each slave device 200 as a replica according to a predetermined key order. Then, the placement unit 112 associates each block with each slave device 200 as Owner and generates placement information associated with each slave device 200 with replica of each block as Replica. The placement unit 112 provides the generated placement information to the Owner / Replica management unit 120. The Owner / Replica management unit 120 stores arrangement information received from the arrangement unit 112.
  • Step A04 The data holding unit 210 of each slave device 200 receives and stores an Owner block and a Replica block from the data placement unit 110 of the master device 100.
  • Step A05 The job allocation unit 130 receives a job for executing a sliding window calculation for each predetermined window width from an external device (not shown). Then, the job assignment unit 130 assigns the received job to each of the plurality of slave devices 200. Specifically, when the job allocation unit 130 receives a job, the job allocation unit 130 refers to the arrangement information of the Owner / Replica management unit 120. The job allocation unit 130 recognizes a predetermined key type that determines the order of a plurality of records included in the Owner block. Then, the job allocation unit 130 extracts, from each record, a key corresponding to a predetermined key type that determines the order from a plurality of records included in the Owner block. The job assigning unit 130 assigns the extracted keys and jobs to the slave device 200 in which the Owner block is arranged.
  • Step A06 The job execution unit 220 executes the assigned job. Specifically, the job execution unit 220 receives a job and a plurality of keys that determine the order of the Owner block from the job allocation unit 130. Then, the job execution unit 220 executes a sliding window calculation for each predetermined window width set in the job for the Owner block stored in the data holding unit 210. At this time, the job execution unit 220 executes the sliding window calculation using a plurality of keys for which the order is determined as a start key for a predetermined window width of the job. Then, when the predetermined window width of the job spans the Owner block and the Replica block, the job execution unit 220 executes the sliding window calculation using not only the Owner block but also the Replica block.
  • the master device 100 arranges a replica of a block in the next order of the Owner block as a Replica block in each slave device 200 in addition to the Owner block. be able to. Then, the master device 100 can assign a job for the Owner block to each slave device 200. As a result, each slave device 200 can complete the sliding window calculation only by local access even when there is a window that crosses the boundary between the block divisions when calculating the sliding window based on the job.
  • each slave device 200 stores an Owner block and a Replica block, so that a shadow area is designated each time a sliding window is calculated. This is unnecessary, and the efficiency of the sliding window calculation can be increased.
  • the data allocation / calculation system of the present invention includes a master server (master device 100) that allocates data and divides and allocates jobs, and a plurality of slave servers (slave devices 200) that store data and execute jobs. Can be divided.
  • the master server divides the data into blocks and distributes the data to each slave server.
  • division may be performed with a preset size, or division may be performed with a preset number of records.
  • the master server sorts the records in the order in which the slave servers calculate the sliding window, and divides the records into blocks. For example, if the master server wants each slave server to perform a sliding window calculation in time series, the records are stored in time series.
  • FIG. 4 shows data in which a user ID and a time are arranged in one set in order to perform a sliding window calculation for each user and along a time series.
  • the master server distributes each block as an Owner somewhere on the slave server.
  • a method of distributed arrangement for example, a method of assigning to each slave server in round robin, a method of assigning to each slave server at random, and an assignment of Replica to be described later, adjacent blocks are separated in order to reduce damage in the event of failure
  • the method of assigning to become a slave server in the rack of can be listed. If all the slave servers are evenly distributed as Owner, the load will be distributed at the time of the subsequent job execution.
  • a master server assigns a block as a replica
  • the master server assigns the block immediately before that block to a slave server assigned as an owner.
  • the block 1 is arranged in the slave X as Owner.
  • block 2 is a block immediately adjacent to block 1.
  • block 1 is the block immediately before block 2. Therefore, the replica of block 2 is arranged as a replica in slave X.
  • FIG. 5 is an example of arrangement information indicating how each block is distributed to each slave server.
  • the arrangement information has items of block ID, start key, end key, Owner, and Replica.
  • the block ID is for uniquely identifying each block.
  • the start key and end key are keys of the first and last records held in each block.
  • Owner is the ID of the slave server that holds the block as Owner.
  • Replica is an ID of a slave server that holds the block as a Replica.
  • the arrangement information may be recorded in other formats as long as it can express which range of records each block holds and which slave server holds it as Owner or Replica.
  • Each block may be allocated to a plurality of slave servers as an Owner block and a Replica block. By assigning a plurality, it is possible to reduce the cost of data relocation and the recovery time when a failure occurs.
  • FIG. 6 is an example of a job.
  • a sliding window calculation is performed in which the average of the values X and Y is taken every minute for the data having records representing the position (X, Y) every user and every 10 seconds shown in FIG.
  • a job will be described as an example.
  • the “scan_by_window” function on the first line in FIG. 6 represents extracting a pattern that matches the keys that are the start point and end point of the window.
  • a key starting from a key ending with ““ 0:00 ”” and starting from the last key ““ 0:00 ”” to ““: 50 ”” are extracted as a window.
  • the master server refers to the arrangement information in FIG. 5 and passes a job to be executed and a range (a plurality of keys) to be read as a start key to a slave server having a block as Owner.
  • the slave server “1” holds a block having a record in the range from “Usr1-00: 00: 00” to “Usr1-00: 0: 30” as Owner. Therefore, the master server assigns to the slave server “1” a plurality of keys in the range from “Usr1-00: 00: 00” to “Usr1-00: 30” as the range to be read as the start key. That is, a job for executing these key ranges is assigned.
  • the slave server “1” searches for a record having a key ending with ““ 0:00 ”” as the window start key from the top of the block held as Owner. Then, the record associated with “Usr1-00: 00: 00” matches. Next, the slave server “1” searches for a key ending with “”: 50 ”as an end key of the window, and recognizes that“ Usr1-00: 00: 50 ”matches. Therefore, the slave server “1” calculates the average position using a record in the range from “Usr1-00: 00: 00” to “Usr1-00: 0: 50” as one window, and issues the result. .
  • the slave server “1” searches for the start key and end key of the next window, and calculates for the windows in the range from “Usr1-00: 01: 00” to “Usr1-00: 01: 50”. Execute. At this time, the slave server “1” does not have a record having the keys “Usr1-00: 01: 40” and “Usr1-00: 01: 50” as an Owner, but has a record locally as a Replica. Since the data is arranged, the sliding window calculation can be realized only by local access. Further, the slave server “1” searches for the start key of the next window, but cannot be found within the given range (the range from Usr1-00: 00: 00 to Usr1-00: 30: 1). End the job. In this way, each slave server is assigned a job related to the range from the start key to the end key of the block held as Owner.
  • FIG. 7 is a block diagram showing the configuration of the data arrangement / calculation system 20 according to the second embodiment of the present invention.
  • the data arrangement / calculation system 20 according to the second embodiment of the present invention is obtained by adding a function capable of inserting and deleting data to the data distributed and arranged in the first embodiment. Note that in the second embodiment of the present invention, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
  • the data arrangement / calculation system 20 includes a master device 300 and a plurality of slave devices 200.
  • the master device 300 includes data having a plurality of records arranged in order according to a predetermined key, and a job for executing a sliding window calculation for each predetermined window width.
  • Each of the plurality of slave devices 200 is connected to the master device 300.
  • the master device 300 will be described.
  • the master device 300 includes a data arrangement unit 110, an Owner / Replica management unit 120, a job allocation unit 130, and a data relocation unit 140.
  • the data rearrangement unit 140 operates after data is distributed and arranged in each slave device 200 by the same operation as in the first embodiment.
  • the data rearrangement unit 140 updates each of the plurality of slave devices 200 when receiving data including a new record from an external device (not shown).
  • the data arrangement unit 140 updates each of the plurality of slave devices 200 based on the deletion request.
  • the data rearrangement unit 140 includes a data insertion unit 141, a data deletion unit 142, a determination unit 143, and a rearrangement unit 144.
  • the data insertion unit 141 When the data insertion unit 141 receives data including a new record, the data insertion unit 141 refers to the arrangement information of the Owner / Replica management unit 120. The data insertion unit 141 inserts a new record into a corresponding block based on the same type of key as the predetermined key described above included in the new record. A block in which a new record is inserted is referred to as an “insert block”. For example, the data insertion unit 141 refers to the arrangement information, and inserts a new record into the second block based on a key of the same type as a predetermined key included in the new record, thereby forming an insertion block.
  • the data deletion unit 142 When the data deletion unit 142 receives a request to delete a record included in the block, the data deletion unit 142 refers to the arrangement information of the Owner / Replica management unit 120 and deletes the record from the block. A block from which a record has been deleted is proved as a “deleted block”. For example, it is assumed that the record to be deleted is included in the second block. The data deletion unit 142 extracts the second block based on a predetermined key included in the record to be deleted, deletes the record from the second block, and sets it as a deletion block.
  • the determination unit 143 determines whether or not the size of the insertion block is within a threshold (whether it is larger than a certain size). Further, the determination unit 143 determines whether or not the size of the deleted block is within a threshold (whether it is smaller than a certain size).
  • the relocation unit 144 will be described. First, a case where the data insertion unit 141 receives data including a new record will be described.
  • the rearrangement unit 144 divides the insertion block into half sizes and generates a block F and a block R. The front half of the inserted block is the block F, and the rear half is the block R.
  • the rearrangement unit 144 arranges the block F as Owner in one of the slave devices 200 and arranges a copy of the block R as Replica.
  • the rearrangement unit 144 rearranges the replica of the block F as the replica to the slave device 200 that stores the replica of the block into which the new record is inserted as the replica. Further, the rearrangement unit 144 rearranges the block R as the Owner in the slave device 200 that stores the block into which the new record is inserted as the Owner. The rearrangement unit 144 provides the rearrangement placement information to the Owner / Replica management unit 120 to update the placement information.
  • the first block as the Owner is arranged in the slave device 200a
  • the duplicate of the second block is arranged as the Replica
  • the second block as the Owner is arranged in the slave device 200b (not shown)
  • the third as the Replica It is assumed that a duplicate of the block is arranged.
  • the rearrangement unit 144 divides the insertion block into a block F and a block R.
  • the rearrangement unit 144 arranges a block F as an Owner and a copy of the block R as a Replica in a new slave device 200c (not shown).
  • the rearrangement unit 144 rearranges the copy of the block F as a replica in the slave device 200a. Furthermore, the rearrangement unit 144 rearranges the block R as an Owner in the slave device 200b. When rearranging, the rearrangement unit 144 causes the slave device 200a to delete the duplicate of the second block arranged as Replica, and causes the slave device 200b to delete the second block arranged as Owner.
  • the rearrangement unit 144 inserts a new record into the block to be inserted and a copy of the block. For example, the rearrangement unit 144 inserts a new record into the duplicate of the second block that is arranged as the Replica of the slave device 200a. In addition, the rearrangement unit 144 inserts a new record in the second block that is arranged as the Owner of the slave device 200b.
  • the rearrangement unit 144 When the size of the deleted block is within the threshold, the rearrangement unit 144 generates a unified block by integrating the block that is the next order of the deleted block and the deleted block according to a predetermined key order. The rearrangement unit 144 rearranges the replica of the integrated block as a replica to the slave device 200 that stores the replica of the block from which the record is deleted as the replica. Further, the rearrangement unit 144 rearranges the integrated block as the Owner in the slave device 200 that stores the block in the next order of the deleted block as the Owner.
  • the relocation unit 144 releases the slave device 200 that stores the block from which the record is deleted as Owner (the relocation unit 144 deletes the Owner block and Replica block allocated to the slave device 200). )
  • the rearrangement unit 144 provides the rearrangement placement information to the Owner / Replica management unit 120 to update the placement information.
  • the first block as the Owner is arranged in the slave device 200a
  • the duplicate of the second block is arranged as the Replica
  • the second block as the Owner is arranged in the slave device 200b (not shown)
  • a third block is arranged as Owner in the slave device 200d (not shown).
  • the rearrangement unit 144 integrates the deleted block and the third block to generate an integrated block.
  • the rearrangement unit 144 rearranges the replica of the integrated block as a replica in the slave device 200a.
  • the rearrangement unit 144 rearranges the integrated block as an Owner in the slave device 200d.
  • the relocation unit 144 releases the slave device 200b storing the second block as Owner (the relocation unit 144 deletes the Owner block and the Replica block allocated to the slave device 200b).
  • the rearrangement unit 144 causes the slave device 200a to delete the duplicate of the second block arranged as Replica, and causes the slave device 200d to delete the third block arranged as Owner.
  • the rearrangement unit 144 deletes the target record from the block that is the target of the record deletion and the copy of the block. For example, the rearrangement unit 144 deletes the target record from the duplicate of the second block arranged as the Replica of the slave device 200a. In addition, the rearrangement unit 144 deletes the target record from the second block arranged as the Owner of the slave device 200b.
  • FIG. 8 is a flowchart showing the processing operation of the data arrangement / calculation system 20 according to the second embodiment of the present invention.
  • the data insertion processing of the data arrangement / calculation system 20 will be described.
  • the data arrangement / calculation system 20 has already distributed and arranged the data in each slave device 200 by the same processing as in the first embodiment.
  • Step B01 When the data insertion unit 141 receives data including a new record, the data insertion unit 141 refers to the arrangement information of the Owner / Replica management unit 120. The data insertion unit 141 inserts the new record into the corresponding block based on the same type of key as the above-described predetermined key included in the new record, thereby forming an insertion block.
  • Step B02 The determination unit 143 determines whether or not the size of the insertion block is within a threshold value.
  • Step B03 In step B02, when the size of the insertion block is larger than the threshold value (YES), the rearrangement unit 144 generates the block F and the block R by dividing the insertion block into half size.
  • the front half of the inserted block is the block F and the rear half is the block R.
  • Step B04 The rearrangement unit 144 arranges the first half block F as an Owner and a duplicate of the second half block R as a replica in the new slave device 200.
  • Step B05 The rearrangement unit 144 rearranges the Replica block. That is, the rearrangement unit 144 rearranges the replica of the block F as the replica to the slave device 200 that stores the replica of the block into which the new record is inserted as the replica. When the rearrangement unit 144 rearranges the slave unit 200, the slave unit 200 deletes the copy of the block arranged as the replica.
  • Step B06 The rearrangement unit 144 rearranges the Owner block. That is, the rearrangement unit 144 rearranges the block R as the Owner in the slave device 200 that stores the block into which the new record is inserted as the Owner. Note that the rearrangement unit 144 causes the slave device 200 to delete a block arranged as Owner when rearranging.
  • Step B07 The rearrangement unit 144 provides the rearrangement placement information to the Owner / Replica management unit 120 to update the placement information.
  • Step B08 On the other hand, in step B02, when the size of the insertion block is within the threshold (NO), the rearrangement unit 144 inserts a new record into the block to be inserted and a copy of the block.
  • FIG. 9 is a flowchart showing the processing operation of the data arrangement / calculation system 20 according to the second embodiment of the present invention.
  • the data deletion process of the data arrangement / calculation system 20 will be described. Also here, it is assumed that the data arrangement / calculation system 20 has already distributed and arranged the data in each slave device 200 by the same processing as in the first embodiment.
  • Step C01 When receiving a request to delete a record included in the block, the data deletion unit 142 refers to the arrangement information of the Owner / Replica management unit 120, deletes the record from the block, and sets it as a deletion block.
  • Step C02 The determination unit 143 determines whether or not the size of the deleted block is within a threshold value.
  • Step C03 In step C02, when the size of the deleted block is within the threshold value (YES), the rearrangement unit 144 integrates and integrates the block in the next order of the deleted block and the deleted block according to a predetermined key order. Generate a block.
  • Step C04 The rearrangement unit 144 rearranges the Replica block. That is, the rearrangement unit 144 rearranges the replica of the integrated block as a replica to the slave device 200 that stores the replica of the block from which the record is deleted as the replica. When the rearrangement unit 144 rearranges the slave unit 200, the slave unit 200 deletes the copy of the block arranged as the replica.
  • Step C05 The rearrangement unit 144 rearranges the Owner block. That is, the rearrangement unit 144 rearranges the integrated block as the Owner in the slave device 200 that stores the block in the next order of the deleted block as the Owner. Note that the rearrangement unit 144 causes the slave device 200 to delete a block arranged as Owner when rearranging.
  • Step C06 The rearrangement unit 144 releases the slave device 200 that stores the block from which the record is deleted as Owner.
  • Step C07 The rearrangement unit 144 provides the rearrangement placement information to the Owner / Replica management unit 120 to update the placement information.
  • Step C08 On the other hand, in step C02, when the size of the deleted block is larger than the threshold (NO), the rearrangement unit 144 deletes the record from the block to be deleted and the copy of the block.
  • the data arrangement / calculation system 20 according to the second embodiment of the present invention efficiently executes the sliding window calculation using the Replica block even when the window width exceeds the Owner block. Can be made. Moreover, the data placement / calculation system 20 according to the second exemplary embodiment of the present invention inserts and deletes data in each slave device 200 even when data is inserted and deleted after the data is placed in each slave device 200. The Owner block and the Replica block reflecting the deletion can be rearranged.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système d'agencement/calcul de données pourvu d'un dispositif maître et d'une pluralité de dispositifs esclaves ; le dispositif maître étant pourvu d'une unité d'agencement de données et d'une unité d'attribution de travaux. L'unité d'agencement de données comprend une unité de partitionnement de données pour partitionner les données et générer une pluralité de blocs et des copies de la pluralité de blocs, et une unité d'agencement pour agencer un premier bloc parmi la pluralité de blocs en tant que propriétaire sur un premier dispositif esclave parmi la pluralité de dispositifs esclaves et agencer en outre une copie d'un second bloc qui suit le premier bloc dans l'ordre selon un ordre de clés prédéterminé en tant que réplique sur le premier dispositif esclave. Le premier dispositif esclave est pourvu d'une unité de conservation de données pour stocker le premier bloc et la copie du second bloc, et d'une unité d'exécution de travaux pour exécuter un calcul à fenêtrage dynamique à l'aide du premier bloc et de la copie du second bloc si une largeur de fenêtrage prédéterminée s'étend sur le premier bloc et la copie du second bloc.
PCT/JP2010/070854 2009-12-07 2010-11-24 Système d'agencement/calcul de données, procédé d'agencement/calcul de données, dispositif maître et procédé d'agencement de données WO2011070910A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2011545163A JP5799812B2 (ja) 2009-12-07 2010-11-24 データ配置・計算システム、データ配置・計算方法、マスタ装置、及びデータ配置方法
US13/514,229 US8898677B2 (en) 2009-12-07 2010-11-24 Data arrangement calculating system, data arrangement calculating method, master unit and data arranging method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2009-277521 2009-12-07
JP2009277521 2009-12-07

Publications (1)

Publication Number Publication Date
WO2011070910A1 true WO2011070910A1 (fr) 2011-06-16

Family

ID=44145457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2010/070854 WO2011070910A1 (fr) 2009-12-07 2010-11-24 Système d'agencement/calcul de données, procédé d'agencement/calcul de données, dispositif maître et procédé d'agencement de données

Country Status (3)

Country Link
US (1) US8898677B2 (fr)
JP (1) JP5799812B2 (fr)
WO (1) WO2011070910A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014020735A1 (fr) * 2012-08-02 2014-02-06 富士通株式会社 Procédé de traitement de données, dispositif de traitement d'informations et programme
WO2014068980A1 (fr) * 2012-11-01 2014-05-08 日本電気株式会社 Système de traitement de données distribué et procédé de traitement de données distribué
JP2017021494A (ja) * 2015-07-08 2017-01-26 日本電信電話株式会社 負荷分散プログラムおよびサーバ

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9323892B1 (en) * 2009-07-01 2016-04-26 Vigilytics LLC Using de-identified healthcare data to evaluate post-healthcare facility encounter treatment outcomes
US9118641B1 (en) 2009-07-01 2015-08-25 Vigilytics LLC De-identifying medical history information for medical underwriting
US9843470B1 (en) * 2013-09-27 2017-12-12 Amazon Technologies, Inc. Portable data center
US10965525B1 (en) 2016-06-29 2021-03-30 Amazon Technologies, Inc. Portable data center for data transfer
US9795062B1 (en) 2016-06-29 2017-10-17 Amazon Technologies, Inc. Portable data center for data transfer
US10398061B1 (en) 2016-06-29 2019-08-27 Amazon Technologies, Inc. Portable data center for data transfer
US10592280B2 (en) 2016-11-23 2020-03-17 Amazon Technologies, Inc. Resource allocation and scheduling for batch jobs

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002358293A (ja) * 2001-05-31 2002-12-13 Nec Corp 実行時負荷分散システム及び実行時負荷分散方法並びにプログラム
JP2003248667A (ja) * 2002-02-26 2003-09-05 Fujitsu Ltd 領域分割パターン決定方法
JP2006236123A (ja) * 2005-02-25 2006-09-07 Fujitsu Ltd ジョブ分散プログラム、ジョブ分散方法およびジョブ分散装置
JP2006252394A (ja) * 2005-03-14 2006-09-21 Sony Corp 情報処理システム、情報処理装置および方法、並びにプログラム
JP2007244887A (ja) * 2001-12-03 2007-09-27 Ziosoft Inc ボリュームレンダリング処理方法、ボリュームレンダリング処理システム、計算機及びプログラム

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11120151A (ja) 1997-10-15 1999-04-30 Hitachi Ltd 不要通信を抑止するプログラム並列化方法
JP2001101149A (ja) * 1999-09-30 2001-04-13 Nec Corp 分散並列型データ処理装置及び分散並列型データ処理プログラムを記録した記録媒体並びに分散並列型データ処理システム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002358293A (ja) * 2001-05-31 2002-12-13 Nec Corp 実行時負荷分散システム及び実行時負荷分散方法並びにプログラム
JP2007244887A (ja) * 2001-12-03 2007-09-27 Ziosoft Inc ボリュームレンダリング処理方法、ボリュームレンダリング処理システム、計算機及びプログラム
JP2003248667A (ja) * 2002-02-26 2003-09-05 Fujitsu Ltd 領域分割パターン決定方法
JP2006236123A (ja) * 2005-02-25 2006-09-07 Fujitsu Ltd ジョブ分散プログラム、ジョブ分散方法およびジョブ分散装置
JP2006252394A (ja) * 2005-03-14 2006-09-21 Sony Corp 情報処理システム、情報処理装置および方法、並びにプログラム

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014020735A1 (fr) * 2012-08-02 2014-02-06 富士通株式会社 Procédé de traitement de données, dispositif de traitement d'informations et programme
WO2014068980A1 (fr) * 2012-11-01 2014-05-08 日本電気株式会社 Système de traitement de données distribué et procédé de traitement de données distribué
CN104769551A (zh) * 2012-11-01 2015-07-08 日本电气株式会社 分布式数据处理系统和分布式数据处理方法
JPWO2014068980A1 (ja) * 2012-11-01 2016-09-08 日本電気株式会社 分散データ処理システム、及び、分散データ処理方法
CN104769551B (zh) * 2012-11-01 2018-07-03 日本电气株式会社 分布式数据处理系统和分布式数据处理方法
US10296493B2 (en) 2012-11-01 2019-05-21 Nec Corporation Distributed data processing system and distributed data processing method
JP2017021494A (ja) * 2015-07-08 2017-01-26 日本電信電話株式会社 負荷分散プログラムおよびサーバ

Also Published As

Publication number Publication date
US8898677B2 (en) 2014-11-25
JP5799812B2 (ja) 2015-10-28
JPWO2011070910A1 (ja) 2013-04-22
US20120246661A1 (en) 2012-09-27

Similar Documents

Publication Publication Date Title
JP5799812B2 (ja) データ配置・計算システム、データ配置・計算方法、マスタ装置、及びデータ配置方法
US11082206B2 (en) Layout-independent cryptographic stamp of a distributed dataset
US9292217B2 (en) Logical volume space sharing
US8799601B1 (en) Techniques for managing deduplication based on recently written extents
US11442961B2 (en) Active transaction list synchronization method and apparatus
JP6542909B2 (ja) ファイル操作方法及び装置
US8086810B2 (en) Rapid defragmentation of storage volumes
US20140297592A1 (en) Computer-readable medium storing program and version control method
US20130097117A1 (en) Application of a differential dataset to a data store using sequential change sets
JP6479186B2 (ja) 計算機システム及びデータベース管理方法
CN102110121A (zh) 一种数据处理方法及其系统
JP2008217209A (ja) 差分スナップショット管理方法、計算機システム及びnas計算機
US9854037B2 (en) Identifying workload and sizing of buffers for the purpose of volume replication
JP2015510174A (ja) ロケーション非依存のファイル
CN111459884B (zh) 一种数据的处理方法、装置、计算机设备和存储介质
CN111522502B (zh) 数据去重方法、装置、电子设备及计算机可读存储介质
EP3707614B1 (fr) Données de table de redistribution dans une grappe de bases de données
US10509767B2 (en) Systems and methods for managing snapshots of a file system volume
JP2011191835A (ja) 計算機システムおよびアプリケーションプログラムの実行方法
KR20110046118A (ko) 적응적 로깅 장치 및 방법
JP6272556B2 (ja) 共有リソース更新装置及び共有リソース更新方法
JP5494817B2 (ja) ストレージシステム、データ管理装置、方法及びプログラム
US10997126B1 (en) Methods and apparatus for reorganizing dynamically loadable namespaces (DLNs)
TWI475419B (zh) 用於在儲存系統上存取檔案的方法和系統
JP6202026B2 (ja) データ管理装置、データ管理方法およびデータ管理プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10835832

Country of ref document: EP

Kind code of ref document: A1

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10835832

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011545163

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 13514229

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10835832

Country of ref document: EP

Kind code of ref document: A1