WO2013179451A1 - 並列データ処理システム、計算機および並列データ処理方法 - Google Patents
並列データ処理システム、計算機および並列データ処理方法 Download PDFInfo
- Publication number
- WO2013179451A1 WO2013179451A1 PCT/JP2012/064149 JP2012064149W WO2013179451A1 WO 2013179451 A1 WO2013179451 A1 WO 2013179451A1 JP 2012064149 W JP2012064149 W JP 2012064149W WO 2013179451 A1 WO2013179451 A1 WO 2013179451A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- record
- data set
- thread
- input data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/466—Transaction processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
Definitions
- the present invention relates to parallel data processing technology.
- the entire data set stored in storage or the like (for example, a file stored in a file system) is basically read and processed.
- an application job
- the data The entire set needs to be read, and data processing is not necessarily efficient, and thus the time required for data processing may be increased.
- the time required to read the entire data set becomes longer, and thus the time required for data processing may become longer.
- an object of the present invention is to shorten the time for data processing.
- a parallel data processing system included in one computing node in a computer system that executes data processing in parallel in a plurality of computing nodes includes a first data group including a plurality of first data and a plurality of second data.
- a parallel data processing execution unit that reads data from the data group including the second data group and executes the process.
- the parallel data processing system may be, for example, a system module in the first and second embodiments described later.
- the first data in the first data group may correspond to the second data in the second data group.
- the first data group may be an index of the second data group.
- the first data may include an index key value of the second data and a reference to one or more second data corresponding to the index key value.
- the parallel data processing execution unit (A) Reading the first data from the first data group, acquiring the first value from the first data based on the first format information acquired from the application, (B) generating one or more threads for reading each of the one or more second data corresponding to the first value from the second data group based on the first reference information acquired from the application; (C) performing (A) to (B) on one or more first data in the first data group; (D) A plurality of the threads are executed in parallel.
- the parallel data processing system may further include a reception unit that receives processing instructions from the application.
- the instruction from the application generally defines the procedure, but the parallel data processing execution unit receives the instruction from the application and executes (A) to (D), so that the instruction from the application is Even if the procedure is defined, the parallel data processing system can execute out-of-order processing independent of the procedure.
- the calculation node can read data related to data processing in parallel. Thereby, it is expected that the throughput of data reading is improved, and therefore, the time for data processing is shortened.
- FIG. 1A illustrates a job execution model according to the first embodiment.
- FIG. 1B illustrates a job execution model according to the second embodiment.
- FIG. 2A illustrates a configuration of a calculation node according to the first embodiment.
- FIG. 2B illustrates a configuration of a calculation node according to the second embodiment.
- FIG. 3A illustrates a first configuration example of the computer system according to the embodiment.
- FIG. 3B illustrates a second configuration example of the computer system according to the embodiment.
- FIG. 4A shows a flow of map task execution processing according to the first embodiment.
- FIG. 4B shows a flow of task execution processing according to the second embodiment.
- FIG. 5A is a diagram for explaining the input data and the process for the input data according to the first embodiment.
- FIG. 5A is a diagram for explaining the input data and the process for the input data according to the first embodiment.
- FIG. 5B shows an example of a format and reference according to the first embodiment.
- FIG. 5C is an example of a schematic diagram illustrating thread generation and thread execution.
- FIG. 5D is a first diagram for explaining the input data and the process for the input data according to the second embodiment.
- FIG. 5E is a second diagram for explaining the input data and the process for the input data according to the second embodiment.
- FIG. 5F shows an example of a format according to the second embodiment.
- FIG. 5G illustrates an example of a reference method according to the second embodiment.
- FIG. 5H illustrates an example of a catalog according to the second embodiment.
- FIG. 6A is an example of a schematic diagram for explaining a session during data acquisition.
- FIG. 6A is an example of a schematic diagram for explaining a session during data acquisition.
- FIG. 6B is an example of a schematic diagram for explaining a blocked session at the time of data acquisition according to the modification.
- FIG. 7A shows a flow of a record acquisition process in a calculation node that acquires a record according to the first embodiment.
- FIG. 7B shows a flow of record acquisition processing in a calculation node that manages records according to the first embodiment.
- FIG. 7C illustrates the configuration of the data read request management table and the remote record acquisition request management table according to the first embodiment.
- FIG. 7D illustrates a flow of record acquisition processing in a calculation node that acquires records according to the modification of the first embodiment.
- FIG. 7E shows the flow of a record acquisition process in a calculation node that manages records according to a modification of the first embodiment.
- FIG. 7A shows a flow of a record acquisition process in a calculation node that acquires a record according to the first embodiment.
- FIG. 7B shows a flow of record acquisition processing in a calculation node that manages records according to the
- FIG. 7F illustrates a configuration of a blocked remote record acquisition request management table according to a modification of the first embodiment.
- FIG. 8A shows a configuration of a node-level resource constraint management table according to the first embodiment.
- FIG. 8B illustrates a configuration of a job level resource constraint management table according to the first embodiment.
- FIG. 8C illustrates a configuration of a process level resource constraint management table in the overall supervisor process according to the first embodiment.
- FIG. 8D illustrates a configuration of a process level resource constraint management table in the supervisor process of each node according to the first embodiment.
- FIG. 8E illustrates a configuration of a task level resource constraint management table in the overall supervisor process according to the first embodiment.
- FIG. 8A shows a configuration of a node-level resource constraint management table according to the first embodiment.
- FIG. 8B illustrates a configuration of a job level resource constraint management table according to the first embodiment.
- FIG. 8C illustrates a configuration of a process level
- FIG. 8F illustrates a configuration of a task level resource constraint management table in the supervisor process of each node according to the first embodiment.
- FIG. 8G shows a flow of the overall resource constraint management process according to the first embodiment.
- FIG. 8H illustrates a flow of resource constraint management processing in each node according to the first embodiment.
- FIG. 9A illustrates a first example of tasks according to the first embodiment.
- FIG. 9B illustrates an example of thread generation in the task illustrated in the first example according to the first embodiment.
- FIG. 9C illustrates an example of thread generation in the task illustrated in the first example according to the first embodiment.
- FIG. 9D illustrates a second example of the task according to the first embodiment.
- FIG. 9E illustrates an example of thread generation in the task illustrated in the second example according to the first embodiment.
- FIG. 1A shows a job execution model according to the first embodiment.
- one solid circle represents one process (for example, a map process or a reduce process).
- One dashed circle in the process indicates one task.
- One rounded square in the task indicates one thread.
- a job is executed by a plurality of computation nodes connected via a network.
- a supervisor process in a compute node that supervises the entirety of multiple compute nodes distributes application code to all compute nodes that participate in job execution, and the supervisor supervisor process performs each computation. Allocate processes such as map process and reduce process to the supervisor process of the node.
- the supervisor process of each compute node generates a process based on the instruction of the supervisory supervisor process. Further, the generated process generates a task based on an instruction from the supervisory supervisor process.
- Each computation node executes the job by executing the map operation included in the application by executing the processes and tasks generated in this way.
- the supervisory supervisor process may be any one of a plurality of supervisor processes in a plurality of computing nodes (that is, any one supervisor process may also serve as the supervisory supervisor process) or a plurality of supervisor processes.
- a dedicated process functioning as a dedicated supervisory supervisor process prepared separately from the process may be used.
- the process allocation is not limited to the above, and the supervisor supervisor process may instruct the supervisor process of each computation node to perform the process.
- the process and task generation is not limited to the above, and the supervisor supervisor process directly performs the process. Also good.
- the parallel data processing system executes the map process according to the job execution model, and stores the input data set # 1 (that is, the first data set) and the input data set # 2 (that is, the second data set) stored in the storage. ) And a map operation are executed, and the result record is written to an intermediate data set stored in the storage. Further, the reduction process is executed, the result record written in the intermediate data set is input, the reduction operation is executed, and the result record is written in the output data set # 1.
- Each of the input data set, the intermediate data set, and the output data set is a collection of a plurality of records, and may be structured by some data structure or may not be structured.
- the input data set # 2 may be a set of one or more files.
- the input data set # 1 only needs to correspond to the record of the input data set # 2.
- the input data set # 1 may be an index of the input data set # 2.
- Each record of the input data set # 1 includes a predetermined index key value of the record of the input data set # 2 and a reference indicating one or more records of the input data set # 2 corresponding to the index key value.
- the reference may include a storage position where the record of the input data set # 2 can be specified on the storage device, or the record can be specified on the data structure provided for storing the input data set # 2.
- a unique index key may be included.
- the input data set # 1 may have records corresponding to all the records of the input data set # 2, or only records corresponding to some of the records of the input data set # 2.
- the input data set # 1 may not be an index of the input data set # 2.
- the input data set # 1 and the input data set # 2 are connectable, the input data set # 1 is simply a collection of records, and the input data set # 2 is a collection of records, depending on the data structure.
- a record of the input data set # 2 that is structured by a certain index key and has an index key value corresponding to the value of a certain item in a certain record of the input data set # 1 may be read. .
- the input data set # 1 and the input data set # 2 may belong to the same data structure.
- the input data set # 1 may be a set of internal nodes that constitute a B-tree
- the input data set # 2 may be a set of leaf nodes that constitute the same B-tree.
- the input data set # 1 is a set of nodes at a certain level constituting the B-tree
- the input data set # 2 is a node at the next level (referenced from the input data set # 1) constituting the same B-tree.
- the input data set may be composed of three or more. For example, there may be another input data set corresponding to the record of the input data set # 2.
- the program code may be an instruction that can be executed by a processor generated by compilation or the like, an instruction that can be converted into an instruction that can be executed by the processor by an execution processing system, or the like. It may be a declaration that can generate an instruction that can be executed by the processor by an execution processing system or the like. Furthermore, it may be a combination of these and may further include other information.
- the instruction and the declaration may be a byte string that can be interpreted by a processor, a compiler, an execution processing system, or the like, or may be described in a source code or the like.
- the reduce operation is a program code included in the application stored in the calculation node.
- a reduction operation is a program code that defines the processing applied to records in an intermediate data set. For example, a result is obtained by aggregating result records (consisting of key / value pairs) generated by a map operation according to a key. A record is generated.
- the input data set # 1 may be an index of the input data set # 2.
- each record of the input data set # 1 has a reference value indicating a predetermined index key value of the record of the input data set # 2 and one or more records of the input data set # 2 corresponding to the index key value. May include.
- the input data set # 3 may be an index of the input data set # 4.
- Each record of the input data set # 3 includes a predetermined index key value of the record of the input data set # 4 and a reference indicating one or more records of the input data set # 4 corresponding to the index key value.
- the predetermined item of the record of the input data set # 2 may include a reference indicating one or more records of the input data set # 3.
- the processor obtains a reference to the input data set # 3 from the record that satisfies the requirement.
- the processor generates a thread that performs processing by referring to the input data set # 3 using the reference. If there are multiple references, the processor creates multiple threads. These threads are executed in parallel by the processor.
- the processor generates a record to be input to the stage calculation # 1 based on the build method # 1 with respect to the record that satisfies the requirements, and executes the stage calculation # 1 using the record as an input. Further, the division operation # 1 is performed on the execution result record of one or more stage operations # 1, thereby determining the subsequent stage # 2 process to which the execution result record is to be sent.
- the processor outputs so that the determined subsequent stage process can receive the execution result record. Specifically, the execution result is stored in an intermediate data set or sent to a subsequent stage process on the network.
- the processor of the computation node executes a task in the stage # 2 process, acquires a record passed from the stage # 1 process in the task, executes the stage calculation # 2, and further divides By performing the calculation # 2, the process of the subsequent stage # 3 to which the execution result record of the stage calculation # 2 is to be sent is determined.
- the processor outputs so that the process of the determined subsequent stage can receive the execution result record of the stage operation # 2.
- the execution result is stored in an intermediate data set or sent to a subsequent stage process on the network.
- records from a plurality of different data sets may be input in parallel or sequentially.
- the first data set and the second data set have a relationship that allows respective records to be combined. That is, the first data set may not be structured, and the second data set may be structured. Since the second data set is structured, the second record corresponding to the record of the first data set associated with the second data set can be selectively extracted.
- the second embodiment is an extension of the first embodiment, and the part described with respect to the first embodiment is applied to the second embodiment without being described again.
- the memory 105 includes an application program (hereinafter referred to as an application) 110, a system module 120, a process manager 131, a task manager 132, a thread manager 133, a data reader / writer 140, a network manager 150, a storage manager 160, and an OS (Operating System) 170.
- an application hereinafter referred to as an application
- the system module 120, the process manager 131, the task manager 132, the thread manager 133, the data reader / writer 140, the network manager 150, and the storage manager 160 (hereinafter, individual program modules are collectively referred to as a module group) and the application 110 are static. Alternatively, it may be a library module that is dynamically linked and executed.
- an instruction from the application 110 or a mutual instruction between program modules in the module group depends on a call interface disclosed by the module group.
- the module group may be a program that operates separately from the application 110.
- an instruction from the application 110 is based on means such as inter-process communication or shared memory.
- the application 110 is a program that defines a job that reads an input data set stored in the storage 104, executes predetermined processing, and writes an output data set, and the calculation node executes the job by executing the application 110.
- the map 110a, the reduce operation 110b, the division operation 110c, the format 110e (format # 1), and the format 110f (format 110f) are used as the information that defines the job (hereinafter referred to as job information).
- job information hereinafter referred to as job information.
- # 2 condition 110g.
- the data reader / writer 140 reads / writes data from / to the storage based on instructions from the system module 120.
- the data reader / writer 140 may be a file system, for example. For example, if the data reader / writer 140 needs to read / write data from / to the storage 104 of its own computing node 100 in order to perform the specified data read / write, the storage manager 160 causes the storage 104 to execute data read / write, When it is necessary to read / write data from / to the storage 104 of another computing node 100 connected via the network 200, the data is transferred to the storage 104 of another computing node 100 connected via the network 200 by the network manager 150. Read / write is executed. At this time, the data reader / writer 140 may temporarily cache data to be read / written using the memory resource of the memory 105.
- the network manager 150 controls data communication with devices (for example, other computing nodes 100) connected via the network.
- the storage manager 160 controls input / output with the storage 104 of its own computing node 100.
- the OS 170 manages devices such as the NIC 102, the HBA 103, and the storage 104, and also manages the entire computing node 110.
- the computing node 100 may include a plurality of at least one element of the CPU 101, the NIC 102, and the HBA 103 from the viewpoint of performance and availability.
- the computing node 100 may include an input device (for example, a keyboard and a pointing device) (not shown) and a display device (for example, a liquid crystal display) (not shown).
- the input device and the display device may be integrated.
- the system module 120 includes a generalized stage function 124 and a supervisor function 122.
- the generalized stage function 124 is program code (function) executed in each process shown in FIG. 1B.
- the storage 400 includes one or more nonvolatile storage media.
- the nonvolatile storage medium is, for example, a magnetic disk or a flash memory.
- the storage 104 may include a plurality of nonvolatile storage media, and may further include a RAID (Redundant ARRAY of Independent Disks) controller that configures a storage space from the nonvolatile storage media. All or part of the storage resources of the storage 400 may be handled in the same manner as the storage 104 included in the computing node 100.
- RAID Redundant ARRAY of Independent Disks
- the processor 101 of the computing node 100 executes the processing of steps S10 to S15 by executing one thread SL1 for reading the record of the input data set # 1 and executing the processing. This process is realized by the processor 101 executing mainly the map function 121.
- the processor 101 acquires one record from the input data set # 1.
- the input data set # 1 may be stored in the storage 104 of its own calculation node 100, or may be stored in the storage 104 of another calculation node 100.
- the record acquisition process for acquiring records from the input data set # 1 will be described later.
- the processor 101 interprets the contents of each item included in the acquired record based on the format 110e (format # 1), and the input data set # 1 included in the condition 110g is obtained for the acquired record.
- the condition for the record it is determined whether or not the record satisfies the condition. If necessary, the process proceeds to S12. If not necessary, the process proceeds to S15 although not shown.
- a part of the condition 110g may be applied. If the condition 110g is not defined, the process may proceed to S12 as it is.
- the processor 101 generates a thread SL2 for acquiring a record from the input data set # 2 and performing processing for one reference to the input data set # 2 of the acquired record.
- the processor 101 determines whether or not there is a reference to the unprocessed input data set # 2 in the acquired record. If the result of this determination is affirmative, S13 is performed, while if the result of this determination is negative, S15 is performed. Thus, if there are a plurality of references to the input data set # 2 in the acquired record, the thread SL2 corresponding to the number of references is generated in S13. Note that when the resources necessary for generating the thread are insufficient, the generation of the thread SL2 may be temporarily suspended. At this time, one thread SL2 may be generated for each reference, or one thread SL2 may be generated for a plurality of (for example, a predetermined number) references.
- the thread SL2 generated by the thread SL1 in S13 is executed by the CPU 101.
- the processor 101 executes the processing of step S16 to step S19 by executing the thread SL2.
- the processor 101 can execute a plurality of threads (thread SL1, thread SL2, etc.) in parallel.
- the computing node 100 includes a plurality of processors 101, and a thread SL2 generated by one processor 101 may be executed by another processor 101. Note that the number of threads that can be executed in parallel is limited by the resources of the computing node 100 and the like.
- the processor 101 acquires one record from the input data set # 2 using the reference acquired by the reference method 110h and the thread SL1.
- the processor 101 interprets the contents of each item included in the acquired record based on the format 110f (format # 2), and the input data set # 2 included in the condition 110g for the acquired record. By applying the condition for the record, it is determined whether or not the record is necessary. If necessary, the process proceeds to S18. If not necessary, the process proceeds to S19 although not shown. In S17, a part of the condition 110g may be applied. If the condition 110g is not defined, the process may proceed to S18 as it is.
- the processor 101 stores the acquired record in the calculation queue 180 of the main memory 105.
- the processor 101 determines whether there is another record in the range indicated by the reference of the input data set # 2. If the result of this determination is affirmative, S16 is performed, while if the result of this determination is negative, this process is terminated and the thread SL2 that performed this process is terminated.
- the processor 101 obtains one record from the computation queue 180, applies a map computation 110a to the record, executes a predetermined process on the record, and obtains the execution result. Output.
- the processor 101 may execute S20 in a thread different from SL2. There may be one or more threads that execute S20.
- the map calculation 110a may be applied by acquiring a plurality of records at once from the calculation queue 180. Instead of executing S18 in the thread SL2, the processor 101 may execute S20 after executing S17 in the thread SL2 and apply the map operation to the record of the result of S17.
- the execution result may be stored in the main memory 105, or the execution result may be passed to a reduce process that executes subsequent processing.
- a thread SL1 that reads records from the input data set # 1 and performs processing thereof and a thread SL2 that reads records from the input data set # 2 and performs processing thereof are provided separately.
- the processing corresponding to S16 to S19 or S20 may be performed as it is without generating the thread SL2.
- a predetermined number of threads SL1 may be generated, records may be read from the input data set # 1 by the threads SL1, and the processing may be performed.
- SL1 may be newly generated, and processing corresponding to S10 to S15 may be performed.
- FIG. 4B shows a flow of task execution processing according to the second embodiment.
- This task execution process indicates a process executed in a task executed in the stage process in the job execution model shown in FIG. 1B.
- the processor 101 of the computation node 100 executes the processing of steps S21 to S26 by executing one thread SLd 1 for reading the record of the input data set #d 1 and executing the processing.
- the input data set #d 1 is the input data set # 1
- the input data set #d 2 is the input data set # 2 .
- the input data set #d 1 is the input data set # 5
- the input data set #d 2 is an input data set # 6.
- the processor 101 acquires one record from the input data set #d 1.
- the input data set #d 1 may be stored in the storage 104 of its own calculation node 100 or may be stored in the storage 104 of another calculation node 100. Note that the record acquisition process of acquiring records from the input data set #d 1 will be described later.
- the processor 101 based on the format 110m corresponding to the record in the input data set #d 1, with respect to interpret the contents of each item contained in the acquired record, the acquired records are included in the condition 110n by applying the criteria for records in the input data set #d 1, in the record to determine whether to satisfy the condition, if necessary, the process proceeds to S23, if not necessary, although not shown S26 move on. In S22, a part of the condition 110n may be applied. If the condition 110n is not defined, the process may proceed to S23 as it is.
- the processor 101 the acquired record, it is determined whether there is a reference to the input data set #d 2. If the result of this determination is affirmative, S24 is performed, while if the result of this determination is negative, S26 is performed.
- the processor 101 for one reference to the input data set #d 2 of the acquired record, it generates a thread SLd 2 for getting to process records from the input data set #d 2.
- the processor 101 the acquired record, it is determined whether there are more references to input data set #d 2. If the result of this determination is affirmative, S24 is performed, while if the result of this determination is negative, S26 is performed. Thus, if there are a plurality of references to the input data set #d 2 in the acquired record, a thread SLd 2 corresponding to the number of references is generated in S24. Note that when the resources necessary for generating the thread are insufficient, the generation of the thread SLd 2 may be temporarily suspended. At this time, it may generate one thread SLd 2 for each of the one reference, may generate one thread SLd 2 for each of the plural reference (e.g., a predetermined number).
- the processor 101 further determines whether there is another record in the input data set #d 1. If the result is affirmative in this determination, while performing S21, if is negative result of this determination, we end this process, and terminates the thread SLd 1 executing this process.
- the thread SLd k generated from the thread SLd k-1 in S24 and S31 to be described later is executed by the processor 101.
- k represents a natural number of 2 or more.
- the processor 101 executes the processing of steps S27 to S35 by executing the thread SLd k .
- the processor 101 can execute a plurality of threads (thread SLd 1 , thread SLd k, etc.) in parallel.
- the computing node 100 includes a plurality of processors 101, and a thread generated by one processor 101 may be executed by another processor 101. The number of threads that can be executed in parallel is limited by the resources of the computing node 100 and the like.
- the processor 101 based on the reference method 110o for referring to the input data set #SLd k using the reference and the references of record input data set #SLd k-1, the input data set #SLd k Get a record.
- the processor 101 interprets the content of each item included in the acquired record based on the format 110f (format #d k ), and the input data set #d included in the condition 110n for the acquired record. By applying the condition for the record of k , it is determined whether or not the record is necessary. If necessary, the process proceeds to S29. If not necessary, the process proceeds to S35 although not shown. In S28, a part of the condition 110n may be applied. If the condition 110n is not defined, the process may proceed to S29 as it is.
- the processor 101 determines whether or not access to the input data set #d k + 1 is further required. If the result of this determination is affirmative, S30 is performed, while if the result of this determination is negative, the process proceeds to S33. For example, in the stage # 1 process of FIG. 1B, a record is acquired from the input record # 1, a record is acquired from the input record # 2 according to the reference in the record, and a record is input from the input record # 3 according to the reference in the record In addition, it is necessary to acquire a record from the input record # 4 according to the reference in the record.
- the processor 101 determines whether or not the acquired record has a reference to the input data set #d k + 1 . If the result of this determination is affirmative, S31 is performed, while if the result of this determination is negative, S35 is performed.
- the processor 101 generates a thread SLd k + 1 for performing processing by acquiring a record from the input data set #d k + 1 for one reference to the input data set #d k + 1 of the acquired record.
- the processor 101 determines whether or not the acquired record further includes a reference to the input data set #d k + 1 . If the result of this determination is affirmative, S31 is performed, while if the result of this determination is negative, S35 is performed. Thereby, if there are a plurality of references to the input data set #d k + 1 in the acquired record, a thread SLd k + 1 corresponding to the number of references is generated in S31. Note that when the resources necessary for generating the thread are insufficient, the generation of the thread SLd k + 1 may be temporarily suspended. At this time, it may generate one thread SLd k + 1 for each of the one reference, may generate one thread SLd k + 1 for each of the plural reference (e.g., a predetermined number).
- the processor s101 builds (generates) a record in a predetermined format based on the acquired record and the build method 110p.
- the processor 101 executes predetermined processing by applying the stage operation 110 j to the built record, and outputs the execution result.
- the processor 101 instead of executing S34 after executing S33 in the thread SLd k , the processor 101 temporarily stores the built record in the calculation queue of the main memory 105, acquires one record from the calculation queue, and then executes S34. , And applying the stage operation 110j to the record, a predetermined process may be executed on the record, and the execution result may be output.
- the processor 101 may execute S34 in a thread different from SLd k . There may be one or more threads that execute S34.
- the stage calculation 110j may be applied by acquiring a plurality of records at once from the calculation queue.
- CPU 101 further determines whether there is another record in the range indicated by the reference input dataset #d k. If this determination result is affirmative, while performing S27, if the result of this determination is negative, then the processing ends, it stops the thread SLd k executing this process.
- the thread SLd k-1 may perform the processing corresponding to S27 to S35 without generating the thread SLd k .
- a predetermined number of threads SLd 1 may be generated, and records may be read from the input data set # 1 by the threads SLd 1 and processed.
- SLd 1 or SLd k is newly generated, and processing corresponding to S21 to S26 or processing corresponding to S27 to S35 is performed. Good.
- FIG. 5A is a diagram for explaining the input data and the process for the input data according to the first embodiment.
- Input data set # 2 stores one or more records including “Date & time”, “User”, “Product”, and “Comment”.
- the input data set # 1 one or more records having items of “Product” and “Reference” are collected and managed for each year. That is, the input data set # 1 is divided by year and month.
- the supervisor process performs parallel data processing by assigning a map task or the like for each of the collected parts (divided parts). At this time, one map task may be responsible for a plurality of divided portions.
- “Product” stores a value of a key item (“Product”) used to search for a record in the input data set # 2.
- “Reference” includes the physical record of the record corresponding to the record in the input data set # 2 that stores the same value as the value of “Product” in the record (reference destination record). A reference indicating the correct storage location is stored.
- “Reference” stores references to the plurality of reference destination records.
- the input record # 2 may have a structure (for example, a B-tree) in which a record can be searched with a certain key, and may be a reference for storing the value of the key in “Reference”.
- a plurality of records may correspond to a certain reference.
- the record format of input data set # 2 is described in format # 2. Further, a method of referring to the record of the input data set # 2 using “Reference” of the input data set # 1 is described in the reference method # 1.
- the processor 101 reads the record of the input data set # 1 by executing the map function 121, and grasps the product and the reference in the record by the format # 1. . Then, the map function 121 determines whether or not a predetermined condition is met based on the value of Product, and identifies a reference of a record that satisfies the condition. Then, the processor 101 executes the map function 121 to acquire a record of the input data set # 2 based on the reference method # 1 and the reference.
- FIG. 5B is a diagram for explaining the format and reference according to the first embodiment.
- Format # 1 is information relating to the record format of the input data set # 1, and in this embodiment, a procedure for interpreting the record of the input data set # 1 is described.
- Format # 1 includes interpreting an input record (that is, record of input data set # 1) in binary format, interpreting each column of the input record as Text type, Long type, Int type, Int type, It describes that a column of 1 (0) is used as a search key.
- the column type is indicated using a type declaration in the Java (registered trademark) language, but the present invention is not limited to this.
- Format # 2 is information relating to the record format of the input data set # 2, and in this embodiment, a procedure for interpreting the record of the input data set # 2 is described.
- Format # 2 describes that an input record (that is, a record of input data set # 2) is interpreted in a text format (character string format).
- the delimiter between the columns in the record is a comma, the first (0) column is the DateTime type, named “Date & Time”, and the second (1) column is the Text type. , “User”, the third (2) column is Text type, “Product”, the fourth (3) column is Text type, “Comment”, based on these To interpret the input column.
- the reference method # 1 is a procedure for a method of referring to the record of the input data set # 2 by using “Reference” of the input data set # 1, and the second method of the input record (record corresponding to the format # 1). It is described that a record is obtained by referring to a physical reference using a column as an offset, a third column as a length, and a fourth column as a node ID.
- the physical reference means that the specified offset (address) of the storage managed by the specified node ID is a starting point and a byte string for the specified length is used as a reference destination record.
- FIG. 5C is an example of a schematic diagram illustrating thread generation and thread execution.
- the upper side of FIG. 5C shows an example in which a process is executed by a single thread, and the lower side of FIG. 5C dynamically generates a thread according to the first embodiment and executes a plurality of threads in parallel.
- An example is shown. Note that the notation in FIG. 5C follows the following rules.
- the horizontal axis represents time.
- a round rectangle with a long side in the figure means a series of processing by one thread.
- the left end of the rounded rectangle represents the time when processing by the thread is started, and the right end of the rounded rectangle represents the time when processing by the thread is ended.
- the value inside the rounded rectangle represents information (for example, the value of the first column of the record) indicating the record that is read along with the processing corresponding to the thread.
- FIG. 5C shows an example in which each record of the input data set # 1 corresponding to February 2012 (2012-Feb) shown in FIG. 5A is acquired and the process is executed.
- the processor when executed by a single thread, the processor records the second record from the top (2012-Feb) of input data set # 1 corresponding to February 2012 (2012-Feb). Read “Product” as “AX Skirt” and refer to the 7th record (“2012-Feb-07...”) Of input data set # 2 based on the “Reference” value of this record.
- the map function 121 first corresponds to February 2012 (2012-Feb) of the input data set # 1 by the thread 5a.
- the map function 121 reads the third record from the top of the input data set # 1 corresponding to 2012-Feb (the value of “Product” is “AX Skirt”) by the thread 5a, and the “Reference” of this record is read. Based on the four values, the thread 5c for referring to the eighth record (“2012-Feb-08...”) Of the input data set # 2 and the tenth record (“2012” of the input data set # 2) -Feb-08 ... ") thread 5d for referencing, 11th (“ 2012-Feb-08 ... ") thread 5e for referring to the input data set # 2, and input data Thread 5 for referring to the twelfth record of set # 2 ("2012-Feb-09 !) Sequentially generates and executed.
- the map function 121 reads the fourth record from the top (the value of “Product” is “BC Bike”) corresponding to the 2012-Feb of the input data set # 1 by the thread 5a, and the “Reference” of this record is read. Based on the two values, a thread 5g for referring to the sixth record of the input data set # 2 and a thread 5h for referring to the ninth record of the input data set # 2 are generated and executed.
- the map function 121 reads the fifth record from the top of the input data set # 1 corresponding to 2012-Feb (the value of “Product” is “BD Flower”) by the thread 5a, and the “Reference” of this record is read. Based on one value, a thread 5i for referring to the fifth record (“2012-Feb-03...”) Of the input data set # 2 is generated and executed.
- FIG. 5D is a first diagram for explaining the input data and the process for the input data according to the second embodiment.
- FIG. 5E is a second diagram for explaining the input data and the process for the input data according to the second embodiment.
- the record of the input data set # 3 is further referred to by using the value of the record of the input data set # 2. Therefore, as shown in FIG. 5D, the application 110 further includes a reference method # 2 for referring to the input data set # 3 based on the record value (“User” value) of the input data set # 2.
- the value of “User” is not a reference indicating a physical position of a record of the input data set # 3 corresponding to the value, but a reference indicating a logical position (searchable by the value). is there.
- FIG. 5E shows an input data set # 3 and an input data set # 4.
- the input data set # 4 stores one or more records including “User”, “Gender”, “Zip”, and “Address”.
- one or more records having items of “User” and “Reference” are collected and managed for each predetermined range.
- “User” the value of an item which is a key used for searching for a record in the input data set # 4 is stored.
- “Reference” a reference indicating a physical storage position of a record (reference destination record) storing the same value as the value of “Product” in the record in the input data set # 4 is stored.
- references to the plurality of reference destination records are stored.
- the input record # 3 may have a structure (for example, a B-tree) that allows a record to be searched with a certain key, and may be a reference for storing the value of the key in “User”.
- a plurality of records may correspond to a certain reference.
- the record format of input data set # 3 is described in format # 3.
- the record format of the input data set # 4 is described in format # 4. Further, a method of referring to the record of the input data set # 4 using “Reference” of the input data set # 3 is described in the reference method # 3. A procedure for generating a record to be output subsequently is described in the build method based on the acquired record.
- the processor 101 reads the record of the input data set # 1 by executing the generalized stage function 124, and “Product” in the record by the format # 1 , “Reference”. Then, the generalization stage function 124 determines whether or not a predetermined condition is met based on the value of “Product”, and specifies “Reference” of a record that satisfies the condition. Then, the generalized stage function 124 acquires a record of the input data set # 2 based on the reference method # 1 and “Reference”.
- the processor 101 executes the generalized stage function 124 to grasp “User”, “Product”, and “Comment” of the record of the read input data set # 2 according to the format # 2. Then, the generalization stage function 124 acquires a record of the input data set # 3 based on the value of “User” and the reference method # 2.
- the processor 101 executes the generalized stage function 124 to grasp “User”, “Gender”, “Zip”, and “Address” of the record of the read input data set # 4 according to the format # 4. Then, the processor 101 executes the generalized stage function 124 to construct and output a record including “User”, “Product”, “Comment”, “Gender”, and “Zip” based on the build method. .
- FIG. 5F is a diagram for explaining a format according to the second embodiment.
- Format # 3 is information related to the record format of the input data set # 3. In this embodiment, a procedure for interpreting the record of the input data set # 3 is described. Format # 3 includes interpreting the input record (that is, record of input data set # 3) in binary format, interpreting each column of the input record as Text type, Long type, Int type, Int type, It describes that a column of 1 (0) is used as a search key.
- Format # 4 is information related to the record format of the input data set # 4. In this embodiment, a procedure for interpreting the record of the input data set # 4 is described. Format # 4 describes that an input record (that is, a record of input data set # 4) is interpreted in a text format (character string format). The delimiter between the columns in the record is a comma, the first (0) column is Text type, named “User”, the second (1) column is Text type, Named “Gender”, the third (2) column is Text type, named “Zip”, the fourth (3) column is Text type, named “Address” and entered based on these Describes interpreting columns.
- FIG. 5G is a diagram for explaining the reference method according to the second embodiment.
- the reference method # 2 is a procedure for referring to the record of the input data set # 3 using the value of the record of the input data set # 2, and the second method of the input record (record corresponding to the format # 2). It describes that a column is used as a reference key and a record is obtained by referring to it with a logical reference.
- the logical reference means that the reference destination data set is searched by the designated key value to identify the reference destination record.
- the reference method # 3 is a procedure for a method of referring to the record of the input data set # 6 by using “Reference” of the input data set # 4, and the second method of the input record (record corresponding to the format # 5). It is described that a record is obtained by referring to a physical reference using a column as an offset, a third column as a length, and a fourth column as a node ID.
- the physical reference means that the specified offset (address) of the storage managed by the specified node ID is a starting point and a byte string for the specified length is used as a reference destination record.
- FIG. 5H shows an example of a catalog according to the second embodiment.
- the format and the reference method are described in, for example, program code. Therefore, when the user prepares the format and the reference method, the user needs to be able to create the program code. However, not all users can create program codes. Therefore, in this embodiment, the user can specify a part of job information such as a format and a reference method by describing a catalog that is easier than the program code in the application, and based on the catalog, the parallel processing is performed. Let the data processing system execute data processing jobs. At this time, the parallel data processing system may execute the job after converting the catalog into a format or a reference method, or may directly execute the job using the catalog.
- the catalog shown in FIG. 5H is described in XML (Extensible Markup Language), for example, and includes description sections 50a to 50d related to four data structures.
- XML Extensible Markup Language
- the data set of “user_comment” corresponding to the input data set # 2 has a format in which the text is divided into columns, the first (0) column is interpreted in the DateTime type, and this is used as a split key. It is described that the delimiter between columns is a comma.
- the data set of “user_comment.product.index” corresponding to the input data set # 1 is in a format as a local secondary index corresponding to “user_comment”, and is in “user_comment”.
- the third (2) column when the column delimiter is a comma is interpreted as a Text type and used as an index key.
- the description parts 50c and 50d have the same description as the above description part. Note that it is not necessary to explicitly specify all necessary job information in the catalog. For those that are not explicitly specified, the system module etc. performs parallel data processing in accordance with the specified rules. Also good.
- FIG. 6A is an example of a schematic diagram for explaining record acquisition according to the first embodiment.
- 6A to 9C are descriptions of the first embodiment, but are not limited to the first embodiment but are also applied to the second embodiment.
- the processor 101 of the computation node 100 executes a thread and obtains a record using the thread, if the record is a record stored in the storage 104 of its own computation node 100, the storage 104 Get a record.
- the record is stored in the storage 104 of another calculation node 100, a record acquisition request for acquiring the record from the calculation node 100 to the other calculation node 100, for example, via the local area network 200.
- the calculation node 100 acquires a record by receiving the record acquired from the storage 104 by the other calculation node 100 in accordance with the record acquisition request. At this time, a session is established between the computation node 100 and another computation node 100.
- a session is created for each record acquisition request generated for each thread.
- the number of threads increases, the number of extended sessions increases, and processing related to session management and control increases, resulting in a decrease in efficiency.
- a session may be created for each block in which a plurality of record acquisition requests are collected.
- FIG. 6B is an example of a schematic diagram for explaining blocking at the time of data acquisition according to the second embodiment.
- the processor 101 of the calculation node 100 collects a plurality of record acquisition requests generated by a plurality of threads into one block (blocked record acquisition request), and the calculation node 100 and other calculation nodes 100 in units of the block. Create a session with Thereby, the number of sessions spanned between the computation nodes 100 can be reduced, and a reduction in processing efficiency can be prevented.
- FIG. 7A shows a flow of record acquisition processing in a calculation node that acquires records according to the first embodiment.
- This record acquisition processing corresponds to the processing of S10 and S16 in FIG. 4A and S21 and S27 in FIG. 4B.
- the data read / write device 140 causes the storage manager 160 to issue a data read request for reading data necessary for acquiring a record to the storage 104 via the OS 170 and the HBA 103.
- the storage manager 160 stores data read request information in the data read request management table 700 (see FIG. 7C), and sends a data read request to the request queue 740 in the main memory 105 of the computing node 100.
- Add The OS 170 acquires a data read request from the request queue 740 and issues it to the storage 104 by the HBA 103.
- the data reader / writer 140 identifies a thread that uses the received record based on the data read request management table 700, restarts the thread, and ends the record acquisition process.
- FIG. 7B shows the flow of a record acquisition process in a calculation node that manages records according to the first embodiment.
- the NIC 102 acquires the record acquisition request message, and the OS 170 stores it in the reception queue 760 of the main memory 105.
- the data reader / writer 140 includes the received record in the acquisition completion message, and the network manager 150 transmits the acquisition completion message to the calculation node 100 that is the transmission source of the record acquisition request message, and ends the processing. Specifically, the network manager 150 adds an acquisition completion message to the transmission queue 760 in the main memory 105 of the computation node 100.
- the process illustrated in FIG. 7A may be executed in threads (SL1, SL2, SLd 1 , SLd 2 , SLd k , SLd k + 1, etc.) that perform the processes of S10 and S16 in FIG. 4A and S21 and S27 in FIG. 4B.
- the CPU 101 may execute the procedure corresponding to S41 to S45 in the same thread SL1 by executing the data acquisition process of S10 in the thread SL1.
- the process illustrated in FIG. 7A is different from the threads (SL1, SL2, SLd 1 , SLd 2 , SLd k , SLd k + 1, etc.) that perform the processes of S10 and S16 in FIG. 4A and S21 and S27 in FIG.
- the CPU 101 may execute the procedure corresponding to S41 to S45 in another thread by executing the data acquisition process of S10 in the thread SL1.
- separate threads may be provided for the procedures corresponding to S41 to S45, the procedures corresponding to S46 to S47, and the procedures corresponding to S48 to S49, or the same thread may be used.
- a plurality of threads may be provided for each procedure.
- another thread may be provided to drive a procedure corresponding to S46 to S47 and a procedure corresponding to S48 to S49.
- These driving operations may be performed by means such as an interrupt or a signal generated when the HBA 103 and the NIC 102 add the completion queue 750 and the reception queue 730.
- FIG. 7D shows the flow of a record acquisition process in a calculation node that acquires a record according to a modification of the first embodiment.
- the network manager 150 transmits a blocked remote record acquisition request message via the OS 170 and the NIC 102, and ends the process. Specifically, the network manager 150 stores the blocked remote record acquisition request message in the transmission queue 840 of the main memory 105, and the NIC 102 sends the blocked remote record acquisition request message from the transmission queue 840 to the destination calculation node 100. Send. Thus, since a plurality of record acquisition requests are made into one blocked remote record acquisition request message, the number of sessions established during communication can be reduced.
- FIG. 7E shows the flow of record acquisition processing in a calculation node that manages records according to a modification of the first embodiment.
- the data reader / writer 140 uses the storage manager 160 to read data necessary for acquiring a plurality of records to the storage 104 via the HBA 103 based on the plurality of record acquisition request messages.
- the data read request is issued, and the process ends.
- the storage manager 160 adds a plurality of data read requests to the request queue 880 in the main memory 105 of the computing node 100.
- the HBA 104 acquires a data read request from the request queue 780 and issues it to the storage 104.
- the storage 104 receives the data read request issued from the request queue 880, reads the record corresponding to the data read request, sends it to the HBA 103, and sends the data read by the HBA 103 to the completion queue 890 of the main memory 105. to add.
- the storage manager 160 acquires a plurality of records corresponding to the data read request from the completion queue 890, extracts the records from the data, and passes them to the system module 120.
- the data reader / writer 140 includes the received plurality of records in the blocked acquisition completion message, and the network manager 150 sends a blocked acquisition completion message to the computing node 100 that is the transmission source of the blocked remote record acquisition request message. Is transmitted, and the process ends. Specifically, the network manager 150 adds a blocked acquisition completion message to the transmission queue 870 in the main memory 105 of the computing node 100. The blocked acquisition completion message stored in the transmission queue 870 is transmitted by the NIC 102 to the calculation node 100 that is the transmission source of the blocked remote record acquisition request message. This blocked acquisition completion message is stored in the reception queue 850 by the NIC 102 in the calculation node 100 of the transmission source.
- the process illustrated in FIG. 7D may be executed in threads (SL1, SL2, SLd 1 , SLd 2 , SLd k , SLd k + 1, etc.) that perform the processes of S10 and S16 in FIG. 4A and S21 and S27 in FIG. 4B.
- the CPU 101 may execute the procedure corresponding to S61 to S65 in the same thread SL1 by executing the data acquisition process of S10 in the thread SL1.
- the process illustrated in FIG. 7D is different from the threads (SL1, SL2, SLd 1 , SLd 2 , SLd k , SLd k + 1, etc.) that perform the processes of S10 and S16 in FIG. 4A and S21 and S27 in FIG.
- the CPU 101 may execute the procedure corresponding to S61 to S65 in another thread by executing the data acquisition process of S10 in the thread SL1.
- separate threads may be provided for the procedures corresponding to S61 to S65, the procedures corresponding to S66 to S67, and the procedures corresponding to S68 to S69, or the same thread may be used.
- a plurality of threads may be provided for each procedure.
- another thread may be provided to drive the procedure corresponding to S66 to S67 and the procedure corresponding to S68 to S69.
- These driving operations may be performed by means such as an interrupt or a signal generated when the HBA 103 and the NIC 102 add the completion queue 750 and the reception queue 730.
- the processing illustrated in FIG. 7E may be performed in a separate thread, or may be performed in the same thread, separately from the procedure corresponding to S70 to S71 and the procedure corresponding to S72 to S73.
- a plurality of threads may be provided for each procedure. For example, a plurality of threads may be generated according to the blocked record acquisition request message acquired in S70, and the data read request may be executed by executing S71 in each of the threads. At this time, the same number of threads as the number of record acquisition requests included in the message may be generated, or a predetermined number of threads may be generated.
- another thread may be provided. Such driving may be performed by means such as an interrupt or a signal generated by the addition of the HBA 103 to the completion queue 790.
- FIG. 7F shows the configuration of a blocked remote record acquisition request management table according to a modification.
- the blocked remote record acquisition request management table 830 includes a request issue time 832, the number of requests 833, and one or more record acquisition requests 834 as information for each blocked remote record acquisition request.
- the record acquisition request 834 includes a thread ID 831, a calculation node ID 835, a record reference 836, a buffer address 837, and a completion flag 838.
- Request issue time 832 indicates the time at which the blocked remote record acquisition request was issued.
- the number of requests 833 indicates the number of record acquisition requests included in the bricked remote record acquisition request.
- Record acquisition request 834 indicates the contents of the record acquisition request.
- the thread ID 831 indicates an ID for identifying the thread that issued the record acquisition request.
- information on how records are arranged in the data set is indispensable.
- a text file is used as a data set and one of the lines is handled as a record
- the record and the record are separated by a line feed code.
- information that the record is delimited by a line feed code is indispensable.
- a data set has a structure such as a B-tree
- some records may be packed in an appropriate data structure in a page which is a unit of access on the storage.
- information on the structure page length, page header / footer structure, record header / footer structure in a page, etc.
- the system module or the like may cache a part of the data in the data set in the main memory to reduce access to the storage. For example, when acquiring a record by scanning a text file, instead of accessing the storage for each record, the data is read from the text file in units of 1 megabyte at a time and stored in the main memory. A record may be taken out from the inside.
- the node level resource constraint management table 900 includes a computation node ID 901 and a resource constraint 902 as information for each computation node.
- the resource constraint 902 has a thread count 903 and a main memory allocation 904.
- Various information is as follows.
- the calculation node ID 901 indicates the ID of the calculation node.
- Resource constraint 902 indicates a resource constraint in the computation node corresponding to the computation node ID 901.
- the number of threads 903 indicates the maximum number of threads that can be generated in the calculation node corresponding to the calculation node ID 901.
- Main memory allocation 904 indicates the maximum amount of main memory that can be allocated in the calculation node corresponding to the calculation node ID 901.
- the node-level resource constraint management table 910 on the lower side of FIG. 8A is managed by a supervisor process in each computation node 100.
- the node level resource constraint management table 910 includes a calculation node ID 911, a resource constraint 912, and a resource usage 913 as information of its own calculation node.
- the resource constraint 912 has a thread number 914 and a main memory allocation 915.
- the resource use 913 has a thread number 916 and a main memory allocation 917.
- Various information is as follows.
- the calculation node ID 911 indicates the ID (calculation node ID) of its own calculation node.
- Resource constraint 912 indicates a resource constraint in its own computation node.
- Resource use 913 indicates a resource used in its own computation node.
- the number of threads 914 indicates the maximum number of threads that can be generated in its own computation node.
- Main memory allocation 915 indicates the maximum amount of main memory that can be allocated in its own computation node.
- the number of threads 916 indicates the number of threads actually generated in its own computation node.
- Main memory allocation 917 indicates the storage amount of the main memory actually allocated in its own computation node.
- FIG. 8B shows the configuration of a job level resource constraint management table according to the first embodiment.
- the resource level management table 920 at the upper level of FIG. 8B is managed by the overall supervisor process.
- the job level resource constraint management table 920 includes a job ID 921, a calculation node ID 922, and a resource constraint 923 as information for each job.
- the resource constraint 923 has a thread number 924 and a main memory allocation 925.
- Various information is as follows.
- the job ID 921 indicates a job ID (job ID).
- the calculation node ID 922 indicates the ID (calculation node ID) of the calculation node that executes the job with the job ID 921.
- Resource constraint 923 indicates a resource constraint for the job with job ID 921 in the computation node with computation node ID 922.
- the thread number 924 indicates the maximum number of threads that can be generated for the job with the job ID 921 in the calculation node with the calculation node ID 922.
- Main memory allocation 925 indicates the maximum amount of main memory that can be allocated to the job with job ID 921 in the calculation node with calculation node ID 922.
- the node-level resource constraint management table 930 on the lower side of FIG. 8B is managed by a supervisor process in each computation node 100.
- the node level resource constraint management table 930 includes a job ID 931, a calculation node ID 932, a resource constraint 933, and a resource usage 934 as information for each job in its own computation node.
- the resource constraint 933 has a thread number 935 and a main memory allocation 936.
- the resource use 934 has a thread number 937 and a main memory allocation 938.
- Various information is as follows.
- Job ID 931 indicates a job ID (job ID).
- the calculation node ID 932 indicates an ID (calculation node ID) of its own calculation node.
- Resource constraint 933 indicates a resource constraint for the job with job ID 931 in its own computation node 100.
- FIG. 8C shows a configuration of a process level resource constraint management table in the supervisor process according to the first embodiment.
- the process level resource constraint management table 940 is managed by the supervisory supervisor process.
- the process level resource constraint management table 940 includes a process ID 941, a job ID 942, a calculation node ID 943, and a resource constraint 944 as information for each process.
- the resource constraint 944 has a thread number 945 and a main memory allocation 946.
- Various information is as follows.
- Process ID 941 indicates a process ID (process ID).
- Job ID 942 indicates a job ID.
- the calculation node ID 943 indicates the ID (calculation node ID) of the calculation node that executes the process with the process ID 941 of the job with the job ID 921.
- Resource constraint 944 indicates a resource constraint for the process with the process ID 941 of the job with the job ID 942 in the computation node with the computation node ID 943.
- the number of threads 945 indicates the maximum number of threads that can be generated for the process with the process ID 941 of the job with the job ID 942 in the calculation node with the calculation node ID 943.
- Main memory allocation 946 indicates the maximum storage amount of main memory that can be allocated to the process with the process ID 941 of the job with the job ID 942 in the calculation node with the calculation node ID 943.
- FIG. 8D shows a configuration of a process level resource constraint management table in the supervisor process of each node according to the first embodiment.
- the process level resource constraint management table 950 is managed by a supervisor process in each computation node 100.
- the process level resource constraint management table 950 includes a process ID 951, a job ID 952, a computation node ID 953, a resource constraint 954, and a resource usage 955 as information for each process in each job in its own computation node.
- the resource constraint 954 has a thread count 956 and a main memory allocation 957.
- the resource use 955 has a thread count 958 and a main memory allocation 959.
- Various information is as follows.
- Process ID 951 indicates the ID of the process.
- Job ID 952 indicates a job ID.
- the calculation node ID 953 indicates the ID (calculation node ID) of its own calculation node 100.
- Resource constraint 954 indicates a resource constraint for the process with the process ID 951 of the job with the job ID 952 in its own computation node 100.
- Resource use 955 indicates the resource used for the process with the process ID 951 of the job with the job ID 952 in its own computation node 100.
- the number of threads 956 indicates the maximum number of threads that can be generated for the process with the process ID 951 of the job with the job ID 952 in its own computation node 100.
- Main memory allocation 957 indicates the maximum amount of main memory that can be allocated to the process with the process ID 951 of the job with the job ID 952 in its own computation node 100.
- the number of threads 958 indicates the number of threads actually generated for the process with the process ID 951 of the job with the job ID 952 in its own computation node 100.
- Main memory allocation 959 indicates the storage amount of the main memory actually allocated to the process with the process ID 951 of the job with the job ID 952 in its own computation node 100.
- FIG. 8E shows the configuration of a task level resource constraint management table in the overall supervisor process according to the first embodiment.
- the task level resource constraint management table 960 is managed by the supervisory supervisor process.
- the task level resource constraint management table 960 includes a task ID 961, a process ID 962, a job ID 963, a calculation node ID 964, and a resource constraint 965 as information for each task.
- the resource constraint 944 has a thread number 945 and a main memory allocation 946.
- Task ID 961 indicates a task ID (process ID).
- Process ID 962 indicates a process ID (process ID).
- Job ID 963 indicates a job ID.
- the calculation node ID 964 indicates the ID (calculation node ID) of the calculation node that executes the task with the task ID 961 of the process with the process ID 962 of the job with the job ID 963.
- Resource constraint 965 indicates a resource constraint for the task with the task ID 961 of the process with the process ID 962 of the job with the job ID 963 in the computation node with the computation node ID 964.
- the number of threads 966 indicates the maximum number of threads that can be generated for the task with the task ID 961 of the process with the process ID 962 of the job with the job ID 963 in the calculation node with the calculation node ID 964.
- Main memory allocation 967 indicates the maximum storage amount of main memory that can be allocated to the task with task ID 961 of the process with process ID 962 of job ID 963 in the calculation node with calculation node ID 964.
- FIG. 8F illustrates a configuration of a task level resource constraint management table in the supervisor process of each node according to the first embodiment.
- the task level resource constraint management table 970 is managed by a supervisor process in each computation node 100.
- the task level resource constraint management table 970 includes a task ID 971, a process ID 972, a job ID 973, a calculation node ID 974, a resource constraint 975, and a resource usage 976 as information on each task of each process in each job in its own computation node.
- the resource constraint 975 has a thread number 977 and a main memory allocation 978.
- the resource use 976 has a thread number 979 and a main memory allocation 980.
- Task ID 971 indicates the ID of the task.
- Process ID 972 indicates the ID of the process.
- Job ID 973 indicates a job ID.
- the calculation node ID 974 indicates the ID (calculation node ID) of its own calculation node 100.
- the resource constraint 975 indicates a resource constraint for the task with the task ID 971 of the process with the process ID 972 of the job with the job ID 973 in its own computation node 100.
- Resource use 976 indicates a resource that is used for the task with the task ID 971 of the process with the process ID 972 of the job with the job ID 973 in its own computation node 100.
- the number of threads 977 indicates the maximum number of threads that can be generated for the task with the task ID 971 of the process with the process ID 972 of the job with the job ID 973 in its own computation node 100.
- Main memory allocation 978 indicates the maximum amount of main memory that can be allocated to the task of task ID 971 of the process ID 972 of the job of job ID 973 in its own computation node 100.
- the number of threads 979 indicates the number of threads actually generated for the task with the task ID 971 of the process with the process ID 972 of the job with the job ID 973 in the own computation node 100.
- Main memory allocation 980 indicates the storage amount of main memory that is actually allocated to the task of task ID 971 of the process ID 972 of the job of job ID 973 in its own computation node 100.
- FIG. 8G shows the flow of the integrated resource constraint management process according to the first embodiment.
- the supervisory process in the computing node 100 that supervises the supervisor process of each computing node 100 assigns a new task, it calculates resource constraints for the new task.
- the resource constraint specified by the user may be used, or the resource constraint may be calculated (for example, proportionally distributed) based on the policy specified by the user.
- the supervisory supervisor process adds a record for the new task to the task level resource constraint management table 960 as shown in FIG. 8E, and stores the resource constraint calculated for the resource constraint of the record.
- the supervisory supervisor process selects a computation node 100 that can execute a task within the scope of the resource constraint, and transmits the resource constraint related to the computation node to the computation node 100.
- the supervisory supervisor process assigns a task to the selected computation node 100 and ends the process.
- the supervisor process of the computing node 100 to which the task is assigned receives the resource constraint transmitted in Step S82, and registers the resource constraint in the task level resource constraint management table 970 as shown in FIG. 8F.
- system module 120 of the compute node 100 to which the task is assigned executes the assigned task.
- FIG. 8H illustrates a flow of resource constraint management processing in each computation node according to the first embodiment.
- the resource constraint management process is realized, for example, when the system module 120 executes a task.
- the process at the upper left of FIG. 8H (steps S90 to S94) is a process executed when attempting to generate a thread in a task. This is, for example, processing executed in S13 of FIG. 4A and S24 and S31 of FIG. 4B.
- the system module 120 determines whether there are sufficient resources to generate a thread.
- a task level resource constraint management table 970 a process level resource constraint management table 950, a job level resource constraint management table 930, and a node level
- Each resource constraint management table 910 can be referred to, and it can be determined whether or not the resources available within the resource constraint range at each level are more than the resources sufficient to generate threads.
- the system module 120 causes the thread manager 133 to generate a thread, allocates the resource to the thread, and reflects the allocation result in the resource usage of each resource constraint management table (910, 930, 950, and 970). .
- the system module 120 starts executing the thread and ends the resource constraint management process.
- the system module 120 saves the thread generation information to be referenced for generating a thread in the thread generation suspension management table 990.
- the system module 120 sets the own thread as a suspended state where the generation of the thread is suspended, and ends the process.
- the thread generation suspension management table 990 includes a task ID 991, a parent thread ID 992, a child thread ID 993, a time 994, and thread generation information 995 as information on threads that are suspended from thread generation.
- Task ID 991 indicates the ID of the task.
- the parent thread ID 992 indicates the ID of a parent thread (parent thread) that generates another thread.
- Child thread ID 993 indicates the ID of a thread (child thread) that is a child generated from the thread.
- Time 994 indicates the time when the generation of the thread is suspended.
- the thread generation information 995 is information necessary for generating a thread (for example, information including a reference indicating a record referred to by a child thread).
- the process in the upper right of FIG. 8H is a process for stopping the thread when the process executed by the thread is completed.
- the system module 120 stops the executing thread.
- the system module 120 releases resources allocated to its own thread. That is, the system module 120 deletes the amount of resources allocated to its own thread from the resource amounts (number of threads, main memory allocation amount) managed by resource usage such as the resource constraint management table 910. As a result, the resources allocated to the own thread can be allocated to other threads.
- the processing at the lower right of FIG. 8H is processing that is executed when it is determined, for example, that there are sufficient resources to generate a thread based on the resource constraint management table 910.
- This procedure may be driven, for example, by means such as an interrupt or a signal when resources are released in S96.
- the system module 120 selects the thread generation information managed in the thread generation suspension management table 990.
- the thread generation information to be selected may be, for example, the oldest thread generation information reserved.
- the system module 120 resumes the parent thread that generates a thread based on the selected thread generation information.
- the system module 120 executes child thread generation again based on the thread generation information. With this process, when there are sufficient resources to generate a thread, a thread that has been suspended can be generated.
- FIG. 9A shows a first example of tasks according to the first embodiment.
- FIG. 9A shows map task # 1111 of map process # 111 of map reduce job # 1 (corresponding to the job execution plan of FIG. 1A).
- the record of # 1001 of the input data set # 1 includes references to 10 records # 2001 to # 2010 of the input data set # 2.
- map task # 1111 acquires the record of # 1001 of input data set # 1, and acquires the record of input data set # 2.
- the record of # 1001 of the input data set # 1 is referred to. If the record satisfies a predetermined condition, the reference included in the record is used. Thus, records # 2001 to # 2010 of the input data set # 2 are acquired.
- FIG. 9B shows an example of thread generation in the task shown in the first example.
- FIG. 9B illustrates thread generation when the task # 1111 illustrated in FIG. 9A is not performed when the resource constraint management process illustrated in FIG. 8H is not performed.
- the horizontal axis represents time.
- a round rectangle with a long side in the figure means a series of processing by one thread.
- the left end of the rounded rectangle represents the time at which the processing by the thread is started, and the right end of the rounded rectangle represents the time at which the processing by the thread ends.
- the value inside the (*) rounded rectangle represents information (for example, record ID) indicating a record that is read along with the processing corresponding to the thread.
- the number of threads that can be executed simultaneously for the thread that acquires the record is “8”.
- the thread 10a further acquires and processes the # 2008 record of the input data set # 2 and performs processing, the thread 10j that acquires and processes the record of # 2009 of the input data set # 2, and A thread 10k that acquires and processes a record of # 2010 of the input data set # 2 is generated and executed.
- the main memory is allocated beyond the available amount of main memory, and thrashing occurs, resulting in a long execution time for these threads.
- FIG. 9C illustrates an example of thread generation in the task illustrated in the first example according to the first embodiment.
- FIG. 9C shows thread generation when the resource constraint management process shown in FIG. 8H is performed when the task # 1111 shown in FIG. 9A is executed.
- the notation in FIG. 9C is the same as the rule in FIG. 9B.
- the thread # 10a When the thread 10a that executes processing by acquiring the record # 1001 of the input data set # 1 is executed in the computation node 100, the thread # 10a acquires the record # 1001 and includes it in the record # 1001.
- the thread 10b that acquires and processes the record of # 2001 of the input data set # 2 is generated and executed based on the received reference, and acquires and processes the record of # 2002 of the input data set # 2.
- a thread 10c is generated and executed.
- a thread 10d, a thread 10e, a thread 10f, a thread 10g, and a thread 10h are generated and executed. At this time, the same number of threads as “8”, which is the number of threads that can be executed simultaneously, are executed.
- step S90 of the resource constraint management process since it is determined in step S90 of the resource constraint management process that there are not enough resources, the thread 10a is put on hold without generating a new thread.
- the execution of the thread 10b When the execution of the thread 10b is completed, one new thread can be executed. Therefore, the execution of the thread 10a is resumed in step S98, and the record of # 2008 of the input data set # 2 is detected in step S99.
- a thread 10i is generated that performs processing by acquiring.
- the execution of the thread 10c is finished, the execution of the thread 10a is resumed, and the thread 10j that acquires and processes the record of # 2009 of the input data set # 2 is generated and executed.
- the execution of the thread 10d When the execution of the thread 10d is completed, the execution of the thread 10a is resumed, and the thread 10k that acquires and processes the record of # 2010 of the input data set # 2 is generated and executed.
- FIG. 9D shows a second example of tasks according to the second embodiment.
- FIG. 9D shows the map task # 1111 of the map process # 111 of the map reduce job # 1 and the map task # 2111 of the map process # 211 of the map reduce job # 2. It is assumed that map task # 1111 and map task # 2111 are executed in parallel.
- the record of # 1001 of the input data set # 1 includes references to 10 records from # 2001 to # 2010 of the input data set # 2.
- the map task # 1111 acquires the record of # 1001 of the input data set # 1, and acquires the data of the input data set # 2 corresponding to this record.
- the system module 120 When executing the map task # 1111, the system module 120 refers to the record of # 1001 of the input data set # 1, and if the record satisfies a predetermined condition, the system module 120 uses the reference included in the record. Thus, records # 2001 to # 2010 of the input data set # 2 are acquired.
- map task # 2111 acquires a record of # 5001 of input data set # 5, and acquires a record of input data set # 6 corresponding to this record.
- the system module 120 When executing the map task # 2111, the system module 120 refers to the record of # 5001 of the input data set # 5. If the record satisfies a predetermined condition, the system module 120 uses the reference included in the record. Thus, records # 6001 to # 6010 of the input data set # 6 are acquired.
- FIG. 9E illustrates an example of thread generation in the task illustrated in the second example according to the first embodiment.
- FIG. 9E shows thread generation when the resource constraint management process shown in FIG. 8H is performed when task # 1111 and task # 2111 shown in FIG. 9D are executed in parallel.
- the main memory area available for task # 1111 is for five threads
- the main memory area available for task # 2111 is for three threads. Note that the notation in FIG. 9E is the same as the rule in FIG. 9B.
- the thread 11a When the thread 11a that executes processing by acquiring the record of # 1001 of the input data set # 1 is executed in the calculation node 100, the thread 11a acquires the record of # 1001 and is included in the record of # 1001.
- the thread 11b that acquires and processes the record of # 2001 of the input data set # 2 is generated and executed based on the received reference, and acquires and processes the record of # 2002 of the input data set # 2.
- a thread 11c is generated and executed.
- a thread 10d and a thread 10e are generated and executed. At this time, an area corresponding to 5 threads in the main memory that can be used by the task # 1111 is used.
- step S90 of the resource constraint management process since it is determined in step S90 of the resource constraint management process that there are not enough resources, the thread 11a does not generate a new thread, and its own thread is put on hold.
- the execution of the thread 11b is completed, one new thread can be executed. Therefore, the execution of the thread 11a is resumed in step S98, and the record of # 2005 of the input data set # 2 in step S99.
- a thread 11f is generated that performs processing by acquiring.
- the execution of the thread 11c is finished, the execution of the thread 11a is resumed, and the thread 11g for acquiring and processing the record of # 2006 of the input data set # 2 is generated and executed.
- the execution of the thread 11d When the execution of the thread 11d is finished, the execution of the thread 11a is resumed, and the thread 11h that acquires and processes the record of # 2007 of the input data set # 2 is generated, and the execution of the thread 11d is finished. In this case, the execution of the thread 11a is resumed, and the thread 11i for obtaining and processing the record of # 2008 of the input data set # 2 is generated.
- the execution of the thread 11f is completed, the thread 11a Execution is resumed, and a thread 11j is generated that acquires and processes the record of # 2009 of the input data set # 2. , If the execution of the thread 11g is completed, the execution of the thread 11a is resumed, the input data set # 2 # thread 11k to perform acquisition and processing records 2010 are generated.
- the thread 12a that performs processing by acquiring the record of # 5001 of the input data set # 5 is executed in parallel, the record of # 5001 is acquired by the thread 12a and included in the record of # 5001.
- the thread 12b that acquires and processes the record of # 6001 of the input data set # 6 is generated and executed based on the received reference, and acquires and processes the record of # 6002 of the input data set # 6
- a thread 12c is generated and executed. At this time, an area corresponding to three threads of the main memory that can be used for task # 2111 is used.
- step S90 of the resource constraint management process since it is determined in step S90 of the resource constraint management process that there are not enough resources, the thread 12a does not generate a new thread, and its own thread is put on hold. When the execution of the thread 12b ends, one new thread can be executed. Therefore, the execution of the thread 12a is resumed in step S98, and the record of # 6003 of the input data set # 6 is detected in step S99. A thread 12d is generated that performs processing by acquiring.
- a thread 12e for obtaining and processing the record of # 6004 of the input data set # 6 is generated, and when the execution of the thread 12d is finished, the input
- a thread 12f for obtaining and processing the record of # 6005 of the data set # 6 is generated and the execution of the thread 12e is finished
- the record of # 6006 of the input data set # 6 is obtained and processed.
- a thread 12h for obtaining and processing the record of # 6007 of the input data set # 6 is generated, and when the execution of the thread 12g is finished.
- a thread 12i for obtaining and processing the record of # 6008 of the input data set # 6 is generated, and the thread 12h
- a thread 12j for obtaining and processing the record of # 6009 of the input data set # 6 is generated, and when the execution of the thread 12i is finished, # 6010 of the input data set # 6.
- a thread 12k that acquires and processes the record is generated. In this way, a plurality of tasks can be executed in parallel.
- Threads that are dynamically generated when executing parallel data processing.
- a process or a kernel level thread such as a native POSIX thread or a lightweight process
- Threads managed by the kernel
- user-level threads threads managed by user programs and libraries such as fiber
- It may be a collection of fixed procedures to be managed (for example, a function pointer managed by an appropriate structure), or a combination of these.
- the unit of data handled by parallel data processing is a record, but the record may be arbitrary data. For example, it may be a collection of a certain number of columns, a collection of a variable number of columns, a simple text, a byte string, an image, sound, etc.
- the multimedia content may also be a collection of these.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
(A)第1データ群から、第1データを読み込み、アプリケーションから取得した第1書式情報に基づいて、第1データから第1の値を取得し、
(B)アプリケーションから取得した第1参照情報に基づき、第1の値に対応する1以上の第2データのそれぞれを第2データ群から読み込むための1以上のスレッドを生成し、
(C)(A)~(B)を、第1データ群の1以上の第1データに対して実行し、
(D)複数の前記スレッドを並行して実行する。
(*)横軸は、時刻を表す。
(*)図中の横に長い角丸四角形は、1つのスレッドによる一連の処理を意味する。角丸四角形の左端はスレッドによる処理を開始する時刻を表し、角丸四角形の右端は当該スレッドによる処理を終了する時刻を表す。
(*)角丸四角形の内部の値は、スレッドに対応した処理に伴って読み込まれるレコードを示す情報(例えば、レコードの先頭のカラムの値)を表す。
(*)スレッドID701は、データ読み込み要求を発行させたスレッドを識別するためのIDを示す。
(*)要求発行時刻702は、データ読み込み要求を発行した時刻を示す。
(*)データ読み込み要求703は、データ読み込み要求の内容を示す。
(*)装置ID704は、データ読み込み要求の送信先のストレージ104の装置IDを示す。
(*)オフセット番地705は、データ読み込み要求対象のレコードが格納されるストレージ104におけるアドレス(オフセット番地)を示す。
(*)読み込み長706は、読み込むレコードのデータ長(バイト長)を示す。
(*)バッファ番地707は、データ読み込み要求対象のレコードを格納する主記憶105上の領域(バッファ)のアドレスを示す。
(*)スレッドID711は、レコード取得要求を発行させたスレッドを識別するためのIDを示す。
(*)要求発行時刻712は、レコード取得要求を発行した時刻を示す。
(*)レコード取得要求713は、レコード取得要求の内容を示す。
(*)計算ノードID714は、レコード取得要求の送信先の計算ノードのID(計算ノードID)を示す。
(*)レコード参照715は、レコード取得要求の対象となるレコードへの参照情報を示す。
(*)バッファ番地716は、データ読み込み要求対象のレコードを格納する主記憶105上の領域(バッファ)のアドレスを示す。
(*)要求発行時刻832は、ブロック化リモートレコード取得要求を発行した時刻を示す。
(*)要求の数833は、ブリック化リモートレコード取得要求に含まれるレコード取得要求の数を示す。
(*)レコード取得要求834は、レコード取得要求の内容を示す。
(*)スレッドID831は、レコード取得要求を発行したスレッドを識別するためのIDを示す。
(*)計算ノードID835は、レコード取得要求の送信先の計算ノードのID(計算ノードID)を示す。
(*)レコード参照836は、レコード取得要求の対象となるレコードへの参照情報を示す。
(*)バッファ番地837は、レコード取得要求の対象となるレコードを格納する主記憶105上の領域(バッファ)のアドレスを示す。
(*)完了フラグ838は、レコード取得要求に対応するレコードを取得したか否かを示すフラグである。
(*)計算ノードID901は、計算ノードのIDを示す。
(*)資源制約902は、計算ノードID901に対応する計算ノードにおける資源の制約を示す。
(*)スレッド数903は、計算ノードID901に対応する計算ノードにおいて、生成可能な最大のスレッドの数を示す。
(*)主記憶割当て904は、計算ノードID901に対応する計算ノードにおいて、割当て可能な主記憶の最大の記憶量を示す。
(*)計算ノードID911は、自身の計算ノードのID(計算ノードID)を示す。
(*)資源制約912は、自身の計算ノードにおける資源の制約を示す。
(*)資源利用913は、自身の計算ノードにおいて利用している資源を示す。
(*)スレッド数914は、自身の計算ノードにおいて、生成可能な最大のスレッドの数を示す。
(*)主記憶割当て915は、自身の計算ノードにおいて、割当て可能な主記憶の最大の記憶量を示す。
(*)スレッド数916は、自身の計算ノードにおいて、実際に生成しているスレッドの数を示す。
(*)主記憶割当て917は、自身の計算ノードにおいて、実際に割当てている主記憶の記憶量を示す。
(*)ジョブID921は、ジョブのID(ジョブID)を示す。
(*)計算ノードID922は、ジョブID921のジョブを実行する計算ノードのID(計算ノードID)を示す。
(*)資源制約923は、計算ノードID922の計算ノードにおけるジョブID921のジョブに対する資源の制約を示す。
(*)スレッド数924は、計算ノードID922の計算ノードにおけるジョブID921のジョブに対する生成可能な最大のスレッドの数を示す。
(*)主記憶割当て925は、計算ノードID922の計算ノードにおけるジョブID921のジョブに対する割当て可能な主記憶の最大の記憶量を示す。
(*)ジョブID931は、ジョブのID(ジョブID)を示す。
(*)計算ノードID932は、自身の計算ノードのID(計算ノードID)を示す。
(*)資源制約933は、自身の計算ノード100におけるジョブID931のジョブに対する資源の制約を示す。
(*)資源利用934は、自身の計算ノード100においてジョブID931のジョブに対して利用している資源を示す。
(*)スレッド数935は、自身の計算ノード100におけるジョブID931のジョブに対する、生成可能な最大のスレッドの数を示す。
(*)主記憶割当て936は、自身の計算ノード100におけるジョブID931のジョブに対する、割当て可能な主記憶の最大の記憶量を示す。
(*)スレッド数937は、自身の計算ノード100においてジョブID931のジョブに対して、実際に生成しているスレッドの数を示す。
(*)主記憶割当て938は、自身の計算ノード100においてジョブID931のジョブに対して、実際に割当てている主記憶の記憶量を示す。
(*)プロセスID941は、プロセスのID(プロセスID)を示す。
(*)ジョブID942は、ジョブIDを示す。
(*)計算ノードID943は、ジョブID921のジョブのプロセスID941のプロセスを実行する計算ノードのID(計算ノードID)を示す。
(*)資源制約944は、計算ノードID943の計算ノードにおけるジョブID942のジョブのプロセスID941のプロセスに対する資源の制約を示す。
(*)スレッド数945は、計算ノードID943の計算ノードにおけるジョブID942のジョブのプロセスID941のプロセスに対する生成可能な最大のスレッドの数を示す。
(*)主記憶割当て946は、計算ノードID943の計算ノードにおけるジョブID942のジョブのプロセスID941のプロセスに対する割当て可能な主記憶の最大の記憶量を示す。
(*)プロセスID951は、プロセスのIDを示す。
(*)ジョブID952は、ジョブのIDを示す。
(*)計算ノードID953は、自身の計算ノード100のID(計算ノードID)を示す。
(*)資源制約954は、自身の計算ノード100におけるジョブID952のジョブのプロセスID951のプロセスに対する資源の制約を示す。
(*)資源利用955は、自身の計算ノード100においてジョブID952のジョブのプロセスID951のプロセスに対して利用している資源を示す。
(*)スレッド数956は、自身の計算ノード100におけるジョブID952のジョブのプロセスID951のプロセスに対する、生成可能な最大のスレッドの数を示す。
(*)主記憶割当て957は、自身の計算ノード100におけるジョブID952のジョブのプロセスID951のプロセスに対する、割当て可能な主記憶の最大の記憶量を示す。
(*)スレッド数958は、自身の計算ノード100においてジョブID952のジョブのプロセスID951のプロセスに対して、実際に生成しているスレッドの数を示す。
(*)主記憶割当て959は、自身の計算ノード100においてジョブID952のジョブのプロセスID951のプロセスに対して、実際に割当てている主記憶の記憶量を示す。
(*)タスクID961は、タスクのID(プロセスID)を示す。
(*)プロセスID962は、プロセスのID(プロセスID)を示す。
(*)ジョブID963は、ジョブIDを示す。
(*)計算ノードID964は、ジョブID963のジョブのプロセスID962のプロセスのタスクID961のタスクを実行する計算ノードのID(計算ノードID)を示す。
(*)資源制約965は、計算ノードID964の計算ノードにおけるジョブID963のジョブのプロセスID962のプロセスのタスクID961のタスクに対する資源の制約を示す。
(*)スレッド数966は、計算ノードID964の計算ノードにおけるジョブID963のジョブのプロセスID962のプロセスのタスクID961のタスクに対する生成可能な最大のスレッドの数を示す。
(*)主記憶割当て967は、計算ノードID964の計算ノードにおけるジョブID963のジョブのプロセスID962のプロセスのタスクID961のタスクに対する割当て可能な主記憶の最大の記憶量を示す。
(*)タスクID971は、タスクのIDを示す。
(*)プロセスID972は、プロセスのIDを示す。
(*)ジョブID973は、ジョブのIDを示す。
(*)計算ノードID974は、自身の計算ノード100のID(計算ノードID)を示す。
(*)資源制約975は、自身の計算ノード100におけるジョブID973のジョブのプロセスID972のプロセスのタスクID971のタスクに対する資源の制約を示す。
(*)資源利用976は、自身の計算ノード100においてジョブID973のジョブのプロセスID972のプロセスのタスクID971のタスクに対して利用している資源を示す。
(*)スレッド数977は、自身の計算ノード100におけるジョブID973のジョブのプロセスID972のプロセスのタスクID971のタスクに対する、生成可能な最大のスレッドの数を示す。
(*)主記憶割当て978は、自身の計算ノード100におけるジョブID973のジョブのプロセスID972のプロセスのタスクID971のタスクに対する、割当て可能な主記憶の最大の記憶量を示す。
(*)スレッド数979は、自身の計算ノード100においてジョブID973のジョブのプロセスID972のプロセスのタスクID971のタスクに対して、実際に生成しているスレッドの数を示す。
(*)主記憶割当て980は、自身の計算ノード100においてジョブID973のジョブのプロセスID972のプロセスのタスクID971のタスクに対して、実際に割当てている主記憶の記憶量を示す。
(*)タスクID991は、タスクのIDを示す。
(*)親スレッドID992は、他のスレッドを生成する親となるスレッド(親スレッド)のIDを示す。
(*)子スレッドID993は、スレッドから生成される子となるスレッド(子スレッド)のIDを示す。
(*)時刻994は、スレッドの生成を保留した時刻を示す。
(*)スレッド生成情報995は、スレッドを生成する際に必要な情報(例えば、子スレッドで参照するレコードを示す参照を含む情報)である。
(*)横軸は、時刻を表す。
(*)図中の横に長い角丸四角形は、1つのスレッドによる一連の処理を意味する。角丸四角形の左端はスレッドによる処理を開始する時刻を表し、角丸四角形の右端は当該スレッドによる処理を終了する時刻を表す。
(*)角丸四角形の内部の値は、スレッドに対応した処理に伴って読み込まれるレコードを示す情報(例えば、レコードID)を表す。
(*)レコードを取得するスレッドについての同時に実行可能なスレッド数は、「8」とする。
Claims (17)
- 複数の計算機で並行してデータ処理を実行する計算機システムにおける1つの計算機が有する並列データ処理システムであって、
複数の第1データを含む第1データ群と複数の第2データを含む第2データ群とを含むデータ群からデータを読み込んで処理を実行する並列データ処理実行部
を有し、
前記並列データ処理実行部は、
(A)前記第1データ群から、前記第1データを読み込み、アプリケーションから取得した第1書式情報に基づいて、前記第1データから第1の値を取得し、
(B)前記アプリケーションから取得した第1参照情報に基づき、前記第1の値に対応する1以上の前記第2データのそれぞれを前記第2データ群から読み込むための1以上のスレッドを生成し、
(C)前記(A)~前記(B)を、前記第1データ群の1以上の第1データに対して実行し、
(D)複数の前記スレッドを並行して実行する
並列データ処理システム。 - 前記並列データ処理実行部は、
(E)前記(D)でスレッドを実行することにより前記第2データを読み込み、アプリケーションから取得した第2書式情報に基づいて、前記第2データから第2の値を取得する
請求項1に記載の並列データ処理システム。 - 前記並列データ処理実行部は、
前記第2の値により、前記アプリケーションから取得した第2条件を評価する
請求項2に記載の並列データ処理システム。 - 前記並列データ処理実行部は、
1つ以上の前記第2データから取得した前記第2の値から、出力データを生成する
請求項2に記載の並列データ処理システム。 - 前記並列データ処理実行部は、
前記第1の値により、前記アプリケーションから取得した第1条件を評価し、当該第1条件が満たされる場合に前記(B)を実行する
請求項1に記載の並列データ処理システム。 - 前記第1参照情報は、前記第2データ群において前記第2データが格納されている物理的な位置を特定する情報を含む
請求項1に記載の並列データ処理システム。 - 前記第1参照情報は、前記第2データ群において前記第2データを検索するための情報を含む
請求項1に記載の並列データ処理システム。 - 前記第2データ群の少なくとも一部の第2データは、ネットワークを介して接続される別の計算機の記憶装置に格納されており、
前記並列データ処理実行部は、
前記スレッドを実行して、前記ネットワークを介して接続された前記別の計算機から前記第2データを取得する際に、前記別の計算機に対して、取得要求を送信して、前記記憶デバイスから前記第2データを取得する
請求項1に記載の並列データ処理システム。 - 前記並列データ処理実行部は、
複数のスレッドの実行により生成される同一の計算機に対する複数の取得要求を1つにまとめたブロック化取得要求を前記別の計算機に送信することにより、複数の前記第2データを取得する
請求項8に記載の並列データ処理システム。 - 前記第1書式情報は、プログラムコードであり、
前記並列データ処理実行部は、
ユーザから所定のマークアップ言語で記述される第1書式情報の作成に必要なカタログ情報を受け付け、
前記カタログ情報に基づいて、前記第1書式情報を作成する
請求項1に記載の並列データ処理システム。 - 前記並列データ処理実行部は、前記並列データ処理実行部を有する計算機におけるスレッドの生成に関する資源制約情報に基づいて、新たなスレッドを生成すると、自身を構成する前記計算ノードにおけるスレッドの実行に利用される資源量が制約を超えると判断した場合には、前記スレッドの生成を保留する
請求項1に記載の並列データ処理システム。 - 前記並列データ処理実行部は、並列データ処理における一部の段階を担当するプロセスにおけるスレッドの生成に関する資源制約情報に基づいて、新たなスレッドを生成すると、前記プロセスにおけるスレッドの実行に利用される資源量が制約を超える場合には、当該スレッドの生成を保留する
請求項1に記載の並列データ処理システム。 - 処理の指示をアプリケーションから受け付ける受付部を更に有し、
前記アプリケーションからの前記指示は、手続を規定しており、
前記並列データ処理実行部は、前記指示を受けて、前記(A)乃至(D)を実行することにより、前記指示が、前記手続を規定していても、前記手続に依存しない非順序の処理を実行する、
請求項1に記載の並列データ処理システム。 - 複数の計算機で並行してデータ処理を実行する計算機システムにおける計算機であって、
前記計算機システムにおける別の計算機と通信するための通信インタフェースデバイスと、
前記通信インタフェースデバイスに接続されており、複数の第1データを含む第1データ群と複数の第2データを含む第2データ群とを含むデータ群からデータを読み込んで処理を実行する制御デバイスと
を有し、
前記制御デバイスは、
(A)前記第1データ群から、前記第1データを読み込み、アプリケーションから取得した第1書式情報に基づいて、前記第1データから第1の値を取得し、
(B)前記アプリケーションから取得した第1参照情報に基づき、前記第1の値に対応する1以上の前記第2データのそれぞれを前記第2データ群から読み込むための1以上のスレッドを生成し、
(C)前記(A)~前記(B)を、前記第1データ群の1以上の第1データに対して実行し、
(D)複数の前記スレッドを並行して実行する
計算機。 - 複数の計算機で並行してデータ処理を実行する計算機システムでの並列データ処理方法であって、
(A)複数の第1データを含む第1データ群と複数の第2データを含む第2データ群とを含むデータ群のうちの前記第1データ群から、前記第1データを読み込み、アプリケーションから取得した第1書式情報に基づいて、前記第1データから第1の値を取得し、
(B)前記アプリケーションから取得した第1参照情報に基づき、前記第1の値に対応する1以上の前記第2データのそれぞれを前記第2データ群から読み込むための1以上のスレッドを生成し、
(C)前記(A)~前記(B)を、前記第1データ群の1以上の第1データに対して実行し、
(D)複数の前記スレッドを並行して実行する
並列データ処理方法。 - 複数の計算機を有し、
各計算機が、
複数の第1データを含む第1データ群と複数の第2データを含む第2データ群とを含むデータ群からデータを読み込んで処理を実行する並列データ処理システム
を有し、
各計算機の並列データ処理システムは、
(A)前記第1データ群から、前記第1データを読み込み、アプリケーションから取得した第1書式情報に基づいて、前記第1データから第1の値を取得し、
(B)前記アプリケーションから取得した第1参照情報に基づき、前記第1の値に対応する1以上の前記第2データのそれぞれを前記第2データ群から読み込むための1以上のスレッドを生成し、
(C)前記(A)~前記(B)を、前記第1データ群の1以上の第1データに対して実行し、
(D)複数の前記スレッドを並行して実行する
計算機システム。 - 複数の計算機で並行してデータ処理を実行する計算機システムでの計算機が実行するコンピュータプログラムであって、
(A)複数の第1データを含む第1データ群と複数の第2データを含む第2データ群とを含むデータ群のうちの前記第1データ群から、前記第1データを読み込み、アプリケーションから取得した第1書式情報に基づいて、前記第1データから第1の値を取得し、
(B)前記アプリケーションから取得した第1参照情報に基づき、前記第1の値に対応する1以上の前記第2データのそれぞれを前記第2データ群から読み込むための1以上のスレッドを生成し、
(C)前記(A)~前記(B)を、前記第1データ群の1以上の第1データに対して実行し、
(D)複数の前記スレッドを並行して実行する
ことを前記計算機に実行させるコンピュータプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/404,550 US9841989B2 (en) | 2012-05-31 | 2012-05-31 | Parallel data processing system, computer, and parallel data processing method |
EP12878202.6A EP2857975A4 (en) | 2012-05-31 | 2012-05-31 | PARALLEL DATA PROCESSING SYSTEM, COMPUTER, AND METHOD FOR PARALLEL DATA PROCESSING |
JP2014518181A JP5881025B2 (ja) | 2012-05-31 | 2012-05-31 | 並列データ処理システム、計算機および並列データ処理方法 |
PCT/JP2012/064149 WO2013179451A1 (ja) | 2012-05-31 | 2012-05-31 | 並列データ処理システム、計算機および並列データ処理方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2012/064149 WO2013179451A1 (ja) | 2012-05-31 | 2012-05-31 | 並列データ処理システム、計算機および並列データ処理方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013179451A1 true WO2013179451A1 (ja) | 2013-12-05 |
Family
ID=49672705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/064149 WO2013179451A1 (ja) | 2012-05-31 | 2012-05-31 | 並列データ処理システム、計算機および並列データ処理方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9841989B2 (ja) |
EP (1) | EP2857975A4 (ja) |
JP (1) | JP5881025B2 (ja) |
WO (1) | WO2013179451A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015157338A1 (en) * | 2014-04-08 | 2015-10-15 | RedPoint Global Inc. | Data transformation system and method |
CN111651489A (zh) * | 2020-06-05 | 2020-09-11 | 厦门理工学院 | 一种大数据处理服务器系统 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9501483B2 (en) | 2012-09-18 | 2016-11-22 | Mapr Technologies, Inc. | Table format for map reduce system |
JP5939123B2 (ja) * | 2012-10-09 | 2016-06-22 | 富士通株式会社 | 実行制御プログラム、実行制御方法および情報処理装置 |
US10114674B2 (en) * | 2014-03-06 | 2018-10-30 | International Business Machines Corporation | Sorting database collections for parallel processing |
CN107329846B (zh) * | 2017-07-11 | 2020-06-12 | 深圳市信义科技有限公司 | 基于大数据技术的大指数据比对方法 |
US10489348B2 (en) | 2017-07-17 | 2019-11-26 | Alteryx, Inc. | Performing hash joins using parallel processing |
US10552452B2 (en) * | 2017-10-16 | 2020-02-04 | Alteryx, Inc. | Asynchronously processing sequential data blocks |
US10558364B2 (en) | 2017-10-16 | 2020-02-11 | Alteryx, Inc. | Memory allocation in a data analytics system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010092222A (ja) * | 2008-10-07 | 2010-04-22 | Internatl Business Mach Corp <Ibm> | 更新頻度に基づくキャッシュ機構 |
US7756919B1 (en) | 2004-06-18 | 2010-07-13 | Google Inc. | Large-scale data processing in a distributed and parallel processing enviornment |
JP2011170774A (ja) * | 2010-02-22 | 2011-09-01 | Nippon Telegr & Teleph Corp <Ntt> | 決定木生成装置、決定木生成方法、及びプログラム |
WO2012049794A1 (ja) * | 2010-10-14 | 2012-04-19 | 日本電気株式会社 | 分散処理装置及び分散処理システム |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070198979A1 (en) * | 2006-02-22 | 2007-08-23 | David Dice | Methods and apparatus to implement parallel transactions |
US9170848B1 (en) * | 2010-07-27 | 2015-10-27 | Google Inc. | Parallel processing of data |
US8782053B2 (en) * | 2011-03-06 | 2014-07-15 | Happy Cloud Inc. | Data streaming for interactive decision-oriented software applications |
US9720708B2 (en) * | 2011-08-19 | 2017-08-01 | Advanced Micro Devices, Inc. | Data layout transformation for workload distribution |
-
2012
- 2012-05-31 WO PCT/JP2012/064149 patent/WO2013179451A1/ja active Application Filing
- 2012-05-31 JP JP2014518181A patent/JP5881025B2/ja active Active
- 2012-05-31 EP EP12878202.6A patent/EP2857975A4/en not_active Ceased
- 2012-05-31 US US14/404,550 patent/US9841989B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7756919B1 (en) | 2004-06-18 | 2010-07-13 | Google Inc. | Large-scale data processing in a distributed and parallel processing enviornment |
JP2010092222A (ja) * | 2008-10-07 | 2010-04-22 | Internatl Business Mach Corp <Ibm> | 更新頻度に基づくキャッシュ機構 |
JP2011170774A (ja) * | 2010-02-22 | 2011-09-01 | Nippon Telegr & Teleph Corp <Ntt> | 決定木生成装置、決定木生成方法、及びプログラム |
WO2012049794A1 (ja) * | 2010-10-14 | 2012-04-19 | 日本電気株式会社 | 分散処理装置及び分散処理システム |
Non-Patent Citations (5)
Title |
---|
JEFFREY DEAN; SANJAY GHEMAWAT: "MapReduce: Simplified Data Processing on Large Clusters", PROCEEDINGS OF OSDI 2004, 2004, pages 137 - 15 |
MICHAEL ISARD; MIHAI BUDIU; YUAN YU; ANDREW BIRRELL; DENNIS FETTERLY: "Dryad: distributed data-parallel programs from sequential building blocks", PROCEEDINGS OF EUROSYS, 2007, pages 59 - 72 |
See also references of EP2857975A4 |
TAKANORI UEDA ET AL.: "QueueLinker: A Distributed Framework for Pipelined Applications", DAI 2 KAI FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT -DEIM 2010- RONBUNSHU, 25 May 2010 (2010-05-25), XP055179049, Retrieved from the Internet <URL:http://db-event.jpn.org/deim2010/proceedings/files/E1-3.pdf> [retrieved on 20120802] * |
VINAYAK R. BORKAR; MICHAEL J. CAREY; RAMAN GROVER; NICOLA ONOSE; RARES VERNICA: "Hyracks: A flexible and extensible foundation for data-intensive computing", PROCEEDINGS OF ICDE 2011, 2011, pages 1151 - 1162, XP031868515, DOI: doi:10.1109/ICDE.2011.5767921 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015157338A1 (en) * | 2014-04-08 | 2015-10-15 | RedPoint Global Inc. | Data transformation system and method |
CN111651489A (zh) * | 2020-06-05 | 2020-09-11 | 厦门理工学院 | 一种大数据处理服务器系统 |
Also Published As
Publication number | Publication date |
---|---|
EP2857975A4 (en) | 2016-03-02 |
US9841989B2 (en) | 2017-12-12 |
JPWO2013179451A1 (ja) | 2016-01-14 |
EP2857975A1 (en) | 2015-04-08 |
JP5881025B2 (ja) | 2016-03-09 |
US20150113535A1 (en) | 2015-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5881025B2 (ja) | 並列データ処理システム、計算機および並列データ処理方法 | |
Padhy | Big data processing with Hadoop-MapReduce in cloud systems | |
Abbasi et al. | Extending i/o through high performance data services | |
US20200034484A1 (en) | User-defined analysis of distributed metadata | |
Sim et al. | Analyzethis: an analysis workflow-aware storage system | |
Skiadopoulos et al. | DBOS: a DBMS-oriented Operating System | |
KR101765725B1 (ko) | 대용량 방송용 빅데이터 분산 병렬처리를 위한 동적 디바이스 연결 시스템 및 방법 | |
Aggarwal et al. | Small files’ problem in Hadoop: A systematic literature review | |
Shan et al. | KubeAdaptor: a docking framework for workflow containerization on Kubernetes | |
Sundarakumar et al. | A comprehensive study and review of tuning the performance on database scalability in big data analytics | |
Salehian et al. | Comparison of spark resource managers and distributed file systems | |
Fukutomi et al. | GPUhd: Augmenting YARN with GPU resource management | |
KR100983479B1 (ko) | 분산 스페이스를 이용하여 분산 프로그래밍 환경을 제공하기 위한 방법, 시스템 및 컴퓨터 판독 가능한 기록 매체 | |
JP6097910B2 (ja) | 並列データ処理システム、計算機および並列データ処理方法 | |
Al-Kiswany et al. | A cross-layer optimized storage system for workflow applications | |
Zhao et al. | Gpu-accelerated cloud computing for data-intensive applications | |
Pan | The performance comparison of hadoop and spark | |
JP6210501B2 (ja) | データベース管理システム、計算機、データベース管理方法 | |
CN103631648A (zh) | 一种任务处理方法及系统 | |
Ho et al. | A mapreduce programming framework using message passing | |
Mian et al. | Managing data-intensive workloads in a cloud | |
Kaur et al. | Omr: Out-of-core mapreduce for large data sets | |
JP5031538B2 (ja) | データ分配方法、データ分配プログラム、及び並列データベースシステム | |
Ludwig | Research trends in high performance parallel input/output for cluster environments | |
Wadhwa | Scalable Data Management for Object-based Storage Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12878202 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014518181 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14404550 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012878202 Country of ref document: EP |