Disclosure of Invention
In view of the foregoing, it is desirable to provide a big data incremental iteration method, apparatus, computer device, and storage medium that can improve big data processing efficiency.
A big data increment iteration method, comprising:
receiving a directed acyclic graph task executed by a graphic processor, acquiring a data set corresponding to the directed acyclic graph task, and storing the data set to a cache in a memory of the graphic processor;
responding to the directed acyclic graph task, performing iterative computation on the data set to obtain an iterative computed data set, and updating the data set stored in the cache by using the iterative computed data set;
and when the data set is subjected to increment change, performing increment iterative computation based on the iteratively computed data set stored in the cache to obtain an incrementally iterated data set, and updating the data set in the cache by using the incrementally iterated data set.
In one embodiment, the cache is a shared memory, and the method further includes:
and when the storage capacity of the data set on the shared memory is detected to be larger than a preset threshold value, migrating the data set on the shared memory to a global memory through a block-based sliding window mechanism.
In one embodiment, the method further comprises:
and when detecting that the incremental iterative computation corresponding to the directed acyclic graph task is stopped, migrating the data set on the global memory to a central processing unit memory.
In one embodiment, the obtaining a data set corresponding to the directed acyclic graph task and storing the data set in a cache in a memory of a graphics processor includes:
acquiring a data set in an RDD format corresponding to the directed acyclic graph task;
and performing data format conversion on the data set in the RDD format to obtain a data set in a G-RDD format, and storing the data set in the G-RDD format to a cache in a memory of the graphics processor.
In one embodiment, the performing data format conversion on the data set in the RDD format to obtain a data set in a G-RDD format, and storing the data set in the G-RDD format to a cache in a memory of a graphics processor includes:
and storing the data set in the RDD format to a data buffer area, calling the data set in the RDD format in the data buffer area for data format conversion to obtain a data set in a G-RDD format, and storing the data set in the G-RDD format to a cache in a memory of a graphic processor.
In one embodiment, the obtaining a data set corresponding to the directed acyclic graph task and storing the data set in a cache in a memory of a graphics processor includes:
reading a data set corresponding to the directed acyclic graph task from a distributed file system, and storing the read data set into a memory of a graphic processor;
and acquiring a data set corresponding to the directed acyclic graph task from the memory of the graphics processor, and storing the data set to a cache in the memory of the graphics processor.
A big data increment iterative apparatus, the apparatus comprising:
the task receiving module is used for receiving a directed acyclic graph task executed by a graphic processor, acquiring a data set corresponding to the directed acyclic graph task, and storing the data set to a cache in a memory of the graphic processor;
the data updating module is used for responding to the directed acyclic graph task, performing iterative computation on the data set to obtain an iteratively computed data set, and updating the data set stored in the cache by using the iteratively computed data set;
and the increment iteration module is used for performing increment iteration calculation on the basis of the iteratively calculated data set stored in the cache to obtain an incrementally iterated data set when the data set is subjected to increment change, and updating the data set in the cache by using the incrementally iterated data set.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
when a directed acyclic graph task executed by a graphic processor is received, acquiring a data set corresponding to the directed acyclic graph task, and storing the data set to a cache in a memory of the graphic processor;
responding to the directed acyclic graph task, performing iterative computation on the data set to obtain an iterative computed data set, and updating the data set stored in the cache by using the iterative computed data set;
and when the data set is subjected to increment change, performing increment iterative computation based on the iteratively computed data set stored in the cache to obtain an incrementally iterated data set, and updating the data set in the cache by using the incrementally iterated data set.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
receiving a directed acyclic graph task executed by a graphic processor, acquiring a data set corresponding to the directed acyclic graph task, and storing the data set to a cache in a memory of the graphic processor;
responding to the directed acyclic graph task, performing iterative computation on the data set to obtain an iterative computed data set, and updating the data set stored in the cache by using the iterative computed data set;
and when the data set is subjected to increment change, performing increment iterative computation based on the iteratively computed data set stored in the cache to obtain an incrementally iterated data set, and updating the data set in the cache by using the incrementally iterated data set.
When the directed acyclic graph task is the directed acyclic graph task executed by the graph processor, the data set corresponding to the directed acyclic graph task is obtained, the data set is stored into the cache in the memory of the graph processor, cache resources in the graph processor are fully utilized, and input/output delay of low bandwidth is hidden through cache of data in the data set, intensive iterative computation and incremental iterative computation, so that repeated computation is effectively reduced, the total computation time is reduced, and the big data processing efficiency is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method provided by the application can be applied to the application environment shown in FIG. 1. The server cluster is a GSpark computing framework server cluster which comprises a GSpark computing framework model and a distributed file system. Fig. 1a is a schematic diagram of a GSpark computing framework model, where the GSpark computing framework model includes a Master Node Master and a plurality of Worker nodes, a single Worker Node includes an Executor and a GPU Manager, and the GSpark computing framework model is an extended Spark increment iterative computing framework model that merges a GPU. Submitting Job Job to Master through a Driver program, wherein the Master enables a Worker to start the Driver (on the Master) and initializes cluster parameters, and the SparkContext parameters are used for constructing cluster resources, including the calculation kernel number, the memory size, the initialization parameter configuration and the like of a CPU and a GPU. Fig. 1b is a data flow diagram in a GSpark computing framework server cluster, and a Driver in the GSpark computing framework model divides Job into DAG (Directed acyclic graph) tasks. And scheduling the DAG task by the task scheduler based on the native data locality of Spark, distributing the DAG task to an executive on each Worker Node for execution, wherein the DAG task is divided into a DAG task executed by a CPU and a DAG task executed by a GPU. When the DAG task is a DAG task executed by the GPU, the executer on the Worker Node reads a data set corresponding to the DAG task from the HDFS (Hadoop Distributed File System) and stores the data set in the GPU memory. And the GPU Manager acquires a data set corresponding to the DAG task in the GPU memory, and performs data format conversion to obtain a data set in a G-RDD format. And the GPU Manager stores the data set after format conversion into a cache so as to carry out iterative computation. Responding to the DAG task, performing iterative computation on the data set to obtain an iterative computation result, and updating the data set stored in the cache according to the iterative computation result; and when the data set is subjected to incremental change, performing incremental iterative computation based on the iterative computation result stored in the cache to obtain the data set subjected to incremental iteration, and updating the corresponding data set in the cache by using the data set subjected to incremental iteration.
In one embodiment, as shown in fig. 2, a big data increment iteration method is provided, which is applied to the working node in fig. 1 as an example, and includes the following steps:
step 202, receiving a directed acyclic graph task executed by the graphics processor, obtaining a data set corresponding to the directed acyclic graph task, and storing the data set to a cache in a memory of the graphics processor.
And submitting the Job to the server cluster by the client, namely submitting Job Job to the Master node. The RDD (flexible Distributed data sets) is a data abstraction of Spark, which itself runs in a memory, for example, a read file is an RDD, a calculation for the file is an RDD, a result set is also an RDD, the dependencies between different fragments and data can be regarded as dependencies between RDDs, and the calculation is based on RDD. Drivers in the GSpark computation framework model divide Job into multiple directed acyclic graphs DAG according to the dependency relationship between the RDDs. There are a series of dependencies between RDDs, and the dependencies are classified into narrow dependencies and wide dependencies. And submitting the DAG to a DAG scheduler, wherein the DAG scheduler divides the DAG into a plurality of stages which are mutually dependent according to the width dependence between the RDDs. Stages are a group of parallel tasks, each Stage comprises one or more Task tasks, the tasks are submitted to a Task scheduler for running in the form of a Task set, and then specific tasks are distributed to a thread pool of an executive on a Worker node for processing.
In Driver, the RDD corresponding to the job is firstly classified into stages through DAGSchedule, and then the underlying scheduler tasskschedule interacts with the Executor. Drivers and executors of the Worker node run jobs in respective thread pools. The Driver can obtain the specific running resource of the Executor during running, so that the Driver communicates with the Executor, and the Driver transmits the divided Task to the Executor in a network mode, wherein the Task refers to a service logic code.
The Executor receives the directed acyclic graph task, carries out deserialization to obtain the input and the output of the data, and on the same data slice of the cluster, the business logic of the data is the same, but the data is different, and the thread pool of the Executor is responsible for executing the data. The task scheduler sends a directed acyclic graph task to the executive, and after the executive deserializes data, the input and output of the data, namely the service logic of the task, are obtained, and the executive runs a service logic code.
And 204, responding to the directed acyclic graph task, performing iterative computation on the data set to obtain a data set after iterative computation, and updating the data set stored in the cache by using the data set after iterative computation.
Iteration means that the same set of data sets is solved repeatedly to optimize a certain parameter, and the same calculation process is executed to achieve a convergence state. Each execution of the calculation process is referred to as an iteration, and the result of each iteration is used as an initial value for the next iteration. Iterative computation is a step-by-step approximation of the result by repeatedly performing a set of identical operations, an iterative data set being a collection of data and relationships between the data. And performing iterative computation on the data set to obtain the data set after iterative computation, updating the data set stored in the cache by using the data set after iterative computation, and continuing performing the next round of iterative computation on the updated data set.
And step 206, when the data set is subjected to increment change, performing increment iterative computation based on the iteratively computed data set stored in the cache to obtain an incrementally iterated data set, and updating the data set in the cache by using the incrementally iterated data set.
After iterative computation based on iterative data is completed, new iterative data generated due to service growth is incremental data, and incremental iteration is an iterative method for obtaining a new iteration result according to the incremental data and an original iteration result.
According to the large data increment iteration method, when the directed acyclic graph task is the directed acyclic graph task executed by the graph processor, the data set corresponding to the directed acyclic graph task is obtained, the data set is stored in the cache of the memory of the graph processor, cache resources in the graph processor are fully utilized, input/output delay of low bandwidth is hidden through caching of data in the data set, intensive iterative calculation and incremental iterative calculation, repeated calculation is effectively reduced, and therefore the total calculation time is reduced, and the large data processing efficiency is improved.
In one embodiment, the cache is a shared memory, and the large data increment iteration method further includes: and when the storage capacity of the data set on the shared memory is detected to be larger than a preset threshold value, migrating the data set on the shared memory to the global memory through a block-based sliding window mechanism. A Shared memory (Shared memory) is a user-controllable first-level cache in a Graphics Processing Unit (GPU), and physically, each SM (Streaming Multiprocessor) contains a low-latency memory pool Shared by all threads in a currently executing Block. Global memory Uniform memory refers to the entire physical memory of all processors. When the storage capacity of the data set on the shared memory is detected to be larger than the preset threshold value, as shown in fig. 3, the data set on the shared memory is migrated to the global memory through a block-based sliding Window mechanism, the Read Window reads an indefinite number of data blocks (chunk) each time, the communication bandwidth is fully used through a merging access mechanism of the global memory, the problem of limitation of the size of the GPU memory is solved, and a programmer is not required to perform manual memory management when the bottom memory is insufficient, so that the effects of ensuring the quick access of data in the calculation process and preventing the memory overflow are achieved, the execution time of a task is effectively reduced, and the overall throughput of the system is improved. Specifically, when detecting that the incremental iterative computation corresponding to the directed acyclic graph task is stopped, migrating the data set on the global memory to the memory of the central processing unit, so as to further manage the memory and ensure the rapid access of data in the computation process.
In one embodiment, as shown in fig. 4, acquiring a data set corresponding to a directed acyclic graph task, and storing the data set to a cache in a memory of a graphics processor includes: step 402, acquiring a data set in an RDD format corresponding to the directed acyclic graph task; and step 404, performing data format conversion on the data set in the RDD format to obtain a data set in the G-RDD format, and storing the data set in the G-RDD format to a cache in a memory of the graphics processor. The RDD elastic distributed data set is a special set, supports various sources, has a fault-tolerant mechanism, can be cached, and supports parallel operation, and one RDD represents the data set in one partition. RDD has two operators: transformation, which belongs to deferred computation, and Action, wherein Transformation is not performed immediately when one RDD is transformed into another RDD, but only the logical operation of the data set is remembered; and the Ation triggers the operation of the operation and really triggers the calculation of the conversion operator. The GPU Manager converts the RDD data into a data type G-RDD which can be processed by the GPU.
In one embodiment, the data format conversion is performed on the data set in the RDD format to obtain a data set in a G-RDD format, and the storage of the data set in the G-RDD format to a cache in a memory of a graphics processor includes: and storing the data set in the RDD format into a data buffer area, calling the data set in the RDD format in the data buffer area for data format conversion to obtain a data set in a G-RDD format, and storing the data set in the G-RDD format into a cache in a memory of the graphics processor. Since Java Virtual Machine (JVM) Heap memory (Heap) cannot communicate directly with GPU, a data buffer is added in the whole task execution to serve as a bridge between JVM and GPU, and the buffer has data buffering and data format conversion functions in Off-Heap memory (Off-Heap), as shown in fig. 5.
In one embodiment, obtaining a data set corresponding to a directed acyclic graph task and storing the data set to a cache in a memory of a graphics processor includes: reading a data set corresponding to the directed acyclic graph task from the distributed file system, and storing the read data set into a memory of a graphic processor; and acquiring a data set corresponding to the directed acyclic graph task from the memory of the graphic processor, and storing the data set to a cache in the memory of the graphic processor. When the DAG task is a DAG task executed by the GPU, the executer on the Worker Node reads a data set corresponding to the DAG task from the HDFS (Hadoop Distributed File System) and stores the data set in the GPU memory. And the GPU Manager acquires a data set corresponding to the DAG task in the GPU memory, and stores the data set to a cache for iterative computation.
In one embodiment, the big data increment iteration method is based on a GSpark computing framework model fusing a GPU and Spark, and by utilizing the similarity of high concurrency of the GPU and Spark distributed parallel computing, the fusion of the GPU and the Spark distributed parallel computing can fully utilize the intensive computing power of general computing of the GPU, so that the speed-up ratio of the whole iterative computing is remarkably improved. Secondly, in the increment iteration process in the GPU, the cache resources in the GPU are fully utilized, and the low-bandwidth input/output delay is hidden through the cache of the calculated data and intensive iterative computation. And finally, replacing the memory by a sliding window mechanism based on the block, so that the problem of limitation of the size of the GPU memory is solved, programmers do not need to manually manage the memory due to insufficient bottom-layer memory, and the effects of ensuring the quick access of data in the calculation process and preventing the overflow of the memory are achieved. The parallel programming framework of the GPU is a mainstream solution of coprocessing computation, and the high concurrency and high throughput of the GPU have a remarkable effect on improving the computation efficiency. By expanding the distributed computing framework of the current mainstream, the auxiliary effect of the GPU is fully utilized, and the effect of improving the efficiency is achieved. Incremental iteration is a fast iterative algorithm that uses sparse computational dependencies in the data to selectively recalculate portions of the model at each step, rather than computing a completely new version.
And constructing a physical environment for task operation, constructing a GSpark computing framework model and a distributed file system on the server cluster, and debugging and starting the GSpark computing framework server cluster. After the task is started, the client submits the job to a Master node, the Master enables a Worker to start a Driver, cluster parameters are initialized, and the SparkContext parameters are used for constructing cluster resources, including the calculation core number, the memory size, the initialization parameter configuration and the like of a CPU and a GPU. Uploading the calculation data to an HDFS distributed file system, submitting a job to a Master node Master of a GSpark through a client, dividing the job into DAG tasks by a Driver according to wide dependence, submitting the tasks to a task scheduler for task scheduling execution, and dividing the tasks into a TasksetSchedul _ CPU executed by a CPU and a TasksetSchedul _ GPU executed by a GPU. When the task is the TasksetScheduler _ GPU task, the GPU Manager of each compute node manages and allocates the GPU computing resources of the node, and places the task and computing data in the GPU memory. The GPU Manager converts the data into G-RDD data types which can be processed by the GPU, caches the data which are firstly placed into a GPU memory (including an Input Cache and a Result Cache), and reuses the data in later iteration, so that the communication overhead between the GPU and the CPU is reduced. And determining the iteration times according to the precision requirement, continuously updating the data result into a cache in the iteration process, and directly using the data calculation result of the previous round in the cache to iterate when the increment data comes without recalculating the last task. In the iterative process, the characteristics of the GPU are fully utilized, iterative calculation is accelerated through a Shared Memory, and when the data cache in the Shared Memory is larger than the actual Memory limit of the GPU or the stored and read data is not hit, the Shared Memory data is replaced to the global Memory by using a variable sliding window mechanism based on blocks. The access Window Read Window reads the data blocks chunk of variable quantity each time, and utilizes a merging access mechanism of the global memory to fully use the communication bandwidth. Since the JVM Heap memory (Heap) of the Java virtual machine cannot directly communicate with the GPU, a data buffer is added in the whole task execution to serve as a bridge between the JVM and the GPU, and the buffer serves as a data buffering and data format conversion function in the Off-Heap memory (Off-Heap). And after the iterative computation is completed, transferring the data in the global memory of the GPU to the memory of the CPU.
It should be understood that although the steps in the flowcharts of fig. 2 and 4 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 4 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, a big data increment iteration apparatus is provided, as shown in fig. 6, the big data increment iteration apparatus includes a task receiving module 602, a data updating module 604, and an increment iteration module 606. The task receiving module is used for receiving the directed acyclic graph task executed by the graphics processor, acquiring a data set corresponding to the directed acyclic graph task, and storing the data set to a cache in a memory of the graphics processor. And the data updating module is used for responding to the directed acyclic graph task, performing iterative computation on the data set to obtain the data set after the iterative computation, and updating the data set stored in the cache by using the data set after the iterative computation. And the increment iteration module is used for performing increment iteration calculation on the basis of the data set after iteration calculation stored in the cache to obtain a data set after increment iteration when the data set is subjected to increment change, and updating the data set in the cache by using the data set after increment iteration.
In one embodiment, the cache is a shared memory, and the large data increment iteration device further includes: and the data migration module is used for migrating the data set on the shared memory to the global memory through a block-based sliding window mechanism when detecting that the storage capacity of the data set on the shared memory is greater than a preset threshold value.
In one embodiment, the data migration module is further configured to migrate the data set on the global memory to the central processor memory when detecting that the incremental iterative computation corresponding to the directed acyclic graph task is stopped.
In one embodiment, the task receiving module is further configured to obtain a data set in an RDD format corresponding to the directed acyclic graph task; and performing data format conversion on the data set in the RDD format to obtain a data set in a G-RDD format, and storing the data set in the G-RDD format to a cache in a memory of the graphics processor.
In an embodiment, the task receiving module is further configured to store the data set in the RDD format in a data buffer, call the data set in the RDD format in the data buffer to perform data format conversion, obtain a data set in a G-RDD format, and store the data set in the G-RDD format in a cache in the memory of the graphics processor.
In one embodiment, the task receiving module is further configured to read a data set corresponding to the directed acyclic graph task from the distributed file system, and store the read data set in a memory of the graphics processor; and acquiring a data set corresponding to the directed acyclic graph task from the memory of the graphic processor, and storing the data set to a cache in the memory of the graphic processor.
For specific limitations of the large data increment iteration device, reference may be made to the above limitations of the large data increment iteration method, which are not described herein again. The modules in the big data increment iteration device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a large data increment iterative method.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the big data increment iteration method in any embodiment when the processor executes the computer program.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps in the big data increment iteration method in any of the embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.