WO2017020637A1 - 一种分布式数据计算的任务分配方法和任务分配装置 - Google Patents
一种分布式数据计算的任务分配方法和任务分配装置 Download PDFInfo
- Publication number
- WO2017020637A1 WO2017020637A1 PCT/CN2016/083279 CN2016083279W WO2017020637A1 WO 2017020637 A1 WO2017020637 A1 WO 2017020637A1 CN 2016083279 W CN2016083279 W CN 2016083279W WO 2017020637 A1 WO2017020637 A1 WO 2017020637A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- task
- calculation
- partition
- distributed
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
Definitions
- the present application relates to the field of video surveillance technologies, and in particular, to a task allocation method and a task assignment apparatus for distributed data calculation.
- the purpose of the application is to provide a task allocation method and task allocation device for distributed data computing, which uses data storage information in a distributed database as a parameter of a computing task, and then allocates the computing task to a storage node corresponding to the data storage information.
- the storage node calculates the data pointed to in the calculation task, and only needs to call the local memory data in the calculation process, which reduces the IO redundancy and time consuming caused by multiple data forwarding.
- a task allocation method using distributed data computing includes:
- Another aspect of the invention relates to a task distribution device for distributed data computing, comprising:
- a target data confirming unit configured to receive a storage parameter of the target data calculated in the distributed data
- a target data mapping unit configured to map the data piece of the target data to the elastic distributed data set according to the storage parameter, where each data piece corresponds to one partition of the elastic distributed data set;
- a calculation task allocation unit for assigning a partition to a storage node to generate a calculation task for calculation.
- the present application provides an electronic device, comprising: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is disposed inside a space enclosed by the housing
- the processor and the memory are disposed on the circuit board;
- the power supply circuit is configured to supply power to respective circuits or devices of the electronic device;
- the memory is configured to store executable program code;
- the program for executing the distributed data calculation is executed by reading executable program code stored in the memory to execute a program corresponding to the executable program code.
- the application also provides an application for performing a task assignment method of the distributed data calculation at runtime.
- the application also provides a storage medium for storing an application for performing a task assignment method of the distributed data calculation.
- the utility model has the beneficial effects that: by using the data storage information in the distributed database as a parameter of the computing task, and then allocating the computing task to the storage node corresponding to the data storage information, the storage node calculates the data pointed to in the computing task, Only local memory data is called during the calculation process, which reduces IO redundancy and time consuming caused by multiple data forwarding.
- FIG. 1 is a flowchart of a method for a first embodiment of a task allocation method for distributed data calculation according to an embodiment of the present application
- FIG. 2 is a flowchart of a method for a second embodiment of a task allocation method for distributed data calculation according to an embodiment of the present application
- FIG. 3 is a schematic structural diagram of data in a second embodiment of a task allocation method for distributed data calculation according to an embodiment of the present application
- FIG. 4 is a schematic diagram of a computing task in a second embodiment of a task allocation method for distributed data calculation provided in a specific implementation manner of the present application;
- FIG. 5 is a task distribution device for distributed data calculation provided in a specific embodiment of the present application.
- FIG. 6 is a structural block diagram of a second embodiment of a task allocation apparatus for distributed data calculation according to an embodiment of the present application.
- FIG. 1 is a flowchart of a method for a first embodiment of a task allocation method for distributed data calculation according to an embodiment of the present application.
- the task allocation method in this embodiment is mainly used for parallel computing of a large amount of data in a distributed database, thereby improving computational efficiency.
- the task assignment method includes:
- Step S101 Receive storage parameters of the target data calculated in the distributed data.
- the basic idea of a distributed database is to distribute the data in the original centralized database to multiple data storage nodes connected through the network to obtain larger storage capacity and higher concurrent access.
- Distributed database systems usually use smaller computer systems. Each computer can be placed in a separate place. Each computer may have a full copy or a partial copy of the DBMS (Database Management System). And have their own local database, many computers located in different locations are connected to each other through the network to form a complete, global logically centralized, physically distributed large database.
- DBMS Database Management System
- the target data calculated in the distributed data is only one or more data tables of one storage node, specifically to a piece of data in the data table.
- the node that performs task assignment only needs to read the starting and ending position of the data to be processed in the data table, and does not need to concentrate all the data to the node. If all the data itself is concentrated into one node, it may take several terabytes of data transmission. If only the storage parameters of the target data are concentrated, only a data transmission amount of not more than 5M may be required, and a large amount of data transmission in the data concentration is eliminated. .
- Step S102 Map the data piece of the target data to the elastic distributed data set according to the storage parameter, and each data piece corresponds to one partition of the elastic distributed data set.
- the data that needs to be processed is a continuous record in the data table. This paragraph is continuous.
- the records belong to different storage nodes, which are exempt from data transmission for data processing.
- the data in each piece of data is located at the same storage node. In this scheme, the data unit is processed in the data unit.
- Step S103 Assign the partition to the storage node to generate a calculation task for calculation.
- the data to be processed is not randomly distributed to the storage node as in the prior art, but is sent to the storage node corresponding to the data piece according to the storage information, and the content sent is also Not a large amount of data itself, but the relevant storage parameters of the data.
- each storage node After receiving the calculation task, each storage node reads the data according to the table name of the target data table where the data piece is located, the start and end positions of the data piece, according to the calculation. The way to perform calculation tasks. Throughout the calculation process, all data is equivalent to reading from the local, reducing data IO redundancy and avoiding the resulting time consuming.
- the storage node calculates the data pointed to in the computing task, and the calculation process Only need to call local memory data, reducing IO redundancy and time consuming caused by multiple data forwarding.
- FIG. 2 is a flowchart of a method for a second embodiment of a task allocation method for distributed data calculation according to an embodiment of the present invention. As shown in the figure, the method includes:
- Step S201 Receive storage parameters of the target data calculated in the distributed data.
- the database of distributed data is HBase.
- HBase is a distributed, column-oriented open source database. HBase is different from a general relational database. It is a database suitable for unstructured data storage. Another difference is HBase's column-based rather than row-based model.
- the HBase-based solution in this embodiment is equivalent to a custom elastic data set, and the elastic data set is divided according to the data partitioning rule of HBase and the target data range input by the user, and the data piece of the HBase data table is mapped to the partition of the elastic data set.
- a processing node that specifies partition data.
- the data in the Hbase data table is distributed and calculated using the parallel computing framework (for example, Spark)
- the data processed by the tasks in the working node of the Spark is the HBase data in the memory of the node, and finally the distributed memory for the HBase data is realized.
- Parallel Computing for example, Spark
- Step S202 determining, according to the storage parameter, whether data in the data piece belongs to the target Standard data.
- the data table is split into multiple pieces of data, and the corresponding data in each piece of data is stored in a storage node.
- HBase when the data table becomes larger as the number of records increases, it is gradually split into multiple regions, and a region is represented by [startkey, endkey), where startkey and endkey represent the start position and termination of the region, respectively.
- startkey, endkey represent the start position and termination of the region, respectively.
- Location different regions will be assigned to the corresponding RegionServer for management by the Master, and the stored information is equivalent to the information of the RegionServer.
- the target data is associated with at least two pieces of data. If all the target data are in the same storage node, the computing task can be directly sent to the storage node without parallel computing.
- the data in a single piece of data is not necessarily all the target data to be calculated, in actual calculation, the data in the data piece needs to be calibrated, and the data to be processed is mapped into the elastic distributed data set, and the elastic distributed One partition of the data set corresponds to one piece of data, and the data in the elastic distributed data set is the target data that needs to be processed.
- Step S203 If the data in the data piece belongs to the target data, the data piece is mapped to a partition of the elastic distributed data set.
- Step S204 If the data in the data piece does not all belong to the target data, the part of the data piece belonging to the target data is mapped to one partition of the elastic distributed data set.
- the partition Because the data slice itself already records the information about the storage node of the data slice, when mapping to the partition, the partition also carries the relevant information of the storage node.
- Step S205 Assign the partition to the storage node where the data piece corresponding to the partition is located.
- Each partition has a data slice mapping, and carries the storage information corresponding to the data piece, and can be directly assigned to the corresponding storage node according to the storage information.
- Step S206 Calling a conversion operator, and generating, at the storage node, a calculation task according to the data of the partition.
- the calculation task data sheet of the data piece can be obtained according to the information about the storage node of the data piece and the target data information recorded in the data piece.
- Step S207 Calling an action operator to calculate the calculation task.
- the calculation task has been generated in each storage node, and the calculation task of each storage node is calculated.
- the task data sheet calls the data related to the calculation task in the storage node and calculates it.
- Step S208 Receive a processing result of the calculation task returned by each storage node.
- the processing results of the computing tasks of each storage node need to be recycled, and each storage node itself may also cache the processing results for iterative use.
- the table name of the target data table of the target data Before assigning the computing task, obtaining the table name of the target data table of the target data, the starting position of the target data in the target data table, and the ending position of the target data in the target data table; wherein the structure of the target data is as shown in FIG. 3, wherein TableDes represents the table name of the target data table, Lx represents the starting position of the target data, and Ly represents the ending position of the target data.
- TableDes represents the table name of the target data table
- Lx represents the starting position of the target data
- Ly the ending position of the target data.
- the invalid data in the data slice is removed to obtain more precise partitions P1, P2, P3, ..., Pi.
- the start and end of the partition also serves as a parameter to create a partition of the elastic distributed data set.
- the relationship between the data slice and the partition is shown in Figure 3.
- the data in the L1-Lx and Ly-Ln intervals shown in Fig. 3 is invalid data.
- the HBase Region data is mapped to the partition of the elastic distributed data set, and each related Region generates a partition, and a corresponding computing task will be generated.
- the node Ni of the Regioni is obtained by the Regioni information, and the partition Pi of the elastic data set corresponding to the Regioni is designated as Ni when the processing node of the Pi is optimally selected.
- the conversion operator of the storage node is called to generate the calculation tasks Task1, Task2, Task3...Taski of all the partitions P1, P2, P3...Pi in the elastic distributed data set.
- the Task is generated according to the partition and is in the same storage node as the corresponding partition data. Therefore, the storage node that processes the Regioni data is the storage node where the Regioni is located.
- the correspondence between HBase data piece Regioni, elastic data set partition Pi (data slice Pi), storage node Ni, and job Taski is shown in Fig. 4.
- Pi reads the Regioni data in the memory of the node
- Taski processes the Pi partition data
- the node Ni executes the task Taski
- the result Ri returns, and the intermediate result can be cached for iterative use.
- the action operator is called, and the jobs Task1, Task2, Task3, ..., Taski are executed to perform different services. Compute the task by summarizing the result data of all jobs through an elastic distributed data set.
- the storage node calculates the data pointed to in the computing task, and the calculation process Only need to call local memory data, reducing IO redundancy and time consuming caused by multiple data forwarding.
- the following is an embodiment of a task distribution device for distributed data calculation provided in a specific embodiment of the present application.
- the embodiment of the task assignment device is implemented based on the embodiment of the task assignment method described above, and is not implemented in the embodiment of the task assignment device. For an explanation, please refer to the above embodiment of the task assignment method.
- FIG. 5 is a structural block diagram of a first embodiment of a distributed data computing task distribution apparatus according to an embodiment of the present application.
- the task distribution apparatus includes:
- a target data confirming unit 310 configured to receive a storage parameter of the target data calculated in the distributed data
- the target data mapping unit 320 is configured to map the data piece of the target data to the elastic distributed data set according to the storage parameter, where each data piece corresponds to one partition of the elastic distributed data set respectively;
- the calculation task assignment unit 330 is configured to assign a partition to the storage node to generate a calculation task for calculation.
- any one of the storage nodes can perform the distribution of the computing task, and any other client with the permission can select the data distribution computing task according to the user's needs, because the client itself does not involve the transmission and access of the data itself, so As long as the terminal device capable of accessing the distributed database through the network can basically implement the solution, a wider use of the database is realized.
- the above units work together by storing data in a distributed database.
- the information is allocated to the storage node corresponding to the data storage information, and the storage node calculates the data pointed to in the calculation task, and only needs to call the local memory data during the calculation process, thereby reducing the data forwarding multiple times.
- FIG. 6 is a structural block diagram of a second embodiment of a task distribution apparatus for distributed data calculation according to an embodiment of the present application.
- the task distribution apparatus includes:
- a target data confirming unit 310 configured to receive a storage parameter of the target data calculated in the distributed data
- the target data mapping element 320 is configured to map the data piece of the target data to the elastic distributed data set according to the storage parameter, where each data piece corresponds to one partition of the elastic distributed data set respectively;
- the calculation task assignment unit 330 is configured to assign a partition to the storage node to generate a calculation task for calculation.
- the target data mapping unit 320 includes:
- the data slice determining module 321 is configured to determine, according to the storage parameter, whether data in the data slice belongs to target data.
- the first mapping module 322 is configured to: if the data in the data piece belongs to the target data, map the data piece to a partition of the elastic distributed data set;
- the second mapping module 323 is configured to map the portion of the data slice that belongs to the target data to a partition of the elastic distributed data set if the data in the data slice does not all belong to the target data.
- the computing task allocation unit 330 includes:
- a partition specifying module 331, configured to allocate a partition to a storage node where the data piece corresponding to the partition is located;
- the calculation task generation module 332 is configured to invoke a conversion operator, and generate a calculation task according to the data of the partition at the storage node;
- the calculation task execution module 333 is configured to invoke an action operator to calculate the calculation task.
- the result receiving unit 340 is configured to receive a processing result of the computing task returned by each storage node.
- the database of the distributed data is HBase.
- the above units and modules work together by using the number in the distributed database.
- the computing task is allocated to the storage node corresponding to the data storage information, and the storage node calculates the data pointed to in the computing task, and only needs to call the local memory data during the calculation process, which is reduced multiple times. IO redundancy and time consuming caused by data forwarding.
- An embodiment of the present application provides an electronic device, including: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is disposed inside a space enclosed by the housing, where The processor and the memory are disposed on the circuit board; the power supply circuit is configured to supply power to each circuit or device of the electronic device; the memory is configured to store executable program code; The executable program code stored in the memory is read to execute a program corresponding to the executable program code for performing the following steps:
- the data storage information in the distributed database is used as a parameter of the computing task, and then the computing task is allocated to the storage node corresponding to the data storage information, and the storage node points the data pointed to in the computing task.
- the computing task is allocated to the storage node corresponding to the data storage information, and the storage node points the data pointed to in the computing task.
- only local memory data is called during the calculation process, which reduces IO redundancy and time consuming caused by multiple data forwarding.
- the electronic device exists in a variety of forms including, but not limited to:
- Mobile communication devices These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication.
- Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
- Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access.
- Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
- Portable entertainment devices These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys. And portable car navigation devices.
- the server consists of a processor, a hard disk, a memory, a system bus, etc.
- the server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability High reliability in terms of reliability, security, scalability, and manageability.
- An embodiment of the present application provides an application program for performing a task allocation method for distributed data calculation provided by an embodiment of the present application at runtime.
- the task allocation method for distributed data computing includes:
- the data piece of the target data is mapped to the elastic distributed data set according to the storage parameter.
- Each piece of data corresponds to a partition in the elastic distributed data set, respectively:
- the data piece is mapped to a partition of the elastic distributed data set
- the portion of the data slice belonging to the target data is mapped to a partition of the elastic distributed data set.
- the partition is assigned to the storage node to generate a calculation task for calculation, including:
- the action operator is called to calculate the calculation task.
- the distributed data meter executed when the application is running In the calculated task allocation method, after the partition is assigned to the storage node to generate a computing task for calculation, the method further includes:
- the database of the distributed data is HBase.
- the application program uses the data storage information in the distributed database as a parameter of the computing task, and then allocates the computing task to the storage node corresponding to the data storage information, and the storage node performs the data pointed to by the computing task.
- the calculation process only local memory data is called during the calculation process, which reduces the IO redundancy and time consuming caused by multiple data forwarding.
- the embodiment of the present application provides a storage medium for storing an application program, which is used to perform a task allocation method for distributed data calculation provided by the embodiment of the present application.
- the task allocation method for distributed data computing includes:
- mapping of the data piece of the target data to the elastic distributed according to the storage parameter a data set, each data piece corresponding to a partition in the elastic distributed data set, comprising:
- the data piece is mapped to a partition of the elastic distributed data set
- the portion of the data slice belonging to the target data is mapped to a partition of the elastic distributed data set.
- the assigning the partition to the storage node to generate a calculation task for calculation includes:
- the action operator is called to calculate the calculation task.
- the method further includes:
- the database of the distributed data is HBase.
- the storage medium is used to store the application program, and the application program allocates the computing task to the storage node corresponding to the data storage information by using the data storage information in the distributed database as a parameter of the computing task.
- the storage node calculates the data pointed to by the computing task, and only needs to call the local memory data during the calculation process, which reduces the IO redundancy and time consuming caused by multiple data forwarding.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请公开了一种分布式数据计算的任务分配方法和任务分配装置。该任务分配方法,包括:接收分布式数据中计算的目标数据的存储参数;根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;将分区指定到存储节点生成计算任务进行计算。通过分布式数据库中的数据存储信息将计算任务分配到数据对应的存储节点,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
Description
本申请要求于2015年8月5日提交中国专利局、申请号为201510472782.3发明名称为“一种分布式数据计算的任务分配方法和任务分配装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及视频监控技术领域,尤其涉及一种分布式数据计算的任务分配方法和任务分配装置。
目前Spark中弹性分布式数据集的数据初始化方式主要有两种:直接从集合中获取数据,并存入RDD(Resilient Distributed Datasets,弹性分布式数据集)中;读取本地或者分布式文件系统(HDFS、S3等)的文本文件、sequence文件等。对于HBase中的数据,主要是通过HBase客户端拉取数据,进行转化处理后,保存到RDD中,分发到多个切片中,再通过RDD的算子进行分布式计算。这样数据需要反复走网络,造成IO冗余,增加耗时。
发明内容
本申请的目的是提供一种分布式数据计算的任务分配方法和任务分配装置,其将分布式数据库中的数据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
为实现上述目的,具体采用以下技术方案:
一方面采用一种分布式数据计算的任务分配方法,包括:
接收分布式数据中计算的目标数据的存储参数;
根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;
将分区指定到存储节点生成计算任务进行计算。
另一方面采用一种分布式数据计算的任务分配装置,包括:
目标数据确认单元,用于接收分布式数据中计算的目标数据的存储参数;
目标数据映射单元,用于根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;
计算任务分配单元,用于将分区指定到存储节点生成计算任务进行计算。
本申请的提供一种电子设备,其特征在于,所述电子设备包括:壳体、处理器、存储器、电路板和电源电路,其中,所述电路板安置在所述壳体围成的空间内部,所述处理器和所述存储器设置在所述电路板上;所述电源电路,用于为所述电子设备的各个电路或器件供电;所述存储器用于存储可执行程序代码;所述处理器通过读取所述存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行所述分布式数据计算的任务分配方法。
本申请的还提供一种应用程序,所述应用程序用于在运行时执行所述分布式数据计算的任务分配方法。
本申请的还提供一种存储介质,所述存储介质用于存储应用程序,所述应用程序用于执行所述分布式数据计算的任务分配方法。
本申请的有益效果在于:通过将分布式数据库中的数据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
图1是本申请具体实施方式中提供的一种分布式数据计算的任务分配方法的第一实施例的方法流程图;
图2是本申请具体实施方式中提供的一种分布式数据计算的任务分配方法的第二实施例的方法流程图;
图3是本申请具体实施方式中提供的一种分布式数据计算的任务分配方法的第二实施例中数据的结构示意图;
图4是本申请具体实施方式中提供的一种分布式数据计算的任务分配方法的第二实施例中计算任务的示意图;
图5是本申请具体实施方式中提供的一种分布式数据计算的任务分配装
置的第一实施例的结构方框图;
图6是本申请具体实施方式中提供的一种分布式数据计算的任务分配装置的第二实施例的结构方框图。
为使本申请的目的、技术方案和优点更加清楚明了,下面结合具体实施方式并参照附图,对本申请进一步详细说明。应该理解,这些描述只是示例性的,而并非要限制本申请的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本申请的概念。
请参考图1,其是本申请具体实施方式中提供的一种分布式数据计算的任务分配方法的第一实施例的方法流程图。本实施例中的任务分配方法,主要用于分布式数据库中对大量数据进行并行计算,提高计算效率。如图所示,该任务分配方法,包括:
步骤S101:接收分布式数据中计算的目标数据的存储参数。
分布式数据库的基本思想是将原来集中式数据库中的数据分散存储到多个通过网络连接的数据存储节点上,以获取更大的存储容量和更高的并发访问量。分布式数据库系统通常使用较小的计算机系统,每台计算机可单独放在一个地方,每台计算机中都可能有DBMS(Database Management System,数据库管理系统)的一份完整拷贝副本,或者部分拷贝副本,并具有自己局部的数据库,位于不同地点的许多计算机通过网络互相连接,共同组成一个完整的、全局的逻辑上集中、物理上分布的大型数据库。
分布式数据中计算的目标数据只是一个存储节点其中的一个或多个数据表,具体到数据表中的一段数据。在进行任务分配时,进行任务分配的节点只需要读取待处理的数据在数据表中的起止位置即可,不需要将所有的数据集中到本节点。如果将所有的数据本身集中到一个节点,可能需要几TB的数据传输量,如果仅仅将目标数据的存储参数集中,可能只需要不超过5M的数据传输量,免除了数据集中时大量的数据传输。
步骤S102:根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区。
一般而言,需要处理的数据在数据表中是一段连续的记录,这一段连续
的记录分属于不同的存储节点,为方便数据处理,免除数据传输。在分布式数据库中,每一个数据片中的数据都是位于同一存储节点。在本方案中,即以数据片为基本的数据单位进行处理。
步骤S103:将分区指定到存储节点生成计算任务进行计算。
在将计算任务发送到存储节点进行处理时,并不是如现有技术中,将需要处理的数据随机派发到存储节点,而是根据存储信息发送到数据片对应的存储节点,并且发送的内容也不是大量的数据本身,而是数据的相关存储参数,各个存储节点在接收到计算任务之后,根据数据片所在的目标数据表的表名、数据片的起始和终止位置读取数据,根据计算方式执行计算任务。整个计算过程中,所有的数据都相当于从本地读取,减少了数据IO冗余,避免了由此导致的耗时。
综上所述,通过将分布式数据库中的数据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
请参考图2,其是申请具体实施方式中提供的一种分布式数据计算的任务分配方法的第二实施例的方法流程图,如图所示,该方法包括:
步骤S201:接收分布式数据中计算的目标数据的存储参数。
所述分布式数据的数据库为HBase。
HBase是一个分布式的、面向列的开源数据库,HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库;另一个不同的是HBase基于列的而不是基于行的模式。
本实施例中基于HBase的方案,相当于自定义弹性数据集,根据HBase的数据分区规则与用户输入的目标数据范围划分弹性数据集,将HBase数据表的数据片映射到弹性数据集的分区,指定分区数据的处理节点。实现在使用并行计算框架(例如Spark)分布式计算Hbase数据表中数据时,Spark的工作节点中的任务所处理的数据均是本节点内存中的HBase数据,最终实现针对HBase数据的分布式内存并行计算。
步骤S202:根据所述存储参数判断所述数据片中的数据是否全部属于目
标数据。
在分布式数据库中,随着数据表中记录的不断增加,数据表会分裂成多个数据片,每个数据片中对应的数据存储到一个存储节点。具体到HBase中,当数据表随着记录数不断增加而变大后,会逐渐分裂成多份regions,一个region由[startkey,endkey)表示,其中startkey和endkey分别表示region的起始位置和终止位置;不同的region会被Master分配给相应的RegionServer进行管理,存储信息相当于RegionServer的信息。
在本方案中,目标数据至少与两个数据片相关联,如果所有的目标数据处于同一存储节点中,直接向该存储节点发送计算任务即可,无需进行并行计算。
因为单个数据片中的数据不一定全是需要计算的目标数据,在实际进行计算时,需要对数据片中的数据进行校准,将需要进行处理的数据映射到弹性分布式数据集中,弹性分布式数据集的一个分区对应一个数据片,弹性分布式数据集中的数据均是需要处理的目标数据。
步骤S203:若所述数据片中的数据全部属于目标数据,将该数据片映射到弹性分布式数据集的一个分区。
步骤S204:若所述数据片中的数据不是全部属于目标数据,将该数据片中属于目标数据的部分映射到弹性分布式数据集的一个分区。
因为数据片中本身已经记载有该数据片的存储节点的相关信息,在映射到分区时,分区同样会携带存储节点的相关信息。
步骤S205:将分区指定到该分区对应的数据片所在的存储节点。
各个分区有数据片映射而来,携带有数据片对应的存储信息,直接根据存储信息指定到对应的存储节点即可。
步骤S206:调用转化算子,在所述存储节点根据分区的数据生成计算任务。
根据数据片中本身记载的该数据片的存储节点的相关信息和目标数据信息,可以获得该数据片的计算任务数据单。
步骤S207:调用行动算子对所述计算任务进行计算。
计算任务已经在每个存储节点中生成,每个存储节点的计算任务根据计
算任务数据单调用本存储节点中与计算任务相关的数据,对其进行计算。
步骤S208:接收每个存储节点返回的计算任务的处理结果。
各个存储节点对计算任务的处理结果需要回收,各个存储节点本身也可能缓存处理结果,供迭代使用。
针对HBase中的任务分配方法的处理过程,结合图3和图4,对本方案进行进一步说明。
在分配计算任务前,获取目标数据的目标数据表的表名、目标数据表中目标数据的起始位置、目标数据表中目标数据的终止位置;其中目标数据的结构如图3所示,其中TableDes表示目标数据表的表名,Lx表示目标数据的起始位置,Ly表示目标数据的终止位置。获取目标数据关联的至少也两个数据片,也就是图3中的数据片Region1、Region2、Region3、…、Regioni,每个数据片Regioni的起始位置和终止位置分别为Lm、Ln(m=2i-1,n=2i),即HBase数据库中该数据片的起止位置。再将数据片中的无效数据去除后得到更加精确的分区P1、P2、P3、…、Pi。分区的起止位置也作为参数创建弹性分布式数据集的分区。数据片和分区的关系如图3所示。图3中所示的L1-Lx、Ly-Ln区间内的数据即为无效数据。
将HBase的Region数据映射到弹性分布式数据集的分区,每个相关的Region产生一个分区,对应将会生成一个计算任务。通过Regioni信息得到该Regioni所在节点Ni,Regioni对应弹性数据集的分区Pi,在最优选择Pi的处理节点时,指定为Ni。调用存储节点的转化算子,生成弹性分布式数据集中所有分区P1、P2、P3…Pi的计算任务Task1、Task2、Task3…Taski。Task根据分区产生,与对应的分区数据在同一存储节点。从而保证处理Regioni数据的存储节点就是Regioni所在的存储节点。HBase的数据片Regioni,弹性数据集的分区Pi(数据片Pi),存储节点Ni,作业Taski的对应关系如图4所示。
存储节点Ni中Pi读取本节点内存中Regioni数据,Taski处理Pi分区数据,节点Ni执行作业Taski,得到结果Ri返回,同时可以缓存中间结果,供迭代使用。
另外,使用缓存中间结果进行迭代,无需重头开始计算,能够有效地提
高计算效率。
调用行动算子,执行作业Task1、Task2、Task3、…、Taski,进行不同的业务。通过弹性分布式数据集汇总所有作业的结果数据,完成计算任务。
整体而言,需要处理的数据量越大,本方案的技术效果越明显,由发送1M数据变为发送1M数据的存储信息,其减少的IO冗余和耗时还可以说不太明显;但是发送1G数据甚至1T数据变为发送1G数据甚至1T数据的存储信息,其减少的IO冗余和耗时则是十分突出的。
综上所述,通过将分布式数据库中的数据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
以下是本申请具体实施方式中提供的一种分布式数据计算的任务分配装置的实施例,任务分配装置的实施例基于上述的任务分配方法的实施例实现,在任务分配装置的实施例中未尽的阐述,请参考上述的任务分配方法的实施例。
请参考图5,其是本申请具体实施方式中提供的一种分布式数据计算的任务分配装置的第一实施例的结构方框图,如图所示,该任务分配装置,包括:
目标数据确认单元310,用于接收分布式数据中计算的目标数据的存储参数;
目标数据映射单元320,用于根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;
计算任务分配单元330,用于将分区指定到存储节点生成计算任务进行计算。
在本方案中,任何一个存储节点都可进行计算任务的派发,其它任何具备权限的客户端可以根据用户的需要选择数据派发计算任务,因为用户端本身不涉及到数据本身的传输和访问,所以只要能够通过网络接入分布式数据库的终端设备基本都可是实现该方案,实现对数据库的更为广泛的使用。
综上所述,上述各单元的协同工作,通过将分布式数据库中的数据存储
信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
请参考图6,其是本申请具体实施方式中提供的一种分布式数据计算的任务分配装置的第二实施例的结构方框图,如图所示,该任务分配装置,包括:
目标数据确认单元310,用于接收分布式数据中计算的目标数据的存储参数;
目标数据映射元320,用于根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;
计算任务分配单元330,用于将分区指定到存储节点生成计算任务进行计算。
其中,所述目标数据映射单元320,包括:
数据片判断模块321,用于根据所述存储参数判断所述数据片中的数据是否全部属于目标数据;
第一映射模块322,用于若所述数据片中的数据全部属于目标数据,将该数据片映射到弹性分布式数据集的一个分区;
第二映射模块323,用于若所述数据片中的数据不是全部属于目标数据,将该数据片中属于目标数据的部分映射到弹性分布式数据集的一个分区。
其中,所述计算任务分配单元330,包括:
分区指定模块331,用于将分区指定到该分区对应的数据片所在的存储节点;
计算任务生成模块332,用于调用转化算子,在所述存储节点根据分区的数据生成计算任务;
计算任务执行模块333,用于调用行动算子对所述计算任务进行计算。
其中,还包括:
结果接收单元340,用于接收每个存储节点返回的计算任务的处理结果。
其中,所述分布式数据的数据库为HBase。
综上所述,上述各单元和模块的协同合作,通过将分布式数据库中的数
据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
本申请实施例提供了一种电子设备,所述电子设备包括:壳体、处理器、存储器、电路板和电源电路,其中,所述电路板安置在所述壳体围成的空间内部,所述处理器和所述存储器设置在所述电路板上;所述电源电路,用于为所述电子设备的各个电路或器件供电;所述存储器用于存储可执行程序代码;所述处理器通过读取所述存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行以下步骤:
接收分布式数据中计算的目标数据的存储参数;
根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;
将分区指定到存储节点生成计算任务进行计算。
处理器对上述步骤的具体执行过程以及处理器通过运行可执行程序代码来进一步执行的步骤,可以参见本申请图1-6所示实施例的描述,在此不再赘述。
由上可见,本申请实施例中,通过将分布式数据库中的数据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
该电子设备以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具
和便携式车载导航设备。
(4)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。
(5)其他具有数据交互功能的电子装置。
本申请实施例提供了一种应用程序,该应用程序用于在运行时执行本申请实施例提供的分布式数据计算的任务分配方法。其中,分布式数据计算的任务分配方法,包括:
接收分布式数据中计算的目标数据的存储参数;
根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;
将分区指定到存储节点生成计算任务进行计算。
本申请的一种实现方式中,上述应用程序运行时所执行的分布式数据计算的任务分配方法中,所述根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区,包括:
根据所述存储参数判断所述数据片中的数据是否全部属于目标数据;
若所述数据片中的数据全部属于目标数据,将该数据片映射到弹性分布式数据集的一个分区;
若所述数据片中的数据不是全部属于目标数据,将该数据片中属于目标数据的部分映射到弹性分布式数据集的一个分区。
本申请的一种实现方式中,上述应用程序运行时所执行的分布式数据计算的任务分配方法中,所述将分区指定到存储节点生成计算任务进行计算,包括:
将分区指定到该分区对应的数据片所在的存储节点;
调用转化算子,在所述存储节点根据分区的数据生成计算任务;
调用行动算子对所述计算任务进行计算。
本申请的一种实现方式中,上述应用程序运行时所执行的分布式数据计
算的任务分配方法中,所述将分区指定到存储节点生成计算任务进行计算之后,还包括:
接收存储节点返回的计算任务的处理结果。
本申请的一种实现方式中,上述应用程序运行时所执行的分布式数据计算的任务分配方法中,所述分布式数据的数据库为HBase。
本申请实施例中,上述应用程序通过将分布式数据库中的数据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
本申请实施例提供了一种存储介质,用于存储应用程序,该应用程序用于执行本申请实施例提供的分布式数据计算的任务分配方法。其中,分布式数据计算的任务分配方法,包括:
接收分布式数据中计算的目标数据的存储参数;
根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;
将分区指定到存储节点生成计算任务进行计算。
在本申请的一种实现方式中,上述存储介质存储的应用程序所执行的分布式数据计算的任务分配方法中,所述根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区,包括:
根据所述存储参数判断所述数据片中的数据是否全部属于目标数据;
若所述数据片中的数据全部属于目标数据,将该数据片映射到弹性分布式数据集的一个分区;
若所述数据片中的数据不是全部属于目标数据,将该数据片中属于目标数据的部分映射到弹性分布式数据集的一个分区。
在本申请的一种实现方式中,上述存储介质存储的应用程序所执行的分布式数据计算的任务分配方法中,所述将分区指定到存储节点生成计算任务进行计算,包括:
将分区指定到该分区对应的数据片所在的存储节点;
调用转化算子,在所述存储节点根据分区的数据生成计算任务;
调用行动算子对所述计算任务进行计算。
在本申请的一种实现方式中,上述存储介质存储的应用程序所执行的分布式数据计算的任务分配方法中,所述将分区指定到存储节点生成计算任务进行计算之后,还包括:
接收存储节点返回的计算任务的处理结果。
在本申请的一种实现方式中,上述存储介质存储的应用程序所执行的分布式数据计算的任务分配方法中,所述分布式数据的数据库为HBase。
本申请实施例中,上述存储介质用于存储上述应用程序,上述应用程序通过将分布式数据库中的数据存储信息作为计算任务的参数,再将计算任务分配到数据存储信息对应的存储节点,由存储节点对计算任务中指向的数据进行计算,计算过程中只需调用本地内存数据,减少了多次数据转发导致的IO冗余和耗时。
应当理解的是,本申请的上述具体实施方式仅仅用于示例性说明或解释本申请的原理,而不构成对本申请的限制。因此,在不偏离本申请的精神和范围的情况下所做的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。此外,本申请所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。
尽管已经详细描述了本申请的实施方式,但是应该理解的是,在不偏离本申请的精神和范围的情况下,可以对本申请的实施方式做出各种改变、替换和变更。
Claims (13)
- 一种分布式数据计算的任务分配方法,其特征在于,包括:接收分布式数据中计算的目标数据的存储参数;根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;将分区指定到存储节点生成计算任务进行计算。
- 根据权利要求1所述的一种分布式数据计算的任务分配方法,其特征在于,所述根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区,包括:根据所述存储参数判断所述数据片中的数据是否全部属于目标数据;若所述数据片中的数据全部属于目标数据,将该数据片映射到弹性分布式数据集的一个分区;若所述数据片中的数据不是全部属于目标数据,将该数据片中属于目标数据的部分映射到弹性分布式数据集的一个分区。
- 根据权利要求1所述的一种分布式数据计算的任务分配方法,其特征在于,所述将分区指定到存储节点生成计算任务进行计算,包括:将分区指定到该分区对应的数据片所在的存储节点;调用转化算子,在所述存储节点根据分区的数据生成计算任务;调用行动算子对所述计算任务进行计算。
- 根据权利要求1所述的一种分布式数据计算的任务分配方法,其特征在于,所述将分区指定到存储节点生成计算任务进行计算之后,还包括:接收存储节点返回的计算任务的处理结果。
- 根据权利要求1所述的一种分布式数据计算的任务分配方法,其特征在于,所述分布式数据的数据库为HBase。
- 一种分布式数据计算的任务分配装置,其特征在于,包括:目标数据确认单元,用于接收分布式数据中计算的目标数据的存储参数;目标数据映射单元,用于根据所述存储参数将所述目标数据的数据片映射到弹性分布式数据集,每个数据片分别对应所述弹性分布式数据集中的一个分区;计算任务分配单元,用于将分区指定到存储节点生成计算任务进行计算。
- 根据权利要求6所述的一种分布式数据计算的任务分配装置,其特征在于,所述目标数据映射单元,包括:数据片判断模块,用于根据所述存储参数判断所述数据片中的数据是否全部属于目标数据;第一映射模块,用于若所述数据片中的数据全部属于目标数据,将该数据片映射到弹性分布式数据集的一个分区;第二映射模块,用于若所述数据片中的数据不是全部属于目标数据,将该数据片中属于目标数据的部分映射到弹性分布式数据集的一个分区。
- 根据权利要求6所述的一种分布式数据计算的任务分配装置,其特征在于,所述计算任务分配单元,包括:分区指定模块,用于将分区指定到该分区对应的数据片所在的存储节点;计算任务生成模块,用于调用转化算子,在所述存储节点根据分区的数据生成计算任务;计算任务执行模块,用于调用行动算子对所述计算任务进行计算。
- 根据权利要求6所述的一种分布式数据计算的任务分配装置,其特征在于,还包括:结果接收单元,用于接收每个存储节点返回的计算任务的处理结果。
- 根据权利要求6所述的一种分布式数据计算的任务分配装置,其特征在于,所述分布式数据的数据库为HBase。
- 一种电子设备,其特征在于,所述电子设备包括:壳体、处理器、存储器、电路板和电源电路,其中,所述电路板安置在所述壳体围成的空间内部,所述处理器和所述存储器设置在所述电路板上;所述电源电路,用于为所述电子设备的各个电路或器件供电;所述存储器用于存储可执行程序代码;所述处理器通过读取所述存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行权利要求1-5任一项所述的分布式数据计算的任务分配方法。
- 一种应用程序,其特征在于,所述应用程序用于在运行时执行权利要求1-5任一项所述的分布式数据计算的任务分配方法。
- 一种存储介质,其特征在于,所述存储介质用于存储应用程序,所述应用程序用于执行权利要求1-5任一项所述的分布式数据计算的任务分配方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16832125.5A EP3333718B1 (en) | 2015-08-05 | 2016-05-25 | Task allocation method and task allocation apparatus for distributed data calculation |
US15/749,999 US11182211B2 (en) | 2015-08-05 | 2016-05-25 | Task allocation method and task allocation apparatus for distributed data calculation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510472782.3 | 2015-08-05 | ||
CN201510472782.3A CN106445676B (zh) | 2015-08-05 | 2015-08-05 | 一种分布式数据计算的任务分配方法和任务分配装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017020637A1 true WO2017020637A1 (zh) | 2017-02-09 |
Family
ID=57942390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/083279 WO2017020637A1 (zh) | 2015-08-05 | 2016-05-25 | 一种分布式数据计算的任务分配方法和任务分配装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11182211B2 (zh) |
EP (1) | EP3333718B1 (zh) |
CN (1) | CN106445676B (zh) |
WO (1) | WO2017020637A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109636097A (zh) * | 2018-11-01 | 2019-04-16 | 中车工业研究院有限公司 | 一种产品设计任务的分配方法及装置 |
CN112084017A (zh) * | 2020-07-30 | 2020-12-15 | 北京聚云科技有限公司 | 一种内存管理方法、装置、电子设备及存储介质 |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105183904B (zh) * | 2015-09-30 | 2020-01-10 | 北京金山安全软件有限公司 | 一种信息推送方法、装置及电子设备 |
CN107704320B (zh) * | 2017-05-12 | 2018-08-17 | 贵州白山云科技有限公司 | 一种分布式系统的任务分配方法及系统 |
CN108932157B (zh) * | 2017-05-22 | 2021-04-30 | 北京京东尚科信息技术有限公司 | 分布式处理任务的方法、系统、电子设备和可读介质 |
CN107256158B (zh) * | 2017-06-07 | 2021-06-18 | 广州供电局有限公司 | 电力系统负荷削减量的检测方法和系统 |
TWI675335B (zh) * | 2017-06-09 | 2019-10-21 | 宏達國際電子股份有限公司 | 訓練任務優化系統、訓練任務優化方法及其非暫態電腦可讀媒體 |
CN109428861A (zh) * | 2017-08-29 | 2019-03-05 | 阿里巴巴集团控股有限公司 | 网络通信方法及设备 |
CN107679701B (zh) * | 2017-09-08 | 2021-02-05 | 广州供电局有限公司 | 负荷削减并行计算方法及装置 |
CN107888684A (zh) * | 2017-11-13 | 2018-04-06 | 小草数语(北京)科技有限公司 | 分布式系统计算任务处理方法、装置及控制器 |
CN110109892B (zh) * | 2018-01-25 | 2021-09-10 | 杭州海康威视数字技术股份有限公司 | 一种数据迁移方法、装置及电子设备 |
CN111190949B (zh) * | 2018-11-15 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | 数据存储及处理方法、装置、设备、介质 |
CN110795217B (zh) * | 2019-09-27 | 2022-07-15 | 广东浪潮大数据研究有限公司 | 一种基于资源管理平台的任务分配方法及系统 |
CN110855671B (zh) * | 2019-11-15 | 2022-02-08 | 三星电子(中国)研发中心 | 一种可信计算方法和系统 |
CN111090519B (zh) * | 2019-12-05 | 2024-04-09 | 东软集团股份有限公司 | 任务执行方法、装置、存储介质及电子设备 |
CN115551548A (zh) * | 2020-03-23 | 2022-12-30 | 赫德特生物公司 | 用于rna递送的组合物和方法 |
CN113672356A (zh) * | 2020-05-13 | 2021-11-19 | 北京三快在线科技有限公司 | 计算资源调度方法和装置、存储介质和电子设备 |
CN112487125B (zh) * | 2020-12-09 | 2022-08-16 | 武汉大学 | 一种面向时空大数据计算的分布式空间对象组织方法 |
CN112685177A (zh) * | 2020-12-25 | 2021-04-20 | 联想(北京)有限公司 | 一种服务器节点的任务分配方法及装置 |
CN112685438B (zh) * | 2020-12-29 | 2023-03-24 | 杭州海康威视数字技术股份有限公司 | 数据处理系统、方法、装置及存储介质 |
CN112965796B (zh) * | 2021-03-01 | 2024-04-09 | 亿企赢网络科技有限公司 | 一种任务调度系统、方法和装置 |
CN113626207B (zh) * | 2021-10-12 | 2022-03-08 | 苍穹数码技术股份有限公司 | 地图数据处理方法、装置、设备及存储介质 |
CN114386384B (zh) * | 2021-12-06 | 2024-03-19 | 鹏城实验室 | 一种大规模长文本数据的近似重复检测方法、系统及终端 |
CN114398105A (zh) * | 2022-01-20 | 2022-04-26 | 北京奥星贝斯科技有限公司 | 一种计算引擎从分布式数据库加载数据的方法及装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104219279A (zh) * | 2013-06-04 | 2014-12-17 | 国际商业机器公司 | 用于超大规模分布式处理应用的模块化架构的系统和方法 |
CN104360903A (zh) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Spark作业调度系统中实现任务数据解耦的方法 |
US20150066646A1 (en) * | 2013-08-27 | 2015-03-05 | Yahoo! Inc. | Spark satellite clusters to hadoop data stores |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5813025A (en) * | 1994-08-10 | 1998-09-22 | Unisys Corporation | System and method for providing variable sector-format operation to a disk access system |
US8418181B1 (en) * | 2009-06-02 | 2013-04-09 | Amazon Technologies, Inc. | Managing program execution based on data storage location |
JP5488029B2 (ja) * | 2010-02-19 | 2014-05-14 | 富士通株式会社 | 分散処理システム、分散処理方法、及びプログラム |
US8346845B2 (en) * | 2010-04-14 | 2013-01-01 | International Business Machines Corporation | Distributed solutions for large-scale resource assignment tasks |
KR20120082218A (ko) * | 2011-01-13 | 2012-07-23 | (주)인디링스 | 파티션 정보를 기초로 호스트의 요청에 대한 처리 기법을 적응적으로 결정하는 스토리지 장치 및 상기 스토리지 장치의 동작 방법 |
US9588994B2 (en) * | 2012-03-02 | 2017-03-07 | International Business Machines Corporation | Transferring task execution in a distributed storage and task network |
CN103677752B (zh) * | 2012-09-19 | 2017-02-08 | 腾讯科技(深圳)有限公司 | 基于分布式数据的并发处理方法和系统 |
CN103019853A (zh) * | 2012-11-19 | 2013-04-03 | 北京亿赞普网络技术有限公司 | 一种作业任务的调度方法和装置 |
US9338234B2 (en) * | 2014-04-16 | 2016-05-10 | Microsoft Technology Licensing, Llc | Functional programming in distributed computing |
US9369782B2 (en) * | 2014-09-17 | 2016-06-14 | Neurio Technology Inc. | On-board feature extraction and selection from high frequency electricity consumption data |
-
2015
- 2015-08-05 CN CN201510472782.3A patent/CN106445676B/zh active Active
-
2016
- 2016-05-25 US US15/749,999 patent/US11182211B2/en active Active
- 2016-05-25 WO PCT/CN2016/083279 patent/WO2017020637A1/zh active Application Filing
- 2016-05-25 EP EP16832125.5A patent/EP3333718B1/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104219279A (zh) * | 2013-06-04 | 2014-12-17 | 国际商业机器公司 | 用于超大规模分布式处理应用的模块化架构的系统和方法 |
US20150066646A1 (en) * | 2013-08-27 | 2015-03-05 | Yahoo! Inc. | Spark satellite clusters to hadoop data stores |
CN104360903A (zh) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Spark作业调度系统中实现任务数据解耦的方法 |
Non-Patent Citations (2)
Title |
---|
CHINA DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, 15 April 2015 (2015-04-15), XP009508528, ISSN: 1674-022X * |
See also references of EP3333718A4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109636097A (zh) * | 2018-11-01 | 2019-04-16 | 中车工业研究院有限公司 | 一种产品设计任务的分配方法及装置 |
CN109636097B (zh) * | 2018-11-01 | 2021-09-21 | 中车工业研究院有限公司 | 一种产品设计任务的分配方法及装置 |
CN112084017A (zh) * | 2020-07-30 | 2020-12-15 | 北京聚云科技有限公司 | 一种内存管理方法、装置、电子设备及存储介质 |
CN112084017B (zh) * | 2020-07-30 | 2024-04-19 | 北京聚云科技有限公司 | 一种内存管理方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US11182211B2 (en) | 2021-11-23 |
CN106445676A (zh) | 2017-02-22 |
CN106445676B (zh) | 2019-10-22 |
US20180232257A1 (en) | 2018-08-16 |
EP3333718A4 (en) | 2019-03-27 |
EP3333718B1 (en) | 2020-06-24 |
EP3333718A1 (en) | 2018-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017020637A1 (zh) | 一种分布式数据计算的任务分配方法和任务分配装置 | |
JP7130600B2 (ja) | ファーストクラスデータベース要素としての半構造データの実装 | |
US9454558B2 (en) | Managing an index of a table of a database | |
US8468120B2 (en) | Systems and methods for tracking and reporting provenance of data used in a massively distributed analytics cloud | |
US20120054182A1 (en) | Systems and methods for massive structured data management over cloud aware distributed file system | |
US10002170B2 (en) | Managing a table of a database | |
US10248346B2 (en) | Modular architecture for extreme-scale distributed processing applications | |
CN104462225B (zh) | 一种数据读取的方法、装置及系统 | |
Greenberg et al. | {MDHIM}: A Parallel {Key/Value} Framework for {HPC} | |
US9971808B2 (en) | Fast query processing in columnar databases with GPUs | |
Liu et al. | Firefly: Untethered multi-user {VR} for commodity mobile devices | |
US20130226955A1 (en) | Bi-temporal key value cache system | |
CN104035938A (zh) | 一种性能持续集成数据处理的方法及装置 | |
WO2020019313A1 (zh) | 一种图数据的更新方法、系统、计算机可读存储介质及设备 | |
Zhang et al. | In‐memory staging and data‐centric task placement for coupled scientific simulation workflows | |
US10223256B1 (en) | Off-heap memory management | |
CN107943846A (zh) | 数据处理方法、装置及电子设备 | |
Premchaiswadi et al. | Optimizing and tuning MapReduce jobs to improve the large‐scale data analysis process | |
US11704327B2 (en) | Querying distributed databases | |
US11030714B2 (en) | Wide key hash table for a graphics processing unit | |
US9229659B2 (en) | Identifying and accessing reference data in an in-memory data grid | |
WO2020019315A1 (zh) | 一种基于图数据的计算运行调度方法、系统、计算机可读介质及设备 | |
US20210149960A1 (en) | Graph Data Storage Method, System and Electronic Device | |
US20130282654A1 (en) | Query engine communication | |
CN106897278B (zh) | 用于键值数据库的数据读写处理方法及设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16832125 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15749999 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2016832125 Country of ref document: EP |