CN110046040B

CN110046040B - Distributed task processing method and system and storage medium

Info

Publication number: CN110046040B
Application number: CN201910280104.5A
Authority: CN
Inventors: 桂绍武
Original assignee: Xiamen Wangsu Co Ltd
Current assignee: Xiamen Wangsu Co Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2021-11-16
Anticipated expiration: 2039-04-09
Also published as: CN110046040A

Abstract

The embodiment of the invention relates to the technical field of cloud services, and discloses a distributed task processing method and system and a storage medium. It includes: the first node receives the data of the resource, obtains a data storage unit corresponding to the resource according to the information of the resource, and stores the data of the resource in the corresponding data storage unit; the second node responds to the request of the third node, and returns the number of the third nodes participating in processing the data of the resource to the third node; and the third nodes determine the corresponding target data storage units according to the number of the third nodes participating in processing the data of the resources and the number of the data storage units in the first nodes, and acquire the data of the corresponding resources from the target data storage units for processing. The invention not only can reduce the consumption caused by the competition of processing tasks, but also can ensure that all the tasks are processed in time.

Description

Distributed task processing method and system and storage medium

Technical Field

The invention relates to the technical field of cloud services, in particular to a distributed task processing method and system and a storage medium.

Background

With the development of science and technology, cloud services are more and more widely applied. In the cloud service system, data such as utilization rate of resources such as a CPU (central processing unit) and a memory of a virtual machine of the cloud service are monitored through some nodes, so that a user can decide whether capacity expansion is needed or not. Meanwhile, other processing needs to be performed on the data of the resources, such as calculating an average value of CPU utilization, so as to better maintain the cloud service. When performing other processing on the data of the resource, for example, a piece of related data of one resource is received every minute, and it is necessary to process the received new data in a unit time. The data of the same resource are related, and the data of different resources are not related, so that the data of the same resource cannot be processed by a plurality of nodes at the same time, and errors are avoided. As the resources of the cloud service system are continuously increased and the amount of data to be processed is also continuously increased, the data of the resources are processed by adopting a distributed system. The distributed system is convenient to expand and can flexibly increase the processing capacity of the system.

In the existing distributed system, the following method is adopted to allocate the data processing tasks of resources: the method comprises the steps that received new data are placed into a set (for example, a data set of Redis) through a first node, a node (hereinafter referred to as a second node) for processing data obtains all data needing to be processed by traversing the set, all resources needing to be processed are obtained according to the corresponding relation between the data and the resources, the resources are sequenced, all second nodes are obtained at the same time and sequenced, and then the corresponding number of the resources are intercepted from all the resources in sequence to serve as tasks needing to be processed by the second node according to the sequence of the second node and the number of the tasks needing to be processed. The number of resources processed by each second node is, for example, the number of processes multiplied by the number of tasks per process (constant). Since the data receiving and processing are performed synchronously and dynamically, all the second nodes do not acquire tasks at the same time, and therefore, all the task resource lists acquired by each second node may be inconsistent, and each second node only intercepts tasks belonging to the node, so that a part of tasks may not be allocated among the intercepted tasks, which results in that the tasks are not processed within the minute and data of the same resource may not be processed for a long time. For example, the first node receives 50 new data, at this time, the second node a obtains that two second nodes (second nodes a and B) currently process the new data together, the second node a intercepts the 25 new data for processing, when the second node B processes the new data, the first node receives 10 new data, the second node B obtains 30 new data to be processed by each node, and therefore the second node B intercepts the 31 th to 60 th new data for processing, so that the 25 th to 30 th data may not be processed all the time, and therefore, the service needs to be restarted, and the time for acquiring the task list by each second node and the node sequence of each second node are changed to process the tasks that are not processed. Meanwhile, in the above method, each second node may compete in task allocation, which results in additional time consumption, and may also result in no task processing at a second node arranged behind, which affects processing efficiency.

Disclosure of Invention

The embodiments of the present invention provide a distributed task processing method, system and storage medium for solving one of the problems in the prior art, which not only can reduce the consumption caused by the competition for processing tasks, but also can ensure that all tasks are processed in time.

In order to solve the above technical problem, an embodiment of the present invention provides a distributed task processing method, which is applied to a distributed system, where the distributed system includes a first node, a third node, and a plurality of second nodes, and the second nodes are respectively connected to the first node and the third node in a communication manner; the method comprises the following steps:

the first node receives data of resources, obtains a data storage unit corresponding to the resources according to the information of the resources, and stores the data of the resources in the corresponding data storage unit; wherein the resource and the data storage unit are both multiple;

the third node responds to the request of the second node and returns the number of the second nodes participating in processing the data of the resource to the second node;

and the second nodes determine corresponding target data storage units according to the number of the second nodes participating in processing the data of the resources and the number of the data storage units in the first nodes, and acquire the data of the corresponding resources from the target data storage units for processing.

The embodiment of the present invention further provides a distributed task processing system, including: the system comprises a first node, a third node and a plurality of second nodes, wherein the second nodes are respectively in communication connection with the first node and the third node;

the first node is used for receiving data of resources, obtaining a data storage unit corresponding to the resources according to the information of the resources, and storing the data of the resources in the corresponding data storage unit; wherein the resource and the data storage unit are both multiple;

the third node is used for responding to the request of the second node and returning the number of the second nodes participating in processing the data of the resource to the second node;

the second nodes are used for determining corresponding target data storage units according to the number of the second nodes participating in processing the data of the resources and the number of the data storage units in the first nodes, and acquiring the data of the corresponding resources from the target data storage units for processing.

Embodiments of the present invention also provide a storage medium storing a computer-readable program for causing a computer to execute the distributed task processing method as described above.

Compared with the prior art, the method and the device for processing the data of the resources have the advantages that the first node obtains the data storage units corresponding to the resources according to the received information of the resources and stores the data of the resources in the corresponding data storage units, the second node obtains the number of the data storage units from the first node, obtains the number of the second nodes participating in processing the data of the resources from the third node, determines the target data storage units corresponding to the second nodes according to the number of the data storage units and the number of the second nodes participating in processing the data of the resources, and obtains the data of the corresponding resources from the target data storage units for processing. Because the data of different resources are correspondingly stored in different data storage units, the second node does not need to traverse all the data to determine the corresponding relation between the resources and the data, so that the consumption caused by competition of processing tasks can be reduced, and the second node determines the target data storage unit to be processed according to the number of the data storage units, so that all the resources can be ensured to be processed in time.

As an embodiment, the obtaining the data storage unit corresponding to the resource according to the information of the resource specifically includes: performing hash conversion on the information of the resource and then performing modulus extraction to obtain a data storage unit corresponding to the resource; wherein the modulus is the number of the data storage units.

As an embodiment, the determining, by the second node, respective corresponding target data storage units according to the number of the second nodes participating in processing the data of the resource and the number of data storage units in the first node specifically includes: and obtaining the corresponding target data storage units by evenly distributing all the data storage units to the second nodes participating in processing the data of the resources.

As an embodiment, the obtaining of the target data storage units corresponding to each data storage unit by evenly distributing all the data storage units to the second nodes participating in processing the data of the resource specifically includes: grouping all the data storage units according to the number of the second nodes participating in processing the data of the resources; and selecting a corresponding data storage unit from the data storage units in each group according to the preset sequence of the second nodes participating in processing the data of the resources as a target data storage unit of each second node.

As an embodiment, acquiring data of a corresponding resource from the target data storage unit for processing specifically includes: calculating any one or combination of the following: average CPU utilization, average memory utilization, and URL request count.

As an embodiment, further comprising: and the second node deletes the processed data after acquiring the data of the corresponding resource from the target data storage unit and processing the data.

As an embodiment, the first node and the third node are the same server.

Drawings

FIG. 1 is a block diagram of a distributed task processing system according to the present invention;

FIG. 2 is a flowchart of a distributed task processing method according to a first embodiment of the present invention;

fig. 3 is a flowchart of a distributed task processing method according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present invention in its various embodiments. However, the technical solution claimed in the present invention can be implemented without these technical details and various changes and modifications based on the following embodiments.

A first embodiment of the present invention relates to a distributed task processing method applied to a distributed system as shown in fig. 1, the system including: the system comprises a first node 101, a third node 103 and a plurality of second nodes 102, wherein the second nodes 102 are respectively connected with the first node 101 and the third node 103 in a communication mode. Each second node 102 reports its own related information to the third node 103 at regular time, for example, whether the second node is in an available state or not, so that the third node 103 counts the number of second nodes that can participate in data of the data resource. However, not limited thereto, for example, the first node 101 and the third node 103 may be the same server. In this embodiment, the first node 101 may also function as the second node 102. The distributed task processing method of the embodiment comprises the following steps: the first node receives the data of the resource, obtains a data storage unit corresponding to the resource according to the information of the resource, and stores the data of the resource in the corresponding data storage unit; the system comprises a plurality of resources and data storage units; the third node responds to the request of the second node and returns the number of the second nodes participating in processing the data of the resource to the second node; and the second nodes determine the corresponding target data storage units according to the number of the second nodes participating in processing the data of the resources and the number of the data storage units in the first nodes, and acquire the data of the corresponding resources from the target data storage units for processing. Compared with the prior art, the method and the device for processing the data of the resources have the advantages that the first node obtains the data storage units corresponding to the resources according to the received information of the resources and stores the data of the resources in the corresponding data storage units, the second node obtains the number of the data storage units from the first node, obtains the number of the second nodes participating in processing the data of the resources from the third node, determines the target data storage units corresponding to the second nodes according to the number of the data storage units and the number of the second nodes participating in processing the data of the resources, and obtains the data of the corresponding resources from the target data storage units for processing. Because the data of different resources are correspondingly stored in different data storage units, the second node does not need to traverse all the data to determine the corresponding relation between the resources and the data, so that the consumption caused by competition of processing tasks can be reduced, and the second node determines the target data storage unit to be processed according to the number of the data storage units, so that all the resources can be ensured to be processed in time.

The following describes implementation details of the distributed task processing method according to the present embodiment in detail, and the following is only provided for facilitating understanding of the implementation details and is not necessary for implementing the present embodiment.

Referring to fig. 2, the distributed task processing method of the present embodiment includes steps 201 to 103.

Step 201: the first node receives the data of the resource, obtains a data storage unit corresponding to the resource according to the information of the resource, and stores the data of the resource in the corresponding data storage unit.

Wherein, the resource and the data storage unit are both multiple. The resource is, for example, a resource of a CPU, a memory, and the like of a virtual machine in the cloud service system, and the data of the resource is, for example, a usage rate per minute of the CPU, a usage rate per minute of the memory, and the like. However, the resource may also be a URL, a page, etc., and the data of the resource may be a URL access rate, a page request number, etc., without being limited thereto. The present embodiment is not particularly limited to resources and data thereof. The data of the resources can be monitored and obtained by some nodes in the cloud service system and reported to the first node. The distributed system in this embodiment may be part of a cloud service system.

In the present embodiment, the data storage unit is, for example, a data (data) set of a database such as Redis, and the data storage unit is not particularly limited in the present embodiment. In this embodiment, a plurality of data storage units with a fixed number may be created in the first node, for example, 1024 data storage units are created, and numbers of the 1024 data storage units are, for example, 1 to 1024.

In step 201, the data storage units corresponding to the resources can be obtained by performing hash conversion on the information of the resources and then performing modulus, where the modulus is the number of the data storage units, so that the data storage units corresponding to the resources can be obtained by mapping according to the information of the resources.

The calculation of this mapping is illustrated below: the resource is, for example, a CPU of the cloud server a, and the information of the resource may be a UUID, which may be represented by a 32-bit 16-system word, for example, 0986e0ac-09e8-4f2d-a3c2-66a2473ead 40. And (3) performing hash conversion on the UUID to obtain an integer (12366578741376259739645328858664875L) represented by a 128-bit binary, then performing modulo operation on the total number of the data storage units to obtain the number of the corresponding data storage unit, wherein if the modulus is 1024, 12366578741376259739645328858664875L% 1024 is 320, and since the number of the data storage unit starts from 1, the number 321 of the data storage unit corresponding to the resource can be obtained by adding 1 to the modulo result.

In step 201, under the condition that the modulus is not changed through the modulo operation, the data storage units corresponding to different resources are also not changed, so that there is a fixed corresponding relationship between the resources and the data storage units. In some examples, as the resource scale increases, the number of data storage units may also be increased, for example, the number of data storage units is increased by multiple times, so that the correspondence relationship between most of the resources and the data storage units remains unchanged.

In step 201, the data of the resource is stored in the corresponding data storage unit, specifically, the data of the resource whose UUID is 0986e0ac-09e8-4f2d-a3c2-66a2473ead40 may be stored in the data storage unit with the number of 321.

Step 202: the third node returns to the second node the number of second nodes participating in processing the data of the resource in response to the request of the second node.

The third node can obtain the number of the second nodes which can be used for processing the data of the resource according to the information reported by each second node at regular time. Upon receiving the request of the second node, the number of second nodes participating in processing the data of the resource may be returned thereto.

Step 203: and the second nodes determine the corresponding target data storage units according to the number of the second nodes participating in processing the data of the resources and the number of the data storage units in the first nodes, and acquire the data of the corresponding resources from the target data storage units for processing.

Specifically, each second node may equally allocate the number of data storage units in the first node according to the number of second nodes participating in processing the data of the resource to obtain the corresponding target data storage unit. However, the present invention is not limited thereto, as long as it is ensured that each data storage unit in the first node corresponds to the second processing node. In practical application, the second node may further delete the processed data after acquiring the data of the corresponding resource from the target data storage unit for processing, so that the storage space of the first node may be saved.

The data of the corresponding resource obtained from the target data storage unit may be processed by calculating any one or a combination of the following: the average utilization rate of the CPU, the average utilization rate of the memory and the URL request number can be used for a service party to better maintain the service performance of the cloud service. In the present embodiment, neither the processing target nor the processing result is specifically limited.

Compared with the prior art, the resource mapping method and the resource mapping device have the advantages that the plurality of data storage units are created in advance, the resources are mapped to the corresponding data storage units quickly through modular arithmetic and the like according to the received information of the resources, the data of the resources are stored in the corresponding storage units, the data among different resources are stored independently, the consumption of the second nodes caused by competition for avoiding simultaneous processing of the data of the same resource is greatly reduced, and the data storage units in the first nodes are relatively fixed, and the second nodes acquire the data of the resources to be processed by taking the data storage units as units, so that the data can be effectively prevented from being processed in a missing mode.

The second embodiment of the present invention further defines a method for evenly distributing data processing units to each second node on the basis of the first embodiment, so that data processing tasks can be guaranteed to be evenly distributed to each second node, and processing efficiency can be guaranteed.

Referring to fig. 3, the distributed task processing method of the present embodiment includes steps 301 to 304.

Step 301 and step 302 correspond to step 201 and step 202 in the first embodiment, respectively, and are not described herein again.

Step 303: and obtaining the corresponding target data storage units by evenly distributing all the data storage units to the second nodes participating in processing the data of the resources.

The number of the data storage units is the total number of the data storage units created in advance in the first node, and may be fixed and constant, for example, 1024. The second node may be a node configured to process data of the resource in the distributed system, and the number of the second nodes may be unchanged, or may be horizontally expanded according to the size of the resource, for example, to increase the number of the second nodes. This embodiment mode does not specifically limit this.

The step 303 of obtaining the corresponding target data storage units by evenly allocating all the data storage units to the second nodes participating in processing the data of the resource may include: and grouping all the data storage units according to the number of the second nodes participating in processing the data of the resources, and selecting the corresponding data storage unit from the data storage units in each group according to the preset sequence of the second nodes participating in processing the data of the resources as a target data storage unit of each second node. Wherein the second node participating in processing the data of the resource needs to be sorted after having obtained it. If the number of the data storage units is an integer multiple of the number of the second nodes and no remainder exists, the integer multiple is the number of the packets, and if the remainder exists, the integer multiple plus 1 is the number of the packets. For example, the number of the second nodes is, for example, 100, which are respectively numbered from 1 to 100, the number of the data storage units is 1024, which are respectively numbered from 1 to 1024, so that the data storage units are divided into 11 groups according to the numbers of the data storage units, and each group of the first 10 groups is respectively allocated with one data storage unit to the second node according to the sequence of the second node, and 24 data storage units in the 11 th group can be respectively allocated to the second node according to the sequence from front to back. However, the present invention is not limited thereto, as long as the data storage units can be distributed to the second nodes relatively evenly.

Step 304: and acquiring the data of the corresponding resource from the target data storage unit for processing.

Compared with the prior art, the resource mapping method and the resource mapping device have the advantages that the plurality of data storage units are created in advance, the resources are mapped to the corresponding data storage units quickly through modular arithmetic and the like according to the received information of the resources, the data of the resources are stored in the corresponding storage units, the data among different resources are stored independently, the consumption of the second nodes caused by competition for avoiding simultaneous processing of the data of the same resource is greatly reduced, and the data storage units in the first nodes are relatively fixed, and the second nodes acquire the data of the resources to be processed by taking the data storage units as units, so that the data can be effectively prevented from being processed in a missing mode. In addition, since the data storage units are equally distributed to the second nodes, data can be processed more efficiently.

A third embodiment of the present invention relates to a distributed task processing system, with continued reference to fig. 1, comprising: the system comprises a first node 101, a third node 103 and a plurality of second nodes 102, wherein the second nodes 102 are respectively connected with the first node 101 and the third node 103 in a communication mode. Each second node 102 reports its own related information to the third node 103 at regular time, for example, whether the second node is in an available state or not, so that the third node 103 counts the number of second nodes that can participate in data of the data resource. However, not limited thereto, for example, the first node 101 and the third node 103 may be the same server.

The first node 101 is configured to receive data of a resource, obtain a data storage unit corresponding to the resource according to the information of the resource, and store the data of the resource in the corresponding data storage unit; the system comprises a plurality of resources and data storage units;

the third node 103 is configured to respond to the request of the second node, and return the number of the second nodes participating in processing the data of the resource to the second node;

the second nodes 102 are configured to determine respective corresponding target data storage units according to the number of the second nodes 102 participating in processing the data of the resource and the number of the data storage units in the first nodes 101, and acquire the data of the corresponding resource from the target data storage units for processing.

In an example, the first node 101 is specifically configured to perform hash conversion on information of a resource and then perform modulo conversion to obtain a data storage unit corresponding to the resource. Wherein the modulus is the number of data storage units. The second node 102 is configured to obtain respective corresponding target data storage units by evenly distributing all data storage units to the second nodes participating in processing the data of the resource. Specifically, the second node 102 is configured to group all the data storage units according to the number of the second nodes 102 participating in processing the data of the resource, and select a corresponding data storage unit from the data storage units in each group according to a preset sequence of the second nodes 102 participating in processing the data of the resource as a target data storage unit of each second node 102. The second node 102 is specifically configured to compute any one or a combination of the following: average CPU utilization, average memory utilization, and URL request count. The second node 102 is further configured to delete the processed data after acquiring the data of the corresponding resource from the target data storage unit and processing the data.

It should be understood that this embodiment is a system example corresponding to the second embodiment, and that this embodiment can be implemented in cooperation with the second embodiment. The related technical details mentioned in the second embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the second embodiment.

A fourth embodiment of the invention relates to a non-volatile storage medium for storing a computer-readable program for causing a computer to perform some or all of the above method embodiments.

That is, those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A distributed task processing method is characterized in that the distributed task processing method is applied to a distributed system, the distributed system comprises a first node, a third node and a plurality of second nodes, and the second nodes are respectively in communication connection with the first node and the third node; the method comprises the following steps:

the first node creating a plurality of data storage units; receiving data of resources, obtaining a data storage unit corresponding to the resources according to the information of the resources, and storing the data of the resources in the corresponding data storage unit; wherein the resource and the data storage unit are both multiple;

2. The distributed task processing method according to claim 1, wherein the obtaining of the data storage unit corresponding to the resource according to the information of the resource specifically includes:

performing hash conversion on the information of the resource and then performing modulus extraction to obtain a data storage unit corresponding to the resource; wherein the modulus is the number of the data storage units.

3. The distributed task processing method according to claim 1, wherein the second node determines, according to the number of the second nodes participating in processing the data of the resource and the number of the data storage units in the first node, respective corresponding target data storage units, specifically comprising:

and obtaining the corresponding target data storage units by evenly distributing all the data storage units to the second nodes participating in processing the data of the resources.

4. The distributed task processing method according to claim 3, wherein obtaining respective corresponding target data storage units by evenly allocating all the data storage units to the second nodes participating in processing the data of the resource specifically includes:

grouping all the data storage units according to the number of the second nodes participating in processing the data of the resources;

and selecting a corresponding data storage unit from the data storage units in each group according to the preset sequence of the second nodes participating in processing the data of the resources as a target data storage unit of each second node.

5. The distributed task processing method according to claim 1, wherein acquiring data of a corresponding resource from the target data storage unit for processing specifically includes:

calculating any one or combination of the following: average CPU utilization, average memory utilization, and URL request count.

6. The distributed task processing method according to claim 1, further comprising: and the second node deletes the processed data after acquiring the data of the corresponding resource from the target data storage unit and processing the data.

7. The distributed task processing method according to claim 1, wherein the first node and the third node are the same server.

8. A distributed task processing system, comprising: the system comprises a first node, a third node and a plurality of second nodes, wherein the second nodes are respectively in communication connection with the first node and the third node;

the first node is used for creating a plurality of data storage units; receiving data of resources, obtaining a data storage unit corresponding to the resources according to the information of the resources, and storing the data of the resources in the corresponding data storage unit; wherein the resource and the data storage unit are both multiple;

9. The distributed task processing system according to claim 8, wherein the first node is specifically configured to perform hash conversion on information of the resource and then perform modulo conversion on the information to obtain a data storage unit corresponding to the resource; wherein the modulus is the number of the data storage units.

10. A storage medium characterized by storing a computer-readable program for causing a computer to execute the distributed task processing method according to any one of claims 1 to 6.