CN109309726A - Document generating method and system based on mass data - Google Patents
Document generating method and system based on mass data Download PDFInfo
- Publication number
- CN109309726A CN109309726A CN201811250926.0A CN201811250926A CN109309726A CN 109309726 A CN109309726 A CN 109309726A CN 201811250926 A CN201811250926 A CN 201811250926A CN 109309726 A CN109309726 A CN 109309726A
- Authority
- CN
- China
- Prior art keywords
- node
- task
- data
- calculate node
- calculate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000004044 response Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012163 sequencing technique Methods 0.000 claims description 4
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 claims description 4
- 238000007726 management method Methods 0.000 description 96
- 238000010586 diagram Methods 0.000 description 6
- 101150039208 KCNK3 gene Proteins 0.000 description 4
- 238000000151 deposition Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Abstract
The present invention provides a kind of document generating method and system based on mass data, including: client sends the first request message to the first management node, carry N number of data block store path and each data block corresponding to task type, including CPU intensive type task and I/O intensive task;The processing capacity that first management node successively obtains two generic tasks of each calculate node processing distributes a subtask to N number of calculate node, calculate node reads the data in the data block and handles data according to the task type of N number of data block respectively;Client generates file corresponding to data according to the data processed result of N number of calculate node.File is generated to mass data parallel processing by multiple calculate nodes in spark cluster, and the data block is distributed to the strong calculate node of processing the type task ability by the task type according to corresponding to database of the management node in spark cluster, and the speed of data processing is improved on the basis of reaching load balancing.
Description
Technical field
The invention belongs to field of computer technology more particularly to a kind of document generating method based on mass data and it is
System.
Background technique
With the fast development of computer technology and Internet technology, the scale of network popularity rate and Internet user also exist
It rises year by year, the constantly soaring double stimuli increased rapidly with data processing amount of userbase brings new for Internet application
Challenge.
For example, the data interaction interaction all in the form of text file substantially between fund system, with the increasing of number of users
Long, the data file for needing to generate on the day before mesh is up to more than 30 G, and commonsense method generates file and needs a few hours, seriously affects industry
The efficiency of business.Also, as data volume is increasing, to the also higher and higher of system properties.Therefore, magnanimity number is faced
According to how to improve the formation speed of file is present institute's facing challenges.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of document generating method and system based on mass data, with solution
The slow-footed problem of file generated certainly in the prior art based on mass data.
The first aspect of the embodiment of the present invention provides a kind of document generating method based on mass data, this method application
It include the first management node and multiple calculate nodes in computing engines spark cluster, spark cluster, comprising:
Client sends the first request message to first management node, and first request message is for request will be to
The data of processing carry out processing and generate file, and the data are by N number of data chunk in first request message described in carrying
Task type corresponding to the store path information and each data block of each data block, the task type in N number of data block
Including central processor CPU intensive task and input and output I/O intensive task, N is the positive integer more than or equal to 2;
First management node successively obtains the processing capacity and processing of each calculate node processing CPU intensive type task
The processing capacity of I/O intensive task;
First management node handles the processing capacity and processing I/O of CPU intensive type task according to each calculate node
Task type corresponding to the processing capacity of intensive task and N number of data block, is distributed respectively to N number of calculate node
One subtask, for handling a data block, each subtask carries the road of a data block for each subtask
Diameter information, so that the calculate node is read in the data block according to the routing information of the data block in the subtask received
Data and data are handled;
The client generates file corresponding to the data according to the data processed result of N number of calculate node.
The second aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer-readable instruction, and the computer-readable instruction realizes following steps when being executed by processor:
Client sends the first request message to first management node, and first request message is for request will be to
The data of processing carry out processing and generate file, and the data are by N number of data chunk in first request message described in carrying
Task type corresponding to the store path information and each data block of each data block, the task type in N number of data block
Including central processor CPU intensive task and input and output I/O intensive task, N is the positive integer more than or equal to 2;
First management node successively obtains the processing capacity and processing of each calculate node processing CPU intensive type task
The processing capacity of I/O intensive task;
First management node handles the processing capacity and processing I/O of CPU intensive type task according to each calculate node
Task type corresponding to the processing capacity of intensive task and N number of data block, is distributed respectively to N number of calculate node
One subtask, for handling a data block, each subtask carries the road of a data block for each subtask
Diameter information, so that the calculate node is read in the data block according to the routing information of the data block in the subtask received
Data and data are handled;
The client generates file corresponding to the data according to the data processed result of N number of calculate node.
The third aspect of the embodiment of the present invention provides a kind of filing system based on mass data, and feature exists
In it includes the first management section in spark cluster that the filing system, which includes client and computer engine spark cluster,
Point and multiple calculate nodes, the system are used for:
Client sends the first request message to first management node, and first request message is for request will be to
The data of processing carry out processing and generate file, and the data are by N number of data chunk in first request message described in carrying
Task type corresponding to the store path information and each data block of each data block, the task type in N number of data block
Including central processor CPU intensive task and input and output I/O intensive task, N is the positive integer more than or equal to 2;
First management node successively obtains the processing capacity and processing of each calculate node processing CPU intensive type task
The processing capacity of I/O intensive task;
First management node handles the processing capacity and processing I/O of CPU intensive type task according to each calculate node
Task type corresponding to the processing capacity of intensive task and N number of data block, is distributed respectively to N number of calculate node
One subtask, for handling a data block, each subtask carries the road of a data block for each subtask
Diameter information, so that the calculate node is read in the data block according to the routing information of the data block in the subtask received
Data and data are handled;
The client generates file corresponding to the data according to the data processed result of N number of calculate node.
The present invention provides a kind of document generating method and system based on mass data, by more in spark cluster
A calculate node generates file to mass data parallel processing, and the management node in spark cluster is according to corresponding to database
Task type the data block is distributed into the strong calculate node of processing the type task ability, on the basis for reaching load balancing
On improve the speed of data processing.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some
Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is a kind of flow diagram of the document generating method based on mass data provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another document generating method based on mass data provided in an embodiment of the present invention;
Fig. 3 is the flow diagram of another document generating method based on mass data provided in an embodiment of the present invention;
Fig. 4 is the flow diagram of another document generating method based on mass data provided in an embodiment of the present invention;
Fig. 5 is a kind of architecture diagram of the filing system based on mass data provided in an embodiment of the present invention;
Fig. 6 is any terminal equipment in a kind of filing system based on mass data provided in an embodiment of the present invention
Schematic diagram.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed
Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific
The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity
The detailed description of road and method, in case unnecessary details interferes description of the invention.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
The embodiment of the invention provides a kind of document generating methods based on mass data.In conjunction with Fig. 1, this method comprises:
S101, client send the first request message to first management node, and first request message is for asking
It asks and data to be processed are subjected to processing generate file, the data are by N number of data chunk at taking in first request message
Task type corresponding to store path information and each data block with each data block in N number of data block, described
Service type includes central processor CPU intensive task and input and output I/O intensive task.
Wherein, N is the positive integer more than or equal to 2.
In embodiments of the present invention, CPU (Central Processing Unit, central processing unit) intensive task is
Refer in task implementation procedure, needs to be the task of a large amount of calculating and logic judgment, I/O (input/output, input/output)
Intensive task refers to the completing to calculate the time that required time depends on I/O operation of the task, to the more demanding of equipment I/O.
Before this step, this method further includes the mass data write-in point that client will carry out processing generation file
In cloth storage system HDFS, this method comprises:
S1011, client send the second request message to the second management node, and data to be processed are written for requesting,
Second request message carries the size information of each data block in N number of data block.
HDFS distributed management system obeys master slave mode, including a management node and multiple memory nodes, management section
Point is known as namenode (namenode), and memory node is known as back end (DataNode) in embodiments of the present invention, will
The management node of HDFS is known as the second management node.
For example, size of data to be processed is 30G, the second request message that client is sent to the second management node
In the size information comprising each data block in N number of data block, 30G data to be processed are made of this N number of data block.
S1012, second management node distribute a storage section according to the size of any data block for the data block
Point, and response message is sent to the client, memory node corresponding to each data block is carried in the response message
Routing information.
Second management node distributes one according to the data storage condition of all memory nodes managed for each data block
A memory node.A response message is sent to client after the distribution of management node completion memory node, is taken in response message
Routing information with memory node corresponding to each data block.For example, data block includes data block 1 to data block 3, and
Data block 1 is distributed to memory node 1 and stored by two management nodes, and data block 2 is distributed to memory node 2 and is stored, by data block 3
Distribute to the storage of memory node 3.At this point, the corresponding relationship in the path of data block 1 and memory node 1 is carried in response message, number
According to the corresponding relationship in the path of block 2 and memory node 2, the corresponding relationship in the path of data block 3 and memory node 3.
S1013, the management node is according to routing information entrained in the response message, by N number of data block
It stores into the memory node of HDFS.
After the storage for completing pending data, client sends the first request to the first management node of spark cluster
Message, request carry out processing to data by spark cluster and generate file.
In embodiments of the present invention, client is according to operation flow, and data to be processed are divided into N number of data block, and by N
A data block is stored respectively in N number of memory node into HDFS system.Client singly judges task corresponding to each data block
Type, including CPU intensive type task and I/O intensive task, client are sending the to the first management node of spark cluster
When one request message, the task type of each data block is carried in the first request message.
S102, the first management node successively obtain processing capacity and the place of each calculate node processing CPU intensive type task
Manage the processing capacity of I/O intensive task.
In embodiments of the present invention, three kinds of processing capacities for obtaining each calculate node processing CPU intensive type task are provided
With the method for the processing capacity of processing I/O intensive task.
In conjunction with Fig. 2, the first management node successively obtain each calculate node processing CPU intensive type task processing capacity and
Processing I/O intensive task processing capacity first method include:
S1021, first management node are successively read the running log of each calculate node.
The historic task processing information preservation of each calculate node is in the running log of calculate node.
S1022, for any calculate node, first management node is obtained according to the running log of the calculate node
Take the average time T1 of the CPU intensive type task of the calculate node processing unit data quantity and the I/O of processing unit data quantity
The average time T2 of intensive task.
Optionally, the unit data quantity is 1G, such as passes through the running log of a calculate node, judges the calculate node
The average time T1 of the CPU intensive type task of history executable unit data volume, and the I/O intensive task of processing unit data quantity
Average time.The average time of processing is shorter, illustrates that processing capacity is stronger.
S1023, institute in first management node T1 value according to corresponding to the calculate node and the spark cluster
There is the average time T1 of the CPU intensive type task of calculate node processing unit data quantity, obtain the calculate node processing CPU
The processing capacity of intensive task.
First management node obtains the mean time of each calculate node processing unit data quantity CPU intensive type task in cluster
Between after, obtain the average time T1 of the CPU intensive type task of all calculate nodes processing unit data quantities, if calculatings saves
Point history processing unit data quantity CPU intensive type task average time be T1, then in embodiments of the present invention, with T1 with
The ratio of T1 ' come represent the calculate node processing CPU intensive type task ability.
S1024, institute in first management node T2 value according to corresponding to the calculate node and the spark cluster
There is the average time T2 of the I/O intensive task of calculate node processing unit data quantity, obtain the calculate node processing I/O
The processing capacity of intensive task.
In embodiments of the present invention, if the I/O intensive task of a calculate node history processing unit data quantity is put down
The equal time is T2, and the average time of the I/O intensive task of all calculate node processing unit data quantities is T2 ' in cluster, then
In embodiments of the present invention, with T2 and T2 ' ratio come represent the calculate node processing I/O intensive task ability.
In conjunction with Fig. 3, the first management node successively obtain each calculate node processing CPU intensive type task processing capacity and
Processing I/O intensive task processing capacity second method include:
S1025, in spark cluster starting, for any calculate node, described in the first management node instruction
Calculate node handles the CPU intensive type task of preset data amount and the I/O intensive task of preset data amount respectively.
S1026, first management node obtain the CPU intensive type times that the calculate node handles the preset data amount
The time T4 of the I/O intensive task of the time T3 and processing preset data amount of business.
S1027, institute in first management node T3 value according to corresponding to the calculate node and the spark cluster
There is calculate node to handle the average time T3 ' of the CPU intensive type task of the preset data amount, obtains the calculate node processing
The processing capacity of CPU intensive type task.
S1028 is stated in the first management node T4 value according to corresponding to the calculate node and the spark cluster and is owned
Calculate node handles the average time T4 ' of the I/O intensive task of the preset quantity, obtains the calculate node processing I/O
The processing capacity of intensive task.
The implementation of the embodiment of the present invention is similar with the method for embodiment corresponding to Fig. 2, unlike, the first management
Node is not to obtain the ability that each calculate node history calculates CPU intensive type task by inquiring the running log of calculate node
Or the ability of I/O intensive task, but by when cluster starts, indicating that each calculate node runs identical CPU intensive
Type task and I/O intensive task, to obtain time and the I/O intensity that each calculate node handles the CPU intensive type task
The time of task.
In embodiments of the present invention, if a calculate node handles the CPU of the specified preset data amount of the first management node
The time of intensive task is T3, and all calculate nodes handle being averaged for the CPU intensive type task of the preset data amount in cluster
Time is T3 ', then in embodiments of the present invention with T3 and T3 ' ratio come represent the calculate node processing CPU intensive type task
Ability, if the time that calculate node handles the I/O intensive task of the specified preset data amount of the first management node be
T4, it is T4 that all calculate nodes, which handle the average time of the I/O intensive task of the preset data amount, in cluster, then in this hair
With T4 and T4 in bright embodiment ' ratio represent the calculate node processing I/O intensive task ability.
In conjunction with Fig. 4, the first management node successively obtain each calculate node processing CPU intensive type task processing capacity and
Processing capacity the third method for handling I/O intensive task includes:
S1029, first management node successively obtain CPU frequency, memory size, the network bandwidth of each calculate node
With maximum disk read or write speed.
S10210, first management node calculate the CPU frequency average value of all calculate nodes in spark cluster, interior
Deposit capacity average value, network bandwidth average value and maximum disk read or write speed average value.
S10211, for any calculate node, first management node is according to the CPU frequency of the calculate node and interior
The CPU frequency average value and memory size average value for depositing all calculate nodes in capacity and park cluster, obtain the calculating
The processing capacity of node processing CPU intensive type task.
S10212, network bandwidth and maximum disk read or write speed of first management node according to the calculate node,
And the network bandwidth average value of all calculate nodes and maximum disk read or write speed average value in spark cluster, described in acquisition
The processing capacity of calculate node processing I/O intensive task.
In embodiments of the present invention, for any node, with all calculate nodes in the CPU frequency and cluster of the node
The ratio of the average value of CPU frequency, in addition the average value of the memory size of the memory size of the node and all calculate nodes
Ratio, the processing capacity as calculate node processing CPU intensive type task;In the value and cluster of the network bandwidth of the node
The ratio of the average value of all calculate node network bandwidths, in addition institute in the value of the maximum disk read or write speed of the node and cluster
There are the ratio of the average value of the maximum disk read or write speed of calculate node, the ability as node processing I/O intensive task.
Further, in Fig. 2 into embodiment shown in Fig. 4, the first management node also needs to consider each calculate node
Operation stability, specifically, first management node is successively read the running log of each calculate node, for any meter
Operator node, first management node obtain the calculate node operation CPU intensive according to the running log of the calculate node
The number of type mission failure, and the number of operation I/O intensive task failure, if the calculate node runs CPU intensive type
The number of mission failure is higher than preset times, then first management node forbids the calculate node operation CPU intensive type to appoint
Business, if the number of calculate node operation I/O intensive task failure is higher than preset times, first management node is prohibited
Only the calculate node runs I/O intensive task.
For any calculate node, the frequency of failure of the calculate node history run CPU intensive type task represents the meter
Operator node runs the stability of CPU intensive type task, which runs the frequency of failure of I/O intensive task, represent
The stability of calculate node operation I/O intensive task.
If the frequency of failure that the calculate node runs CPU intensive type task is higher than preset times, illustrate that the calculate node is transported
The stability of row CPU intensive type task is poor, even if calculate node fortune is calculated by the method as described in Fig. 2 to Fig. 4
The ability of row CPU intensive type task is stronger, and the first management node can also forbid this during distributing CPU intensive type task
Calculate node runs CPU intensive type task, if the calculate node.
If the frequency of failure that calculate node runs I/O intensive task is higher than preset times, illustrate that the calculate node is run
The stability of I/O intensive task is poor, even if calculate node operation is calculated by the method as described in Fig. 2 to Fig. 4
The ability of I/O intensive task is stronger, and the first management node can also forbid the meter during distributing I/O intensive task
Operator node runs I/O intensive task, if the calculate node.
S103, first management node handle processing capacity and the place of CPU intensive type task according to each calculate node
Task type corresponding to the processing capacity and N number of data block of I/O intensive task is managed, is distinguished to N number of calculate node
A subtask is distributed, for handling a data block, each subtask carries a data block for each subtask
Routing information.
Specifically, client is sent in the first request message of the first management node, N number of data block is also carried
Encoded information, the encoded information of N number of data block is for indicating that N number of data block is successive during file generated
Sequentially.First management node generates N number of subtask according to first request message, and according to institute in the first request message
The encoded information for stating N number of data block is ranked up N number of subtask.The first management node real-time reception is had time
The heartbeat message that not busy calculate node is sent.First management node is according to the ranking results of N number of subtask, successively by institute
It states N number of subtask and distributes to N number of calculate node in described the available free calculate node, wherein any subtask is directed to, right
When the subtask is allocated, according to the task type of the subtask, which is distributed in current idle node
To the strongest calculate node of task type processing capacity.
Massive information to be processed is such as divided into 48 data blocks according to operation flow, according to for generating the suitable of file
Sequence, the number of data block are respectively data block 1 to data block 48, and the first management node is according to the number of each data block, by this
File generated task corresponding to first request message is divided into 48 subtasks, and each subtask is used for the number to a data block
According to being handled, then this 48 subtasks are discharged into task queue by the first management node, the row of subtask corresponding to data block 1
In the queue first, be successively subtask corresponding to subtask ... data block 48 corresponding to data block 2 later.
In embodiments of the present invention, for purposes of illustration only, the subtask for the data for being used to handle data block 1 is known as task1,
The subtask for being used to handle the data of data block 2 is known as task2 ... and will be used to handle the subtask of the data of data block 48
Referred to as task48.
In spark cluster, when a calculate node free time, a heartbeat message can be sent to the first management node,
First management node judges the calculate node free time according to the heartbeat message that the calculate node is sent.
Task1 to task48 in queue is successively allocated by the first management node, and the first management node is distributed first
Task1 obtains task type corresponding to processing task1 according to the task type of task1 in current all idle nodes
Task1 is distributed to the calculate node, later according to same rule, successively by the strongest calculate node of the processing capacity of task
Task2 to task48 is allocated.
S104, the calculate node are read in the data block according to the routing information of the data block in the subtask received
Data and data are handled.
It optionally, is the EMS memory occupation amount for reducing calculate node, for a data block, the calculate node is according to data block
The data block is divided into n batch and handled by the sequencing of middle data, and n is the positive integer more than or equal to 2.The meter
Operator node is every to have handled a batch, generates a subfile, and the filename of the subfile includes the number of the data block
The batch number information of information and the batch.The subfile is written to the default storage section in HDFS by the calculate node
Point.
Task1 is such as distributed into calculate node 1, calculate node 1 reads number according to the store path of data block 1 in task1
According to block 1, calculate node 1 is handled the data in data block 1 according to batch, if data block 1 includes 1,000,000 datas, meter
Operator node handles task1 points for 10 batches, and each batch generates a subfile, and the name of the subfile can be
Task1-1, task1-2 are to task1-10.
S105, the client generate corresponding to the data according to the data processed result of N number of calculate node
File.
After all child nodes complete the processing of its corresponding subtask, memory node is according to the files of all subfiles
Name carries out file mergences to all subfiles in sequence, generates file corresponding to the data, and by the file download
To local storage space.
The present invention provides a kind of document generating methods based on mass data, pass through multiple calculating in spark cluster
Node generates file, and the task according to corresponding to database of the management node in spark cluster to mass data parallel processing
The data block is distributed to the strong calculate node of processing the type task ability by type, is improved on the basis of reaching load balancing
The speed of data processing.
In conjunction with Fig. 5, the embodiment of the invention also provides a kind of filing system based on mass data, which includes
Client 51 and computer engine spark cluster, include the first management node 52 and multiple calculate nodes 53 in spark cluster,
The system is used for:
Client 51 sends the first request message to first management node 52, and first request message is for requesting
Data to be processed are subjected to processing and generate file, the data are by N number of data chunk at carrying in first request message
Task type corresponding to the store path information and each data block of each data block, the task in N number of data block
Type includes central processor CPU intensive task and input and output I/O intensive task, and N is the positive integer more than or equal to 2;
First management node 52 successively obtain each calculate node 53 handle CPU intensive type task processing capacity and
Handle the processing capacity of I/O intensive task;
First management node 52 handles the processing capacity and processing of CPU intensive type task according to each calculate node 53
Task type corresponding to the processing capacity of I/O intensive task and N number of data block is distinguished to N number of calculate node 53
A subtask is distributed, for handling a data block, each subtask carries a data block for each subtask
Routing information so that the calculate node 53 reads the number according to the routing information of the data block in the subtask received
According to the data in block and data are handled;
The client 51 generates corresponding to the data according to the data processed result of N number of calculate node 53
File.
Further, which further includes distributed file system HDFS, comprising the second management node 54 and more in HDFS
A memory node 55, before the client 51 sends the first request message to first management node 52, this method is also
Include:
The client 51 sends the second request message to the second management node 54, and number to be processed is written for requesting
According to second request message carries the size information of each data block in N number of data block;
Second management node distributes a memory node 55 according to the size of any data block for the data block,
And response message is sent to the client 51, memory node 55 corresponding to each data block is carried in the response message
Routing information;
The client 51 according to routing information entrained in the response message, by N number of data block store to
In the memory node 55 of HDFS.
Further, first management node 52 successively obtains each calculate node 53 and handles CPU intensive type task
Processing capacity and the processing capacity for handling I/O intensive task include:
First management node 52 is successively read the running log of each calculate node 53;
For any calculate node 53, first management node 52 is obtained according to the running log of the calculate node 53
The calculate node 53 is taken to handle the average time T1 of the CPU intensive type task of unit data quantity and the I/ of processing unit data quantity
The average time T2 of O intensive task;
Own in first management node 52 T1 value according to corresponding to the calculate node 53 and the spark cluster
Calculate node 53 handles the average time T1 ' of the CPU intensive type task of unit data quantity, obtains the calculate node 53 and handles
The processing capacity of CPU intensive type task;
Own in first management node 52 T2 value according to corresponding to the calculate node 53 and the spark cluster
Calculate node 53 handles the average time T2 of the I/O intensive task of unit data quantity, it obtains the calculate node 53 and handles I/
The processing capacity of O intensive task.
Further, first management node 52 successively obtains each calculate node 53 and handles CPU intensive type task
Processing capacity and the processing capacity for handling I/O intensive task include:
In spark cluster starting, for any calculate node 53, first management node 52 indicates the meter
Operator node 53 handles the CPU intensive type task of preset data amount and the I/O intensive task of preset data amount respectively;
First management node 52 obtains the CPU intensive type task that the calculate node 53 handles the preset data amount
Time T3 and handle the preset data amount I/O intensive task time T4;
Own in first management node 52 T3 value according to corresponding to the calculate node 53 and the spark cluster
Calculate node 53 handles the average time T3 of the CPU intensive type task of the preset data amount, it obtains at the calculate node 53
Manage the processing capacity of CPU intensive type task;
Own in first management node 52 T4 value according to corresponding to the calculate node 53 and the spark cluster
Calculate node 53 handles the average time T4 ' of the I/O intensive task of the preset quantity, obtains the calculate node 53 and handles
The processing capacity of I/O intensive task.
Further, first management node 52 successively obtains each calculate node 53 and handles CPU intensive type task
Processing capacity and the processing capacity for handling I/O intensive task include:
First management node 52 successively obtain the CPU frequency of each calculate node 53, memory size, network bandwidth and
Maximum disk read or write speed;
The CPU frequency average value, memory that first management node 52 calculates all calculate nodes 53 in spark cluster hold
Measure average value, network bandwidth average value and maximum disk read or write speed average value;
For any calculate node 53, first management node 52 is according to the CPU frequency of the calculate node 53 and interior
The CPU frequency average value and memory size average value for depositing all calculate nodes 53 in capacity and park cluster, obtain the meter
The processing capacity of the processing CPU intensive type task of operator node 53;
Network bandwidth and maximum disk read or write speed of first management node 52 according to the calculate node 53, and
The network bandwidth average value of all calculate nodes 53 and maximum disk read or write speed average value, obtain the meter in spark cluster
The processing capacity of the processing I/O intensive task of operator node 53.
Further, which is also used to:
First management node 52 is successively read the running log of each calculate node 53;
For any calculate node 53, first management node 52 is obtained according to the running log of the calculate node 53
The calculate node 53 is taken to run the number of CPU intensive type mission failure, and the number of operation I/O intensive task failure;
If the number that the calculate node 53 runs CPU intensive type mission failure is higher than preset times, first pipe
Reason node 52 forbids the calculate node 53 to run CPU intensive type task;
If the number that the calculate node 53 runs the failure of I/O intensive task is higher than preset times, first pipe
Reason node 52 forbids the calculate node 53 to run I/O intensive task.
It further, further include the encoded information of N number of data block, N number of data in first request message
The encoded information of block is for indicating sequencing of N number of data block during file generated, first management node
52 handle the processing capacity of the processing capacity of CPU intensive type task and processing I/O intensive task according to each calculate node 53,
And task type corresponding to N number of data block, distributing a subtask respectively to N number of calculate node 53 includes:
First management node 52 generates N number of subtask according to first request message, and is disappeared according to the first request
The encoded information of N number of data block described in breath is ranked up N number of subtask;
First management node, 52 real-time reception the heartbeat message that sends of available free calculate node 53;
First management node 52 successively distributes N number of subtask according to the ranking results of N number of subtask
To N number of calculate node 53 in described the available free calculate node 53, wherein any subtask is directed to, to the subtask
When being allocated, according to the task type of the subtask, which is distributed in current idle node to the task class
The strongest calculate node 53 of type processing capacity.
Further, described so that the calculate node 53 according to the path of the data block in the subtask received believe
Breath reads the data in the data block and carries out processing to data
The data block is divided into n batch and carried out by the calculate node 53 according to the sequencing of data in data block
Processing, n are the positive integer more than or equal to 2;
The calculate node 53 is every to have handled a batch, generates a subfile, the filename of the subfile includes
The batch number information of the number information of the data block and the batch;
The subfile is written to the default memory node in HDFS by the calculate node 53;
The client 51 generates corresponding to the data according to the data processed result of N number of calculate node 53
File includes:
The memory node carries out file mergences to all subfiles in sequence according to the filename of all subfiles,
Generate file corresponding to the data, and by the file download to local storage space.
The present invention provides a kind of filing systems based on mass data, pass through multiple calculating in spark cluster
Node generates file, and the task according to corresponding to database of the management node in spark cluster to mass data parallel processing
The data block is distributed to the strong calculate node of processing the type task ability by type, is improved on the basis of reaching load balancing
The speed of data processing.
Fig. 6 is the signal of any terminal equipment in the filing system provided in an embodiment of the present invention based on mass data
Figure.As shown in fig. 6, the terminal device 6 of the embodiment includes: processor 60, memory 61 and is stored in the memory 61
In and the computer program 62 that can be run on the processor 60, such as the document generator based on mass data.It is described
Processor 60 is realized when executing the computer program 62 in above-mentioned each document generating method embodiment based on mass data
The step of, such as step 101 shown in FIG. 1 is to 105.
Illustratively, the computer program 62 can be divided into one or more module/units, it is one or
Multiple module/units are stored in the memory 61, and are executed by the processor 60, to complete the present invention.Described one
A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for
Implementation procedure of the computer program 62 in the terminal device 6 is described.
The terminal device 6 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set
It is standby.The terminal device may include, but be not limited only to, processor 60, memory 61.It will be understood by those skilled in the art that Fig. 6
The only example of terminal device 6 does not constitute the restriction to terminal device 6, may include than illustrating more or fewer portions
Part perhaps combines certain components or different components, such as the terminal device can also include input-output equipment, net
Network access device, bus etc..
The processor 60 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
The memory 61 can be the internal storage unit of the terminal device 6, such as the hard disk or interior of terminal device 6
It deposits.The memory 61 is also possible to the External memory equipment of the terminal device 6, such as be equipped on the terminal device 6
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..Further, the memory 61 can also both include the storage inside list of the terminal device 6
Member also includes External memory equipment.The memory 61 is for storing needed for the computer program and the terminal device
Other programs and data.The memory 61 can be also used for temporarily storing the data that has exported or will export.
The embodiment of the present invention also provides a kind of computer readable storage medium, and the computer-readable recording medium storage has
Computer program, the computer program realize the text described in any of the above-described embodiment based on mass data when being executed by processor
The step of part generation method.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention
Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey
The medium of sequence code.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the essence of corresponding technical solution is departed from the spirit and scope of the technical scheme of various embodiments of the present invention, it should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of document generating method based on mass data, which is characterized in that this method is applied to computing engines spark collection
Group, including the first management node and multiple calculate nodes in spark cluster, this method comprises:
Client sends the first request message to first management node, and first request message will be to be processed for request
Data carry out processing and generate file, the data are by N number of data chunk at carrying in first request message described N number of
Task type corresponding to the store path information and each data block of each data block, the task type include in data block
Central processor CPU intensive task and input and output I/O intensive task, N are the positive integer more than or equal to 2;
First management node successively obtains the processing capacity and processing I/O of each calculate node processing CPU intensive type task
The processing capacity of intensive task;
First management node is intensive according to the processing capacity and processing I/O of each calculate node processing CPU intensive type task
Task type corresponding to the processing capacity of type task and N number of data block, distributes one to N number of calculate node respectively
Subtask, for handling a data block, each subtask carries the path letter an of data block for each subtask
Breath, so that the calculate node reads the number in the data block according to the routing information of the data block in the subtask received
It is handled according to and to data;
The client generates file corresponding to the data according to the data processed result of N number of calculate node.
2. document generating method according to claim 1, which is characterized in that this method is also applied to distributed file system
Include the second management node and multiple memory nodes in HDFS, HDFS, is sent in the client to first management node
Before first request message, this method further include:
The client sends the second request message to the second management node, for request write-in data to be processed, described the
Two request messages carry the size information of each data block in N number of data block;
Second management node distributes a memory node according to the size of any data block, for the data block, and to institute
It states client and sends response message, the routing information of memory node corresponding to each data block is carried in the response message;
The client stores N number of data block to HDFS's according to routing information entrained in the response message
In memory node.
3. document generating method according to claim 1, which is characterized in that first management node successively obtains each
The processing capacity of calculate node processing CPU intensive type task and the processing capacity of processing I/O intensive task include:
First management node is successively read the running log of each calculate node;
For any calculate node, first management node obtains the calculating according to the running log of the calculate node
The average time T1 of the CPU intensive type task of node processing unit data quantity and the I/O intensive task of processing unit data quantity
Average time T2;
All calculate nodes in first management node T1 value according to corresponding to the calculate node and the spark cluster
The average time T1 ' for handling the CPU intensive type task of unit data quantity obtains the calculate node processing CPU intensive type task
Processing capacity;
All calculate nodes in first management node T2 value according to corresponding to the calculate node and the spark cluster
The average time T2 ' for handling the I/O intensive task of unit data quantity obtains the calculate node processing I/O intensive task
Processing capacity.
4. document generating method according to claim 1, which is characterized in that first management node successively obtains each
The processing capacity of calculate node processing CPU intensive type task and the processing capacity of processing I/O intensive task include:
In spark cluster starting, for any calculate node, first management node indicates the calculate node point
Manage the CPU intensive type task of preset data amount and the I/O intensive task of preset data amount in other places;
First management node obtains the time T3 that the calculate node handles the CPU intensive type task of the preset data amount
With the time T4 for the I/O intensive task for handling the preset data amount;
All calculate nodes in first management node T3 value according to corresponding to the calculate node and the spark cluster
The average time T3 ' for handling the CPU intensive type task of the preset data amount obtains the calculate node processing CPU intensive type
The processing capacity of task;
All calculate nodes in first management node T4 value according to corresponding to the calculate node and the spark cluster
The average time T4 ' for handling the I/O intensive task of the preset quantity obtains the calculate node processing I/O intensity and appoints
The processing capacity of business.
5. document generating method according to claim 1, which is characterized in that first management node successively obtains each
The processing capacity of calculate node processing CPU intensive type task and the processing capacity of processing I/O intensive task include:
First management node successively obtains CPU frequency, memory size, network bandwidth and the maximum disk of each calculate node
Read or write speed;
It is average that first management node calculates the CPU frequency average value of all calculate nodes, memory size in spark cluster
Value, network bandwidth average value and maximum disk read or write speed average value;
For any calculate node, first management node according to the CPU frequency and memory size of the calculate node, and
The CPU frequency average value and memory size average value of all calculate nodes in park cluster obtain the calculate node processing CPU
The processing capacity of intensive task;
First management node is according to the network bandwidth of the calculate node and maximum disk read or write speed and spark collection
The network bandwidth average value of all calculate nodes and maximum disk read or write speed average value in group, obtain the calculate node processing
The processing capacity of I/O intensive task.
6. according to the described in any item document generating methods of claim 3-5, which is characterized in that this method further include:
First management node is successively read the running log of each calculate node;
For any calculate node, first management node obtains the calculating according to the running log of the calculate node
Node runs the number of CPU intensive type mission failure, and the number of operation I/O intensive task failure;
If the number of the calculate node operation CPU intensive type mission failure is higher than preset times, first management node
Forbid the calculate node operation CPU intensive type task;
If the number of the calculate node operation I/O intensive task failure is higher than preset times, first management node
Forbid the calculate node operation I/O intensive task.
It further include N number of data block in first request message 7. document generating method according to claim 6
Encoded information, the encoded information of N number of data block is for indicating that N number of data block is successive suitable during file generated
Sequence, first management node handle the processing capacity and processing I/O intensity of CPU intensive type task according to each calculate node
Task type corresponding to the processing capacity of task and N number of data block distributes a son to N number of calculate node respectively
Task includes:
First management node generates N number of subtask according to first request message, and according to institute in the first request message
The encoded information for stating N number of data block is ranked up N number of subtask;
The first management node real-time reception the heartbeat message that sends of available free calculate node;
First management node is successively distributed to N number of subtask described according to the ranking results of N number of subtask
N number of calculate node in available free calculate node, wherein be directed to any subtask, when being allocated to the subtask,
According to the task type of the subtask, which is distributed in current idle node to the task type processing capacity most
Strong calculate node.
8. document generating method according to claim 7, which is characterized in that described so that the calculate node is according to connecing
The routing information of the data block in subtask received reads the data in the data block and carries out processing to data
The data block is divided into n batch and handled by the calculate node according to the sequencing of data in data block, n
For the positive integer more than or equal to 2;
The calculate node is every to have handled a batch, generates a subfile, and the filename of the subfile includes the number
According to the number information of block and the batch number information of the batch;
The subfile is written to the default memory node in HDFS by the calculate node;
According to the data processed result of N number of calculate node, generate file corresponding to the data includes: the client
The memory node carries out file mergences to all subfiles in sequence according to the filename of all subfiles, generates
File corresponding to the data, and by the file download to local storage space.
9. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In when the computer program is executed by processor the step of any one of such as claim 1 to 8 of realization the method.
10. a kind of filing system based on mass data, which is characterized in that the filing system include client and
Computer engine spark cluster includes the first management node and multiple calculate nodes in spark cluster, which is used for:
Client sends the first request message to first management node, and first request message will be to be processed for request
Data carry out processing and generate file, the data are by N number of data chunk at carrying in first request message described N number of
Task type corresponding to the store path information and each data block of each data block, the task type include in data block
Central processor CPU intensive task and input and output I/O intensive task, N are the positive integer more than or equal to 2;
First management node successively obtains the processing capacity and processing I/O of each calculate node processing CPU intensive type task
The processing capacity of intensive task;
First management node is intensive according to the processing capacity and processing I/O of each calculate node processing CPU intensive type task
Task type corresponding to the processing capacity of type task and N number of data block, distributes one to N number of calculate node respectively
Subtask, for handling a data block, each subtask carries the path letter an of data block for each subtask
Breath, so that the calculate node reads the number in the data block according to the routing information of the data block in the subtask received
It is handled according to and to data;
The client generates file corresponding to the data according to the data processed result of N number of calculate node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811250926.0A CN109309726A (en) | 2018-10-25 | 2018-10-25 | Document generating method and system based on mass data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811250926.0A CN109309726A (en) | 2018-10-25 | 2018-10-25 | Document generating method and system based on mass data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109309726A true CN109309726A (en) | 2019-02-05 |
Family
ID=65221965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811250926.0A Pending CN109309726A (en) | 2018-10-25 | 2018-10-25 | Document generating method and system based on mass data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109309726A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110187971A (en) * | 2019-05-30 | 2019-08-30 | 口碑(上海)信息技术有限公司 | Service request processing method and device |
CN110995725A (en) * | 2019-12-11 | 2020-04-10 | 北京明略软件系统有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN111580979A (en) * | 2020-05-14 | 2020-08-25 | 哈尔滨工业大学(深圳) | Data processing method, device and system based on atmospheric radiation transmission model |
CN112565321A (en) * | 2019-09-26 | 2021-03-26 | 杭州海康威视数字技术股份有限公司 | Data stream pushing method, device and system |
CN112579297A (en) * | 2020-12-25 | 2021-03-30 | 中国农业银行股份有限公司 | Data processing method and device |
CN112579351A (en) * | 2020-11-16 | 2021-03-30 | 麒麟软件有限公司 | Cloud hard disk backup system |
CN112817728A (en) * | 2021-02-20 | 2021-05-18 | 咪咕音乐有限公司 | Task scheduling method, network device and storage medium |
CN112988360A (en) * | 2021-05-10 | 2021-06-18 | 杭州绿城信息技术有限公司 | Task distribution system based on big data analysis |
WO2021218619A1 (en) * | 2020-04-30 | 2021-11-04 | 华为技术有限公司 | Task allocation method and apparatus, and task processing system |
CN113626207A (en) * | 2021-10-12 | 2021-11-09 | 苍穹数码技术股份有限公司 | Map data processing method, device, equipment and storage medium |
CN113709298A (en) * | 2020-05-20 | 2021-11-26 | 华为技术有限公司 | Multi-terminal task allocation method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100125847A1 (en) * | 2008-11-17 | 2010-05-20 | Fujitsu Limited | Job managing device, job managing method and job managing program |
US20120110047A1 (en) * | 2010-11-15 | 2012-05-03 | International Business Machines Corporation | Reducing the Response Time of Flexible Highly Data Parallel Tasks |
CN103500123A (en) * | 2013-10-12 | 2014-01-08 | 浙江大学 | Parallel computation dispatch method in heterogeneous environment |
CN104598298A (en) * | 2015-02-04 | 2015-05-06 | 上海交通大学 | Virtual machine dispatching algorithm based on task load and current work property of virtual machine |
CN104657221A (en) * | 2015-03-12 | 2015-05-27 | 广东石油化工学院 | Multi-queue peak-alternation scheduling model and multi-queue peak-alteration scheduling method based on task classification in cloud computing |
CN107832153A (en) * | 2017-11-14 | 2018-03-23 | 北京科技大学 | A kind of Hadoop cluster resources self-adapting distribution method |
-
2018
- 2018-10-25 CN CN201811250926.0A patent/CN109309726A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100125847A1 (en) * | 2008-11-17 | 2010-05-20 | Fujitsu Limited | Job managing device, job managing method and job managing program |
US20120110047A1 (en) * | 2010-11-15 | 2012-05-03 | International Business Machines Corporation | Reducing the Response Time of Flexible Highly Data Parallel Tasks |
CN103500123A (en) * | 2013-10-12 | 2014-01-08 | 浙江大学 | Parallel computation dispatch method in heterogeneous environment |
CN104598298A (en) * | 2015-02-04 | 2015-05-06 | 上海交通大学 | Virtual machine dispatching algorithm based on task load and current work property of virtual machine |
CN104657221A (en) * | 2015-03-12 | 2015-05-27 | 广东石油化工学院 | Multi-queue peak-alternation scheduling model and multi-queue peak-alteration scheduling method based on task classification in cloud computing |
CN107832153A (en) * | 2017-11-14 | 2018-03-23 | 北京科技大学 | A kind of Hadoop cluster resources self-adapting distribution method |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110187971B (en) * | 2019-05-30 | 2020-08-04 | 口碑(上海)信息技术有限公司 | Service request processing method and device |
CN110187971A (en) * | 2019-05-30 | 2019-08-30 | 口碑(上海)信息技术有限公司 | Service request processing method and device |
CN112565321A (en) * | 2019-09-26 | 2021-03-26 | 杭州海康威视数字技术股份有限公司 | Data stream pushing method, device and system |
CN110995725A (en) * | 2019-12-11 | 2020-04-10 | 北京明略软件系统有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN110995725B (en) * | 2019-12-11 | 2021-12-07 | 北京明略软件系统有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
WO2021218619A1 (en) * | 2020-04-30 | 2021-11-04 | 华为技术有限公司 | Task allocation method and apparatus, and task processing system |
CN111580979A (en) * | 2020-05-14 | 2020-08-25 | 哈尔滨工业大学(深圳) | Data processing method, device and system based on atmospheric radiation transmission model |
CN113709298A (en) * | 2020-05-20 | 2021-11-26 | 华为技术有限公司 | Multi-terminal task allocation method |
CN112579351A (en) * | 2020-11-16 | 2021-03-30 | 麒麟软件有限公司 | Cloud hard disk backup system |
CN112579297A (en) * | 2020-12-25 | 2021-03-30 | 中国农业银行股份有限公司 | Data processing method and device |
CN112817728A (en) * | 2021-02-20 | 2021-05-18 | 咪咕音乐有限公司 | Task scheduling method, network device and storage medium |
CN112988360A (en) * | 2021-05-10 | 2021-06-18 | 杭州绿城信息技术有限公司 | Task distribution system based on big data analysis |
CN113626207A (en) * | 2021-10-12 | 2021-11-09 | 苍穹数码技术股份有限公司 | Map data processing method, device, equipment and storage medium |
CN113626207B (en) * | 2021-10-12 | 2022-03-08 | 苍穹数码技术股份有限公司 | Map data processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109309726A (en) | Document generating method and system based on mass data | |
CN110520853B (en) | Queue management for direct memory access | |
Yan et al. | Blogel: A block-centric framework for distributed computation on real-world graphs | |
Abad et al. | Package-aware scheduling of faas functions | |
CN110262901B (en) | Data processing method and data processing system | |
CN109947565B (en) | Method and apparatus for distributing computing tasks | |
CN108924187B (en) | Task processing method and device based on machine learning and terminal equipment | |
CN109033001A (en) | Method and apparatus for distributing GPU | |
CN111949394A (en) | Method, system and storage medium for sharing computing power resource | |
CN110308984B (en) | Cross-cluster computing system for processing geographically distributed data | |
CN105471985A (en) | Load balance method, cloud platform computing method and cloud platform | |
CN109408229A (en) | A kind of dispatching method and device | |
CN103176849A (en) | Virtual machine clustering deployment method based on resource classification | |
CN113946431B (en) | Resource scheduling method, system, medium and computing device | |
Zegrari et al. | Resource allocation with efficient load balancing in cloud environment | |
CN109614227A (en) | Task resource concocting method, device, electronic equipment and computer-readable medium | |
Wang et al. | Phase-reconfigurable shuffle optimization for Hadoop MapReduce | |
Ke et al. | Aggregation on the fly: Reducing traffic for big data in the cloud | |
US8028291B2 (en) | Method and computer program product for job selection and resource allocation of a massively parallel processor | |
CN114896068A (en) | Resource allocation method, resource allocation device, electronic device, and storage medium | |
Al-kahtani et al. | An efficient distributed algorithm for big data processing | |
CN105426255A (en) | Network I/O (input/output) cost evaluation based ReduceTask data locality scheduling method for Hadoop big data platform | |
US8543722B2 (en) | Message passing with queues and channels | |
Nguyen et al. | Resource management for elastic publish subscribe systems: A performance modeling-based approach | |
CN109544347A (en) | Tail difference method of completing the square, computer readable storage medium and tail difference match system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190205 |