CN100547973C

CN100547973C - A kind of high performance computing system based on peer-to-peer network

Info

Publication number: CN100547973C
Application number: CNB2007100522694A
Authority: CN
Inventors: 金海�; 廖小飞; 罗飞; 章勤; 张�浩
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2007-05-23
Filing date: 2007-05-23
Publication date: 2009-10-07
Anticipated expiration: 2027-05-23
Also published as: CN101072133A

Abstract

A kind of high performance computing system based on peer-to-peer network comprises monitor node, dispatch node, computing node, data server and client computer.The application project description document that the monitor node subscribing client is submitted to, management and the state of monitoring dispatch node and the situation of finishing of each task cluster; Dispatch node is given each affiliated computing node with task assignment, monitors the state of each computing node and the situation of finishing of task, and reports to monitor node; The task that dispatch node was assigned under computing node received and calculates, Report Tasks is finished situation, and carries out exchanges data with data server; The data of data server storage backup application project, the request of data of processing client and computing node; Client computer is submitted the initial application engineering to, the startup operation of management main task, and the final result that is applied.System of the present invention has that versatility is good, cross-platform, programming is convenient, the characteristics of zmodem and favorable expandability, can overcome the shortcoming of original aspiration computer system.

Description

A kind of high performance computing system based on peer-to-peer network

Technical field

The invention belongs to distributed high-performance computing server technology field, in the Internet scope, utilize the peer-to-peer network technology to realize a kind of high performance computing system, be specially a kind of high performance computing system based on peer-to-peer network.

Background technology

Traditional high-performance computer is based on the supercomputer of SMP large-scale computer or group of planes structure; But because structure and maintenance cost are huge, they can not be widely used, and only are confined to large-scale company or scientific research institutions such as government, universities and colleges.Along with the continuous growth of network size, distributed computing technology is applied to high-performance computing sector more and more.Utilize the PC among the Internet,, can make up the high performance computing system higher than traditional supercomputer performance, that price is cheaper by the principle of Distributed Calculation.

Utilize the constructed high-performance calculation platform of PC among the Internet mainly to adopt aspiration computer pattern at present; This system comprises two category nodes: aspiration machine node and server node.Wherein, volunteer the calculating of machine node subtasking, and the interim result who has calculated is sent to server node; Server node provides functions such as task division, task assignment, task gather as a central control station point of system.The computational resource (PC, work station and cluster etc.) that the whole system utilization is connected on the Internet is carried out calculation task.When these resources were in idle condition, they were by background program active Connection Service device node, and download data files and application subprogram are carried out to local.After had calculated the subtask, they returned to server to the calculation task result, and by server these results were further handled; The aspiration machine is many more, and the computing capability of system is strong more.

As typical case's representative of aspiration computer platform, the structure that the open network computing system (BOINC) [GRID ' 04,2004] of Berkeley exploitation adopts is the client-server model.Wherein, server end is made up of dispatch server, data server and participant's web-page interface; Client is then downloaded and a BOINC client software is installed by participating user and constitutes.By Python script and the C++ interface that BOINC provides, can programme to application, form the BOINC engineering.Client is uploaded destination file from the computing unit and the related data of server down load application engineering after this locality calculating is finished, and the report current state.Because all client all directly communicates with server,, easily produce problems such as single node inefficacy and autgmentability be limited so it is a typical centralized control structure.The programming tool that it provides does not come distinguishing interface according to the estimated performance of task, finishes but leave this work for programming personnel oneself, has increased the difficulty of programming.

Other aspiration computer platform comprises SETI@Home, XtremWeb, P3 etc.Although they are successful, these computing platforms also exist some shortcomings: the DLL (dynamic link library) complexity that 1) does not provide API or provide, and whole platform application programs developer is opaque, and the application and development difficulty is bigger; 2) some computing platform does not provide the versatility support, only calculates an application; 3) when network congestion or network delay were big, server node became the bottleneck of whole computing platform easily, and then caused system to occur that single node lost efficacy and problem such as autgmentability difference.Thereby traditional aspiration computer system can not bring into play the calculating potential of reciprocity high performance computing system well.

Summary of the invention

The object of the present invention is to provide a kind of high performance computing system based on peer-to-peer network, this system has that versatility is good, cross-platform, programming is convenient, the characteristics of zmodem and favorable expandability, can overcome the shortcoming of original aspiration computer system.

High performance computing system based on peer-to-peer network provided by the invention comprises monitor node, dispatch node, and computing node, data server and client computer, wherein,

The engineering description document that the monitor node subscribing client is submitted to, and the task in will using is redirected to each dispatch node in the mode of task cluster; In the computational process of using, the situation of finishing of task in the state of monitor node monitoring dispatch node and each task cluster;

Each computing node under dispatch node is tasked received task cluster branch is monitored the state of each computing node in this management domain and the situation of finishing of institute's assigned tasks, and the situation of finishing of task is reported to monitor node;

Computing node is attached to each dispatch node, and is monitored with task assignment by affiliated dispatch node and to manage; After a certain computing node received the calculation task that dispatch node assigns, it obtained the code and the supplemental characteristic of this calculation task from data server, starts the operation of this task then; In computational process, computing node is regularly finished situation to affiliated dispatch node Report Tasks, and after task computation was finished, computing node was uploaded to data server with the result of this task; Wherein, computing node comprises Registering modules, client stores module, client transmissions module and task control module;

The client stores module provides operation interval to computing node, and provides the data support to the client transmissions module; It all manages with file mode program, data and the result of each task, and sets up the odd-job catalogue for each engineering;

Registering modules is used for the adding and the initial work of Management Calculation node, and the existing state report; When computing node will add system, Registering modules sent to monitor node and joins request, and the address information of the dispatch node of returning according to monitor node is then sent to this dispatch node and to be joined request; After this dispatch node is returned the successful response message of adding, Registering modules initialization task control module; In the running of computing node, Registering modules is periodically to affiliated dispatch node report existing state; If it can not be communicated by letter with affiliated dispatch node, then send adding message to monitor node again, and then add a new dispatch node;

Task control module is used for the reception and the operation of task on the Management Calculation node, and running status and result's report; When computing node was idle, task control module periodically sent task requests message to dispatch node; When dispatch node has the task of the operation waited for, assign a task and give computing node; After task control module received the mission bit stream that dispatch node assigns, notice client transmissions module was to the code and the supplemental characteristic of this subtask of data server request; After the code of this task and supplemental characteristic transmission were finished, task control module passed to this task with the primary data of this task in the client stores module, and starts the calculating of this task; In task computational process, task control module is periodically to the operation conditions of dispatch node Report Tasks; After task computation was finished, task control module was put into the client stores module with destination file, and these files are transferred to data server by the client transmissions module; Then, return to idle condition, and task finished situation and self idle condition reports to dispatch node;

The client transmissions module receives the data transfer command of task control module, according to these transmission commands, and the assistant client terminal memory module, carry out transfer of data with data server, after transfer of data is finished, the client transmissions module will be finished event notice and give task control module;

The initial project data that data server is submitted at application project presentation stage subscribing client; In the engineering calculation stage, its handles the request of data of computing node, transmits the code and the supplemental characteristic of subtask, and receives the result data of the subtask that computing node uploads; After subtask calculating was finished, the request of data of its subscribing client sent the subtask result data to client computer;

After client computer is submitted the primary data of application project to data server, form an engineering description document, and this project description document submitted to monitor node, after the calculating of the subtask of application project is finished, client computer is obtained the result of subtask from data server, and notify main task that these results are carried out aggregation process, obtain the final result of this application.

The invention provides the solution that realizes a kind of novel high-performance computing system, have the following advantages and purposes:

(1) have enhanced scalability and fault-tolerant ability, realize high-performance calculation: this system is built into computational resource half centralized network of a three-layer network structure; With respect to centralized resource management, this architecture has stronger extensibility and fault-tolerant ability, and can improve usage factor of system resource, increases the throughput of calculation task, reduce the response time of computing application, and then realize the high-performance calculation of application.

(2) provide reliable and data transport service efficiently:, use a message channel and data channel transmission control command and data flow respectively to each request of data.This transmission mechanism separates the control command transmission with data flow transmission, can handle the concurrent transmission request of multitask, and the transmission of individual task is realized fault-tolerant control; Data Stream Processing has improved the real-time and the reliability of system efficiently.

(3) can support the general reciprocity high-performance calculation of a class to use: to compare with the reciprocity high performance computing system of @home series, the invention provides the programming model that a cover is simple and easy to usefulness, have abundant and task division method flexibly, support general high-performance calculation is used.

Description of drawings

Fig. 1 is the structural representation that the present invention is based on the high performance computing system of peer-to-peer network;

Fig. 2 is the structural representation of client computer;

Fig. 3 is the structural representation of monitor node;

Fig. 4 is the structural representation of dispatch node;

Fig. 5 is the structural representation of computing node;

Fig. 6 is the structural representation of data server;

Fig. 7 is a Data Transport Protocol;

Fig. 8 is a data stream format;

Fig. 9 is the data transmission state transition diagram.

Embodiment

Below in conjunction with accompanying drawing the present invention is described in detail.

Divide from operation principle, the node that system resource of the present invention can be divided into five kinds of roles: monitor node 1, dispatch node 2.1,2.2 ..., 2.N (being referred to as dispatch node 2), computing node 3.1,3.2 ..., 3.K (being referred to as computing node 3), data server 4 and client computer 5, as shown in Figure 1; Wherein, N, K are positive integer.Each application is submitted to system as an engineering, and this engineering comprises a main task and many subtasks.Wherein main task is moved on client computer 5, and it participates in the control of application project and carries out; The subtask is moved on computing node 3, and the part evaluation work of application project is finished in each subtask.

Monitor node 1, dispatch node 2 and computing node 3 form a hierarchical network: all dispatch node 2.1,2.2 ..., 2.N all links to each other with monitor node 1; All computing nodes 3.1,3.2 ..., 3.K forms a plurality of working groups, the computing node in each working group all links to each other with a dispatch node.1 pair of system resource of monitor node and application project manage, and its work comprises the engineering description document that processing client is submitted to, and the running status of engineering and current system resource are monitored.Dispatch node 2.1,2.2 ..., the idle computing node under 2.N management and the scheduling carries out calculation task, and the state of computing node under the monitoring.Computing node 3.1,3.2 ..., 3.K is the specific actor of subtask.The data of data server 4 storage administration application projects, and provide transmission to support to these data.The user provides a client computer 5, submits application project by client computer 5 to system, and move main task on client computer 5.

As shown in Figure 1, the primary data of application project is submitted to data server 4 by client computer 5; After submission was finished, client computer was set up an engineering description document, and the content of this engineering description document comprises the address information of data server 4, address information and all subtask information of monitor node; Then client computer is submitted to monitor node 1 with this project description document; Start main task then, begin the operation of whole engineering.After engineering started, client computer 5 was divided into task cluster with all subtasks of engineering, and each task cluster comprises the certain subtask of a collection of quantity.Client computer 5 starts a task cluster to monitor node 1 application successively, and monitor node 1 is selected a dispatch node 2.i that load is the lightest (i=1,2..., N) then, and this task cluster is redirected to dispatch node 2.i.Dispatch node 2.i after receiving the task cluster that is redirected by monitor node 1, each computing node 3.s-3.t under subtask wherein tasked by task distribution mechanisms branch (s=1,2..., K, t=1,2..., K, and s≤t).Computing node 3.j is (after s≤j≤t) receives the subtask that dispatch node 2.i assigned, calculate parameters needed and code data to this subtasks of data server 4 request, and send the related data of this subtask to computing node 3.j by data server 4.After receiving that needed data are calculated in this subtask, computing node 3.j this subtask that brings into operation; After the calculating of subtask was finished, computing node 3.j sent to data server 4 with the destination file of this subtask, and gave dispatch node 2.i with the status report of finishing of this task.After receiving the status report that computing node finishes about task computation, dispatch node 2.i gives monitor node 1 with the complete status report of this subtask.The state of the monitor node 1 monitoring task cluster of submitting to; Client computer is regularly inquired about the situation of finishing of the task cluster of applying for to monitor node 1; If behind certain hour, there is the subtask not finish, then will not finish the subtask and be divided into task cluster again, and apply for once more starting these task cluster to monitoring server 1, finish the calculating of subtask in these task cluster then by dispatch node and computing node.After finished all subtasks, client computer 5 was obtained the result of subtask to data server 4, and by main task these results was carried out aggregation process, obtained the final result of this application project, thereby finished finding the solution whole engineering.

Illustrate the formation and the function of native system each several part below respectively, persons skilled in the art can adopt alternate manner to give specific implementation according to content disclosed by the invention.

As shown in Figure 2, client computer 5 submits to module 51 and main task operation module 52 to form by engineering.Engineering is submitted the application project after module 51 is used for software development kit programming for ratification to; This project comprises a main task and a plurality of subtask.At first, engineering submits to module 51 that the primary data of application project is submitted to data server 4, and forms an engineering description document.Then, engineering submits to module 51 that this project description document is submitted to monitor node 1, and submission is finished event notice to main task operation module 52.

Main task operation module 52 is responsible for the operation of main task.After receiving that engineering submits to the engineering submission of module 51 to finish incident, main task operation module starts the operation of main task.In running, main task operation module 52 is divided into task cluster with the subtask of this project, and starts these task cluster to monitor node 1 application successively.Main task operation module 52 regular (the estimation running time of each subtask) is to the completion status of the monitor node 1 inquiry subtask in the task cluster of applying for; If does not finish the subtask in the longest run time that the user sets, then these uncompleted subtasks are reassembled into task cluster, start these task cluster to monitor node 1 application once more.After all finished all subtasks, main task operation module 52 was to the result data of data server 4 these subtasks of request; After receiving the subtask result data that data server 4 sends, the notice main task is further carried out aggregation process to these results, and obtains the final result of this application.

As shown in Figure 3, monitor node 1 mainly is made of following part: new node administration module 11, dispatch node table 12, dispatch node monitoring module 13, engineering management module 14 and quantity sheet 15.Wherein, existing dispatch node information in dispatch node table 12 storage system, the basic information unit (BIU) of each dispatch node comprises the task cluster information of dispatch node address information, time-to-live, dispatch node and the computing node number in the dispatch node.Quantity sheet 15 is writing down all engineering description document of submitting to system, and the dispatch node information that participates in this application project, and it is by engineering management module 14 management maintenances.

New node administration module 11 utilizes dispatch node table 12 pair initiate node to manage.Initiate node sends to monitor node and joins request, when this node is a new dispatch node, new node administration module 11 is dispatch node information unit of this node initializing, and this information unit is put into dispatch node table 12, notifies this node to add system's success then.If when newly added node was a new computing node, new node administration module 11 was distributed to the heaviest dispatch node of load with initiate node; Here load is the heaviest to be meant that the number of tasks that task cluster comprised on this dispatch node is maximum.New node administration module 11 is at first selected the heaviest dispatch node of load from dispatch node table 12; Then, the dispatch node address information that this load is the heaviest is told to initiate computing node, allows this new node send to this dispatch node and joins request, and join in its working group.

Dispatch node monitoring module 13 comprises the existing state of dispatch node and its task cluster completion status according to the state of dispatch node in dispatch node table 12 supervisory control system.Dispatch node monitoring module 13 receives the status report messages of dispatch node 2.i, and this message comprises the completion status that dispatch node 2.i goes up task cluster; Dispatch node monitoring module 13 is according to the task cluster completion status and the node time-to-live of corresponding dispatch node information unit in this information updating dispatch node table 12.The time-to-live of each dispatch node in dispatch node monitoring module 13 regular (as 5 seconds) poll dispatch node tables 12; If a dispatch node is not upgraded the time-to-live in three polling cycles (as 15 seconds), dispatch node monitoring module 13 thinks that this dispatch node logs off, and delete the information unit of this dispatch node in dispatch node table 12, notify engineering management module 14 that uncompleted task cluster in this dispatch node is redirected to the lightest dispatch node of another one load then.

Engineering management module 14 is being managed the startup operation of task cluster in the application project.After the engineering description document that receives the client computer submission, engineering management module 14 is quantity sheet information unit of this project initialization, and this information unit is put in the quantity sheet 15.After the task cluster that receives client computer starts request, engineering management module 14 is chosen a lightest dispatch node of load in dispatch node table 12, this task cluster is redirected to this dispatch node, and the address information of dispatch node is put in the information unit of quantity sheet 15 correspondences.After receiving the task cluster status poll information of client computer, engineering management module 14 obtains the dispatch node that this task cluster is assigned according to quantity sheet 15, the task cluster that obtains corresponding dispatch node from dispatch node table 12 is finished situation then, and this result is returned to client computer.After in running, receiving the message event of dispatch node monitoring module 13 about dispatch node death, engineering management module 14 is selected a lightest dispatch node of load once more from dispatch node table 12, and the uncompleted task cluster of dead dispatch node is redirected to it.Then, engineering management module 14 is new dispatch node information with the dispatch node information updating of corresponding task cluster in the quantity sheet 15.

As shown in Figure 4, dispatch node is mainly by forming with the lower part: (1) dispatch node starts module 21; (2) computing node table 22; (3) the computing node administration module 23; (4) task management module 24; (5) the initiating task formation 25; (6) the wait task formation 26; (7) the operation task formation 27; (8) task queue 28 that makes mistakes.Wherein, computing node information in this dispatch node of computing node table 22 record, the content of each computing node information unit comprises: computing node sign, computing node IP address, computing node communication socket, computing node poll last time time, computing node poll fail count, computing node running state information.When a computing node adds fashionablely, dispatch node is that this computing node generates a computing node information object, and this object is joined in the computing node table 22.

When dispatch node will add system, dispatch node startup module 21 was sent to monitor node 1 and is joined request; Receive after the adding successful respond of monitor node 1, dispatch node starts module 21 initialization computing node administration module 23 and task management module 24 respectively.

Computing node administration module 23 is managed the adding of new computing node, and monitors the state of computing node in the computing node table 22.On the one hand, when new computing node sends when joining request to dispatch node, the computing node administration module is computing node information unit of this computer point initialization, and this information is put in the computing node table 22.On the other hand, the state of computing node administration module 23 each computing nodes of monitoring.Each computing node 3 periodically sends alive message to affiliated dispatch node; If computing node administration module 23 is not received the alive message of computing node in 3 polling cycles, think that then this computing node logs off or lost efficacy, and it is deleted from computing node table 22, and notify task management module 24, the task on this computing node is divided again task other computing nodes.

Task management module 24 utilizes four task teams to come the management and monitoring node to be redirected to the task in its task cluster.Wherein, initiating task formation 25 is used to preserve initiating task information, and the mission bit stream that current wait is assigned is preserved in wait task formation 26; Operation task formation 27 is preserved and has been divided the mission bit stream of tasking computing node; The mission bit stream of makeing mistakes is preserved in the task queue 28 that makes mistakes.

At first, task management module 24 is put into the task in the task cluster in the initiating task formation 25 after receiving the redirected task cluster of monitor node.Then, task management module 24 is checked the legitimacy of these mission bit streams successively: if a task is then put into wait task formation 26 with this task by checking; If this task fails inspection, then this task is abandoned.Then, receive the free time report of a computing node when task management module 24 after, if have not assigned tasks in the wait task formation 26, then give this computing node, and this task is put in the operation task formation 27 first task assignment in the wait task formation 26.Computing node is the state of task on task management module 24 these computing nodes of report periodically; Task management module 24 is added up the completion status of task in the operation task formations 27, and periodically reports the situation of finishing of calculation task to monitor node; When not having task to calculate, this status report is equivalent to the existing state report message of dispatch node.After task management module 24 had been reported the state of task to monitor node, the task that calculating is finished shifted out from operation task formation 27, and gave monitor node with the complete status report of this task.As if the task execute exception on the computing node, or receive the message that computing node administration module 23 withdraws from about computing node, the task of then branch being tasked this computing node moves in the task queue 28 that makes mistakes.Then, when when report free time of receiving computing node, will make mistakes task in the task queue 28 of task management module 24 is divided again and is tasked this idle computing node, and this task is moved in the operation task formation 27 again.

As shown in Figure 5, computing node mainly is made of following part: (1) Registering modules 31 (2) task control modules 34; (3) the client transmissions module 33; (4) the client stores module 32.Wherein, client stores module 34 provides operation interval for computing node 3, and provides the data support to client transmissions module 33.It all manages with file mode program, data and the result of each task, and sets up the odd-job catalogue for each engineering.

Registering modules 31 is being managed the adding of computing node and the initial work of node, and the report of the existing state of this node.When computing node will add system, Registering modules 31 sent to monitor node 1 and joins request, and the address information of the dispatch node of returning according to monitor node 12 is then sent to this dispatch node 2 and to be joined request.After dispatch node 2 is returned the successful response message of adding, Registering modules 31 initialization task control modules 34.In the running of computing node, Registering modules 31 is periodically to affiliated dispatch node 2 report existing states.If it can not be communicated by letter with affiliated dispatch node 2, then send adding message to monitor node 1 again, and then add a new dispatch node.

The reception and the operation of task on the task control module 34 Management Calculation nodes, and running status and result's report.At first, when computing node 3 was idle, task control module 34 periodically sent the task requests message to dispatch node 2.When dispatch node 2 has the task of the operation waited for, assign a task and give computing node; After task control module 34 received the mission bit stream that dispatch node assigns, notice client transmissions module 33 was to the code and the supplemental characteristic of data server 4 these subtasks of request.After the code of this task and supplemental characteristic transmission were finished, task control module 34 passed to this task with the primary data of this task in the client stores module 32, and starts the calculating of this task.In task computational process, task control module 34 is periodically to the operation conditions of dispatch node Report Tasks.After task computation was finished, task control module 34 was put into client stores module 32 with destination file, and these files are transferred to data server 4 by client transmissions module 33; Then, return to idle condition, and task finished situation and self idle condition reports to dispatch node 2.

Client transmissions module 33 receives the data transfer command of task control module 34, according to these transmission commands, and assistant client terminal memory module 32, carry out transfer of data with data server 4: it sends request of data to data server 4 on the one hand, and corresponding data are put in the client stores module 32; On the other hand, the result of calculation of task is taken out from client stores module 32, be transferred to then in the data server 4.After transfer of data is finished, client transmissions module 33 will be finished event notice and give task control module 34.

Data server 4 provides the data backup function for engineering; As shown in Figure 6, it mainly is made up of three parts: (1) data service module 41; (2) the service end memory module 42; (2) the service end transport module 43.Its function and correlation are described below:

Data service module 41 provides unified data, services interface to call for computing node 3 and client computer 5, and finishes data, services in conjunction with service end transport module 43 and service end memory module 42.When computing node 3 or client computer 5 after data server sends request of data, data service module 41 sends data transfer command to service end transport module 43.When computing node 3 or client computer 5 when data server 4 is submitted data to, data service module 41 sends the reception data command to service end transport module 43.

Service end transport module 43 provides the service end transfer function for the transfer of data between computing node 3, client computer 5 and the data server 4.Service end transport module 43 can respond a plurality of data request operation simultaneously.Receive the Data Receiving order of data service module 41 when service end transport module 43 after, it finds the needed data of request from service end memory module 42, and transfers data to computing node 3.Receive the Data Receiving order of data service module 41 when service end transport module 43 after, it receives the data of computing node 3, and data are put in the service end memory module 42.

Service end memory module 42 is carried out storage administration with the data of application project in the operation interval of data server; It is in conjunction with Berkeley DB database, for data service module 41 and service end transport module 43 provide the local access service.It utilizes a file resource pond to come the code of back-up storage user and each computing node, parameter and result data.

Data server 4 is finished by Data Transport Protocol FDTP with the transfer of data of computing node 3, and this agreement is realized by service end transport module 43, service end memory module 42, client transmissions module 33 and 32 cooperations of client stores module of data server.As shown in Figure 7, data server 4 is as server, and computing node 3 is as client.Wherein, pass through service end transport module 43 and client transmissions module 33 between data server 4 and the computing node 3 respectively, set up a message channel and data channel and come message transfer and data file, and in this locality data are carried out storage administration by service end memory module 42 and client stores module 32.

In data transmission procedure, the transmission data are formatd according to as shown in Figure 8 data stream format.The pairing information of document flow comprises document structure information FSI, beginning-of-file label FBT, file size FS, file data FD and end mark ET.Wherein, ET is end-of-file mark or data flow end mark, and when the file of transmission when being last file in the request msg, ET is data flow end mark ETT; If this document is not last file in the request msg, then ET is end-of-file mark ETF, and next document flow will be followed in the ET back with same stream format.

The formal specification of accompanying drawing 9 usefulness finite state machines the data transmission procedure of FDTP.Transfer of data client, server are all from idle condition IDLE at first, each time.User end to server enters synchronous accepting state SYN_RECV after sending data request information; Server is verified the legitimacy and the correctness of message, and will be verified that the result sends to client computer as synchronization message SEND/SYN after receiving request message, enters synchronized transmission state SYN_SENT.If by checking, server will not enter the IDLE state; Client also enters the IDLE state after receiving synchronous error messages, and the reason of will makeing mistakes returns to the user, finishes this transmission and prepares next time request message and handle.If by checking, server to data passage initialization (PREPARE) after, enter beginning transmission state BEGIN_TRANS; Client to the initialization of client data passage, and enters beginning transmission state BEGIN_TRANS after receiving synchronizing information.

After client, server entered the beginning transmission state, a side was the data sender, and corresponding the opposing party is Data Receiving person, and both sides carry out transfer of data (S/R).After having transmitted a data file, the server SYN_SENT that gets the hang of, and whether find out this transmission success, will transmit the result and send to client as synchronizing information; Get the hang of SYN_RECV and receiving synchronous information of client correspondingly.If bust this, client, server enter IDLE from state SYN_SENT and SYN_RECV respectively, empty data channel and prepare Message Processing next time according to failure cause.If transmission success, client and server are finished once transmission, and the FINISH_ONCE that gets the hang of.

Get the hang of behind the FINISH_ONCE, communicating pair is checked whether end of transmission of institute's request msg.If also have file not transmit, communicating pair all will enter the BEGIN_TRANS state, begin the transmission course of next file.If the whole end of transmissions of institute's transmitting file, server, client will enter SYN_SENT and SYN_RECV state respectively, and the authorization information that server is finished expression sends to client as synchronization message.After receiving synchronization message, client will will enter the IDLE state with server, finish the processing of this data request information, and prepare to handle next data request information.

Claims

1, a kind of high performance computing system based on peer-to-peer network, it is characterized in that: this system comprises monitor node (1), dispatch node (2.1,2.3 ..., 2.N, N is a positive integer), computing node (3.1,3.3 ..., 3.K, K is a positive integer), data server (4) and client computer (5), wherein

The application project description document that monitor node (1) subscribing client (5) is submitted to, and the task in will using with the mode of task cluster redirected to each dispatch node (2.1,2.3 ..., 2.N); In the computational process of using, monitor node (1) monitoring dispatch node (2.1,2.3 ..., 2.N) state and each task cluster in the situation of finishing of task;

Dispatch node (2.1,2.3 ..., 2.N) with received task cluster branch each computing node under tasking, the situation of finishing of the state of each computing node and institute's assigned tasks in the monitoring management territory, and with the situation of finishing of task to monitor node (1) report;

Computing node (3.1,3.3 ..., 3.K) be attached to each dispatch node, and monitor with task assignment by affiliated dispatch node and to manage; After a certain computing node received the calculation task that dispatch node assigns, it obtained the code and the supplemental characteristic of this calculation task from data server (4), starts the operation of this task then; In computational process, computing node is regularly finished situation to affiliated dispatch node Report Tasks, and after task computation was finished, computing node was uploaded to data server (4) with the result of this task; Wherein, computing node comprises Registering modules (31), client stores module (32), client transmissions module (33) and task control module (34);

Client stores module (32) provides operation interval to computing node, and provides the data support to client transmissions module (33); It all manages with file mode program, data and the result of each task, and sets up the odd-job catalogue for each engineering;

Registering modules (31) is used for the adding and the initial work of Management Calculation node, and the existing state report; When computing node will add system, Registering modules (31) sent to monitor node (1) and joins request, and the address information of the dispatch node of returning according to monitor node (1) is then sent to this dispatch node and to be joined request; After this dispatch node is returned the successful response message of adding, Registering modules (31) initialization task control module (34); In the running of computing node, Registering modules (31) is periodically to affiliated dispatch node report existing state; If it can not communicate by letter with affiliated dispatch node, then send to monitor node (1) again and join request, and then add a new dispatch node;

Task control module (34) is used for the reception and the operation of task on the Management Calculation node, and running status and result's report; When computing node was idle, task control module (34) periodically sent task requests message to dispatch node; When dispatch node has the task of the operation waited for, assign a task and give computing node; After task control module (34) received the mission bit stream that dispatch node assigns, notice client transmissions module (33) was to the code and the supplemental characteristic of data server (4) request subtask; After the code of this task and supplemental characteristic transmission were finished, task control module (34) passed to this task with the primary data of this task in the client stores module (32), and starts the calculating of this task; In task computational process, task control module (34) is periodically to the operation conditions of dispatch node Report Tasks; After task computation was finished, task control module (34) was put into client stores module (32) with destination file, and these files are transferred to data server (4) by client transmissions module (33); Then, return to idle condition, and task finished situation and self idle condition reports to dispatch node;

Client transmissions module (33) receives the data transfer command of task control module (34), according to these data transfer commands, and assistant client terminal memory module (32), carry out transfer of data with data server (4), after transfer of data is finished, client transmissions module (33) will be finished event notice and give task control module (34);

The initial project data that data server (4) is submitted at application project presentation stage subscribing client (5); In the application project calculation stages, its handles the request of data of computing node, transmits the code and the supplemental characteristic of subtask, and receives the result data of the subtask that computing node uploads; After subtask calculating was finished, the request of data of its subscribing client (5) sent the subtask result data to client computer (5);

After client computer (5) is submitted the primary data of application project to data server (4), form an application project description document, and this application project description document submitted to monitor node (1), after the calculating of the subtask of application project is finished, client computer (5) is obtained the result of subtask from data server (4), and notify main task that these results are carried out aggregation process, obtain the final result of this application project.

2, high performance computing system according to claim 1 is characterized in that: client computer (5) comprises engineering submission module (51) and main task operation module (52);

Engineering is submitted the application project after module (51) is used for software development kit programming for ratification to; Engineering submits to module (51) that the primary data of application project is submitted to data server (4), and form an application project description document, and this application project description document submitted to monitor node (1), application project is submitted to finished event notice and move module (52) to main task;

Main task operation module (52) is responsible for the operation of main task; After receiving that engineering is submitted to the application project of module (51) to submit to finish incident, the subtask of this application project is divided into task cluster, and starts these task cluster to monitor node (1) application successively; Main task operation module (52) is regularly inquired about the completion status of the subtask in the task cluster of applying for to monitor node (1); If does not finish the subtask in the longest run time that the user sets, then these uncompleted subtasks are reassembled into task cluster, start these task cluster to monitor node (1) application once more; After all finished all subtasks, main task operation module (52) was asked the result data of these subtasks to data server (4); After receiving the subtask result data that data server (4) sends, the notice main task is further carried out aggregation process to these results, and obtains the final result of this application project.

3, high performance computing system according to claim 1 and 2 is characterized in that: monitor node (1) comprises new node administration module (11), dispatch node table (12), dispatch node monitoring module (13), engineering management module (14) and quantity sheet (15);

Dispatch node table (12) is used for the existing dispatch node information of storage system, quantity sheet (15) is used to write down the application project description document that all submit to system, and the dispatch node information that participates in this application project, and by engineering management module (14) management maintenance;

New node administration module (11) utilizes dispatch node table (12) that initiate node is managed; When initiate node was a new dispatch node, new node administration module (11) was dispatch node information unit of this node initializing, and this information unit is put into dispatch node table (12), notified this node to add system's success then; If when newly added node was a new computing node, new node administration module (11) was distributed to the heaviest dispatch node of load with initiate node; New node administration module (11) is at first selected the heaviest dispatch node of load from dispatch node table (12); Then, the dispatch node address information that this load is the heaviest is told to initiate computing node, allows this new node send to this dispatch node and joins request, and join in its working group;

Dispatch node monitoring module (13) is according to the state of dispatch node in dispatch node table (12) supervisory control system, receive the status report messages of this dispatch node, and according to the task cluster completion status and node time-to-live of corresponding dispatch node information unit in this information updating dispatch node table (12);

The time-to-live of each dispatch node in dispatch node monitoring module (13) the periodic polling dispatch node table (12); If a dispatch node is not upgraded the time-to-live in predetermined polling cycle, dispatch node monitoring module (13) thinks that this dispatch node logs off, and delete the information unit of this dispatch node in dispatch node table (12), notify engineering management module (14) that uncompleted task cluster in this dispatch node is redirected to the lightest dispatch node of another one load then;

Engineering management module (14) is used for managing the startup operation of application project task cluster; After the application project description document that receives client computer (5) submission, engineering management module (14) is quantity sheet information unit of this application project initialization, and this information unit is put in the quantity sheet (15); After the task cluster that receives client computer starts request, engineering management module (14) is chosen a lightest dispatch node of load in dispatch node table (12), this task cluster is redirected to this dispatch node, and the address information of dispatch node is put in the corresponding information unit of quantity sheet (15); After receiving the task cluster status poll information of client computer, engineering management module (14) obtains the dispatch node that this task cluster is assigned according to quantity sheet (15), the task cluster that obtains corresponding dispatch node from dispatch node table (12) is finished situation then, and this result is returned to client computer; After in running, receiving the message event of dispatch node monitoring module (13) about dispatch node death, engineering management module (14) is selected a lightest dispatch node of load once more from dispatch node table (12), and the uncompleted task cluster of dead dispatch node is redirected to it; Then, engineering management module (14) is new dispatch node information with the dispatch node information updating of corresponding task cluster in the quantity sheet (15).

4, high performance computing system according to claim 1 and 2 is characterized in that: described dispatch node comprises that dispatch node starts module (21), computing node table (22), computing node administration module (23), task management module (24), initiating task formation (25), wait task formation (26), operation task formation (27) and the task queue that makes mistakes (28);

Computing node table (22) is used to write down the computing node information in the dispatch node; Initiating task formation (25) is used to preserve initiating task information, and wait task formation (26) is used to preserve the mission bit stream that current wait is assigned; Operation task formation (27) is used to preserve and has divided the mission bit stream of tasking computing node; The task queue (28) that makes mistakes is used to preserve the mission bit stream of makeing mistakes;

Dispatch node starts module (21) and sends to monitor node (1) when dispatch node will add system and join request, receive and add after the successful respond dispatch node startup module (21) difference initialization computing node administration module (23) and task management module (24);

Computing node administration module (23) is used to manage the adding of new computing node, and the state of the middle computing node of monitoring computing node table (22);

Task management module (24) is used for the management and monitoring node and is redirected task to its task cluster, and initiating task formation (25), wait task formation (26), operation task formation (27) and the task queue that makes mistakes (28) are managed maintenance.

5, high performance computing system according to claim 1 and 2 is characterized in that: data server (4) comprises data service module (41), service end memory module (42), service end transport module (43);

Data service module (41) provides unified data, services interface to call for computing node (3) and client computer (5), and finishes data transport service in conjunction with service end transport module (43) and service end memory module (42); When computing node (3) or client computer (5) to data server send request of data after, data service module (41) sends data transfer command to service end transport module (43); When computing node (3) or client computer (5) when data server (4) is submitted data to, data service module (41) sends the reception data command to service end transport module (43);

Service end transport module (43) provides the service end transfer function for the transfer of data between computing node (3), client computer 5 and the data server (4); Service end transport module (43) can respond a plurality of data request operation simultaneously; Receive the Data Receiving order of data service module (41) when service end transport module (43) after, it finds the needed data of request from service end memory module (42), and transfers data to computing node (2); Receive the Data Receiving order of data service module (41) when service end transport module (43) after, it receives the data of computing node (3), and data are put in the service end memory module (42);

Service end memory module (42) is carried out storage administration with the data of application project in the operation interval of data server; For data service module (41) and service end transport module (43) provide the local access service.