CN104301434A - High speed communication architecture and method based on trunking - Google Patents

High speed communication architecture and method based on trunking Download PDF

Info

Publication number
CN104301434A
CN104301434A CN201410602244.7A CN201410602244A CN104301434A CN 104301434 A CN104301434 A CN 104301434A CN 201410602244 A CN201410602244 A CN 201410602244A CN 104301434 A CN104301434 A CN 104301434A
Authority
CN
China
Prior art keywords
trunking communication
communication node
trunking
node
task data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410602244.7A
Other languages
Chinese (zh)
Other versions
CN104301434B (en
Inventor
高永虎
张广勇
张清
沈铂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410602244.7A priority Critical patent/CN104301434B/en
Publication of CN104301434A publication Critical patent/CN104301434A/en
Application granted granted Critical
Publication of CN104301434B publication Critical patent/CN104301434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a high speed communication architecture and method based on trunking. The high speed communication architecture comprises a plurality of trunking communication nodes and a shared storage system. The multiple trunking communication nodes are mutually connected to form an annular trunking communication node communication structure; the multiple trunking communication nodes are each connected with the shared storage system. Compared with the prior art, the high speed communication architecture and method based on the trunking have the advantages that trunking communication is expanded to the multiple trunking communication nodes, calculation devices among the trunking communication nodes of a trunking communication system and calculation devices within the trunking communication nodes achieve calculation load balance, high fault tolerance of the trunking communication system is guaranteed, and therefore the overall operating efficiency of the trunking communication system is improved and task processing time is largely shortened.

Description

A kind of high-speed communication framework based on cluster and method
Technical field
The invention belongs to trunking communication field, particularly relate to a kind of high-speed communication method based on cluster and system.
Background technology
The data huge explosion of the current social mankind, information data gets more and more, the requirement of people to the disposal ability of information data is also more and more higher, the demand high-performance calculations such as not only oil exploration, weather forecast, space flight national defence, scientific research, finance, e-government, education, enterprise, online game etc. widely field to the demand rapid growth of high-performance calculation.
Computational speed is particularly important for high-performance calculation, high-performance calculation is towards multinuclear, the development of many core, isomerism parallel is adopted to promote computation speed, current CPU+GPU is very ripe isomery cooperated computing pattern, be applicable to application or the algorithm of high-speed parallel calculating, but because some application operational data amounts are always larger, be limited to the reasons such as the network bandwidth in single server, add multiple accelerator card or the mode by the limited cluster of extension of network, cannot meet current demand.
Summary of the invention
The invention provides a kind of high-speed communication framework based on cluster and method, to solve the problem.
The invention provides a kind of high-speed communication framework based on cluster, comprise multiple trunking communication node, shared memory systems; Wherein, be interconnected between described multiple trunking communication node, form the trunking communication node communication structure of a ring-type; Described multiple trunking communication node is connected with described shared memory systems respectively.
The present invention also provides a kind of high-speed communication method based on cluster, comprises the following steps:
Each trunking communication node carries out processing and result is sent to adjacent trunking communication node by default communication sequence after obtaining corresponding part calculation task data respectively;
Adjacent trunking communication node, according to the described result received, upgrades the calculating of oneself and is sent to next adjacent trunking communication node according to described default communication sequence, terminating until calculate.
Compared to prior art, according to a kind of high-speed communication method based on cluster provided by the invention and system, trunking communication is expanded on multiple stage trunking communication node, the computing equipment between the trunking communication node of trunked communication system, in trunking communication node is made to reach the load balancing of calculating, and ensure the high fault tolerance of trunked communication system, thus improve the overall operation efficiency of trunked communication system, greatly shorten the task processing time.
Multiple trunking communication node is connected by express network by the present invention, form a loop configuration trunked communication system, realize the high scalability of calculating scale, simultaneously at the asynchronous execution of this system cocycle communication and parallel computation, improve the overall operation efficiency of trunked communication system, meet the requirement of performance application.
In order to ensure trunked communication system reliability of operation, the present invention proposes a kind of fault tolerant mechanism: namely by other trunking communication node of trunking communication node timed collection data message and be saved in shared memory systems, ensure to occur in long-play system delay machine time, program continues to run from breakpoint, ensure when certain trunking communication node failure simultaneously, its not completing of task can be continued by other trunking communication node.
This trunked communication system has higher autgmentability, the Topology Structure Design of annular can expand to any number of trunking communication node in theory, can add and delete any number of trunking communication node, certain trunking communication node catastrophic failure can be prevented or have new node to add fashionable can working on.
In order to efficiently manage data, the method for designing that a kind of data store is also proposed in the present invention, namely data-storage system divides shared memory systems and local storage system, calculation task data and the Backup Data of each trunking communication nodes sharing is stored in shared memory systems, between assuring data security and each trunking communication node, calculated data is synchronous, facilitates effective management of data; Meanwhile, the local memory cell in each trunking communication node, by the data in Cache Design storage system, improves the access efficiency of data further.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, and form a application's part, schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Figure 1 shows that the high-speed communication Organization Chart based on cluster of the embodiment of the present invention 1;
Figure 2 shows that the high-speed communication method process chart based on cluster of the embodiment of the present invention 2.
Embodiment
Hereinafter also describe the present invention in detail with reference to accompanying drawing in conjunction with the embodiments.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.
Figure 1 shows that the high-speed communication Organization Chart based on cluster of the embodiment of the present invention 1, comprising: trunking communication node 1, trunking communication node 2, trunking communication node 3 ... trunking communication node n-1, trunking communication node n; Shared memory systems M;
Each trunking communication node (trunking communication node 1, trunking communication node 2, trunking communication node 3 ... trunking communication node n-1, trunking communication node n) in comprise central processor CPU unit, one or more Graphics Processing Unit GPU, local memory cell; Wherein, between each trunking communication node, carry out network connection by express network, form the trunking communication node communication structure of a ring-type; Local memory cell, for the data in Cache Design storage system M, also for storing the communication data of this trunking communication node.
Shared memory systems M, is interconnected by express network and each trunking communication node, for storing the calculation task data of each trunking communication node, the Backup Data of each trunking communication node.
Express network comprises: fast-swap Ethernet, gigabit Ethernet, 100G fiber optic network.
Figure 2 shows that the high-speed communication method process chart based on cluster of the embodiment of the present invention 2, comprise the following steps:
Step 201: each trunking communication node carries out processing and result is sent to adjacent trunking communication node by default communication sequence after obtaining corresponding part calculation task data respectively;
The process that each trunking communication node obtains corresponding part calculation task data is respectively:
Trunking communication management node or third party entity, according to the size information of the calculation task data volume obtained from shared memory systems, the population size information of Graphics Processing Unit GPU that obtains from each trunking communication node, obtain the size information of each part calculation task data volume corresponding to trunking communication node described and notice each trunking communication node described;
Each trunking communication node described, according to the size information obtaining corresponding part calculation task data volume, obtains corresponding part calculation task data respectively from described shared memory systems; Wherein, described trunking communication management node refers to one of Stochastic choice in trunking communication node.
Trunking communication management node or third party entity, according to the size information of the calculation task data volume obtained from shared memory systems, the population size information of Graphics Processing Unit GPU that obtains from each trunking communication node, obtain each size information of part calculation task data volume corresponding to trunking communication node described and the process of notice each trunking communication node described is:
Trunking communication management node or third party entity obtain the size information of calculation task data volume, from each trunking communication node, obtain the population size information of Graphics Processing Unit GPU from shared memory systems;
Obtain the product that GPU quantity in each trunking communication node accounts for the ratio of whole trunking communication nodes total GPU quantity the size information according to described ratio and described calculation task data volume respectively, determine the size information of the part calculation task data volume that each trunking communication node described is corresponding and notice each trunking communication node described.
After each trunking communication node completes the part calculation task data of obtained correspondence, from described shared memory systems, obtain the part calculation task data of corresponding ratio again.
Described default communication sequence refers to clockwise communication sequence or anti-clockwise communications order.
Under clockwise communication sequence, as shown in Figure 1: trunking communication node 0-> trunking communication node 1, trunking communication node 1-> trunking communication node 2, trunking communication node N-2-> trunking communication node N-1, trunking communication node N-1-> trunking communication node 0, trunking communication node 0->1, trunking communication node 1->2 ...
Step 202: adjacent trunking communication node, according to the described result received, upgrades the calculating of oneself and is sent to next adjacent trunking communication node according to described default communication sequence, terminating until calculate.
After each trunking communication node completes the part calculation task data of obtained correspondence, transmission processing result gives described trunking communication management node;
Final result is stored in described shared memory systems by described trunking communication management node.
Or
Trunking communication management node is collected the completed result of calculation of each trunking communication node according to the time interval of presetting and is stored in described shared memory systems.
Compared to prior art, according to a kind of high-speed communication method based on cluster provided by the invention and system, trunking communication is expanded on multiple stage trunking communication node, the computing equipment between the trunking communication node of trunked communication system, in trunking communication node is made to reach the load balancing of calculating, and ensure the high fault tolerance of trunked communication system, thus improve the overall operation efficiency of trunked communication system, greatly shorten the task processing time.
Multiple trunking communication node is connected by express network by the present invention, form a loop configuration trunked communication system, realize the high scalability of calculating scale, simultaneously at the asynchronous execution of this system cocycle communication and parallel computation, improve the overall operation efficiency of trunked communication system, meet the requirement of performance application.
In order to ensure trunked communication system reliability of operation, the present invention proposes a kind of fault tolerant mechanism: namely by other trunking communication node of trunking communication node timed collection data message and be saved in shared memory systems, ensure to occur in long-play system delay machine time, program continues to run from breakpoint, ensure when certain trunking communication node failure simultaneously, its not completing of task can be continued by other trunking communication node.
This trunked communication system has higher autgmentability, the Topology Structure Design of annular can expand to any number of trunking communication node in theory, can add and delete any number of trunking communication node, certain trunking communication node catastrophic failure can be prevented or have new node to add fashionable can working on.
In order to efficiently manage data, the method for designing that a kind of data store is also proposed in the present invention, namely data-storage system divides shared memory systems and local storage system, calculation task data and the Backup Data of each trunking communication nodes sharing is stored in shared memory systems, between assuring data security and each trunking communication node, calculated data is synchronous, facilitates effective management of data; Meanwhile, the local memory cell in each trunking communication node, by the data in Cache Design storage system, improves the access efficiency of data further.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. based on a high-speed communication framework for cluster, it is characterized in that, comprise multiple trunking communication node, shared memory systems; Wherein, be interconnected between described multiple trunking communication node, form the trunking communication node communication structure of a ring-type; Described multiple trunking communication node is connected with described shared memory systems respectively.
2. framework according to claim 1, is characterized in that: each trunking communication node comprises central processor CPU unit, one or more Graphics Processing Unit GPU, local memory cell.
3. framework according to claim 2, is characterized in that: described shared memory systems, for storing the calculation task data of described multiple trunking communication node, the Backup Data of described multiple trunking communication node.
4. be applied to a method for any one of claim 1-3 high-speed communication framework, it is characterized in that:
Each trunking communication node carries out processing and result is sent to adjacent trunking communication node by default communication sequence after obtaining corresponding part calculation task data respectively;
Adjacent trunking communication node, according to the described result received, upgrades the calculating of oneself and is sent to next adjacent trunking communication node according to described default communication sequence, terminating until calculate.
5. method according to claim 4, is characterized in that: the process that each trunking communication node obtains corresponding part calculation task data is respectively:
Trunking communication management node or third party entity, according to the size information of the calculation task data volume obtained from shared memory systems, the population size information of Graphics Processing Unit GPU that obtains from each trunking communication node, obtain the size information of each part calculation task data volume corresponding to trunking communication node described and notice each trunking communication node described;
Each trunking communication node described, according to the size information obtaining corresponding part calculation task data volume, obtains corresponding part calculation task data respectively from described shared memory systems; Wherein, described trunking communication management node refers to one of Stochastic choice in trunking communication node.
6. method according to claim 5, is characterized in that: trunking communication management node or third party entity obtain the size information of calculation task data volume, from each trunking communication node, obtain the population size information of Graphics Processing Unit GPU from shared memory systems;
Obtain the product that GPU quantity in each trunking communication node accounts for the ratio of whole trunking communication nodes total GPU quantity the size information according to described ratio and described calculation task data volume respectively, determine the size information of the part calculation task data volume that each trunking communication node described is corresponding and notice each trunking communication node described.
7. method according to claim 6, is characterized in that: after each trunking communication node completes the part calculation task data of obtained correspondence, obtains the part calculation task data of corresponding ratio from described shared memory systems again.
8. method according to claim 5, is characterized in that: after each trunking communication node completes the part calculation task data of obtained correspondence, and transmission processing result gives described trunking communication management node;
Final result is stored in described shared memory systems by described trunking communication management node.
9. method according to claim 5, is characterized in that: trunking communication management node is collected the completed result of calculation of each trunking communication node according to the time interval of presetting and is stored in described shared memory systems.
10. method according to claim 4, is characterized in that: described default communication sequence refers to clockwise communication sequence or anti-clockwise communications order.
CN201410602244.7A 2014-10-31 2014-10-31 A kind of high-speed communication framework and method based on cluster Active CN104301434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410602244.7A CN104301434B (en) 2014-10-31 2014-10-31 A kind of high-speed communication framework and method based on cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410602244.7A CN104301434B (en) 2014-10-31 2014-10-31 A kind of high-speed communication framework and method based on cluster

Publications (2)

Publication Number Publication Date
CN104301434A true CN104301434A (en) 2015-01-21
CN104301434B CN104301434B (en) 2018-06-15

Family

ID=52320997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410602244.7A Active CN104301434B (en) 2014-10-31 2014-10-31 A kind of high-speed communication framework and method based on cluster

Country Status (1)

Country Link
CN (1) CN104301434B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580503A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Efficient dynamic load balancing system and method for processing large-scale data
CN105183692A (en) * 2015-09-22 2015-12-23 浪潮(北京)电子信息产业有限公司 Method and system for data communication between cluster system devices
CN107181626A (en) * 2017-07-18 2017-09-19 郑州云海信息技术有限公司 Distributed storage group system network bandwidth monitoring method and system
CN107463448A (en) * 2017-09-28 2017-12-12 郑州云海信息技术有限公司 A kind of deep learning weight renewing method and system
CN108595670A (en) * 2018-04-28 2018-09-28 金蝶蝶金云计算有限公司 A kind of date storage method, device, computer installation and storage medium
CN108768794A (en) * 2018-07-27 2018-11-06 郑州云海信息技术有限公司 A kind of flow rate testing methods of network cluster, device, equipment and medium
CN112477781A (en) * 2019-09-12 2021-03-12 华为技术有限公司 System and method for realizing electronic control function in automobile and automobile
CN113556242A (en) * 2020-04-24 2021-10-26 中科寒武纪科技股份有限公司 Method and equipment for performing inter-node communication based on multi-processing nodes
CN114691591A (en) * 2020-12-31 2022-07-01 中科寒武纪科技股份有限公司 Circuit, method and system for inter-chip communication

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101969468A (en) * 2010-10-14 2011-02-09 广州从兴电子开发有限公司 Inquiry server cluster system and inquiry method
CN102215123A (en) * 2011-06-07 2011-10-12 南京邮电大学 Multi-ring-network-topology-structure-based large-scale trunking system
CN102571499A (en) * 2012-02-14 2012-07-11 广州亦云信息技术有限公司 Monitoring method of cloud database server cluster
CN103634277A (en) * 2012-08-23 2014-03-12 深圳市腾讯计算机系统有限公司 Memory sharing method, server and system
CN104125165A (en) * 2014-08-18 2014-10-29 浪潮电子信息产业股份有限公司 Job scheduling system and method based on heterogeneous cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101969468A (en) * 2010-10-14 2011-02-09 广州从兴电子开发有限公司 Inquiry server cluster system and inquiry method
CN102215123A (en) * 2011-06-07 2011-10-12 南京邮电大学 Multi-ring-network-topology-structure-based large-scale trunking system
CN102571499A (en) * 2012-02-14 2012-07-11 广州亦云信息技术有限公司 Monitoring method of cloud database server cluster
CN103634277A (en) * 2012-08-23 2014-03-12 深圳市腾讯计算机系统有限公司 Memory sharing method, server and system
CN104125165A (en) * 2014-08-18 2014-10-29 浪潮电子信息产业股份有限公司 Job scheduling system and method based on heterogeneous cluster

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580503A (en) * 2015-01-26 2015-04-29 浪潮电子信息产业股份有限公司 Efficient dynamic load balancing system and method for processing large-scale data
CN105183692A (en) * 2015-09-22 2015-12-23 浪潮(北京)电子信息产业有限公司 Method and system for data communication between cluster system devices
CN107181626A (en) * 2017-07-18 2017-09-19 郑州云海信息技术有限公司 Distributed storage group system network bandwidth monitoring method and system
CN107463448A (en) * 2017-09-28 2017-12-12 郑州云海信息技术有限公司 A kind of deep learning weight renewing method and system
CN108595670B (en) * 2018-04-28 2021-05-14 金蝶蝶金云计算有限公司 Data storage method and device, computer device and storage medium
CN108595670A (en) * 2018-04-28 2018-09-28 金蝶蝶金云计算有限公司 A kind of date storage method, device, computer installation and storage medium
CN108768794A (en) * 2018-07-27 2018-11-06 郑州云海信息技术有限公司 A kind of flow rate testing methods of network cluster, device, equipment and medium
CN112477781A (en) * 2019-09-12 2021-03-12 华为技术有限公司 System and method for realizing electronic control function in automobile and automobile
US11418935B2 (en) 2019-09-12 2022-08-16 Huawei Technologies Co., Ltd. System and method for implementing automobile electronic control function, and automobile
CN112477781B (en) * 2019-09-12 2022-08-26 华为技术有限公司 System and method for realizing electronic control function in automobile and automobile
CN113556242A (en) * 2020-04-24 2021-10-26 中科寒武纪科技股份有限公司 Method and equipment for performing inter-node communication based on multi-processing nodes
CN113556242B (en) * 2020-04-24 2023-01-17 中科寒武纪科技股份有限公司 Method and equipment for performing inter-node communication based on multi-processing nodes
CN114691591A (en) * 2020-12-31 2022-07-01 中科寒武纪科技股份有限公司 Circuit, method and system for inter-chip communication

Also Published As

Publication number Publication date
CN104301434B (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN104301434A (en) High speed communication architecture and method based on trunking
Li et al. Coding for distributed fog computing
Li et al. Parameter server for distributed machine learning
CN105824780A (en) Parallel development method based on single machine and multiple FPGA
CN103399894A (en) Distributed transaction processing method on basis of shared storage pool
CN113900810A (en) Distributed graph processing method, system and storage medium
CN113687964B (en) Data processing method, device, electronic equipment, storage medium and program product
CN104580503A (en) Efficient dynamic load balancing system and method for processing large-scale data
Kchaou et al. Towards an offloading framework based on big data analytics in mobile cloud computing environments
WO2021027331A1 (en) Graph data-based full relationship calculation method and apparatus, device, and storage medium
Li et al. Enhancing the robustness and efficiency of scale-free network with limited link addition
CN108737575A (en) A kind of novel high speed communication construction and its method based on cluster
CN105183692A (en) Method and system for data communication between cluster system devices
Zhu et al. Blockchain-based consensus study on distributed control systems
CN113778645A (en) Task scheduling method, device and equipment based on edge calculation and storage medium
Wang et al. Towards fast-convergence, low-delay and low-complexity network optimization
Ma et al. Cost-efficient data backup for data center networks against ε-time early warning disaster
CN105512087B (en) Reliability evaluation method of resource-constrained multi-node computing system
Wang et al. The research on electric power control center credit monitoring and management using cloud computing and smart workflow
Luo et al. Implementation of a parallel graph partition algorithm to speed up BSP computing
CN110636091A (en) Data balancing method, device, equipment and storage medium for cloud storage cluster
Cheng et al. Stream-based particle swarm optimization for data migration decision
Wu [Retracted] Virtual Simulation Management of Data Traffic Optimization of Big Data Cloud Platform considering Multipoint Mapping Algorithm
Han et al. Intentional controlled islanding based on dynamic community detection for power grid
Zhao et al. Research on cloud storage technology based on fpga

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant