CN116366660A - Communication management intelligent system and method for distributed parallel simulation calculation - Google Patents

Communication management intelligent system and method for distributed parallel simulation calculation Download PDF

Info

Publication number
CN116366660A
CN116366660A CN202310337188.8A CN202310337188A CN116366660A CN 116366660 A CN116366660 A CN 116366660A CN 202310337188 A CN202310337188 A CN 202310337188A CN 116366660 A CN116366660 A CN 116366660A
Authority
CN
China
Prior art keywords
communication
server
manager
servers
performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310337188.8A
Other languages
Chinese (zh)
Inventor
李荣辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN202310337188.8A priority Critical patent/CN116366660A/en
Publication of CN116366660A publication Critical patent/CN116366660A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1014Server selection for load balancing based on the content of a request
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1023Server selection for load balancing based on a hash applied to IP addresses or costs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1029Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a distributed parallel simulation calculation-oriented communication management intelligent system and a distributed parallel simulation calculation-oriented communication management intelligent method. The method comprises a communication client for providing a communication interface, a communication server for broadcasting or forwarding messages of the client, a communication manager for monitoring, deploying and managing the communication server, and a communication collector for receiving and storing all data passing through the communication server; the invention innovates the communication architecture of some prior distributed parallel simulation computing platforms, selects the communication server with better performance for the communication client to use according to different allocation strategies, dynamically and intelligently manages the communication server, and dynamically adds and deletes the number of the communication servers providing service according to the number of topics requested by the communication client and the communication overhead so as to better utilize the machine performance of the distributed parallel simulation computing platform.

Description

Communication management intelligent system and method for distributed parallel simulation calculation
Technical Field
The invention belongs to the technical field of software application, and particularly relates to a distributed parallel simulation calculation-oriented communication management intelligent system and method.
Background
With the rise of artificial intelligence, the Internet of things and simulation industry, more and more simulation platforms need to integrate computing resources suitable for different computing tasks, and the computing efficiency is improved by adopting a distributed heterogeneous parallel computing architecture. Therefore, a robust and reliable communication architecture and a communication management mechanism are cores for ensuring stable communication and intercommunication of multi-node and multi-service computing tasks facing to the distributed parallel simulation computing platform.
Some communication architectures adopted by the distributed parallel simulation computing platforms in the field at present are divided into a communication server and a communication client. The communication server adopts a process of starting a certain number of processes in advance as an object of external service. Before communicating with the communication server, the communication client needs to bind the corresponding communication server address and port, and when the communication client uses a plurality of message bodies to communicate, the communication client needs to select different communication server addresses and ports to disperse the communication pressure of the communication server.
Disadvantages of this type of method are:
1. the load balancing strategy of the communication architecture of the simulation platform is determined by users of different communication clients, but the users of different communication clients do not know the use conditions of other communication clients, so that the situation that a plurality of communication client users send data to the same communication server, so that the load of one communication server is too high and the other communication servers are in idle state waste is likely to occur;
2. after the communication server of the communication architecture of the simulation platform is used for a certain time, the consumption speed is reduced due to the fact that message accumulation is too high, a user of the communication client is not aware of the consumption speed, and message pushing is reduced due to the fact that the communication server is continuously used;
3. the communication server of the communication architecture of such an emulation platform may not be successfully started or the process in use may die, and the service state of the communication server may not be known by the communication client, so that the communication client may use the communication server to cause the sending message to be unreachable.
Disclosure of Invention
The invention mainly aims to overcome the defects and shortcomings of the prior art and provides a distributed parallel simulation computing-oriented communication management intelligent system and a distributed parallel simulation computing-oriented communication management intelligent method. .
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a distributed parallel simulation calculation-oriented communication management intelligent system, which comprises a communication client, a communication server, a communication manager and a communication collector, wherein the communication client is connected with the communication server;
the communication client is used for providing a communication interface and acquiring details of address and actual connection in bottom layer encapsulation;
the communication server is used for broadcasting or forwarding the information of the client and pushing the information to the data collector for storing and pushing the information;
the communication manager is used for deploying the number of the communication servers, monitoring the states of the communication servers and sending management commands according to the performance use states of the communication servers;
the communication manager is used for receiving a communication client request communication server address port request and distributing a communication server to the communication client for use according to a load balancing strategy;
the communication manager is used for receiving information of communication managers of other nodes of the distributed parallel simulation computing platform and synchronizing address information of all communication servers on the distributed parallel simulation computing platform cluster;
the communication collector is used for receiving all data passing through the communication server and caching the data to a database for storage.
As an preferable technical solution, the communication manager monitors performance states of all communication servers through a heartbeat mechanism between the communication manager and the communication servers, specifically:
the communication server counts the self performance consumption at regular time and reports the self performance consumption to the communication manager;
the communication manager performs intelligent management operation by monitoring the performance state of the communication manager;
if the communication server cannot report the self performance consumption condition within the limited time and times, the communication manager deletes the communication server from the management queue and does not participate in allocation if the communication server is unreachable.
As an optimized technical scheme, the communication manager distributes and distributes reasonable quantity in advance through an artificial intelligence historical communication service characteristic prejudging mechanism model, and specifically comprises the following steps:
collecting historical communication service characteristic data oriented to a distributed parallel simulation computing platform;
and constructing an artificial intelligent historical communication service characteristic prejudging mechanism model, training according to the historical communication service characteristic data, and predicting the reasonable number of the distributed deployment communication servers based on the historical communication service characteristics by utilizing the trained artificial intelligent historical communication service characteristic prejudging mechanism model.
As an preferable technical solution, the allocating a communication server to a communication client according to a load balancing policy specifically includes:
judging whether a target communication client is using a certain communication server according to different communication message bodies of the distributed simulation platform, and distributing the same communication server to ensure that the message is reachable;
distributing one idle communication server in use of the non-communication client according to the number of the communication servers pre-distributed and deployed and the number of communication client users recorded by the communication servers;
distributing one of the communication servers with the lowest load and the optimal performance according to the performance load conditions of all the communication servers of the current local machine;
and according to the current local performance load condition, if the local performance load reaches the upper limit, requesting other machine communication managers of the distributed simulation platform to distribute the communication server with the lowest load and the optimal performance.
As an preferable technical solution, the communication manager receives information of communication managers of other nodes of the distributed parallel simulation computing platform, and is used for synchronizing address information of all communication servers on the distributed parallel simulation computing platform cluster, specifically:
configuring a communication manager to a master mode or a slave mode; the communication manager configured in master mode receives the address information of the communication servers of all the communication managers configured in slave mode on the distributed parallel simulation computing platform cluster at fixed time; the communication manager configured in slave mode pushes all self-maintained local communication server address information to the communication manager configured in master mode in the distributed parallel simulation computing platform cluster at regular time; the communication manager configured in master mode holds the communication server addresses of all nodes of the distributed parallel simulation computing platform cluster; a communication manager configured in slave mode holds all communication server addresses local.
The invention also provides a communication management intelligent method facing to distributed parallel simulation calculation, which comprises the following steps:
the communication manager selects the number of communication servers which are started initially according to the percentage of the upper limit of the machine performance and according to the artificial intelligence historical communication service characteristic prejudging mechanism model, and sends an instruction to start the communication servers with the specified number;
the communication server establishes connection with the communication manager, registers own address and port, and maintains the performance state of the heartbeat mechanism timing push process;
configuring a communication manager to a master mode or a slave mode; the communication manager configured in master mode will receive the address information of the communication server of all the communication managers configured in slave mode at fixed time; the communication manager configured in slave mode pushes all self-maintained address information of the communication servers to the communication manager configured in master mode at regular time;
the communication manager adds the communication server into a management queue and manages according to the performance condition of the communication server;
the communication client requests the communication manager to acquire an available communication server address, and the communication manager distributes the communication server to the communication client for use according to a load balancing strategy;
and collecting machine performance load condition data of a communication manager in communication service characteristics of the distributed parallel simulation computing platform, optimizing an artificial intelligent historical communication service characteristic prejudging mechanism model, and deploying and distributing reasonable distributed communication server quantity in advance for the next communication of the distributed parallel simulation computing platform.
As an preferable technical scheme, the selecting the number of the communication servers started initially according to the artificial intelligence historical communication service characteristic prejudging mechanism model specifically includes:
collecting historical communication service characteristic data oriented to a distributed parallel simulation computing platform;
and constructing an artificial intelligent historical communication service characteristic prejudging mechanism model, training according to the historical communication service characteristic data, and predicting the reasonable number of the distributed deployment communication servers based on the historical communication service characteristics by utilizing the trained artificial intelligent historical communication service characteristic prejudging mechanism model.
As an preferable technical scheme, the communication manager adds the communication servers into a management queue, and manages according to performance conditions, including restarting, cleaning accumulation information, and dynamically adding and deleting the number of the communication servers, specifically:
when the communication server has the condition of message accumulation or process death, the communication manager judges whether to restart or clear accumulated information according to the performance use condition of the communication server, sends a control instruction to the communication server for execution, and then maintains and updates the corresponding information of the management queue of the communication manager;
when all communication servers are in higher load, the communication manager decides whether to start more communication servers according to the machine performance condition, or requires the communication manager of other machines of the distributed parallel simulation computing platform to start more communication servers to provide service.
As a preferable technical solution, the heartbeat maintaining mechanism specifically includes:
the communication server counts the self performance consumption at regular time and reports the self performance consumption to the communication manager;
the communication manager performs intelligent management operation by monitoring the performance state of the communication manager;
if the communication server cannot report the self performance consumption condition within the limited time and times, the communication manager deletes the communication server from the management queue and does not participate in allocation if the communication server is unreachable.
As an preferable technical solution, the communication manager allocates a communication server to a communication client for use according to a load balancing policy, specifically:
judging whether a target communication client is using a certain communication server according to different communication message bodies of the distributed simulation platform, and distributing the same communication server to ensure that the message is reachable;
distributing one idle communication server in use of the non-communication client according to the number of the communication servers pre-distributed and deployed and the number of communication client users recorded by the communication servers;
distributing one of the communication servers with the lowest load and the optimal performance according to the performance load conditions of all the communication servers of the current local machine;
and according to the current local performance load condition, if the local performance load reaches the upper limit, requesting other machine communication managers of the distributed simulation platform to distribute the communication server with the lowest load and the optimal performance.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) The invention innovates the communication architecture of some prior distributed parallel simulation computing platforms, realizes the monitoring and intelligent management of the performance condition of the communication server, and can restart the communication server or inform the communication server to clear accumulated information and other operations if the communication server has too high load or can not normally serve. The operation such as traditional manual monitoring and restarting is replaced, and the monitoring and management with high real-time performance, high reliability, time saving and labor saving are realized.
(2) The invention innovates the communication architecture of some prior distributed parallel simulation computing platforms, realizes the distribution service for providing load balance for the communication client, and selects the communication server with better performance for the communication client to use according to different distribution strategies. The method and the device simplify the development flow of the communication client, reduce the improper operation of the communication client using the communication server, improve the use efficiency of a plurality of communication servers and reduce the probability of overhigh load of a single communication server.
(3) The invention innovates the communication architecture of some prior distributed parallel simulation computing platforms, realizes the dynamic intelligent management of the communication servers, and dynamically increases and deletes the number of the communication servers providing services according to the number of topics requested by the communication clients and the communication cost so as to better utilize the machine performance of the distributed parallel simulation computing platforms.
(4) The invention innovates the communication architecture of some prior distributed parallel simulation computing platforms, realizes the allocation strategy supporting the dynamic addition and deletion of communication servers, meets the requirements of different communication overheads of different topics of the distributed parallel simulation computing platforms, and distributes the communication servers and allocates reasonable quantity in advance by an artificial intelligence historical communication service characteristic prejudgement mechanism model so as to better utilize the performance of the distributed parallel simulation computing platforms.
Drawings
FIG. 1 is a block diagram of a communication management intelligent system for distributed parallel simulation computation according to an embodiment of the present invention;
FIG. 2 is a flow chart of a communication management intelligent system for distributed parallel simulation computation according to an embodiment of the invention;
FIG. 3 is a timing diagram of a communication management intelligent system for distributed parallel simulation computation according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a communication management intelligent system management architecture for distributed parallel simulation computing according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a communication management intelligent system information flow for distributed parallel simulation computation according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a communication management intelligent system deployment for distributed parallel simulation computing according to an embodiment of the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Example 1
As shown in fig. 1, the present embodiment provides a distributed parallel simulation computing oriented communication management intelligent system, which includes a communication Client (Client), a communication server (forwarding), a communication manager (tracker manager), and a data Collector (Collector);
(1) Client communication Client. Aiming at the distributed parallel simulation computing platform, interfaces such as publishing/subscribing, confirmation sending/confirmation receiving and the like are provided, the details of the address and the actual connection are acquired by the bottom package, the upper service is convenient to call, and the communication requirements of the one-to-one, one-to-many, many-to-one and the like distributed parallel simulation computing platform are met.
Further, the meeting of the one-to-one, one-to-many and many-to-one communication requirements is specifically:
the communication between the Client communication Client and the Client communication Client is forwarded by a forwarding communication server according to topic difference. One Client communication Client can be satisfied to send, and one Client communication Client monitors the same topic; one Client communication Client sends the same topic to a plurality of clients; one Client communication Client listens, and a plurality of clients send the same topic.
(2) And the forwarding communication server. And broadcasting or forwarding the message of the Client communication Client, pushing the message to a Collector data Collector to store and push the message, and meeting the requirements of high throughput, high concurrency, information decoupling and the like of a distributed parallel simulation computing platform.
Further, the requirements of high throughput, high concurrency, information decoupling and the like of the distributed parallel simulation computing platform are met specifically as follows:
a plurality of Forwarder communication server processes can be deployed on a plurality of server nodes of the distributed parallel simulation computing platform to provide services, so that a plurality of topics can be forwarded at different Forwarder communication server processes simultaneously, namely high flux and high concurrency are met; the communication between the Client communication Client and the Client communication Client is forwarded by the forwarding communication server according to the topic, and the communication is defined by point to point instead of point, namely message decoupling is satisfied.
(3) Tracker manager communication manager
The tracker manager communication manager realizes intelligent communication management of the distributed parallel simulation computing cluster, monitors and manages the performance state of the Forwarder communication server through a heartbeat mechanism, provides services such as load balancing policy allocation and the like for Client communication Client request communication addresses, introduces artificial intelligence to realize the establishment of a pre-judging mechanism according to the historical communication service characteristics of the distributed parallel simulation computing platform, optimizes the distributed deployment and allocation principle of the Forwarder communication server service, improves the overall performance and efficiency of the distributed parallel simulation computing platform communication architecture, and meets the requirements of high reliability and high expandability of the distributed parallel simulation computing platform communication architecture.
Further, the heartbeat mechanism specifically includes:
the forward communication server counts the self performance consumption at regular time and reports the tracker manager communication manager;
the Forwarder communication manager performs intelligent management operation by monitoring the performance state of the Forwarder communication manager;
if the forward communication server cannot report the self performance consumption condition within the limited time and times, the tracker manager communication manager considers the self performance consumption condition as unreachable, deletes the self performance consumption condition from the management queue, and does not participate in distribution.
(3.1) is responsible for intelligently managing all the Forwarder communication servers, distributing the Forwarder communication servers and distributing reasonable quantity in advance through an artificial intelligent historical communication service characteristic prejudging mechanism model, and sending management commands such as restarting service, restarting connection, cleaning accumulated information operation, intelligent adding and deleting service and the like according to the performance use state of the forwarding communication server, so as to meet the high-reliability and high-scalability requirements of the distributed parallel simulation computing platform.
Further, the artificial intelligence historical communication service characteristic prejudging mechanism model specifically comprises the following steps:
collecting historical communication service characteristic data oriented to a distributed parallel simulation computing platform;
and constructing an artificial intelligent historical communication service characteristic prejudging mechanism model, training according to the historical communication service characteristic data, and predicting the reasonable number of the distributed deployment communication servers based on the historical communication service characteristics by utilizing the trained artificial intelligent historical communication service characteristic prejudging mechanism model.
And (3.2) the method is responsible for receiving the Client communication Client request forwarding communication server address port request, distributing a best available forwarding communication server to the Client communication Client for use according to a load balancing strategy, and meeting the high-reliability requirement of the distributed parallel simulation computing platform.
Further, the load balancing policy specifically includes:
(1) judging whether a target Client communication Client uses a certain communication server according to different communication message bodies of the distributed simulation platform, and distributing the same Forwarder communication server to ensure that the message is reachable;
(2) distributing one of the idle forward communication servers in use of the Client-free communication clients according to the number of the communication servers pre-distributed and deployed and the number of Client communication Client users recorded by the communication servers;
(3) distributing one of the forwarding communication servers with the lowest load and the optimal performance according to the performance load conditions of all the communication servers of the current local machine;
(4) and according to the current local performance load condition, if the local performance load reaches the upper limit, requesting other machine tracker manager communication managers of the distributed simulation platform to distribute the communication server with the lowest load and the optimal performance Forwarder.
(3.3) the tracker manager communication manager configured to be slave mode is responsible for pushing all the addresses of the Forwarder communication server maintained by the local machine to the master mode tracker manager communication manager at regular time according to the self mode decision; the tracker manager communication manager configured in master mode receives the addresses of the Forwarder communication servers pushed by all the tracker manager communication managers configured in slave mode in the distributed parallel simulation computing platform cluster at regular time.
The tracker manager communication manager configured in master mode holds the addresses of the Forwarder communication servers of all nodes of the distributed parallel simulation computing platform cluster; the tracker manager communication manager configured in slave mode holds all Forwarder communication server addresses locally.
(4) Collector communication Collector. All data passing through the forwarding communication server library are received, the data are cached, and the data are combined and pushed to the database for storage according to the time/cache size, so that the efficiency is improved.
1. The communication flow of the communication management intelligent system facing to the distributed parallel simulation calculation is described below with reference to fig. 2, and the system is further described clearly and completely.
The communication flow of the communication management intelligent system facing the distributed parallel simulation calculation comprises the following steps:
after the tracker manager communication manager is initialized, a monitoring cycle is entered, information from the Forwarder communication server and the Client communication Client is monitored, and intelligent management service and load balancing service are respectively provided for the tracker manager communication manager and the Client communication Client.
After the initialization of the forward communication server is completed, the forward communication server registers own information to the tracker manager communication manager, if the registration is retried, and if the heartbeat cycle and the monitoring cycle are successfully entered. The heartbeat cycle is mainly reporting performance and additional information, and the listening cycle is mainly listening to requests and communications from Client communication clients.
After the Client side is initialized, requesting the tracker manager for the address of the Forwarder communication server, if the request is continued, if the connection is successfully requested from the Forwarder communication server, the connection is failed to enter a reconnection program, and the connection is successfully started to publish/subscribe the message.
After the Collector data Collector is initialized, a monitoring cycle is started, information from the forwarding communication server is monitored, the received information is added into a cache, the time interval is reached or the cache size is enough, and the cached data is pushed to a database after being combined.
2. The system sequence of the communication management intelligent system facing to the distributed parallel simulation calculation is described below with reference to fig. 3, and the system is further described clearly and completely.
1) The forwarding communication server as a communication node firstly establishes connection with a tracker manager communication manager and pushes own address and port, and maintains the performance state of a heartbeat timing push process;
2) Binding a subscriber and a publisher of the Forwarder communication server to a specified network port;
3) The Client communication Client A requests an available forwarding communication server as a communication node through a tracker manager communication manager;
4) The Client communication Client A is connected with a subscriber of the upper communication node according to the designated network address and the network port returned by the tracker manager communication manager;
5) After the Client communication Client A establishes communication through the steps, the Client communication Client A sends a data packet to a communication node;
6) The Client communication Client B requests an available forwarding communication server through a tracker manager communication manager;
7) The Client communication Client B is connected with a publisher of the communication node according to the designated network address and the network port returned by the tracker manager communication manager;
8) After receiving the data, the communication server screens clients subscribing the topics according to topic information in the data packet, and the data packet forwards the data to the Client communication Client B;
9) The data packet forwarded by the communication server is forwarded to the Client communication Client B, and the Client communication Client B triggers a registration event according to the message details.
3. The management architecture of the distributed parallel simulation computing-oriented communication management intelligent system is described below with reference to fig. 4, and the system is further described clearly and completely.
The tracker communication manager is responsible for intelligently managing all the Forwarder communication servers in the cluster, monitoring the running state information of the Forwarder communication servers, and generating and storing logs. The tracker performs intelligent management by monitoring the performance condition of the Forwarder communication server and sends instructions to control the behavior of the Forwarder communication server, such as: closing, restarting and cleaning up pile-up information.
The Forwarder communication server is used as a communication node and is a centralized node responsible for mutual communication among a plurality of clients, and a lightweight communication mechanism is adopted among a plurality of application ends such as a virtual environment simulation node, a heterogeneous unmanned simulation node, a multi-physical field simulation node, a deep reinforcement learning calculation node and the like. Each Forwarder communication server operates in an independent process, and the Forwarder communication servers coordinate and cooperate with each other to realize a data interaction function.
4. The information flow of the communication management intelligent system facing to the distributed parallel simulation calculation is described below with reference to fig. 5, and the system is further described clearly and completely.
In the communication process of the distributed parallel simulation computing platform, a tracker communication manager is responsible for intelligently managing all Forwarder communication servers serving as communication nodes in the cluster. Before the Client communication Client connects with the forwarding communication server, the Client communication Client first needs to make a request address with the tracker communication manager. And after the request address is successful, the Client is connected with the forwarding communication server. After the connection is successful, the Client communication Client sends data to the corresponding forwarding communication server through the publishing interface, and the forwarding communication server performs broadcasting or fixed-point forwarding operation according to the communication message body to other clients subscribing the message body. All data flowing through the forwarding communication server is pushed to the communication collector for data caching and storage.
5. The deployment of the communication management intelligent system for distributed parallel simulation calculation is described below with reference to fig. 6, and the system is further described clearly and completely.
The tracker communication manager is deployed on each server node and can be configured in master mode or slave mode. Each tracker communication manager maintains all Forwarder communication servers on the server node as communication nodes, such as: opening, closing, restarting and cleaning up pile-up information. Meanwhile, the number of forwarding communication servers on the server node is maintained, the initial number is determined by the percentage of the upper limit of the machine performance and according to an artificial intelligence historical communication service characteristic prejudging mechanism model, and the number of the subsequent dynamic adding and deleting forwarding communication servers is determined by the number of topics requested by clients and the size of communication overhead.
The tracker communication manager configured in master mode receives the list information of the maintained Forwarder communication servers of all tracker communication managers configured in slave mode in the cluster, but does not directly send a management command to operate the Forwarder communication server on other server nodes. The tracker communication manager configured in master mode holds addresses and ports of all the Forwarder communication servers in the distributed parallel simulation computing platform cluster, and when clients choose to request the Forwarder communication server addresses from the master-tracker communication manager, the Forwarder communication server addresses may be allocated to a plurality of different server nodes in the distributed parallel simulation computing platform cluster.
The tracker communication manager configured in slave mode only maintains the Forwarder communication server list information on the present server node, and directly transmits a management command to operate the Forwarder communication server on the present server node. The tracker communication manager configured in the slave mode holds the addresses and ports of all the Forwarder communication servers on the server node, when the Client communication Client selects to request the Forwarder communication server address from the slave-tracker communication manager, the Forwarder communication server address is preferentially allocated to the server node, when the machine performance of the server node reaches the upper limit, the forwarding request master-tracker communication manager requests the Forwarder communication server address, and when the forwarding request master-tracker communication manager requests the Forwarder communication server address, the forwarding request server address may be allocated to the Forwarder communication server address on other different server nodes in the distributed parallel simulation computing platform cluster.
Example 2
In another embodiment of the present application, there is also provided a distributed parallel simulation computing oriented communication management intelligent method, including the following steps:
s1, a communication manager selects the number of communication servers which are started initially according to the percentage of the upper limit of the machine performance and according to an artificial intelligence historical communication service characteristic prejudging mechanism model, and sends an instruction to start the communication servers with the specified number;
further, the artificial intelligence historical communication service characteristic prejudging mechanism model specifically comprises the following steps:
collecting historical communication service characteristic data oriented to a distributed parallel simulation computing platform;
and constructing an artificial intelligent historical communication service characteristic prejudging mechanism model, training according to the historical communication service characteristic data, and predicting the reasonable number of the distributed deployment communication servers based on the historical communication service characteristics by utilizing the trained artificial intelligent historical communication service characteristic prejudging mechanism model.
S2, the communication server establishes connection with the communication manager, registers own address and port, and maintains the performance state of the heartbeat mechanism timing push process;
further, the heartbeat mechanism specifically includes:
the communication server counts the self performance consumption at regular time and reports the self performance consumption to the communication manager;
the communication manager performs intelligent management operation by monitoring the performance state of the communication manager;
if the communication server cannot report the self performance consumption condition within the limited time and times, the communication manager deletes the communication server from the management queue and does not participate in allocation if the communication server is unreachable.
S3, the communication manager adds the communication servers into a management queue, and performs different intelligent management services according to the performance conditions of the communication servers, including restarting, cleaning accumulation information, dynamically adding and deleting the number of the communication servers, and the like, and specifically comprises the following steps:
1) When all communication servers are in higher load, the communication manager decides whether to start more communication servers according to the performance condition of the machine or requires the communication manager of other machines of the distributed parallel simulation computing platform to start more communication servers to provide service;
2) The communication manager monitors the conditions of message accumulation, process death and the like possibly occurring in the process of providing service for the communication server through a heartbeat mechanism, judges whether to restart or clear the work according to the performance using condition of the communication manager, informs the communication server to execute, and then maintains and updates the corresponding information of the management queue of the communication manager;
s4, the communication client requests the communication manager to acquire the address of the available communication server, and the communication manager distributes the communication server to the communication client for use according to the load balancing strategy.
Further, the load balancing policy specifically includes:
(1) judging whether a target communication client is using a certain communication server according to different communication message bodies of the distributed simulation platform, and distributing the same communication server to ensure that the message is reachable;
(2) distributing one idle communication server in use of the non-communication client according to the number of the communication servers pre-distributed and deployed and the number of communication client users recorded by the communication servers;
(3) distributing one of the communication servers with the lowest load and the optimal performance according to the performance load conditions of all the communication servers of the current local machine;
(4) and according to the current local performance load condition, if the local performance load reaches the upper limit, requesting other machine communication managers of the distributed simulation platform to distribute the communication server with the lowest load and the optimal performance.
S5, the manager decides according to the self mode, and if the manager is a slave mode manager, all communication server addresses maintained by the local are pushed to the master mode manager at regular time; and if the communication server address is the master mode manager, the communication server addresses pushed by all slave mode managers in the distributed parallel simulation computing platform cluster are received regularly.
S6, collecting machine performance load condition data of the communication manager in the communication service characteristics of the distributed parallel simulation computing platform, optimizing an artificial intelligent historical communication service characteristic prejudging mechanism model, and requesting to distribute the communication servers in advance and distributing reasonable quantity for the next communication client.
It should be noted that, the system provided in the foregoing embodiment is only exemplified by the division of the foregoing functional modules, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to perform all or part of the functions described above, and the system is a communication management intelligent method facing distributed parallel simulation computation applied to the foregoing embodiment.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (10)

1. The communication management intelligent system for distributed parallel simulation calculation is characterized by comprising a communication client, a communication server, a communication manager and a communication collector;
the communication client is used for providing a communication interface and acquiring details of address and actual connection in bottom layer encapsulation;
the communication server is used for broadcasting or forwarding the information of the client and pushing the information to the data collector for storing and pushing the information;
the communication manager is used for deploying the number of the communication servers, monitoring the states of the communication servers and sending management commands according to the performance use states of the communication servers;
the communication manager is used for receiving a communication client request communication server address port request and distributing a communication server to the communication client for use according to a load balancing strategy;
the communication manager is used for receiving information of communication managers of other nodes of the distributed parallel simulation computing platform and synchronizing address information of all communication servers on the distributed parallel simulation computing platform cluster;
the communication collector is used for receiving all data passing through the communication server and caching the data to a database for storage.
2. The distributed parallel simulation computing oriented communication management intelligent system according to claim 1, wherein the communication manager monitors performance states of all communication servers through a heartbeat mechanism with the communication servers, specifically:
the communication server counts the self performance consumption at regular time and reports the self performance consumption to the communication manager;
the communication manager manages by monitoring the performance state of the communication server;
if the communication server cannot report the self-performance consumption condition within the limited time and times, the communication manager considers that the self-performance consumption condition cannot be transmitted, deletes the self-performance consumption condition from the management queue and does not participate in distribution.
3. The distributed parallel simulation computing oriented communication management intelligent system according to claim 1, wherein the communication manager distributes and distributes reasonable numbers of communication servers in advance through an artificial intelligence historical communication service characteristic prejudgement mechanism model, specifically:
collecting historical communication service characteristic data oriented to a distributed parallel simulation computing platform;
and constructing an artificial intelligent historical communication service characteristic prejudging mechanism model, training according to the historical communication service characteristic data, and predicting the reasonable number of the distributed deployment communication servers based on the historical communication service characteristics by utilizing the trained artificial intelligent historical communication service characteristic prejudging mechanism model.
4. The distributed parallel simulation computing oriented communication management intelligent system according to claim 1, wherein the distributing the communication server to the communication client according to the load balancing policy is specifically:
judging whether a target communication client is using a certain communication server according to different communication message bodies of the distributed simulation platform, and distributing the same communication server to ensure that the message can be sent;
distributing one idle communication server in use of the non-communication client according to the number of the communication servers pre-distributed and deployed and the number of communication client users recorded by the communication servers;
distributing one of the communication servers with the lowest load and the optimal performance according to the performance load conditions of all the communication servers of the current local machine;
and according to the current local performance load condition, if the local performance load reaches the upper limit, requesting other machine communication managers of the distributed simulation platform to distribute the communication server with the lowest load and the optimal performance.
5. The distributed parallel simulation computing oriented communication management intelligent system according to claim 1, wherein the communication manager receives information of communication managers of other nodes of the distributed parallel simulation computing platform, and is used for synchronizing address information of all communication servers on a distributed parallel simulation computing platform cluster, specifically:
configuring a communication manager to a master mode or a slave mode; the communication manager configured in master mode receives the address information of the communication servers of all the communication managers configured in slave mode on the distributed parallel simulation computing platform cluster at fixed time; the communication manager configured in slave mode pushes all self-maintained local communication server address information to the communication manager configured in master mode in the distributed parallel simulation computing platform cluster at regular time;
the communication manager configured in master mode holds the communication server addresses of all nodes of the distributed parallel simulation computing platform cluster; a communication manager configured in slave mode holds all communication server addresses local.
6. The intelligent communication management method for distributed parallel simulation calculation is characterized by comprising the following steps of:
the communication manager selects the number of communication servers which are started initially according to the percentage of the upper limit of the machine performance and according to the artificial intelligence historical communication service characteristic prejudging mechanism model, and sends an instruction to start the communication servers with the specified number;
the communication server establishes connection with the communication manager, registers own address and port, and maintains the performance state of the heartbeat mechanism timing push process;
configuring a communication manager to a master mode or a slave mode; the communication manager configured in master mode will receive the address information of the communication server of all the communication managers configured in slave mode at fixed time; the communication manager configured in slave mode pushes all self-maintained address information of the communication servers to the communication manager configured in master mode at regular time;
the communication manager adds the communication server into a management queue and manages according to the performance condition of the communication server;
the communication client requests the communication manager to acquire an available communication server address, and the communication manager distributes the communication server to the communication client for use according to a load balancing strategy;
and collecting machine performance load condition data of a communication manager in communication service characteristics of the distributed parallel simulation computing platform, optimizing an artificial intelligent historical communication service characteristic prejudging mechanism model, and deploying and distributing reasonable distributed communication server quantity in advance for the next communication of the distributed parallel simulation computing platform.
7. The intelligent communication management method for distributed parallel simulation computing according to claim 6, wherein the selecting the number of communication servers to be started initially according to the artificial intelligence historical communication service characteristic prejudgement mechanism model is specifically as follows:
collecting historical communication service characteristic data oriented to a distributed parallel simulation computing platform;
and constructing an artificial intelligent historical communication service characteristic prejudging mechanism model, training according to the historical communication service characteristic data, and predicting the reasonable number of the distributed deployment communication servers based on the historical communication service characteristics by utilizing the trained artificial intelligent historical communication service characteristic prejudging mechanism model.
8. The intelligent communication management method for distributed parallel simulation computation according to claim 6, wherein the communication manager adds communication servers into a management queue and manages according to performance conditions, and the method comprises restarting, cleaning up accumulation information, and dynamically adding and deleting the number of the communication servers, specifically comprises the following steps:
when the communication server has the condition of message accumulation or process death, the communication manager judges whether to restart or clear accumulated information according to the performance use condition of the communication server, sends a control instruction to the communication server for execution, and then maintains and updates the corresponding information of the management queue of the communication manager;
when all communication servers are in higher load, the communication manager decides whether to start more communication servers according to the machine performance condition, or requires the communication manager of other machines of the distributed parallel simulation computing platform to start more communication servers to provide service.
9. The intelligent communication management method for distributed parallel simulation computation according to claim 6, wherein the heartbeat maintaining mechanism is specifically:
the communication server counts the self performance consumption at regular time and reports the self performance consumption to the communication manager;
the communication manager manages by monitoring the performance state of the communication server;
if the communication server cannot report the self-performance consumption condition within the limited time and times, the communication manager considers that the self-performance consumption condition cannot be transmitted, deletes the self-performance consumption condition from the management queue and does not participate in distribution.
10. The intelligent communication management method for distributed parallel simulation computing according to claim 6, wherein the communication manager distributes the communication server to the communication client for use according to a load balancing policy, specifically:
judging whether a target communication client is using a certain communication server according to different communication message bodies of the distributed simulation platform, and distributing the same communication server to ensure that the message can be sent;
distributing one idle communication server in use of the non-communication client according to the number of the communication servers pre-distributed and deployed and the number of communication client users recorded by the communication servers;
distributing one of the communication servers with the lowest load and the optimal performance according to the performance load conditions of all the communication servers of the current local machine;
and according to the current local performance load condition, if the local performance load reaches the upper limit, requesting other machine communication managers of the distributed simulation platform to distribute the communication server with the lowest load and the optimal performance.
CN202310337188.8A 2023-03-31 2023-03-31 Communication management intelligent system and method for distributed parallel simulation calculation Pending CN116366660A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310337188.8A CN116366660A (en) 2023-03-31 2023-03-31 Communication management intelligent system and method for distributed parallel simulation calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310337188.8A CN116366660A (en) 2023-03-31 2023-03-31 Communication management intelligent system and method for distributed parallel simulation calculation

Publications (1)

Publication Number Publication Date
CN116366660A true CN116366660A (en) 2023-06-30

Family

ID=86930787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310337188.8A Pending CN116366660A (en) 2023-03-31 2023-03-31 Communication management intelligent system and method for distributed parallel simulation calculation

Country Status (1)

Country Link
CN (1) CN116366660A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484713A (en) * 2015-08-27 2017-03-08 中国石油化工股份有限公司 A kind of based on service-oriented Distributed Request Processing system
CN108322548A (en) * 2018-03-07 2018-07-24 浙江大学 A kind of industrial process data analyzing platform based on cloud computing
KR101916799B1 (en) * 2018-06-18 2018-11-08 주식회사 에프아이티 Apparatus And Method For Big Data Server Load Balancing Control
CN111400036A (en) * 2020-03-05 2020-07-10 张晏铭 Cloud application management system, method, device and medium based on server cluster
CN113407426A (en) * 2021-06-17 2021-09-17 北京字跳网络技术有限公司 Server cluster capacity evaluation method and device, electronic equipment and storage medium
CN113572815A (en) * 2021-06-25 2021-10-29 广州大学 Communication technology method, system and medium for crossing heterogeneous platforms
CN114448983A (en) * 2022-03-09 2022-05-06 南京凌华微电子科技有限公司 ZooKeeper-based distributed data exchange method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484713A (en) * 2015-08-27 2017-03-08 中国石油化工股份有限公司 A kind of based on service-oriented Distributed Request Processing system
CN108322548A (en) * 2018-03-07 2018-07-24 浙江大学 A kind of industrial process data analyzing platform based on cloud computing
KR101916799B1 (en) * 2018-06-18 2018-11-08 주식회사 에프아이티 Apparatus And Method For Big Data Server Load Balancing Control
CN111400036A (en) * 2020-03-05 2020-07-10 张晏铭 Cloud application management system, method, device and medium based on server cluster
CN113407426A (en) * 2021-06-17 2021-09-17 北京字跳网络技术有限公司 Server cluster capacity evaluation method and device, electronic equipment and storage medium
CN113572815A (en) * 2021-06-25 2021-10-29 广州大学 Communication technology method, system and medium for crossing heterogeneous platforms
CN114448983A (en) * 2022-03-09 2022-05-06 南京凌华微电子科技有限公司 ZooKeeper-based distributed data exchange method

Similar Documents

Publication Publication Date Title
CN110191148B (en) Statistical function distributed execution method and system for edge calculation
WO2021190482A1 (en) Computing power processing network system and computing power processing method
CN111615066B (en) Distributed micro-service registration and calling method based on broadcast
JP5557840B2 (en) Distributed database monitoring mechanism
CN113364850B (en) Software-defined cloud-edge collaborative network energy consumption optimization method and system
CN105357296A (en) Elastic caching system based on Docker cloud platform
CN101014002A (en) Cluster message transmitting method and distributed cluster system
CN113852693B (en) Migration method of edge computing service
CN109756474B (en) Service cross-region calling method and device for power dispatching automation system
WO2018121201A1 (en) Distributed cluster service structure, node cooperation method and device, terminal and medium
CN111343237A (en) Server cluster communication method, communication device and computer storage medium
WO2007041899A1 (en) A system and method of managing the dynamic adaptive distributed resource
WO2021043124A1 (en) Kbroker distributed operating system, storage medium, and electronic device
CN110011984B (en) REST and RPC-based distributed cluster system and method
CN111404818A (en) Routing protocol optimization method for general multi-core network processor
WO2013097363A1 (en) Method and system for scheduling data sharing device
CN116366660A (en) Communication management intelligent system and method for distributed parallel simulation calculation
CN102130968A (en) Water resource monitoring communication system and method
CN101989918A (en) Peer-to-peer network management system and method
CN111422078A (en) Electric vehicle charging data allocation monitoring method based on block chain
CN116614517A (en) Container mirror image preheating and distributing method for edge computing scene
CN114615268B (en) Service network, monitoring node, container node and equipment based on Kubernetes cluster
CN115514651A (en) Cloud-edge data transmission path planning method and system based on software-defined stacked network
WO2014036715A1 (en) System and method for controlling real-time resource supply process based on delivery point
CN104301240B (en) Data transmission method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination