CN116233129A - Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission - Google Patents

Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission Download PDF

Info

Publication number
CN116233129A
CN116233129A CN202310146378.1A CN202310146378A CN116233129A CN 116233129 A CN116233129 A CN 116233129A CN 202310146378 A CN202310146378 A CN 202310146378A CN 116233129 A CN116233129 A CN 116233129A
Authority
CN
China
Prior art keywords
data
node
spacecraft
orbit
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310146378.1A
Other languages
Chinese (zh)
Inventor
覃润楠
彭晓东
谢文明
惠建江
冯渭春
姜加红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Space Science Center of CAS
Original Assignee
National Space Science Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Space Science Center of CAS filed Critical National Space Science Center of CAS
Priority to CN202310146378.1A priority Critical patent/CN116233129A/en
Publication of CN116233129A publication Critical patent/CN116233129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/22Arrangements for detecting or preventing errors in the information received using redundant apparatus to increase reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q9/00Arrangements in telecontrol or telemetry systems for selectively calling a substation from a main station, in which substation desired apparatus is selected for applying a control signal thereto or for obtaining measured values therefrom

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Cardiology (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-node publishing and subscribing system for safety transmission of spacecraft on-orbit data, which comprises: the data subscription layer, the data distribution layer, the data processing layer and the data caching layer; the data subscribing layer and the data transmitting layer are used for jointly realizing multi-node subscription and release of the on-orbit data based on the distributed cloud service nodes managed by the Zookeeper; the data processing layer is used for processing and analyzing high-concurrency remote data by adopting a priority election algorithm based on heartbeat detection and a multi-node data consistency communication technology; the data caching layer is used for realizing the storage of the on-orbit data of the spacecraft through a distributed cache technology. The method breaks through a plurality of key technologies by means of big data and cloud computing platform technology, ensures data safety, consistency and computing efficiency, and provides technical support for data transmission, management and caching of spacecraft mass in-orbit test.

Description

Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission
Technical Field
The invention relates to a computer communication technology, in particular to a multi-node publishing and subscribing system for safety transmission of spacecraft on-orbit data.
Background
The space environment where the spacecraft is located in the in-orbit operation process is complex and changeable, various environmental effects such as particle radiation, electromagnetic radiation, temperature alternation and the like can have interference influence on the spacecraft, and serious damage can be caused to the spacecraft by faults. How to accurately and intuitively display the on-orbit working condition of the spacecraft by adopting an effective means, discover hidden danger in time to perform fault early warning, rapidly locate and implement a fault plan when the fault occurs, and have important significance for guaranteeing long-term on-orbit stable operation of the spacecraft. Therefore, the development of the comprehensive detection and monitoring display technology of the running state of the spacecraft has urgent space requirements and has higher practical value.
In field application, the monitoring of the on-orbit running state of the spacecraft is not separated from the analysis of the remote measurement parameters and the data transmission images of the spacecraft, and the spacecraft test data becomes the only basis for ground monitoring personnel and expert students to know the running state of the spacecraft. With the rapid development of the aerospace technology, the types of spacecrafts, test frequencies and measurement sites are increased increasingly, and more novel loads are carried and applied, so that the number of the test data of the spacecrafts is increased sharply. In the on-orbit state monitoring process of the spacecraft, how to use a small number of servers to efficiently and reliably receive, manage and statistically analyze massive test data in real time, so that the running condition of the spacecraft is accurately represented, and meanwhile, the deployment space and the expenditure cost of a machine room are reduced, so that the method is a hot spot concerned by ground monitoring personnel.
The advent and development of distributed Cloud Computing (Cloud Computing) has enabled time-consuming processing of massive data to rely on server clusters with limited Computing power, which can be distributed to computers and other devices as needed by sharing software and hardware resources. Hadoop is an open source cloud computing platform widely applied in recent years, and is composed of a core component distributed file system (Hadoop Distributed File System, abbreviated as HDFS), a big data computing engine MapReduce, pig, zooKeeper, HBase, hive and other sub-items, and has the characteristics of universality, high reliability, scalability, dynamic interaction and the like. The introduction of Hadoop of the distributed cloud computing platform definitely brings convenience to the analysis and display of the spacecraft on-orbit test process in the face of continuously-growing massive spacecraft test data.
However, the distributed file system HDFS adopted for data transmission between computing nodes during operation of the Hadoop platform has the advantages of high fault tolerance, batch processing, PB-level large data transmission and the like, but the HDFS cannot meet the requirements of efficient small file storage, multithreading concurrent data read-write and the like, so that a high concurrency data publish-subscribe system needs to be introduced into the cloud computing platform. The asynchronous communication paradigm of the publish-subscribe system enables large-scale distributed communication transport applications: the data sender sends the data content to the logic channels divided according to the topic names, the data receiver freely subscribes to the topics of interest, and synchronously receives all the data sent to the topic channels, and the data sender has the characteristics of multi-thread concurrence, loose coupling and high expansibility. Currently mainstream publish-subscribe systems include topic-based iBus, SCRIBE, CORBA Notification Service, etc. systems, and content-based, channel-based, type-based publish-subscribe model systems.
Some research work has been done on the combination of a cloud computing platform and a publish-subscribe system, for example, a document proposes a collection of publish-subscribe middleware for a cloud platform integrated wireless sensor network (Wireless Sensor Netowrks, abbreviated as WSN), provides abundant resources, services and storage related to WSN driving data for communities, and also proposes a content-based matching algorithm to analyze subscription intention, so that appropriate content can be conveniently published to subscribers. However, in practical application, considering that the massive parallel computing of Hadoop depends on offline data stored on a local disk, the application requirements of real-time online connection and processing analysis display of the on-orbit test data of a spacecraft cannot be met, and the real-time interactive processing of the data is required to be performed in addition without the Hadoop, so that the safe and reliable transmission of the big data is realized. Because of the inconsistent data processing sequence of each cloud computing node in the cloud environment, many classical data security mechanisms such as Paxos data consistency algorithm, raft priority election mechanism, bayer fault tolerant PBFT (Practical Byzantine Fault Tolerance, PBFT for short) and the like are generated at present. However, these algorithms have different degrees of usage limitation due to the difference of the respective characteristics, wherein the Paxos and Raft consensus algorithm does not consider the problem of the bayer node in the system, but the PBFT algorithm has poor expandability, etc.
Therefore, how to transmit mass data of the whole life cycle process of the spacecraft in real time, safely and stably is a main problem faced by the design of the publish-subscribe system in the cloud environment.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a multi-node publishing and subscribing system for the safe transmission of spacecraft on-orbit data.
In order to achieve the above purpose, the present invention provides a multi-node publish-subscribe system for the secure transmission of on-orbit data of a spacecraft, the system comprising: the data subscription layer, the data distribution layer, the data processing layer and the data caching layer; wherein, the liquid crystal display device comprises a liquid crystal display device,
the data subscription layer and the data distribution layer are used for jointly realizing multi-node subscription and release of on-orbit data of the spacecraft based on a distributed cloud service node managed by the Zookeeper;
the data processing layer is used for processing and analyzing high-concurrency remote data by adopting a priority election algorithm based on heartbeat detection and a multi-node data consistency communication technology;
the data caching layer is used for realizing the storage of the on-orbit data of the spacecraft through a distributed quick caching technology.
As an improvement of the system, the multi-node subscription and release framework of the spacecraft on-orbit data comprises a plurality of distributed cloud service nodes, a plurality of publisher nodes and a plurality of subscriber nodes; wherein, the liquid crystal display device comprises a liquid crystal display device,
one of the distributed cloud service nodes is a master control node, and the rest are service agent slave nodes;
the main control node is used for receiving the on-orbit data sent by each publisher node and forwarding the on-orbit data to the service agent slave node;
the service agent slave node is used for storing the received on-orbit data into a memory or a magnetic disk according to the topic blocks;
the publisher node is used for sending the on-orbit data of the spacecraft to the main control node;
the subscriber node is configured to subscribe to and consume the on-orbit data from the node to the service agent through the topic.
As an improvement of the above system, the publisher node includes a structure port that subscribes to spacecraft multisource trial data and a character port that subscribes to configuration parameters;
the publisher node employs a processing mechanism of a main thread and a sending thread, wherein,
the main thread is used for receiving on-orbit data created by a user, and buffering the on-orbit data to the tail part of a double-end queue of the data accumulator after being processed by the interceptor, the serializer and the partitioner;
the sending thread is used for acquiring the on-orbit data from the head of the double-end queue of the data accumulator and sending the acquired on-orbit data to the service agent slave node through the responder.
As an improvement of the system, the subscriber nodes subscribe the on-orbit data from the nodes to the service agent through the theme, each subscriber node has a corresponding subscription group, the publisher node delivers the corresponding on-orbit data to the subscription group subscribing to the theme according to the theme, each subscriber node in the subscription group can repeatedly consume the on-orbit data, and different subscription groups do not affect each other.
As an improvement of the system, the subscriber node adopts a pull mode to read data from the service agent to the consumption offset from the data block of the topic corresponding to the node according to the water line, and stores a consumption offset; wherein the water line is used to identify on-orbit data that has been consumed by a subscriber node; the consumption offset is a displacement value of the subscriber node pulling the next piece of on-orbit data, and the consumption rate is determined by the data quantity and the consumption rate of the service agent slave node.
As an improvement of the above system, the priority election algorithm based on heartbeat detection specifically includes:
prioritizing all service agents from nodes, setting a threshold according to the current cluster scale, dividing the service agents into election areas above the threshold, and dividing the service agents into waiting areas below the threshold; the service agent slave nodes in the election area are qualified to participate in the election of the master control node, and the service agent slave nodes newly added into the cluster or recovered from faults are placed in a waiting area;
when the master control node stops working due to unexpected process interruption or unexpected halt, an election is initiated in an election zone, if more than half service agent slave nodes vote, the service agent slave node with the highest priority is set as a new master control node, and data consistency interpretation among the nodes is carried out;
after the election is finished, the priority of the selected service agent slave node is increased; meanwhile, the threshold value is dynamically adjusted according to actual conditions: when the number of the service agents in the election area exceeds a preset value, the service agent with the lowest priority in the election area is lowered from the node to the waiting area; and when the number of the service agents in the waiting area exceeds a preset value, upgrading the service agent with the highest priority in the waiting area from the node to the election area.
As an improvement to the above system, each of the distributed cloud service nodes includes epoch information and a data offset, wherein,
the epoch information is used for recording the change times of the main control node, the initial value of the epoch information is 0, and the value of the epoch information is added with 1 each time the main control node is changed;
the data offset is recorded and used for synchronizing the offset coordinate when the first piece of data is written under the current epoch information, the service agent requests the master control node for data synchronization from the node at fixed frequency, and when the node state is changed, the data interception or filling operation is carried out on the positions of the data offsets of the respective nodes so as to protect the data integrity of the service agent slave node.
As an improvement of the above system, the data consistency interpretation between the nodes specifically includes:
each service agent slave node independently sends a message to the master control node to request synchronization;
if a certain service agent is restarted from the downtime of the node, performing data interception or repair operation according to the water line of the main control node by comparing self epoch information with the data offset;
if the master control node is down and restarted, for the newly selected master control node, adding 1 to the self epoch information according to the newly written data, and downwards moving the recorded data offset; the original master control node is demoted to the service agent slave node, and corresponding request synchronization processing is executed.
As an improvement of the above system, the distributed cache technology is to combine the distributed node reading and additional writing mechanism, and complete the distributed data cache by designing a check point and performing snapshot backup in the check point; the checkpoints are the current consistency checkpoints of the data that are formed by caching the states of all tasks when they have processed the same input data.
As an improvement of the above system, the distributed node reading and append writing mechanism specifically includes: each data backup daemon thread runs in the distributed cloud service node, and an additional writing mechanism is adopted to write the edit file into the disk of the real-time data, so that the writing efficiency and the safety of the data are ensured;
performing snapshot backup in the check point to complete distributed data cache; the method specifically comprises the following steps:
when the writing operation encounters a check point, a new file is re-established to continue the additional writing process, meanwhile, the daemon thread adds an index into the edit file header, and the unique identification and the backup image file are combined into a source data storage block of the distributed node to form a data file consistent with a data source.
Compared with the prior art, the invention has the advantages that:
1. the method is mainly used for constructing a multi-node publishing-subscribing system frame for managing multi-source heterogeneous data of the spacecraft aiming at the huge pressures faced by the processes of transmission, sharing, processing analysis, storage and the like of massive test data in a plurality of links such as on-orbit fault diagnosis state detection, health state assessment, spacecraft service life prediction and the like of the spacecraft test task process monitoring;
2. the invention breaks through key technologies such as a multi-node publishing and subscribing technology, a priority election technology based on heartbeat detection, multi-node data consistency communication and the like by means of big data and cloud computing platform technology, ensures data safety, consistency and computing efficiency, and provides technical support for mass on-orbit test data transmission, management and caching of a spacecraft;
3. the method is oriented to the processing pressure of massive test data in the processes of transmission, sharing, processing analysis, storage and the like in the spacecraft test task process, and provides support for the real-time monitoring of the full life cycle of the spacecraft test task, the health detection of the operation working condition, and the rapid forecasting and early warning of risks and faults;
4. the development of a large data technology-based publishing and subscribing system and a communication security technology is promoted, and the problems of packet loss, repeated calculation, low system availability and the like of large data transmission in the prior art are solved.
Drawings
FIG. 1 is a technical roadmap of a multi-node publish-subscribe system for the secure transmission of spacecraft-oriented data;
FIG. 2 is a diagram of a multi-node publish and subscribe technology architecture;
FIG. 3 is a publisher pattern design;
FIG. 4 is a subscriber pattern design;
FIG. 5 is a customer offset schematic;
FIG. 6 is a distributed cache storage mode design;
FIG. 7 is a schematic diagram of a multi-node data coherency communication technique;
FIG. 8 is a test hardware environment deployment schematic;
FIG. 9 is a response delay of each publish/subscribe system under different data packages, where FIG. 9 (a) is a Kafka data subscription response delay, FIG. 9 (b) is an ActiveMQ data subscription response delay, FIG. 9 (c) is a RabbitMQ data subscription response delay, FIG. 9 (d) is a RocketMQ data subscription response delay, and FIG. 9 (e) is a BPSS data subscription response delay;
figure 10 is a graph of different publish/subscribe system throughput performance comparisons.
Detailed Description
The invention provides a multi-node publishing and subscribing system for safety transmission of spacecraft on-orbit data, which comprises the following components: the data subscription layer, the data distribution layer, the data processing layer and the data caching layer; wherein, the liquid crystal display device comprises a liquid crystal display device,
the data subscription layer and the data distribution layer are used for jointly realizing multi-node subscription and release of on-orbit data of the spacecraft based on a distributed cloud service node managed by the Zookeeper;
the data processing layer is used for processing and analyzing high-concurrency remote data by adopting a priority election algorithm based on heartbeat detection and a multi-node data consistency communication technology;
the data caching layer is used for realizing the storage of the on-orbit data of the spacecraft through a distributed quick caching technology.
The technical scheme of the invention is described in detail below with reference to the accompanying drawings and examples.
Example 1
The embodiment of the invention provides a multi-node publishing and subscribing system for safety transmission of on-orbit data of a spacecraft, and the technical route is shown in figure 1.
The multi-node publishing and subscribing system comprises a data distributing layer, a data subscribing layer, a data processing layer and a data caching layer, and mainly relates to four key technologies: the method comprises the key technologies of multi-node publishing and subscribing technology, priority election technology based on heartbeat detection, multi-node data consistency communication technology, distributed cache and the like.
The data distribution layer and the subscription layer realize multi-node publishing and subscribing technology together based on a Zookeeper managed distributed cloud service node: the data distribution layer records data offset through two data structures, namely a water line of a timestamp type and a check point, so that the data distribution sequence is ensured; the data subscription layer is used for partitioning consumption according to the data type of the space environment, and guaranteeing the sequence of data processing according to the consumption offset. The data processing layer ensures data safety, consistency and calculation efficiency through a priority election technology based on heartbeat detection and a multi-node data consistency communication technology, and executes high concurrency remote data processing analysis. The data caching layer completes data safe storage through a distributed quick caching technology, and achieves high memory level throughput and strong fault tolerance.
(1) Multi-node publishing and subscribing technology
The main technical architecture of the multi-node publish and subscribe technology is shown in fig. 2.
The multi-node publish and subscribe technology comprises a plurality of publisher nodes, a plurality of subscriber nodes, a main control node and a plurality of service agent nodes, wherein the publisher nodes send data to the main control node, the main control node distributes the service agent nodes to be responsible for storing the received data in a disk, and the subscriber nodes are responsible for subscribing and consuming the data from the service agent nodes.
The design of the publisher node is composed of two threads of a main thread and a sending thread in a coordinated way: creating data in a main thread by a user, and then caching the data to the tail part of a double-end queue of a message accumulator after the action of a possible interceptor, a sequencer and a partitioner so that a sending thread can send the data in batches, thereby reducing the resource consumption of network transmission to improve the performance; the sending thread is responsible for retrieving data from the double-ended queue head of the message accumulator and sending it through the responder into the service proxy node, as shown in fig. 3.
The publisher node provides two data subscription interfaces, namely a structural port for subscribing the spacecraft multisource test data and a character port for subscribing other data such as configuration parameters, for ensuring high flexibility and practicability. Meanwhile, the port type is self-defined by realizing the event type interface, and the generalization performance is improved.
The subscriber node is responsible for subscribing to topics in the service proxy from the nodes and pulling data from subscribed topics. Each subscriber node has a corresponding subscription group to which data is delivered after being published to the topic, and each subscriber in the group can repeatedly consume the data without affecting each other between the two subscription groups, as shown in fig. 4.
The subscriber node uses a pull mode to read data from the service agent node based on the water line and stores a consumption offset as shown in fig. 5. Wherein the water line is used to define the visibility of the data, i.e. to identify which data is consumable by the subscriber node. The consumption offset is a displacement value by which the subscriber node pulls the next piece of data, so that the data between the water line and the consumption offset belongs to the data content that can be pulled.
The specific consumption rate of the subscriber node is mainly determined by the data amount and consumption rate in the service agent node. Meanwhile, the pulling mode can be used for partitioning consumption according to the type of the spatial environment data, the data with fixed receiving frequency can be in a data driving mode, long polling continuous consumption data is maintained, various heterogeneous data offsets are judged by a unified daemon, repeated configuration of multi-source data is avoided, and the pressure of a server is effectively relieved.
(2) Distributed cache technology
The data caching layer of the multi-node publish/subscribe system realizes a distributed cache technology, combines a distributed node reading and additional writing mechanism, and completes the distributed data cache by designing a check point and carrying out snapshot backup in the check point. The whole technical idea is shown in fig. 6.
In order to avoid damage to data during faults, the data flowing to the memory is subjected to data serialization storage and landing through a distributed cache technology, and when data loss occurs, the data content in the memory can be restored through a data recovery mechanism. And running a data backup daemon thread in each distributed cloud service node, and writing the edit file into the disk by adopting an additional writing mechanism to ensure the writing efficiency and the safety of the data.
At the same time, repeated write operations can result in files that are too large to produce useless disk operations for which a checkpoint snapshot backup storage mechanism is employed. The meaning of a checkpoint is: when all tasks have just processed one and the same input data, their state is cached down, constituting the current consistency checkpoint of the data. The check point can avoid the storage of other additional information besides the state, and improves the data caching efficiency. Therefore, when the write operation encounters a check point, a new file is re-established to continue to carry out additional write, meanwhile, the daemon thread adds an index into the edit file header, and finally, the unique identification and the backup image file are combined into a source data storage block of a distributed node, so that a data file consistent with a data source is formed, and the efficiency can be ensured, and meanwhile, the data can be recovered when a system fails or a server is powered off.
(3) Priority election technology based on heartbeat detection
The invention introduces a heartbeat detection mechanism to periodically detect the communication state of each cloud service node in order to ensure the safe data transmission among cloud computing nodes, reduce repeated computation and improve the data processing efficiency.
The current heartbeat detection protocols have various types, such as an acceleration heartbeat protocol, a multi-machine system heartbeat detection mechanism, self-adaptive heartbeat detection and the like, but most of the heartbeat detection protocols have irrecoverability after node abnormality and the problem of recovery delay caused by single-point failure of a main node. The common method for solving the single point failure is a dual-machine fault-tolerant technology, when a main node fails, a backup machine is used for smoothly taking over the work of the failed main node, the flexibility of a distributed system is reduced, and then an election mechanism is generated aiming at the defects of the dual-machine fault-tolerant technology: when a slave node detects a master node fault, an election process requiring other slave nodes to participate is initiated, and a new slave node is elected to take over the master node.
However, in practical engineering application, the more the cloud computing nodes are, the longer the execution time of the election mechanism is, and meanwhile, when the cloud platform performs load balancing processing, new nodes frequently enter a cluster and cause one election again, so that the main nodes are continuously transformed, and the system availability is greatly reduced. In order to improve the problems, the invention designs a priority election algorithm based on heartbeat detection, which dynamically regulates and controls the number of nodes capable of participating in election, and the pseudo codes of the nodes are shown in a table 1.
Table 1 priority election algorithm based on heartbeat detection
Figure BDA0004089357370000081
/>
Figure BDA0004089357370000091
As can be seen from table 1, the priority election algorithm based on heartbeat detection first prioritizes all nodes, sets a threshold according to the current cluster size, and divides the nodes above the threshold into election areas and below the threshold into waiting areas. Only the slave nodes in the election area are qualified to participate in the election of the master node, and the priority of the slave nodes which participate in the election after the election is finished can be further increased, so that the performance of the master node is ensured to be the optimal node in the cluster as much as possible. Meanwhile, the threshold value can be dynamically adjusted according to actual conditions: and when the number of the slave nodes in the election area is too large, the node with the lowest priority in the election area is lowered to the waiting area, and when the number of the slave nodes in the waiting area is too large, the slave node with the highest priority in the waiting area is upgraded to the election area. In addition, the nodes newly added into the cluster or the nodes with fault recovery are directly set as waiting areas, so that the number of times of election when the nodes are added into the cluster is reduced. And after the master node stops working due to unexpected process interruption or unexpected halt, the master node initiates election in an election zone, if more than half slave nodes vote and agree, the slave node with the highest priority takes over the master node to work, and data consistency interpretation among the nodes is carried out.
(4) Multi-node data consistency communication technology
Due to the performance difference among the service computing nodes, the data synchronization degrees of different slave nodes are different, and when the selection mechanism is performed, the phenomenon of inconsistent data after the change of the master node can occur, so that the data is lost. In order to further improve the reliability of the data communication transmission process, the invention designs a multi-node data consistency communication technology in a priority election algorithm based on heartbeat detection, and the pseudo code of the multi-node data consistency communication technology is shown in a table 2.
Table 2 Multi-node data consistency communication technique
Figure BDA0004089357370000101
As can be seen from table 2, the number of changes of the master node is recorded in each node, the initial value of the epoch information is 0, and each time the master node changes once, the value of the epoch information is increased by 1, which corresponds to adding a version number to the master node. Meanwhile, data offset is required to be recorded in each node and is used for synchronizing offset coordinates when the first piece of data is written under the current epoch information. The slave nodes request the data synchronization from the master node at a fixed frequency, and compare the positions of the data offsets of the respective nodes when the node states are changed, so as to perform data truncation or filling operation, thereby protecting the data integrity of each node.
A visual example of the multi-node data consistency communication technique is shown in fig. 7, assuming that a water line of a slave node B is down when it is not yet updated, after the slave node B is restarted, a request needs to be sent to a master node to query the data offset value of the master node a, which is 2 in the above illustration, and the slave node B performs a data truncation operation to adjust the data offset to the previous water line value, so that the piece of data with the displacement value of "1" is deleted from the slave node B. At this time, if the slave node B takes over the job and becomes the master node, the piece of data is lost thoroughly.
To solve this problem, a data synchronization mechanism is added to the multi-node data consistency communication technology: after the cut-off operation is executed, the slave node B can draw the piece of data with the displacement value of 1 from the master node A to be filled by comparing the self epoch information with the data offset. After data synchronization, it is assumed that the master node a is down at this time, and the slave node B is changed to the master node C via the priority election policy, and similarly, after the master node a (also the new slave node D) is restarted, the same logic judgment is performed, so that the data with the displacement value of "1" can be reserved in both nodes.
Finally, when the publisher node writes new data to the slave node B (also the new master node C), the master node epoch information in the cache in which the slave node B is located also needs to be incremented by 1, while the data offset is reset to the water line position "2", and then it is continued to determine whether to perform the data truncating or filling operation. Therefore, the situation of data loss during the state switching of the master node and the slave node is avoided to a great extent through a data consistency communication mechanism.
The innovation points are as follows:
(1) The method and the system for managing the multi-node publishing/subscribing system for the multi-source heterogeneous data of the spacecraft are constructed by means of big data and cloud computing platform technology, and have the advantages of quick response of big data subscription, high throughput stability and the like.
(2) In practical application, as the data is not synchronous when the master node and the slave node are changed due to the performance difference of each cloud node in the cloud computing platform, the current multi-node election mechanism at home and abroad has the limitations of low election efficiency, frequent change of a network topology structure, further reduced system availability and the like.
The technical effects are as follows:
(1) Preparation of experimental environment
The distributed publish-subscribe management system BPSS designed by the invention is subjected to field test in a large spacecraft test simulation system, a hardware deployment diagram of the system is shown in figure 8, and a large-capacity data exchange environment is built by adopting a cloud platform virtual node cluster, a general server cluster, a Langchao distributed cloud storage device, a Galangal kylin operating system and a China general domestic database.
(2) Analysis of experimental results
The method is applied to multi-source data transmission, interaction and display requirements such as measurement and control telemetry, data transmission images, remote control instructions and the like in on-orbit operation monitoring of a spacecraft, and two experiments of big data subscription throughput performance and big data transmission safety performance are designed to jointly verify the actual performance of BPSS.
Firstly, data release is carried out by adopting spacecraft on-orbit transmission data packets with 1M monomers and different magnitudes of 10, 100, 1000 and the like, the data content comprises measurement and control telemetry, data transmission images, control instructions and other interactive data of inter-satellite communication, the data types comprise json, txt, jpeg, xml and other heterogeneous data types, and the data subscription response time delay of BPSS and the current main stream open source release/subscription systems at home and abroad, such as Kafka, activeMQ, rabbitMQ, rocketMQ, is compared.
The time spent by the data subscription after the data is published until the subscription obtains all the data is responded to by the time delay, as shown in formula (1).
T delay =T subscribe -T publish (1)
Wherein T is delay Response time delay for data subscription, T subscribe Representing a data subscription timestamp, T publish A timestamp is issued for the data.
As shown in table 3, the data subscription response time delay and the subscription data amount generally show positive correlation in the data transmission process of different magnitudes and different types, the BPSS data subscription average response time delay is 0.05s, and the data subscription average response time delays of other publish/subscribe systems such as Kafka, activeMQ, rabbitMQ, rocketMQ are 28.64s, 0.07s, 0.14s and 0.11s respectively, compared with BPSS, the BPSS has faster data subscription response rate.
TABLE 3 comparison of response delays for different publish/subscribe systems
Figure BDA0004089357370000121
And then, respectively adopting spacecrafts with monomer sizes of 1M, 10M and 100M to transmit data packets in an on-orbit manner, testing whether different data types such as different data monomer sizes and json, txt, jpeg, xml affect the subscription rate of each publish/subscribe system, wherein the experimental result is shown in fig. 9, wherein fig. 9 (a) is Kafka data subscription response time delay, fig. 9 (b) is ActiveMQ data subscription response time delay, fig. 9 (c) is RabbitMQ data subscription response time delay, fig. 9 (d) is RocketMQ data subscription response time delay, and fig. 9 (e) is BPSS data subscription response time delay. Analysis shows that in various publish/subscribe systems, the data subscription response time delay and the size of a single body of subscription data are in positive correlation. And simultaneously, compared with other data types, the subscription response time delay of each publish/subscribe system for jpeg image type data is longest, but the response speed difference of BPSS for various multi-source heterogeneous data is smallest.
And then analyzing the data throughput of the system, and defining the data throughput capacity calculation of the system as shown in a formula (2).
Figure BDA0004089357370000131
Wherein Num subscribe Representing statistics of the amount of data subscribed to under the condition that the data is continuously published, t represents the time spent subscribing to the data in seconds.
The experiment adopts a spacecraft with 1M monomer to transmit data packets on orbit, the data packets are continuously released for 5 minutes according to the inherent data transmission frequency of 1000 packets/second, the throughput performance of a release subscription system such as BPSS, kafka, activeMQ, rabbitMQ and the like is tested, and the related experiment result is shown in figure 10. It is known that the higher the throughput, the greater the Tps value and the smoother the curve trend, so BPSS has better throughput performance in each publish/subscribe system. Meanwhile, as can be seen by analyzing the table 3, the data subscription response time delay is inversely proportional to the big data throughput performance of the publish/subscribe system, and the longer the data subscription response time delay is, the worse the system throughput performance is.
And finally, adopting a spacecraft with a single 1M in-orbit data package, carrying out self-increasing number on each package of data, then publishing according to the frequency of 1 package/second, respectively counting the missing condition of the number of the subscribed data and the integrity of the data content under different duration of continuous subscription for 1 hour, 6 hours, 12 hours, 24 hours and the like, and testing the actual data safety transmission performance of different publishing/subscribing systems.
The experimental indexes adopt the frame loss rate and the data breakage rate of data transmission, the definition of the indexes is shown in formulas (3) and (4), and the experimental results are shown in table 4.
Figure BDA0004089357370000132
Figure BDA0004089357370000133
Table 4 comparison of transmission data integrity for different publish/subscribe systems
Figure BDA0004089357370000141
The result shows that when the single-day throughput reaches about 80GB, the BPSS data transmission frame loss rate is controlled to be 0.025%, the data breakage rate is controlled to be 0.018%, and the BPSS data transmission frame loss and breakage conditions are basically kept stable in each time of a single day, so that the BPSS data transmission frame loss and breakage control method has the advantage of large data stability transmission.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and are not limiting. Although the present invention has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the appended claims.

Claims (10)

1. A multi-node publish-subscribe system for the secure transmission of spacecraft-oriented data, the system comprising: the data subscription layer, the data distribution layer, the data processing layer and the data caching layer; wherein, the liquid crystal display device comprises a liquid crystal display device,
the data subscription layer and the data distribution layer are used for jointly realizing multi-node subscription and release of on-orbit data of the spacecraft based on a distributed cloud service node managed by the Zookeeper;
the data processing layer is used for processing and analyzing high-concurrency remote data by adopting a priority election algorithm based on heartbeat detection and a multi-node data consistency communication technology;
the data caching layer is used for realizing the storage of the on-orbit data of the spacecraft through a distributed quick caching technology.
2. The multi-node publish-subscribe system for the safe transmission of spacecraft-oriented data according to claim 1, wherein the architecture of multi-node subscription and publication of spacecraft-oriented data comprises a plurality of distributed cloud service nodes, a plurality of publisher nodes and a plurality of subscriber nodes; wherein, the liquid crystal display device comprises a liquid crystal display device,
one of the distributed cloud service nodes is a master control node, and the rest are service agent slave nodes;
the main control node is used for receiving the on-orbit data sent by each publisher node and forwarding the on-orbit data to the service agent slave node;
the service agent slave node is used for storing the received on-orbit data into a memory or a magnetic disk according to the topic blocks;
the publisher node is used for sending the on-orbit data of the spacecraft to the main control node;
the subscriber node is configured to subscribe to and consume the on-orbit data from the node to the service agent through the topic.
3. The multi-node publish-subscribe system for the secure transmission of spacecraft-oriented data according to claim 2, wherein,
the publisher node comprises a structure port for subscribing spacecraft multisource test data and a character port for subscribing configuration parameters;
the publisher node employs a processing mechanism of a main thread and a sending thread, wherein,
the main thread is used for receiving on-orbit data created by a user, and buffering the on-orbit data to the tail part of a double-end queue of the data accumulator after being processed by the interceptor, the serializer and the partitioner;
the sending thread is used for acquiring the on-orbit data from the head of the double-end queue of the data accumulator and sending the acquired on-orbit data to the service agent slave node through the responder.
4. The multi-node publish-subscribe system for safe transmission of spacecraft-oriented data according to claim 2, wherein said subscriber nodes subscribe to the online data from the nodes through the topic to the service agent, each subscriber node has a corresponding subscription group, the publisher node delivers the corresponding online data to the subscription group subscribing to the topic according to the topic, each subscriber node in the subscription group can repeatedly consume the online data, and different subscription groups do not affect each other.
5. The multi-node publish-subscribe system for safe transmission of spacecraft-oriented data according to claim 2, wherein said subscriber node adopts a pull mode to read data from a service agent to a data block of a topic corresponding to the node from a water line to a consumption offset, and stores a consumption offset; wherein the water line is used to identify on-orbit data that has been consumed by a subscriber node; the consumption offset is a displacement value of the subscriber node pulling the next piece of on-orbit data, and the consumption rate is determined by the data quantity and the consumption rate of the service agent slave node.
6. The multi-node publish-subscribe system for the secure transmission of spacecraft-oriented data according to claim 1, wherein the heartbeat detection-based priority election algorithm specifically comprises:
prioritizing all service agents from nodes, setting a threshold according to the current cluster scale, dividing the service agents into election areas above the threshold, and dividing the service agents into waiting areas below the threshold; the service agent slave nodes in the election area are qualified to participate in the election of the master control node, and the service agent slave nodes newly added into the cluster or recovered from faults are placed in a waiting area;
when the master control node stops working due to unexpected process interruption or unexpected halt, an election is initiated in an election zone, if more than half service agent slave nodes vote, the service agent slave node with the highest priority is set as a new master control node, and data consistency interpretation among the nodes is carried out;
after the election is finished, the priority of the selected service agent slave node is increased; meanwhile, the threshold value is dynamically adjusted according to actual conditions: when the number of the service agents in the election area exceeds a preset value, the service agent with the lowest priority in the election area is lowered from the node to the waiting area; and when the number of the service agents in the waiting area exceeds a preset value, upgrading the service agent with the highest priority in the waiting area from the node to the election area.
7. The spacecraft-oriented multi-node publish-subscribe system for secure transmission of data on-orbit of claim 6, wherein each of said distributed cloud service nodes includes epoch information and data offsets, wherein,
the epoch information is used for recording the change times of the main control node, the initial value of the epoch information is 0, and the value of the epoch information is added with 1 each time the main control node is changed;
the data offset is recorded and used for synchronizing the offset coordinate when the first piece of data is written under the current epoch information, the service agent requests the master control node for data synchronization from the node at fixed frequency, and when the node state is changed, the data interception or filling operation is carried out on the positions of the data offsets of the respective nodes so as to protect the data integrity of the service agent slave node.
8. The multi-node publish-subscribe system for the secure transmission of spacecraft-oriented data according to claim 7, wherein the data consistency interpretation between the nodes specifically comprises:
each service agent slave node independently sends a message to the master control node to request synchronization;
if a certain service agent is restarted from the downtime of the node, performing data interception or repair operation according to the water line of the main control node by comparing self epoch information with the data offset;
if the master control node is down and restarted, for the newly selected master control node, adding 1 to the self epoch information according to the newly written data, and downwards moving the recorded data offset; the original master control node is demoted to the service agent slave node, and corresponding request synchronization processing is executed.
9. The multi-node publish-subscribe system for safe transmission of spacecraft-oriented data according to claim 1, wherein said distributed caching technique is to combine distributed node reading and additional writing mechanisms, and complete distributed data caching by designing checkpoints and performing snapshot backup in the checkpoints; the checkpoints are the current consistency checkpoints of the data that are formed by caching the states of all tasks when they have processed the same input data.
10. The spacecraft on-orbit data security transmission oriented multinode publish-subscribe system of claim 9, wherein,
the distributed node reading and additionally writing mechanism specifically comprises: each data backup daemon thread runs in the distributed cloud service node, and an additional writing mechanism is adopted to write the edit file into the disk of the real-time data, so that the writing efficiency and the safety of the data are ensured;
performing snapshot backup in the check point to complete distributed data cache; the method specifically comprises the following steps:
when the writing operation encounters a check point, a new file is re-established to continue the additional writing process, meanwhile, the daemon thread adds an index into the edit file header, and the unique identification and the backup image file are combined into a source data storage block of the distributed node to form a data file consistent with a data source.
CN202310146378.1A 2023-02-08 2023-02-08 Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission Pending CN116233129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310146378.1A CN116233129A (en) 2023-02-08 2023-02-08 Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310146378.1A CN116233129A (en) 2023-02-08 2023-02-08 Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission

Publications (1)

Publication Number Publication Date
CN116233129A true CN116233129A (en) 2023-06-06

Family

ID=86578202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310146378.1A Pending CN116233129A (en) 2023-02-08 2023-02-08 Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission

Country Status (1)

Country Link
CN (1) CN116233129A (en)

Similar Documents

Publication Publication Date Title
CN105959151B (en) A kind of Stream Processing system and method for High Availabitity
EP2159694B1 (en) Method and device for barrier synchronization, and multicore processor
Kamburugamuve et al. Survey of distributed stream processing for large stream sources
CN105357296A (en) Elastic caching system based on Docker cloud platform
CN104735110B (en) Metadata management method and system
CN102088490B (en) Data storage method, device and system
CN103763155A (en) Multi-service heartbeat monitoring method for distributed type cloud storage system
CN102413172B (en) Parallel data sharing method based on cluster technology and apparatus thereof
US10498817B1 (en) Performance tuning in distributed computing systems
CN104965850A (en) Database high-available implementation method based on open source technology
CN106095957B (en) The cross-domain more wave file synchronous method and device of distributed file system
CN102929769A (en) Virtual machine internal-data acquisition method based on agency service
CN112099977A (en) Real-time data analysis engine of distributed tracking system
CN104573428B (en) A kind of method and system for improving server cluster resource availability
CN114064211A (en) Video stream analysis system and method based on end-edge-cloud computing architecture
CN109901948A (en) Shared-nothing database cluster strange land dual-active disaster tolerance system
CN111400086B (en) Method and system for realizing fault tolerance of virtual machine
Ailijiang et al. Efficient distributed coordination at wan-scale
CN116233129A (en) Multi-node publishing and subscribing system for spacecraft on-orbit data safety transmission
CN116723077A (en) Distributed IT automatic operation and maintenance system
CN110647399A (en) High-performance computing system and method based on artificial intelligence network
Chen et al. Big data storage architecture design in cloud computing
CN110069343B (en) Power equipment distributed storage and calculation architecture for complex high concurrency calculation
Vieira et al. Treplica: ubiquitous replication
Zhou et al. Task rescheduling optimization to minimize network resource consumption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination