CN114844906A - Method for processing multiple data streams and related system - Google Patents

Method for processing multiple data streams and related system Download PDF

Info

Publication number
CN114844906A
CN114844906A CN202110509707.5A CN202110509707A CN114844906A CN 114844906 A CN114844906 A CN 114844906A CN 202110509707 A CN202110509707 A CN 202110509707A CN 114844906 A CN114844906 A CN 114844906A
Authority
CN
China
Prior art keywords
network service
distribution
data
cluster
service instance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110509707.5A
Other languages
Chinese (zh)
Inventor
李海峰
付萌
朱小平
杨永强
黄登辉
郜忠华
苑威
贾正义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to PCT/CN2022/075256 priority Critical patent/WO2022161501A1/en
Publication of CN114844906A publication Critical patent/CN114844906A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/34Signalling channels for network management communication
    • H04L41/342Signalling channels for network management communication between virtual entities, e.g. orchestrators, SDN or NFV entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a method for processing a plurality of data streams, which comprises the following steps: the controller elastically stretches the network service cluster, then sends an updated distribution rule to the distribution entity, the distribution entity indicates the network service instance in the elastically stretched network service cluster to redistribute storage of state data of the multiple data streams according to the updated distribution rule, the distribution entity sends the multiple data streams to the network service instance in the elastically stretched network service cluster according to the updated distribution rule, the network service instance in the elastically stretched network service cluster receives at least one data stream, and the at least one data stream is processed according to the stored state data of the at least one data stream. The method reduces the resource consumption required for expanding the network service cluster.

Description

Method for processing multiple data streams and related system
The present application claims priority of chinese patent application entitled "a network service resilient extension system and method" filed by the chinese intellectual property office at 2021, month 02, and day 01, application No. 202110138515.8, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of computer technology, and in particular, to a method for processing multiple data streams, a data stream processing system, a method for processing multiple data streams, a distribution entity, and a related computer-readable storage medium and a computer program product.
Background
Network services middleware is an important component of today's network infrastructure. Through the network service middleware, network operators may provide additional capabilities for traffic inspection, modification, blocking, redirection, address translation, speed limiting, firewall, intrusion detection, and the like. The traditional network service middleware is dedicated hardware, and with the continuous development of Network Function Virtualization (NFV), the functions of the network service middleware are gradually transferred from dedicated hardware devices to virtual machines or containers (instances) on a general-purpose server.
One important advantage of network function virtualization technology is elastic scalability. In particular, the network function software runs in instances, the number of instances may change dynamically as the load (e.g., data flow requiring instance processing) changes: when the load is large and the processing pressure of the instances is large, the number of the instances needs to be increased; when the load is small, the number of instances needs to be reduced.
However, most network services have state data (e.g., streaming data). Instances need to frequently read or update status data when processing a data stream. Therefore, when the network service cluster is elastically scaled, not only the number of instances needs to be increased or decreased, but also the state data in each instance running the network function needs to be synchronously updated, so that multiple data streams are evenly and correctly distributed to the corresponding instances. How to synchronize state data becomes an urgent problem to be solved with low overhead.
Disclosure of Invention
The application provides a method for processing a plurality of data streams. The method realizes the coordination of data stream distribution and state synchronization by introducing a distribution entity to control the data stream distribution and state synchronization according to a distribution rule. When the network service cluster is elastically stretched, the distribution entity can redistribute the storage of the state data of the plurality of data streams according to the updated distribution rule, ensure that the plurality of data streams can be correctly distributed to the network service instance with the state data, and realize uninterrupted processing. Therefore, the expandability of the network service cluster is improved, the elastic expansion of the network service cluster is supported, and the network service instance does not need to carry out large-scale state data synchronization, thereby avoiding consuming a large amount of resources and reducing the resource consumption of the expanded network service cluster.
In a first aspect, the present application provides a method for processing multiple data streams. The method may be applied to a data stream processing system. The data stream processing system is for processing a plurality of data streams of stateful data. The status data may be flow processing data, such as a five-tuple of the data flow including a source Internet Protocol (IP) address, a destination IP address, a source port, a destination port, and a Protocol type.
The data stream processing system includes a controller, a distribution entity, and a network service cluster. The network service cluster includes at least one network service instance. At least one network service entity in the network service cluster distributively stores state data corresponding to each data flow in a plurality of data flows. Each network service instance can store the state data of part of the data streams in the multiple data streams, and the storage space occupied by the network service instances for storing the full amount of state data of the multiple data streams is reduced.
Specifically, the controller performs elastic scaling on the network service cluster, for example, increasing or decreasing the number of network service instances in the network service cluster, or enhancing or weakening the processing capability of each network service instance (increasing or decreasing the specification of the network service instances, i.e., the number of processors, the memory capacity, and the like), and after the elastic scaling is completed, the controller sends an updated distribution rule to the distribution entity. Correspondingly, the distribution entity instructs the elastically scaled network service instance in the network service cluster to redistribute the storage of the state data of the multiple data streams according to the updated distribution rule, thereby implementing state data synchronization, that is, if one data stream is switched to another network service instance to be processed along with the update of the distribution rule, the state data of the data stream is also synchronized to the other network service instance. Then, the distribution entity sends the multiple data flows to the network service instances in the elastically stretched network service cluster according to the updated distribution rule, the network service instances in the elastically stretched network service cluster receive at least one data flow, and the at least one data flow is processed according to the stored state data of the at least one data flow.
In the method, the distribution entity controls the data stream distribution and the state synchronization according to the distribution rule, thereby realizing the coordination of the data stream distribution and the state synchronization. Even if the number or the processing capacity of the network service instances in the network service cluster is changed, the distribution entity can redistribute the storage of the state data of the plurality of data streams according to the updated distribution rule, and the plurality of data streams can be correctly distributed to the network service instances having the state data, so that the data streams are processed uninterruptedly. The scheme improves the expandability of the network service cluster, supports the elastic expansion of the network service cluster, does not interrupt the network service in the elastic expansion process of the network service cluster, and realizes large-scale, high-concurrency and high-new-built network service. Moreover, the network service instance can synchronize the state data as required without carrying out large-scale state data synchronization, thereby avoiding consuming a large amount of resources, improving the expandability of the network service cluster and reducing the resource consumption of the expanded network service cluster.
In some possible implementation manners, before the controller elastically expands or contracts the network service cluster, the controller may further send an original distribution rule to the distribution entity, and correspondingly, the distribution entity may send the plurality of data streams to the network service instances in the network service cluster according to the original distribution rule.
The original distribution rule may be a distribution rule corresponding to the topology of the network service cluster during initialization, for example, a distribution rule corresponding to the topology of the network service cluster before the controller elastically expands and contracts the network service cluster.
In the method, a distribution entity distributes a plurality of data streams to network service instances having state data thereof according to a distribution rule, for example, the distribution entity may distribute the plurality of data streams to the network service instances having state data thereof according to an original distribution rule before the network service cluster is elastically scaled, or distribute the plurality of data streams to the network service instances having state data thereof according to an updated distribution rule after the network service cluster is elastically scaled. This allows uninterrupted processing of the data stream.
In some possible implementation manners, when the distribution entity sends the multiple data streams to the network service instance in the elastically scaled network service cluster according to the updated distribution rule, the distribution entity may determine to enable the updated distribution rule to distribute the multiple data streams, for example, the updated distribution rule may be enabled to take effect, that is, the distribution entity determines that the used distribution rule is switched from the original distribution rule to the updated distribution rule, and then the distribution entity deletes the original distribution rule. Thus, the data flow distribution conflict caused by the fact that the original distribution rule and the updated distribution rule take effect simultaneously is avoided.
In some possible implementation manners, after the controller elastically stretches the network service cluster, the controller may modify a service range of a service resource of a network service instance in the network service cluster after the elastic stretching. Wherein, the network service instances included in one network service cluster are generally the same type of network service instances. For example, the network service instances of one network service cluster may all be resilient load balancing gateways, or all be network address translation gateways, etc. The network service clusters in which the network service instances for providing different network services are located may be different network service clusters. The controller can avoid the service resource conflict among different network service clusters by modifying the service resource use range of the network service instances in the network service clusters after elastic expansion and contraction.
In some possible implementations, the traffic resource may be at least one of an IP address and a port number. Therefore, the method and the device can avoid the situation that the data stream cannot be processed normally due to the allocation conflict of the IP address or the port number, and further cause the service interruption, and improve the service experience of the network service.
In some possible implementations, the distribution entity may specifically send the import data stream to a network service instance in the elastically scaled network service cluster according to the updated distribution rule. Wherein the import data stream may be a data stream flowing into a data stream processing system, such as a request data stream; similarly, the egress data stream may be a data stream that flows out of the data stream processing system, such as a response data stream.
After the network service instance in the network service cluster after elastic expansion receives at least one data stream, and processes the at least one data stream according to the stored state data of the at least one data stream, a distribution entity may receive an egress data stream corresponding to the ingress data stream, and send the egress data stream to the network service instance according to the updated distribution rule.
Therefore, the distribution of the import data stream and the export data stream corresponding to the import data stream can be realized, the requirement of network service is met, and the method has high applicability.
In some possible implementation manners, before the network service cluster is elastically scaled, the distribution entity may send the import data stream to a network service instance in the network service cluster after elastic scaling according to an original distribution rule. When a network service instance in a network service cluster receives at least one data stream, and processes the at least one data stream according to the stored state data of the at least one data stream, a distribution entity may receive an outlet data stream corresponding to the inlet data stream, and send the outlet data stream to the network service instance according to the original distribution rule. Therefore, the distribution of the inlet data flow and the outlet data flow before the elastic expansion and contraction of the network service cluster can be realized.
In some possible implementations, the network service cluster includes at least one service pair, and each service pair includes two network service instances operating in a master-master mode or a master-slave mode. The master-master mode refers to that each network service instance (for example, the first network service instance and the second network service instance) in the service pair has an independent service resource and independently provides the network service. The active/standby mode means that each network service instance (for example, a first network service instance and a second network service instance) in a service pair has independent service resources, only one network service instance (for example, the first network service instance) works at the same time, and the other network service instance (for example, the second network service instance) is used as a backup to receive state data generated by the network service instance in a working state.
The updated distribution rules include distribution tables and forwarding tables. Wherein the distribution table is used for looking up the service pair, and the forwarding table is used for determining a single or a plurality of network service instances in the service pair. Specifically, the distribution entity may determine a service pair corresponding to a data flow in the elastically stretched network service cluster according to the distribution table, then determine a target network service instance in the service pair according to the forwarding table, and then send the data flow to the target network service instance.
The method rapidly determines the target network service instance by utilizing a secondary table look-up mode and distributes the data stream to the target network service instance, thereby realizing the high-efficiency and uninterrupted processing of the data stream and improving the service efficiency.
In some possible implementations, the at least one network service instance includes any one of a resilient load balancing gateway, a network address translation gateway, a network intrusion detection gateway, a traffic flushing gateway, a virtual private network gateway, a multicast gateway, and an anycast gateway. Therefore, the data streams of different network services can be processed, and the requirements of different network services are met.
In a second aspect, the present application provides a data stream processing system. The system comprises a controller, a distribution entity and a network service cluster, wherein the network service cluster comprises at least one network service instance, and each network service instance stores state data of partial data streams;
the controller is used for elastically stretching the network service cluster and sending the updated distribution rule to the distribution entity;
the distribution entity is configured to instruct, according to the updated distribution rule, the network service instance in the network service cluster after elastic expansion and contraction to redistribute storage of state data of the multiple data flows, and send the multiple data flows to the network service instance in the network service cluster after elastic expansion and contraction according to the updated distribution rule;
and the network service instance in the network service cluster after elastic expansion is used for receiving at least one data flow and processing the at least one data flow according to the stored state data of the at least one data flow.
In some possible implementations, the controller is further configured to:
sending original distribution rules to the distribution entity;
the distribution entity is further configured to:
and sending the plurality of data streams to the network service instances in the network service cluster according to the original distribution rule.
In some possible implementations, the distribution entity is further configured to determine to enable the updated distribution rule to distribute the plurality of data streams, and delete the original distribution rule.
In some possible implementations, the controller is further configured to modify a usage scope of traffic resources of the elastically scaled network service instances in the network service cluster.
In some possible implementations, the distribution entity is to:
sending an import data stream to a network service instance in the network service cluster after elastic expansion according to the updated distribution rule;
and receiving an outlet data flow corresponding to the inlet data flow, and sending the outlet data flow to the network service instance according to the updated distribution rule.
In some possible implementations, the network service cluster includes at least one service pair, each service pair includes two network service instances operating in a master-master mode or a master-slave mode, and the updated distribution rule includes a distribution table and a forwarding table;
the distribution entity is configured to determine a service pair corresponding to a data stream in the elastically stretched network service cluster according to the distribution table; determining a target network service instance in the service pair according to the forwarding table; and sending the data stream to the target network service instance.
In a third aspect, the present application provides a method of processing a plurality of data streams. The method may particularly be performed by a distribution entity. The distribution entity may be software, which is deployed on a general-purpose device such as a server, or may be hardware, for example, a server with a data stream distribution function.
Specifically, the distribution entity receives an updated distribution rule sent by the controller, and then instructs the network service instance in the elastically scaled network service cluster to redistribute the storage of the state data of the plurality of data streams according to the updated distribution rule. And then the distribution entity sends the data streams to the network service instances in the network service cluster after elastic expansion and contraction according to the updated distribution rule.
In some possible implementation manners, before the controller elastically expands or contracts the network service cluster, the distribution entity may receive an original distribution rule sent by the controller to the distribution entity, and send the multiple data streams to the network service instances in the network service cluster according to the original distribution rule, so that the network service instances can synchronize state data based on the distribution rule and process the data streams based on the synchronized state data.
In some possible implementations, the distribution entity may delete the original distribution rule when determining to enable the updated distribution rule to distribute the plurality of data streams.
In some possible implementation manners, the distribution entity sends an ingress data stream to a network service instance in the elastically-stretched network service cluster according to the updated distribution rule, the network service instance in the elastically-stretched network service cluster receives at least one data stream, after processing the at least one data stream according to the stored state data of the at least one data stream, the distribution entity receives an egress data stream corresponding to the ingress data stream, and sends the egress data stream to the network service instance according to the updated distribution rule.
In a fourth aspect, the present application provides a distribution entity. The distribution entity specifically comprises:
a communication unit for receiving the updated distribution rule transmitted by the controller;
a distribution unit, configured to instruct, according to the updated distribution rule, a network service instance in the network service cluster after elastic scaling to redistribute storage of state data of the multiple data flows;
the distribution unit is further configured to invoke the communication unit to send the multiple data streams to the network service instances in the elastically stretched network service cluster according to the updated distribution rule.
In some possible implementations, before the controller elastically scales the network service cluster, the communication unit of the distribution entity may receive that the controller sends the original distribution rule to the distribution entity. The distribution unit is specifically configured to invoke the communication unit to send the multiple data streams to the network service instances in the network service cluster according to the original distribution rule, so that the network service instances can synchronize state data based on the distribution rule, and process the data streams based on the synchronized state data.
In some possible implementations, the distribution entity may further include a rule management unit. The rule management unit may be configured to store the distribution rule, for example, to store an initial distribution rule or an updated distribution rule. The rule management unit is further configured to delete the original distribution rule upon determining that the updated distribution rule is enabled to distribute the plurality of data streams.
In some possible implementation manners, the distribution unit is specifically configured to invoke the communication unit to send the import data stream to a network service instance in the elastically-scaled network service cluster according to the updated distribution rule. The network service instance in the network service cluster after elastic expansion receives at least one data stream, and after the at least one data stream is processed according to the stored state data of the at least one data stream, the communication unit is further configured to receive an outlet data stream corresponding to the inlet data stream, and the distribution unit is further configured to invoke the communication unit to send the outlet data stream to the network service instance according to the updated distribution rule.
In a fifth aspect, the present application provides a computing device. The computing device comprises a processor and a memory, the memory having stored therein computer-readable instructions, the processor executing the computer-readable instructions to perform the steps performed by the distribution entity in the method according to the first or third aspect of the application. For example, the computing device may be a linux server or a white board switch.
In a sixth aspect, the present application provides a cluster of computing devices. The cluster of computing devices includes at least one computing device. The computing device comprises a processor and a memory, the memory having stored therein computer readable instructions, the processor executing the computer readable instructions to perform the steps performed by the distribution entity in the method according to the first aspect of the application.
In a seventh aspect, the present application provides a computer-readable storage medium. The cluster of computing devices comprises computer readable instructions which, when run on a computer, cause the computer to perform the steps as performed by the distribution entity in the method of the first or third aspect of the application.
In an eighth aspect, the present application provides a computer program product. The product comprises computer readable instructions which, when run on a computer, cause the computer to perform the steps performed by the distribution entity in the method according to any of the first or third aspects of the present application.
The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.
Drawings
In order to more clearly illustrate the technical method of the embodiments of the present application, the drawings used in the embodiments will be briefly described below.
Fig. 1 is a system architecture diagram of a data stream processing system according to an embodiment of the present application;
fig. 2 is a system architecture diagram of a data stream processing system according to an embodiment of the present application;
fig. 3 is a system architecture diagram of a data stream processing system according to an embodiment of the present application;
fig. 4 is a flowchart of a method for processing multiple data streams according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a method for processing multiple data streams according to an embodiment of the present application;
fig. 6 is a schematic flowchart of a method for processing multiple data streams according to an embodiment of the present disclosure;
fig. 7 is a schematic view of an application scenario of a method for processing multiple data streams according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a computing device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a computing device cluster according to an embodiment of the present application.
Detailed Description
The terms "first", "second" in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or as implying any indication of the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
Some technical terms referred to in the embodiments of the present application will be first described.
Examples are computing units with various types of resources that can run software modules to support specific functions. Common examples include virtual machines or containers. Instances typically run on a computer, such as a server.
Network Services (NS), also referred to as Network Functions (NFs), specifically refer to additional capabilities provided for data flows in a network. The additional capabilities may include traffic checking, modification, blocking, redirection, address translation, speed limiting, firewall, intrusion detection. Conventional network services are typically provided by dedicated hardware devices.
Network Function Virtualization (NFV) specifically refers to a virtualization technology that divides a network service function into a plurality of functional blocks, which are respectively operated in a software manner and are not limited to dedicated hardware devices (e.g., a customized switch or a customized router). Based on this, the network service can be gradually transferred from the dedicated hardware device to the general hardware device, for example, to an instance on the server. A computer program for implementing a network service or network function may be deployed in an instance to implement a corresponding network service or network function. An instance implementing a network function may be referred to as a network service instance.
Network services can be divided into two types, stateful and stateless. Specifically, the basis for determining whether a network service is a stateful service or a stateless service is whether two data packets from the same data flow have a context relationship in the network service. If the context relationship is provided, the network service is indicated as the stateful service. If the context relation is not available, the network service is indicated to be a stateless service.
The state data includes stream processing data for the data stream. The stream processing data includes data generated by the network service processing the data stream based on the resource data. The streaming data generated by different network services may be different. In some possible implementations, the flow processing data may include a five-tuple of the data flow including a source Internet Protocol (IP) address, a destination IP address, a source port, a destination port, and a Protocol type.
Since the performance of a single network service instance often cannot provide enough processing power to process a data stream, for various network services, a plurality of network service instances supporting the same network service are generally required to be deployed in a network, and the plurality of network service instances form a network service cluster. When a network service cluster is elastically scaled (elastic scaling), not only the number of instances needs to be increased or decreased, but also the state data in each network service instance needs to be synchronously updated, so that any data stream can be normally processed under the condition that the data stream is switched to a new network service instance to be processed.
In view of this, the present application provides a data stream processing system. The data stream processing system is used for processing a plurality of data streams (for example, a plurality of data streams with state data). The data stream processing system includes a controller, a distribution entity, and a network service cluster. The network service cluster includes at least one network service instance. At least one network service instance of the network service cluster distributively stores state data for each of a plurality of data streams. Wherein each network service instance may store state data for a portion of the data stream.
Specifically, the controller is configured to perform elastic scaling on the network service cluster, for example, change the number of network service instances in the network service cluster, so as to perform horizontal elastic scaling, or change the processing capability of the network service instances in the network service cluster, so as to perform vertical elastic scaling. The controller is further configured to send the updated distribution rules to the distribution entity. The distribution entity is used for indicating the storage of the state data of the plurality of data streams redistributed by the network service instance in the network service cluster after elastic expansion according to the updated distribution rule, then sending the plurality of data streams to the network service instance in the network service cluster after elastic expansion according to the updated distribution rule, the network service instance in the network service cluster after elastic expansion is used for receiving at least one data stream, and processing the at least one data stream according to the state data of the at least one data stream.
The system controls data stream distribution and state synchronization according to a distribution rule by introducing a distribution entity, thereby realizing the coordination of the data stream distribution and the state synchronization. Based on the method, even if the number or the processing capacity of the network service instances in the network service cluster is changed, the distribution entity can redistribute the storage of the state data of the plurality of data streams according to the updated distribution rule, and the plurality of data streams can be correctly distributed to the network service instances having the state data thereof, so that uninterrupted processing of the data streams is realized. The scheme improves the expandability of the network service cluster, supports the elastic expansion of the network service cluster, does not interrupt the network service in the elastic expansion process of the network service cluster, and realizes large-scale, high-concurrency and high-new-built network service.
Further, the network service instances in the network service cluster provided by the application can synchronize the state data as required, and do not need to perform full state data synchronization, thereby avoiding consuming a large amount of resources, improving the expandability of the network service cluster, and reducing the resource consumption of the expanded network service cluster.
In order to make the technical solution of the present application clearer and more comprehensible, a system architecture of a data stream processing system is described below.
Referring to the system architecture diagram of a data stream processing system shown in fig. 1, as shown in fig. 1, a data stream processing system 100 includes a controller 10, a distribution entity 20, and a web service cluster 300, wherein the web service cluster 300 includes at least one web service instance 30. At least one network service instance 30 in the network service cluster 300 distributively stores state data for a plurality of data streams. Where each network service instance 30 may store state data for a portion of the data stream. The controller 10 establishes communication with each distribution entity 20, respectively, each network service instance 30 of the network service cluster 300. At least one distribution entity 20 establishes communication with each of the at least one network service instance 30, respectively.
The controller 10 is used to elastically scale the network service cluster 300. For example, the controller 10 may increase or decrease the number of network service instances 30 in the network service cluster 300 to achieve horizontal elastic scalability; for another example, the controller 10 may increase or decrease the processing capacity of the network service instances 30 in the network service cluster 300 to achieve vertical elastic scaling. The controller 10 is also arranged to send updated distribution rules to the distribution entity 20.
The distribution entity 20 is configured to receive an updated distribution rule issued by the controller 10, and instruct, according to the updated distribution rule, the network service instance in the network service cluster 300 after elastic expansion and contraction to redistribute storage of state data of a plurality of data streams. For example, if the network service cluster 300 includes 2 network service instances 30 before elastic scaling and 4 network service instances 30 after elastic scaling, the distribution entity 10 may rearrange the state data stored in the 2 network service instances 30 according to the updated distribution rule, so that the state data is stored in the 4 network service instances 30 after elastic scaling in a distributed manner. The state data stored in a distributed manner in the plurality of network service instances 30 means that each network service instance 30 stores the state data of a part of the data streams in the state data of the plurality of data streams.
The distribution entity 20 is further configured to send a plurality of data flows to the network service instances 30 in the elastically scaled network service cluster 300 according to the updated distribution rule. It should be noted that, because the distribution entities 20 are generally stateless, the flexible scaling of the distribution entities 20 can be directly realized by increasing or decreasing the number of the distribution entities 20, so as to avoid that the distribution entities 20 are overloaded to affect the normal operation of the service or the distribution entities 20 are idle to cause resource waste.
For the stateful network service instance 30, for example, the network service instance 30 may provide stateful services such as a Source Network Address Translation (SNAT) service, a load balancing service, and a traffic intrusion detection service to the user.
The web service instances 30 in the elastically scaled web service cluster 300 are used to process data flows based on synchronized state data. Specifically, the network service instance 30 in the elastically-scaled network service cluster 300 may receive at least one data flow of the plurality of data flows and process the at least one data flow according to the stored state data of the at least one data flow.
The system 100 realizes the coordination of data stream distribution and state data synchronization by introducing the distribution entity 20 to control the data stream distribution and the state data synchronization according to the distribution rule. Even if the network service cluster 300 is elastically stretched, the distribution entity 20 can synchronize the state data according to the updated distribution rule, so as to achieve state data rebalancing, thereby ensuring that the data stream can be correctly distributed to the network service instance 30 having the corresponding state data, and avoiding interruption of the network service. And the state data is synchronized according to needs in the redistribution process, so that the resource overhead of synchronizing the state data is reduced, and the elastic expansion efficiency of the network service instance is improved.
Next, the structure and interactive process of the key components such as the controller 10, the distribution entity 20, and the network service instance 30 in the data stream processing system 100 will be described in detail with reference to fig. 1.
In the embodiment shown in fig. 1, the controller 10 includes a communication unit 102 and a management unit 104. Wherein the communication unit 102 is configured to send the distribution rule to the distribution entity 20. Wherein the distribution rules may comprise original distribution rules or updated distribution rules. In this embodiment, before the network service cluster 300 elastically expands and contracts, the communication unit 102 is configured to send an original distribution rule to the distribution entity 20; after the network service cluster 300 is elastically scaled, the communication unit 102 is configured to send the updated distribution rule. The management unit 104 is used for managing the network service cluster 300. In particular, the management unit 104 may create a network service instance 30 or delete a network service instance 30 in the network service cluster 300. Whereby the management unit 104 can elastically scale the network service cluster 300.
The elastic expansion comprises transverse elastic expansion and longitudinal elastic expansion. Horizontal elastic scaling specifically changes the number of network service instances 30 in a network service cluster 300. The management unit 104 can create a web service instance 30 to increase the number of web service instances 30 in the web service cluster 300 or delete a web service instance 30 to decrease the number of web service instances 30 in the web service cluster 300. The vertical elastic scaling is specifically to change the processing capability of the network service instance 30 in the network service cluster 300, for example, the processing capability of the network service instance 30 can be extended from two virtual central processing units (2 vCPUs) to 4 vCPUs. The management unit 104 may change the processing power of the network service instance 30 by switching virtual machines or containers.
The distribution entity 20 comprises a communication unit 202 and a distribution unit 204. The communication unit 202 is configured to receive a distribution rule issued by the controller 10. For example, the communication unit 202 may receive the original distribution rule before the web service cluster 300 is elastically scaled, and the communication unit 202 may receive the updated distribution rule after the web service cluster 300 is elastically scaled. The distribution unit 204 is configured to distribute the data stream according to a distribution rule. For example, before the web service cluster 300 is elastically scaled, the distribution unit 204 may invoke the communication unit 202 to send a plurality of data streams to the web service instances 30 in the web service cluster 300 according to the original distribution rule. The distribution unit 204 may determine a target network service instance corresponding to the data stream according to the original distribution rule, and send the data stream to the target network service instance through the communication unit 202. For another example, after the web service cluster is elastically scaled, the distribution unit 204 may invoke the communication unit 202 to distribute the plurality of data streams to the web service instances 30 in the elastically scaled web service cluster 300 according to the updated distribution rule.
It should be noted that, after the network service is elastically scaled, the distribution unit 204 is further configured to instruct the network service instance 30 in the elastically scaled network service cluster 300 to redistribute the storage of the state data of the multiple data flows according to the updated distribution rule. In this manner, the web service instances 30 in the elastically scaled web service cluster 300 can process data flows based on the redistributed state data.
The distribution unit 204 may include a data stream distribution subunit and a status data management subunit, and the controller 10 may issue the updated distribution rule to the status data management subunit in the distribution unit 204. For example, the controller 10 may issue a status data rebalancing command generated based on the updated distribution rule to the status data management subunit. Wherein the updated distribution rules may be invisible to the data stream distribution subunit and visible to the state data management subunit. In this way, the status data management subunit may trigger status data synchronization based on the updated distribution rules. The data stream distribution subunit still performs data stream distribution based on the original distribution rule. When the state data synchronization is completed, the state data management subunit or the distribution unit 204 may send the updated distribution rule to the data stream distribution subunit, and accordingly, the data stream distribution subunit may perform data stream distribution according to the updated distribution rule.
The updated distribution rules may also include a scope field. When the scope field takes a value of a first value, for example, 1 or true, the distribution rule representing the update is used for state data synchronization and is not used for data stream distribution; and when the scope field takes a second value, for example 0 or false, the distribution rule representing the update is used for state data synchronization and data stream distribution. In this manner, the distribution entity 20 may issue a state data rebalancing command generated by the updated distribution rule to the network service instance 30, so that the network service instance 30 performs state data synchronization based on the state data rebalancing command generated by the updated distribution rule and performs data stream distribution based on the original distribution rule. When the state data synchronization is completed, the value of the scope field of the updated distribution rule may be modified, so that the distribution entity 20 performs data stream distribution according to the updated distribution rule.
In some possible implementations, the distribution entity 20 further comprises a rule management unit 206. The rule management unit 206 is configured to store distribution rules, for example, original distribution rules or updated distribution rules. It should be noted that, after the network service cluster 300 is elastically scaled, the distribution unit 204 determines to enable the updated distribution rule to distribute the multiple data streams, and the rule management unit 206 may also delete the original distribution rule. Thus, the data flow distribution conflict caused by the fact that the original distribution rule and the updated distribution rule take effect simultaneously is avoided.
Network service instance 30 includes a communication unit 302 and a service unit 304. The communication unit 302 is configured to receive at least one data stream, for example, at least one data stream of a plurality of data streams. The service unit 304 is configured to process the at least one data flow according to the stored state data of the at least one data flow.
Also included in the network service instance 30 is a state management unit 306. The status management unit 306 stores status data. After the network service cluster 300 is elastically scaled, the state management unit 306 may also synchronize the state data. Specifically, the communication unit 302 in the network service instance 30 in the elastically scaled network service cluster 300 may receive an indication of the distribution entity 20, which may be, for example, a status data rebalancing command generated by the updated distribution rule, and the status management unit 306 in the network service instance 30 may synchronize the status data to other network service instances 30 as needed according to the status data rebalancing command. The status data rebalancing command may carry an identifier of the network service instance and an identifier of the status data, and the status management unit 306 may obtain the status data corresponding to the identifier of the status data from the corresponding network service instance 30 according to the identifier of the network service instance.
The foregoing is an illustration of data stream processing system 100 including a network service cluster 300. In some possible implementations, data stream processing system 100 may also include multiple network service clusters 300. Multiple network service clusters 300 may be used to implement different network services. For example, network service instances 30 of one network service cluster 300 are used to implement SNAT and network service instances 30 of another network service cluster 300 are used to implement load balancing. Network service instances 30 in different network service clusters 300 may make service calls to form a network service chain.
In some possible implementations, to avoid resource conflicts among different network service clusters 300, the management unit 104 in the controller 10 may also be used to manage configuration information of the network service instance 30. Specifically, the management unit 104 may divide the network service instances 30 having the same network service into one service group, for example, the network service instances 30 in the same network service cluster 300 may be divided into one service group, and the management unit 104 may configure the usage scope of the service resources of the network service instances 30 in the network service cluster 300, so that the scope of the service resources used by different service groups is isolated from each other. The service resource may include resources such as an IP address or a port number. After the controller 10 elastically scales the network service cluster 300, the controller 10 may further modify the service resource usage range of the network service instance 30 in the network service cluster 300 after elastic scaling.
For ease of understanding, this application also provides a specific example for illustration. In this example, the network service instance 30 provides SNAT network services. The traffic resources used by the network service cluster 300 include User Datagram Protocol (UDP) and Transmission Control Protocol (TCP) port numbers, and the range of the port numbers specifically includes 1 to 6000, the management unit 104 may set configuration information of the network service instance 30, for example, configure the range of the port numbers used by the network service instance 30 in the service group 1 to 3000 (excluding the right endpoint), and configure the range of the port numbers used by the network service instance 30 in the service group 2 to 3000 to 6000. Thus, the service resources of the two service groups are not overlapped and independent. Correspondingly, in the case that the network service instances 30 are divided into service groups, the distribution rules (e.g. original distribution rules, updated distribution rules) in the distribution entity 20 may also include the usage scope of the service resource, so as to distribute the data stream using the specific service resource or the status data of the data stream to the correct network service instance 30.
Fig. 1 is illustrated with a distribution entity 20 and a network service instance 30 deployed independently. In some possible implementations, distribution entity 20 and network service instance 30 may also be deployed embedded. For example, distribution entity 20 may also be embodied in a software module deployed in network service instance 30.
Referring to a schematic structural diagram of a data stream processing system 100 shown in fig. 2, the data stream processing system 100 includes a controller 10, a distribution entity 20 and a network service cluster 300. The network service cluster 300 includes at least one network service instance 30. Where distribution entities 20 are embedded in network service instances 30, for example, one distribution entity 20 may be embedded in one network service instance 30 as a unit of a network service instance 30.
The components and functions of the controller 10, the distribution entity 20, and the network service instance 30 in the embodiment shown in fig. 2 may be described with reference to the related contents in the embodiment shown in fig. 1, and are not described again here. It should be noted that, when the distribution entity 20 is embedded and deployed in the network service instance 30, the resource overhead generated by the communication between the synchronous distribution entity 20 and the network service instance 30 can be reduced, and resources are saved.
Referring to fig. 3, another schematic diagram of a data stream processing system 100 is shown, where the data stream processing system 100 includes a controller 10, a plurality of distribution entities 20, and a network service cluster 300, and the network service cluster 300 includes at least one network service instance 30. Wherein at least one distribution entity 20 of the plurality of distribution entities 20 is independent of the network service instance 30, the remaining distribution entities 20 may be deployed embedded in the network service instance 30.
Wherein a distribution entity 20 independent of the network service instance 30 may be used to distribute an external data flow, which may be a data flow from outside the data flow processing system 100 into the data flow processing system 100. The distribution entity 20, which is embedded and deployed at the network service instances 30, may be used to distribute internal data flows, which may be data flows between the network service instances 30.
The components and functions of the controller 10, the distribution entity 20, and the network service instance 30 in the embodiment shown in fig. 3 may refer to the description of the relevant contents in the embodiment shown in fig. 1, and are not described again here. It should be noted that at least one distribution entity 20 is independent of the network service instance 30, and the remaining distribution entities 20 are embedded and deployed in the network service instance 30, so that a suitable distribution entity 20 can be selected for data stream distribution according to the type of data stream, such as external data stream and internal data stream, thereby improving the distribution efficiency, reducing the resource overhead generated by communication between the distribution entity 20 and the network service instance 30, and satisfying the user requirements.
In some possible implementations, data flow forwarding may also be performed between network service instances 30 implementing different network services (e.g., network service instances 30 implementing SNAT and network service instances 30 implementing load balancing) to form a network service chain. The following describes the process of forwarding data streams between network service instances 30 under data stream processing system 100 with different architectures.
In the data stream processing system 100 shown in fig. 1, the distribution entity 20 is deployed independently from the network service instances 30, and data stream distribution between the network service instances 30 can be realized by the distribution entity 20. For example, the data stream 1 is first distributed by the distribution entity 20 and transmitted to a first network service instance (any one of the 2 network service instances 30 in fig. 1), after the first network service instance completes processing the data stream 1, the processed data stream 1 may be sent to the distribution entity 20 again, and the distribution entity 20 distributes the processed data stream 1 to a second network service instance (another one of the 2 network service instances 30 in fig. 1) according to the distribution rule. Wherein the first network service instance and the second network service instance are used to implement different network services. It should be noted that, when the second network service instance synchronizes the state data stored in the first network service instance, the state data stored in the first network service instance may be sent to the distribution entity 20 first, and then the distribution entity 20 synchronizes the state data to the second network service instance according to the distribution rule.
In the data stream processing system 100 shown in fig. 2, since there is no independently deployed distribution entity 20, data stream distribution between different network service instances 30 is implemented by the distribution entity 20 embedded within the network service instances 30. For example, after the first network service instance completes processing the data stream 1, the distribution entity 20 embedded and deployed in the first network service instance may send the processed data stream 1 to the second network service instance according to the distribution rule. It should also be noted that external data streams of the data stream processing system 100, such as data streams from a switching device (not shown in the figure), may also be sent to the network service instance 30, and distributed by the distribution entity 20 embedded in the network service instance 30.
In the data stream processing system 100 shown in fig. 3, since the independent distribution entity 20 and the distribution entity 20 embedded in the network service instance 30 are deployed at the same time, the distribution of data streams between the network service instances 30 can be realized by the distribution entity 20 within the network service instance 30, and the distribution of external data streams of the data stream processing system 100 can be realized by the independently deployed distribution entity 20. The data stream distribution efficiency can be improved by combining the independently deployed distribution entity 20 and the embedded distribution entity 20 for data stream distribution.
The embodiment shown in fig. 1 to fig. 3 describes in detail an organization structure of a data stream processing system 100 provided in the embodiment of the present application, and the embodiment of the present application further provides a method for processing multiple data streams. The processing method of the multiple data streams will be described in detail below.
Referring to fig. 4, a flow chart of a method for processing multiple data streams is shown, the method comprising:
s402: the controller 10 sends the original distribution rules to the distribution entity 20.
Distribution rules refer to rules for distributing data streams. In some embodiments, the distribution rule may be to hash flow processing data of the data flow, such as a quintuple, and determine the network service instance 30 corresponding to the data flow according to a hash value remainder result. For example, if the hash value remainder result is 0, it indicates that the data stream is distributed to the first network service instance, and if the hash value remainder result is 1, it indicates that the data stream is distributed to the second network service instance. Further, the distribution rules may also be used to indicate status data synchronization.
Upon initialization of the web services cluster 300, the controller 10 may issue the original distribution rules to the distribution entity 20. The original distribution rule may be a distribution rule corresponding to the topology of the network service cluster 300 before elastic scaling. The original distribution rules may be used to control data flow distribution and state data synchronization in the pre-elastic network service cluster 300.
S404: the distribution entity 20 sends a plurality of data streams to the network service instances 30 in the network service cluster 300 according to the original distribution rules.
For any data flow in a plurality of data flows, the distribution entity 10 may determine, according to the original distribution rule, a network service instance 30 corresponding to the data flow, for example, a target network service instance in the network service cluster 300, and then send the data flow to the target network service instance corresponding to the data flow.
Specifically, the initial distribution rule may be to hash stream processing data of the data stream, such as a quintuple, and when determining the network service instance 30 corresponding to the data stream according to the hash value remainder result, the distribution entity 20 may hash the stream processing data of each of the multiple data streams, perform the remainder operation on the hash value to obtain the hash value remainder result, and then determine the network service instance 30 corresponding to the data stream according to the hash value remainder result, so that the distribution entity 20 may send the data stream to the network service instance 30 corresponding to the data stream.
The network service instance 30 may be specifically any one or more of an Elastic Load Balance (ELB) gateway, an NAT gateway, a network intrusion detection gateway, a traffic cleansing gateway, a virtual private network gateway, a multicast gateway, and an anycast gateway. A gateway is a device that implements network interconnection above the network layer, and is therefore also called an internetwork connector, a protocol converter. In practical applications, the gateway may be a server, for example, a server providing stateful services such as flexible complex balancing, network address translation, network intrusion detection, and the like.
In some possible implementations, the original distribution rule may also be other rules. Distribution entity 20 may determine, based on other rules, a network service instance 30 to which the plurality of data flows correspond and send the plurality of data flows to the corresponding network service instance 30. The embodiments of the present application do not limit this.
S406: the network service instances 30 in the network service cluster 300 process the plurality of data flows according to their state data.
S402 to S406 are specific implementations of processing the data stream before the network service cluster 300 elastically expands and contracts, and the processing method for executing multiple data streams in the embodiment of the present application may not execute the above steps.
S408: controller 10 elastically scales network service cluster 300.
Specifically, as the load (e.g., the number of data streams that need to be processed) of the network service cluster 300 changes, the controller 10 may elastically scale the network service cluster 300. Wherein, the elastic expansion comprises transverse elastic expansion and longitudinal elastic expansion. The controller 10 may select to perform horizontal elastic expansion or vertical elastic expansion on the network service cluster 300 according to the traffic demand.
In some embodiments, the controller 10 may create a web service instance 30 in the web service cluster 300, thereby increasing the number of web service instances 30 in the web service cluster 300; the controller 10 may also delete network service instances 30 in the network service cluster 300, thereby reducing the number of network service instances 30 in the network service cluster 300. In this manner, the controller 10 enables lateral elastic scaling of the web service cluster 300.
In other embodiments, for a network service instance 30 in the network service cluster 300, the controller 10 may switch the virtual machine or container corresponding to the network service instance 30 to change the processing capability of the network service instance 30. For example, the virtual machine corresponding to the network service instance 30 includes 2vCPUs, and the controller 10 may switch the network service instance 30 to the virtual machine including 4vCPUs, thereby changing the processing capability of the network service instance 30. In this way, the controller 10 realizes longitudinal elastic expansion and contraction of the network service cluster 300.
S410: the controller 10 sends the updated distribution rules to the distribution entity 20.
After the web service cluster 300 is elastically scaled, for example, after the number of the web service instances 30 in the web service cluster 300 is changed or the processing capability of the web service instances 30 is changed, the web service instances 30 in the web service cluster 300 generally need to synchronize state data to provide the web service based on the state data. To this end, the controller 10 may issue updated distribution rules to the distribution entity 20 for status data synchronization.
The updated distribution rule is specifically a distribution rule corresponding to the topology of the elastically scaled network service cluster 300. The updated distribution rules may be used for state data synchronization and data flow distribution in the elastically scaled network service cluster 300.
S412: the distribution entity 20 instructs the network service instances 30 in the elastically scaled network service cluster 300 to redistribute the storage of the state data of the plurality of data flows according to the updated distribution rules.
At least one network service instance 30 of the network service cluster 300 has state data for a plurality of data streams stored in a distributed manner. Where state data for a partial data stream may be stored in each network service instance 30. When the number or processing capacity of the network service instances 30 changes, for example, when the number of the network service instances 30 is increased from 1 to 2, the distribution entity 20 may generate a status data rebalancing command according to the updated distribution rule, and then send the status data rebalancing command to the network service instances 30 in the elastically scaled network service cluster 00, for example, the newly added network service instances 30, so that the network service instances 30 execute the status data rebalancing command, thereby achieving status data rebalancing. Where state data rebalancing may be understood as state data redistribution, ensuring that the network service instance 30 is able to provide network services based on the redistributed state data.
Specifically, the distribution entity 20 may determine, based on the updated distribution rule, the state data that should be stored by the network service instance 30 in the network service cluster 300 after elastic scaling, and generate the state data rebalance command based on the state data that should be stored by the network service instance 30 in the network service cluster 300 after elastic scaling and the state data actually stored by the network service instance 30 in the network service cluster 300 before elastic scaling. For example, the distribution entity 20 may generate a state data rebalancing command for the identity of the state data that the newly added network service instance 30 should hold and the identity of the network service instance 30 that actually holds the state data. Thus, the status data rebalancing command can instruct the newly added network service instance 30 to obtain the status data from the network service instance 30 that actually stores the corresponding status data, thereby implementing the status data synchronization.
When the number of network service instances 30 increases, the distribution entity 20 may send a state data rebalancing command to the newly added network service instance 30 to synchronize the state data of the newly added network service instance 30 from the network service instance 30 before the change. For example, when the network service cluster 300 is changed to include a first network service instance and a second network service instance, and a third network service instance and a fourth network service instance are added to the network service cluster 300, the distribution entity 20 may send a status data rebalancing command to the third network service instance and the fourth network service instance, so that the status data of the multiple data flows are redistributed on the first network service instance, the second network service instance, the third network service instance and the fourth network service instance.
When the number of network service instances 30 decreases, distribution entity 20 may send a status data rebalancing command to the remaining network service instances 30 to cause the remaining network service instances 30 to synchronize status data from the decreased network service instances 30. For example, when the network service cluster 300 is changed to include a first network service instance, a second network service instance, a third network service instance, and a fourth network service instance, and when the number of network service instances 30 in the network service cluster 300 decreases and the first network service instance and the second network service instance remain, the distribution entity 20 may send a status data rebalancing command to the first network service instance and the second network service instance, so that the status data of the multiple data flows are redistributed over the first network service instance and the second network service instance.
When the processing capabilities of a network service instance 30 change, the distributing entity 30 may send a state data rebalancing command to the changed network service instance 30 to cause the changed network service instance 30 to synchronize state data from the pre-changed network service instance 30. For example, the network service cluster 300 includes, before the change, a first network service instance and a second network service instance, where the first network service instance and the second network service instance respectively include 2vCPUs, and after the change, a third network service instance and a fourth network service instance are included, where the third network service instance and the fourth network service instance respectively include 4vCPUs, and the distribution entity 20 may send a status data rebalancing command to the third network service instance and the fourth network service instance, so that status data of multiple data flows are redistributed on the third network service instance and the fourth network service instance.
S414: the distribution entity 20 sends a plurality of data flows to the network service instances 30 in the elastically scaled network service cluster 300 according to the updated distribution rules.
Distribution entity 20 may determine to enable the updated distribution rules to distribute the multiple data streams when the state data synchronization is complete. Further, the distribution entity 20 may also delete the original distribution rules. Therefore, the original distribution rule and the updated distribution rule can be prevented from being simultaneously effective, and data flow distribution conflict is caused.
Specifically, the network service instance 30 corresponding to each of the multiple data flows is determined according to the updated distribution rule, where the network service instance 30 corresponding to the data flow is specifically the network service instance 30 storing the state data corresponding to the data flow. Distribution entity 20 may then send the multiple data flows to the network service instances corresponding to the data flows in the elastically scaled network service cluster.
Similar to the original distribution rule, the updated distribution rule may be based on hashing the flow processing data of the data flow, such as a quintuple, and then determining the network service instance 30 corresponding to the data flow according to the hash value remainder result, so as to distribute the data flow to the network service instance 30. The hash value remainder is a remainder obtained by dividing the hash value by a divisor, where the divisor may be the number of network service instances 30 in the network service cluster 300. The number of network service instances 30 in a network service cluster 300 may vary as the controller 10 laterally elastically scales the network service cluster 300. Based on this, the updated distribution rule may differ from the original distribution rule by the divisor by which the hash value is left over.
In this embodiment, the data streams may include different types, such as ingress (ingress) data streams and egress (egress) data streams. An import data stream refers to a data stream flowing into data stream processing system 100, such as a request data stream; an egress data stream refers to a data stream that exits data stream processing system 100, such as a response data stream.
Due to the change in divisor in the hash remainder operation, in some cases, the ingress data stream before elastic scaling and the egress data stream after elastic scaling may be distributed to different network service instances 30. To this end, distribution entity 20 may also synchronize state data between different network service instances 30.
In some possible implementations, some network services, such as NAT services, etc., may require that an ingress data flow and a corresponding egress data flow be distributed by the same network service instance. Based on this, the distribution rules may include distribution rules for different types of data flows. For example, for an imported data flow, the distribution rule may be based on hashing flow processing data of the data flow, such as a quintuple, and then determining a network service instance 30 corresponding to the data flow according to a hash value remainder result, so as to distribute the data flow to the network service instance 30. For another example, for an egress data flow, the distribution rule may be that the distribution entity 20 finds the corresponding network service instance 30 according to the identifier of the service resource used for sending the ingress data flow corresponding to the egress data flow. Wherein the identification of the traffic resource may comprise at least one of an IP address and a port number.
Further, the state data in each network service instance 30 may also be backed up in view of the reliability of the state data. In particular, network service cluster 300 may include at least one service pair, each service pair including two network service instances 30 operating in either master-master mode or master-slave mode.
The master-master mode refers to that each network service instance 30 (e.g., the first network service instance and the second network service instance) in the service pair has independent service resources and independently provides the network service. The active/standby mode means that each network service instance 30 (for example, a first network service instance and a second network service instance) in a service pair has independent service resources, only one network service instance 30 (for example, the first network service instance) works at the same time, and the other network service instance 30 (for example, the second network service instance) serves as a backup to receive state data generated by the network service instance 30 in a working state.
When the first network service instance and the second network service instance work in the master mode, the first network service instance and the second network service instance respectively store state data of different data streams for processing the different data streams. The first network service instance and the second network service instance can fully synchronize the state data, so that the safety and the reliability of the state data are ensured. When the first network service instance and the second network service instance work in the active/standby mode, the second network service instance can back up the state data stored in the first network service instance, so that the data security is ensured.
Accordingly, the updated distribution rules may include a distribution table and a forwarding table. In this embodiment, the distributing entity 20 may determine a service pair corresponding to a data flow in the elastically scaled network service cluster 300 according to the distribution table, and then the distributing entity 20 determines a target network service instance in the service pair based on the forwarding table.
In specific implementation, the distributing entity 20 may determine stream processing data of the data stream, such as a hash value of a five-tuple, and then determine a service pair corresponding to the data stream according to a remainder of the hash value and a mapping relationship between a remainder of the hash value in the distribution table and the service pair, and then the distributing entity 20 may determine an operating mode of the network service instance 30 in the service pair based on the forwarding table. When the working mode of the network service instance 30 in the service pair is the active/standby mode, it may be determined that the network service instance 30 in the master mode is the target network service instance. Forwarding policies may also be included in the forwarding table, for example a forwarding policy may be a load balancing policy. When the operation mode of the network service instance 30 in the service pair is the master mode, the distribution entity 20 may determine that one network service instance 30 in the service pair is the target network service instance through a forwarding policy (e.g., a load balancing policy).
It should be further noted that in some application scenarios, for example, in an ELB scenario, the distribution table corresponding to the ingress data stream and the distribution table corresponding to the egress data stream have the same matching field (i.e., matching field), and the distribution tables may be merged into one distribution table; in other application scenarios, such as NAT scenarios, the distribution tables corresponding to the ingress data flow and the egress data flow have different matching domains and cannot be combined. That is, in the scenario of NAT, etc., the distribution table may include the first distribution table and the second distribution table. The first distribution table may be used to determine a service pair corresponding to the ingress data flow, and the second distribution table may be used to determine a service pair corresponding to the egress data flow.
S416: the network service instance 30 in the elastically scaled network service cluster 300 receives at least one data flow and processes the at least one data flow according to the stored state data of the at least one data flow.
Specifically, when the multiple data flows in S414 are import data flows, after the network service instance 30 in the elastically-scaled network service cluster 300 receives at least one import data flow and processes the at least one import data flow according to the stored state data of the at least one import data flow, the distribution entity 20 may further receive an export data flow corresponding to the import data flow and send the export data flow to the network service instance 30 according to the updated distribution rule.
In this method, the distribution entity 20 stores the distribution rules for controlling the data stream distribution and the state data synchronization, and therefore, the distribution entity 20 can realize the coordination of the data stream distribution and the state data synchronization. Even if the number or processing capacity of the network service instances 30 in the network service cluster 300 changes, the distribution entity 20 can control the redistribution of the state data on the network service instances 30 through the updated distribution rules, and multiple data streams can be correctly distributed to the network service instances 30 having the state data thereof, so as to realize uninterrupted data stream processing. The method improves the expandability of the network service cluster 300, supports the elastic expansion of the network service cluster 300, does not interrupt the network service in the elastic expansion process of the network service cluster 300, and realizes large-scale, high-concurrency and high-new-built network service.
In addition, the network service instances 30 in the network service cluster 300 can synchronize the state data as required without performing large-scale state data synchronization, thereby avoiding consuming a large amount of resources, improving the expandability of the network service cluster 300, and reducing the resource consumption of the expanded network service cluster 300.
In order to make the technical solution of the present application clearer and easier to understand, the processing method for multiple data streams provided by the embodiment of the present application will be described below from the perspective of the expansion or contraction of the network service cluster 300.
Referring to the flowchart of the processing method of multiple data streams shown in fig. 5, in the example of fig. 5, the data stream processing system 100 adopts the architecture as shown in fig. 3, and the data stream processing system 100 includes a controller 10, a distribution entity 20 and a network service cluster 300. The web service cluster 300 includes at least one service pair, each service pair including two web service instances. The distribution entities 20 include a plurality of entities, at least one distribution entity 20 is independent of the network service instance, and the remaining distribution entities 20 are embedded and deployed in the network service instance.
In an initial phase, the network service cluster 300 includes a service pair, for example, service pair 1, where service pair 1 includes network service instance 1 and network service instance 2 operating in the master-master mode. The controller 10 may send the original distribution rule to the distribution entity 20, the distribution entity 20 may send the data stream to the network service instance 1 or the network service instance 2 in the service pair 1 based on the original distribution rule, and the network service instance 1 or the network service instance 2 may process the data stream 1 based on the stored state data.
When the load of the network service cluster 300 increases, the controller 10 may expand the network service cluster 300. The expanded network service cluster 300 may include two service pairs, for example, service pair 1 and service pair 2, where service pair 2 is a new service pair. Service pair 2 includes a network service instance 3 and a network service instance 4 operating in master-master mode. Wherein the controller 10 may send the updated distribution rule to the distribution entity 20, and the distribution entity 20 may instruct the network service instance in the elastically scaled network service cluster 300 to redistribute the storage of the state data of the plurality of data flows according to the updated distribution rule.
In particular, distribution entity 20 may generate a state data rebalancing command according to the updated distribution rules and then send the state data rebalancing command to network service instances 3 in service pair 2. The network service instance 3 executes a state data rebalance command to synchronize state data from the network service instance 1 in service pair 1 as needed. For example, the network service instance 1 stores the status data of 100 data streams, and the network service instance 3 may obtain the status data of 50 data streams corresponding to the above identifiers from the network service instance 1 according to the identifiers of the status data in the status data rebalance command. Further, the network service instance 4 in the service pair 2 can synchronize the state data from the network service instance 3 to realize backup of the state data, thereby avoiding state data loss caused by network service instance 3 failure.
The synchronization process mainly aims at the existing state data (also referred to as stock state data), and in the synchronization process, new state data (also referred to as increment state data) is generated. The new status data specifically refers to the status data generated after the distribution entity 20 issues the status data rebalancing command. The distribution entity 20 may also synchronize new state data based on updated distribution rules and original distribution rules, which may enable securing the reliability of the state data storage in a redundant manner.
It should be noted that, when synchronizing the state data, the distribution entity 20 still uses the original distribution rule to distribute the data stream, for example, the distribution entity 20 may send the data stream to the network service instance 1 or the network service instance 2 according to the original distribution rule, and the network service instance 1 or the network service instance 2 processes the data stream.
After the network service instance 3 completes synchronizing the status data as required, the controller 10 may further modify the service resource usage range of the network service instance 30 in the expanded network service cluster 300. For example, the network service cluster 300 is used to provide SNAT service, the traffic resource used by the network service cluster 300 includes UDP port number, which ranges from 1 to 6000, the controller 10 may configure the UDP port number used by the network service instance in service pair 1 to range from 1 to 3000 (excluding 3000), and the UDP port number used by the network service instance in service pair 2 to range from 3000 to 6000. Thus, the service resources used by the two service pairs do not overlap with each other. In addition, the distribution rule may carry a usage range of the service resource, and the distribution entity 20 can distribute the data flow using the service resource in a specific range or the status data of the data flow to the correct network service instance 30 when distributing the data flow.
Distribution entity 20 may then determine to enable the updated distribution rules to distribute the data streams, and in order to avoid data stream distribution conflicts, distribution entity 20 may delete the original distribution rules. In particular, the updated distribution rule comprises a scope field, and the updated distribution rule can be validated by modifying the scope field of the updated distribution rule, i.e. the updated distribution rule can be used for data stream distribution. This enables the distribution of data streams with the updated distribution rules enabled.
Referring next to the flowchart of the processing method of multiple data streams shown in fig. 6, in the example of fig. 6, the data stream processing system 100 adopts the architecture as shown in fig. 3, and the data stream processing system 100 includes a controller 10, a distribution entity 20 and a network service cluster 300. The web service cluster 300 includes at least one service pair, each service pair including two web service instances. The distribution entities 20 include a plurality of entities, at least one distribution entity 20 is independent of the network service instance, and the remaining distribution entities 20 are embedded and deployed in the network service instance.
In an initial phase, the network service cluster 300 includes two service pairs, for example, service pair 1 and service pair 2, where service pair 1 includes network service instance 1 and network service instance 2 operating in the master-master mode, and service pair 2 includes network service instance 3 and network service instance 4 operating in the master-master mode.
The controller 10 may send the original distribution rule to the distribution entity 20, the distribution entity 20 may send the data stream to the network service instance 1, the network service instance 2 in the service pair 1 or the network service instance 3, the network service instance 4 in the service pair 2 based on the original distribution rule, and the network service instance 1, the network service instance 2, the network service instance 3 or the network service instance 4 may process the data stream based on the stored state data.
When the load of the web service cluster 300 decreases, the controller 10 may scale the web service cluster 300. The scaled web services cluster 300 may include one service pair, for example service pair 1. Wherein, the service pair 2 is a deleted service pair.
The controller 10 may send the updated distribution rules to the distribution entity 20, and the distribution entity 20 may instruct the network service instances 30 in the elastically scaled network service cluster 300 to redistribute the storage of the state data of the plurality of data flows according to the updated distribution rules.
In particular, distribution entity 20 may generate a state data rebalancing command according to the updated distribution rules and then send the state data rebalancing command to network service instance 1 in service pair 1. The network service instance 1 executes a state data rebalance command to synchronize state data from the network service instance 3 in the service pair 2 as needed. Further, the network service instance 2 in the service pair 1 can synchronize the state data from the network service instance 1 to realize backup of the state data, so that the state data loss caused by the failure of the network service instance 1 is avoided.
Further, the distribution entity 20 may also generate status data after issuing the status data rebalancing command. The distribution entity 20 may also synchronize the new status data based on the updated distribution rules and the original distribution rules, which may allow for securing the reliability of the status data in a redundant manner.
It should be noted that, when synchronizing the state data, the distribution entity 20 still uses the original distribution rule to distribute the data stream, for example, the distribution entity 20 may send the data stream to the network service instance 1, the network service instance 2, the network service instance 3, or the network service instance 4 according to the original distribution rule, and the data stream is processed by these network service instances.
In some possible implementations, the updated distribution rule may include a scope field, and the distribution entity 20 may directly modify the scope field of the updated distribution rule so that the updated distribution rule may be used for data stream distribution. It should be noted that, modifying the updated distribution rule may also be performed by the controller 10, and then the controller 10 issues the modified distribution rule to the distribution entity 20, so that the distribution entity 20 distributes the data stream based on the modified rule.
The controller 10 may then modify the scope of use of the business resources of the network service instances 30 in the condensed network service cluster 300. It should be noted that the controller 10 may wait for a period of time (e.g., 10 seconds) to drain the data stream in service pair 2, and then modify the service range of the service resource of the network service instance 30 in the network service cluster 300 after the contraction. For example, the network service cluster 300 is used to provide SNAT service, the traffic resources used by the network service cluster 300 include UDP port numbers, the range of the port numbers is 1 to 6000, and after the network service cluster 300 is condensed, the controller 10 may modify the range of the UDP port numbers used by the network service instances in the service pair 1 to 6000, that is, the service pair 1 monopolizes all the port resources.
Finally, to avoid data stream distribution conflicts, distribution entity 20 may delete the original distribution rules and distribute the data streams via the updated distribution rules.
The dashed connection lines in fig. 5 and 6 are used to indicate the transfer of control information (e.g., distribution rules and status data), and the solid connection lines in fig. 5 and 6 are used to indicate the transfer of data streams.
For stateful web services, the state data of a data flow is typically lifecycle. During the life cycle of the state data, the state data generally needs to be maintained, so as to avoid that the state data is mistakenly aged and deleted, and further, the network service instance 30 cannot process the data stream.
In particular, the network service instance 30 may update the status data when the data stream arrives at the network service instance 30. Considering the situation that the ingress data flow and the egress data flow may be distributed to different network service instances 30 when the network service cluster 300 is scaled, when the data flow arriving at the network service instance 30 is an egress data flow, the network service instance 30 may further determine whether the egress data flow and the corresponding ingress data flow are processed by the network service instance 30 in the same service pair.
In some possible implementations, the network service instance 30 may look up the distribution rule according to the key of the egress data flow (e.g., five-tuple: source IP, destination IP, protocol number, source port, destination port) and the key of the ingress data flow (e.g., five-tuple: source IP, destination IP, protocol number, source port, destination port), and determine the Identity (ID) of the service pair corresponding to the ingress data flow and the ID of the service pair corresponding to the egress data flow based on the distribution rule.
If the IDs are consistent, it indicates that the egress data flow and the corresponding ingress data flow are processed by the network service instance in the same service pair. Based thereon, the network service instance can send a status update message to another network service instance within the service pair, the status update message including updated status data. If the IDs are not consistent, it indicates that the egress data flow and the corresponding ingress data flow are processed by the network service instance in the different service pair. Based on this, the network service instance needs to send the status update message not only to another network service instance within the located service pair, but also to a network service instance within another service pair.
Next, the network service instance 30 is used to provide load balancing and NAT services for the ELB gateway.
Referring to the flowchart of the multiple data stream processing method shown in fig. 7, as shown in fig. 7, the data stream processing system 100 includes a controller 10, a distribution entity 20, and a network service cluster 300. The network service cluster 300 is specifically an ELB gateway cluster. The network service cluster 300 includes at least one ELB gateway pair. Each ELB gateway pair includes two ELB gateways. The ELB gateways in each ELB gateway pair may operate in a master-master mode.
The client of the extranet and the server of the intranet establish communication with the stream processing system 100, respectively. Among them, the client (IP address is denoted as EIP) of the external network accesses the server of the internal network, and because the access amount is large, a plurality of servers are required to perform load balancing to provide services for the external network users. In this embodiment, three servers are exemplarily used, and the IP addresses of the three servers are SIP1(server IP), SIP2, and SIP3, respectively.
In an initial stage, the data stream processing system 100 includes an ELB gateway pair, for example, ELB gateway pair 1, where the ELB gateway pair 1 includes an ELB gateway 1 and an ELB gateway 2. The client of the external network can access the server of the internal network through the public network IP (denoted as VIP) provided by the ELB gateway pair 1. Specifically, when a client of the external network triggers an operation of a server accessing the internal network, a data stream may be generated, which is first distributed to the ELB gateway pair 1 (e.g., the ELB gateway 1 in the ELB gateway pair 1) by the distribution entity 20. At this time, the source IP, the destination IP, the source port number, and the destination port number of the packet in the data stream are respectively assumed to be EIP, VIP, EPORT, and VPORT.
The ELB gateway 1 converts the source IP of the packets in the data flow to the Local IP (LIP) of the ELB gateway 1 based on load balancing, and selects an available port number (denoted as LPORT) from TCP or UDP port number resources ranging from 0 to 65535 as the source port number, and converts the destination IP (vip) of the packets to the server IP (sip).
In particular, the ELB gateway 1 may balance different data flows arriving at the same ELB gateway to different servers according to a scheduling algorithm, such as a round robin algorithm. Assume that the source IP, destination IP, source port number, and destination port number of the packet after the ELB gateway conversion are LIP, SIP, LPORT, and VPORT, respectively. The value range of the SIP includes SIP1, SIP2 and SIP 3. Therefore, the client of the outer network can access the server of the inner network.
When there are more clients of the external network accessing the servers of the internal network, the controller 10 in the data stream processing system 100 may further expand the network service cluster 300 (e.g., the ELB gateway cluster). The expanded network service cluster 300 may include two ELB gateway pairs, such as ELB gateway pair 1 and ELB gateway pair 2. The ELB gateway pair 2 is a newly added ELB gateway pair, and the newly added ELB gateway pair includes two ELB gateways operating in the master-master mode, specifically, an ELB gateway 3 and an ELB gateway 4.
Prior to capacity expansion, distribution entity 20 receives the data stream and distributes the data stream based on the original distribution rules. The original rule comprises the following distribution table and forwarding table:
TABLE 1 distribution Table before expansion
Matching the domain: destination IP An action domain: ID of service pair Scope of action
Virtual IP, VIP, of ELB gateways ID of service pair 1 False
Local IP of ELB gateway, LIP1 ID of service pair 1 False
Table 2 forwarding table before expansion
Matching the domain: ID of service pair An action domain: destination IP List, policy
ID of service pair 1 [ IP of ELB gateway 1, IP of ELB gateway 2]And (3) strategy: load balancing
The distributing entity 20 may find the ID of the service pair to which the data flow belongs, such as the ID of service pair 1, based on the distribution table in table 1 matching the virtual ip (vip) or the local ip (lip) of the ELB gateway, and then the distributing entity 20 searches the forwarding table according to the ID of the service pair to obtain the corresponding forwarding table entry. The forwarding table entry stores the IP of all ELB gateways in the service pair. The distribution entity 20 distributes the data flow to a certain ELB gateway corresponding to these IP addresses based on a forwarding policy, such as a load balancing policy, set in the forwarding table.
During capacity expansion, the data stream processing system 100 may perform the following steps:
step 1: the controller 10 creates an ELB gateway.
Specifically, the controller may create the ELB gateway 3 and the ELB gateway 4. Wherein, the source IP of the ELB gateway 3 and the ELB gateway 4 can be configured as LIP2, and the destination IP can be configured as IP of the server of the intranet, such as SIP1, SIP2 or SIP 3. The TCP and UDP source ports range from 0-6000.
Step 2: the controller 10 issues the updated distribution rule.
The updated distribution rules include the following distribution tables and forwarding tables:
table 3 distribution table after step 2 is performed
Figure BDA0003058975100000211
Compared with the distribution table before capacity expansion, the distribution table after step 2 is executed has 3 additional entries, and the scope of the 3 entries is configured as True, and is used for identifying that the entry is used for state data synchronization and is not used for data stream distribution. The effect of the newly added first entry (entry 3 in table 3) is to distribute the data flow with destination IP as VIP evenly to service pair 1 and service pair 2 by quintuple hashing. The newly added 2 nd and 3 rd entries (4 th and 5 th entries in table 3) are entries for matching the egress data stream. If the data flow reaches service pair 1, the LIP1 is used as the source address of the data flow to convert the data flow; the destination IP of the egress data flow is LIP1, the egress data flow is processed at service pair 1 and the distribution entity 20 distributes it to service pair 1. If the data flow reaches service pair 2, the LIP2 is used as the source address of the data flow to convert the data flow; the egress data flow has a destination IP of LIP2, and is processed at service pair 2, and is distributed to service pair 2 by the distributing entity 20.
Table 4 forwarding table after performing step 2
Matching the domain: ID of service pair An action domain: destination IP List, policy
ID of service pair 1 [ IP of ELB gateway 1, IP of ELB gateway 2]The strategy is as follows: load balancing
ID of service pair 2 [ IP of ELB gateway 3, IP of ELB gateway 4]And (3) strategy: load balancing
And (3) adding 1 table entry to the forwarding table after the step (2) is executed, relative to the forwarding table before capacity expansion. The function of the newly added entry is to uniformly send the data stream arriving at the service pair 2 to the ELB gateway 3 and the ELB gateway 4 through the forwarding table.
And step 3: the distribution entity 20 issues a status data rebalancing command, and the ELB gateway executes the status data rebalancing command to realize status data synchronization.
The distribution entity 20 may generate the status data rebalancing command according to the table entry in step 2, and issue the status data rebalancing command to a network service entity, such as an ELB gateway. The status data rebalance command is used to balance inventory status data (i.e., status data that has been generated prior to issuing the command).
The ELB gateway can execute the state data rebalancing command to execute the steps of traversing the stock state data, re-hashing the protocol field of the stock state data, and re-distributing and storing the stock state data to each service pair, thereby realizing the rebalancing of the stock state data.
Further, the status data rebalance command is also used to balance the incremental status data (status data generated after issuing the status data rebalance command). Specifically, the ELB gateway may synchronize the incremental state data according to the original table entry and the newly added table entry in the distribution table, so as to synchronize the incremental state data.
And 4, step 4: the controller 10 modifies the configuration information of the ELB gateway.
After the inventory status data is redistributed, the controller 10 may modify the configuration information for service pair 1. In this embodiment, the configuration information of the ELB gateway in service pair 1 does not need to be changed, and the internal servers SIP1, SIP2, and SIP3 are still used as the destination IP, and LIP1 is used as the source IP, and 0-6K is used as the source port to perform load balancing service.
It should be noted that the step 4 is an optional step, when the network service instance is an ELB, the controller 10 does not need to modify the configuration information, and when the network service instance is an NAT gateway, the controller 10 generally needs to modify the configuration information, for example, the source port number of the service pair 1 is modified from 0-6000 to 0-3000 (excluding 3000). Where port numbers in the range of 3000-6000 are allocated to service pair 2.
And 5: the controller 10 modifies the updated distribution rule and issues the modified distribution rule.
After the configuration information of the ELB gateway is successfully modified, the controller 10 may delete the entry whose scope takes False in the distribution table after step 2 is executed, and modify the entry whose scope takes True into False, to obtain the modified distribution table. Based on the modified distribution table and the forwarding table, the modified distribution rule may be obtained, and the controller 10 may issue the modified distribution rule for data stream forwarding and state synchronization.
Wherein, the distribution table after step 5 is executed is as follows:
table 5 distribution table after performing step 5
Figure BDA0003058975100000221
The forwarding table after performing step 5 is as follows:
table 6 forwarding table after performing step 5
Matching the domain: ID of service pair An action domain: destination IP List, policy
ID of service pair 1 [ IP of ELB gateway 1, IP of ELB gateway 2]And (3) strategy: load balancing
ID of service pair 2 [ IP of ELB gateway 3, IP of ELB gateway 4]And (3) strategy: load balancing
Compared with the forwarding table after step 2 is executed, the matching domain of the second entry in the forwarding table after step 5 is changed from the ID of service pair 1 to the ID of service pair 2.
It should be noted that, instead of issuing a new distribution rule, the controller 10 may directly modify the field value on the basis of executing the distribution table in step 2 by the distribution entity 20 to obtain a modified distribution table, and further obtain the modified distribution rule according to the modified distribution table and the forwarding table.
The tables 5 and 6 are distribution tables and forwarding tables after capacity expansion, and the distribution entity 20 may distribute the data stream to the corresponding ELB gateway based on the distribution tables and forwarding tables after capacity expansion, and balance the data stream to the server at the back end after address conversion by the EL B gateway, thereby implementing server access.
In some possible implementations, the controller 10 in the data stream processing system 100 may also scale the network service cluster 300 (e.g., an ELB gateway cluster) when there are fewer clients of the external network accessing the servers of the internal network. The scaled web services cluster 300 includes an ELB gateway pair, such as ELB gateway pair 1. Wherein, the ELB gateway pair 2 is a deleted ELB gateway pair.
The process of the web service cluster 300 is as follows:
step 1: the controller 10 issues the updated distribution rule.
The updated distribution rule includes a distribution table and a forwarding table, where the forwarding table is unchanged from before the contraction, as shown in table 6. The distribution table is as follows:
table 7 distribution table after step 1 is performed
Figure BDA0003058975100000231
In the distribution table, besides the table entries before the reduction, three table entries are added. The scope value of the three entries is True, which is used to identify the entry for state data synchronization and not for data stream distribution. The matching domain of the newly added entry is a target IP, the values of the target IP are respectively VIP, LIP1 and LIP2, the action domain is an ID of a service pair, and the value is an ID of a service pair 1.
Step 2: the distribution entity 20 issues a status data rebalancing command, and the ELB gateway performs the status data rebalancing command.
After the controller 10 issues the updated distribution rule to the distribution entity 20, the distribution entity 20 may generate a status data rebalancing command according to the updated distribution rule, and then issue the status data rebalancing command to an ELB gateway, such as the ELB gateway 1.
The ELB gateway 1 executes the state data rebalancing command to execute the steps of traversing the stock state data according to the updated distribution rule, re-hashing the protocol field of the stock state data, and re-distributing and storing the stock state data to each service pair, thereby realizing the rebalancing of the stock state data.
Further, the state data rebalance command is also used to balance the incremental state data. Specifically, the ELB gateway 1 may synchronize the incremental state data according to the original table entry and the newly added table entry in the distribution table, so as to synchronize the incremental state data.
And step 3: the controller 10 issues the modified distribution rule.
After the inventory status data is rebalanced, the controller 10 may modify the updated distribution rules. For example, the controller may delete the entry whose scope is False in the distribution table in table 7, and then modify the entry whose value is True into False, so as to obtain the modified distribution table. Controller 10 may also delete the entry in the forwarding table associated with service pair 2 to obtain a modified forwarding table. The modified distribution rule includes the modified distribution table and the modified forwarding table, which is specifically as follows:
table 8 modified distribution table
Matching the domain: destination IP An action domain: ID of service pair Scope of action
Virtual IP, VIP, of ELB gateways ID of service pair 1 False
Local IP of ELB gateway, LIP1 ID of service pair 1 False
Local IP of ELB gateway, LIP2 ID of service pair 1 False
Table 9 modified forwarding table
Matching the domain: ID of service pair An action domain: destination IP List, policy
ID of service group 1 [ IP of ELB gateway 1, IP of ELB gateway 2]And (3) strategy: load balancing
The distribution entity 20 may distribute the data stream to the ELB gateway in the scaled network service cluster 300 based on the modified distribution rule, and the ELB gateway sends the data stream to the servers in the intranet through the load balancing policy.
The method for processing multiple data streams provided by the embodiment of the present application is described in detail above with reference to fig. 1 to 7, and the data stream processing system 100 and the distribution entity 20 provided by the embodiment of the present application are described below with reference to the accompanying drawings.
Referring to the schematic structural diagram of the data stream processing system 100 shown in fig. 1, the data stream processing system 100 includes: a controller 10, a distribution entity 20 and a network service cluster 300, wherein the network service cluster 300 comprises at least one network service instance 30, each network service instance 30 storing state data of a part of a data stream. The functions of the various components of the web services cluster 300, such as the controller 10, the distribution entity 20, and the web service instances 30 in the web services cluster 300, are described in detail below.
The controller 10 is configured to perform elastic expansion and contraction on the network service cluster, and send an updated distribution rule to the distribution entity;
the distribution entity 20 is configured to instruct, according to the updated distribution rule, the network service instance in the network service cluster after elastic scaling to redistribute storage of state data of the multiple data flows, and send the multiple data flows to the network service instance in the network service cluster after elastic scaling according to the updated distribution rule;
the elastically scaled network service instance 30 in the network service cluster 300 is configured to receive at least one data flow, and process the at least one data flow according to the stored state data of the at least one data flow.
In some possible implementations, the controller 10 is further configured to:
before the network service cluster 30 is elastically stretched, sending an original distribution rule to the distribution entity 20;
the distribution entity 20 is further configured to:
the plurality of data streams are sent to the network service instances 30 in the network service cluster 300 according to the original distribution rules.
In some possible implementations, the distribution entity 20 is specifically configured to:
the distribution entity determines to enable the updated distribution rule to distribute the plurality of data streams, deleting the original distribution rule.
In some possible implementations, the controller 10 is further configured to:
after the network service cluster 300 is elastically stretched, the service range of the service resource of the network service instance 30 in the network service cluster 300 after elastic stretching is modified.
In some possible implementations, the traffic resource includes at least one of an internet protocol IP address and a port number.
In some possible implementations, the distribution entity 20 is specifically configured to:
sending the import data stream to a network service instance 30 in the elastically stretched network service cluster 300 according to the updated distribution rule;
the elastically scaled web service instances 30 in the web service cluster 300 are configured to:
receiving at least one data stream, and after processing the at least one data stream according to the stored state data of the at least one data stream, the method further comprises:
the distribution entity 20 is specifically configured to:
and receiving an egress data flow corresponding to the ingress data flow, and sending the egress data flow to the network service instance 30 according to the updated distribution rule.
In some possible implementations, the web service cluster 300 includes at least one service pair, each service pair includes two web service instances 30 operating in a master-master mode or a master-slave mode, and the updated distribution rule includes a distribution table and a forwarding table;
the distribution entity 20 is specifically configured to:
determining a service pair corresponding to a data stream in the elastically stretched network service cluster according to the distribution table;
determining a target network service instance in the service pair according to the forwarding table;
and sending the data stream to the target network service instance.
In some possible implementations, the at least one network service instance 30 includes any one or more of a resilient load balancing gateway, a network address translation gateway, a network intrusion detection gateway, a traffic flushing gateway, a virtual private network gateway, a multicast gateway, and an anycast gateway.
It should be noted that fig. 1 illustrates the distribution entity 20 independent of the network service instance 30. In some possible implementations, the distribution entity 20 of the data stream processing system 100 may also be embedded deployed in the network service instance 30, as shown in particular in fig. 2. In other possible implementations, data stream processing system 100 may include multiple distribution entities 20, with at least one distribution entity 20 being separate from a network service instance 30, and the remaining distribution entities 20 being disposed embedded in the network service instance 30.
The data stream processing system 100 according to the embodiment of the present application may correspond to perform the method described in the embodiment of the present application, and the above and other operations and/or functions of the components of the data stream processing system 100, such as the controller 10, the distribution entity 20, and the network service instance 30 in the network service cluster 300, are respectively for implementing the corresponding flows of the methods in the embodiment shown in fig. 4, and are not described herein again for brevity.
Next, referring to the structural schematic diagram of the distribution entity 20 shown in fig. 1, the distribution entity 20 includes:
a communication unit 202 for receiving the updated distribution rule transmitted by the controller 10.
A distributing unit 204, configured to instruct, according to the updated distribution rule, the elastically scaled network service instance 30 in the network service cluster 300 to redistribute storage of the state data of the multiple data flows.
The distributing unit 204 is further configured to invoke the communicating unit 202 to send the multiple data streams to the network service instances 30 in the elastically stretched network service cluster 300 according to the updated distribution rule.
In some possible implementations, the communication unit 202 is further configured to: the original distribution rules sent by the controller 10 are received.
The distributing unit 204 is further configured to invoke the communicating unit 202 to send the multiple data streams to the network service instances 30 in the network service cluster 300 according to the original distribution rule.
In some possible implementations, the distribution entity 20 further includes:
a rule management unit 206, configured to store the distribution rule.
In some possible implementations, the rule management unit 206 is further configured to:
deleting the original distribution rule upon determining that the updated distribution rule is enabled to distribute the plurality of data streams.
In some possible implementations, the distribution unit 204 is specifically configured to:
invoking the communication unit 202 to send the import data stream to a network service instance 30 in the elastically scaled network service cluster 300 according to the updated distribution rule.
The communication unit 202 is further configured to: the network service instance 30 in the network service cluster 300 after elastic expansion receives at least one data flow, and receives an egress data flow corresponding to the ingress data flow after processing the at least one data flow according to the stored state data of the at least one data flow.
The distributing unit 204 is further configured to:
invoking the communication unit 202 to send the egress data stream to the network service instance 30 according to the updated distribution rule.
In some possible implementations, the web service cluster 300 includes at least one service pair, each service pair includes two web service instances 30 operating in a master-master mode or a master-slave mode, and the updated distribution rule includes a distribution table and a forwarding table.
The distributing unit 204 is specifically configured to:
determining a service pair corresponding to a data stream in the elastically stretched network service cluster 300 according to the distribution table; determining a target network service instance in the service pair according to the forwarding table; invoking the communication unit 202 to send the data stream to the target network service instance.
In some possible implementations, the at least one network service instance 30 includes any one or more of a resilient load balancing gateway, a network address translation gateway, a network intrusion detection gateway, a traffic flushing gateway, a virtual private network gateway, a multicast gateway, and an anycast gateway.
The distribution entity 20 according to the embodiment of the present application may correspond to perform the method described in the embodiment of the present application, and the above and other operations and/or functions of each module/unit of the distribution entity 20 are respectively for implementing corresponding flows of each method performed by the distribution entity 20 in the embodiment shown in fig. 4, and are not described herein again for brevity.
The embodiment of the application also provides a computing device 800. The computing device 800 may be a server, such as a cloud server in a cloud environment, or a local server in a local data center. In some embodiments, the computing device 800 may be a desktop, laptop, etc. terminal. The computing device 800 is particularly adapted to carry out the functions of the distribution entity 20 in the embodiment shown in fig. 1.
Fig. 8 provides a schematic diagram of a computing device 800, as shown in fig. 8, the computing device 800 including a bus 801, a processor 802, a communication interface 803, and a memory 804. The processor 802, memory 804, and communication interface 803 communicate over a bus 801.
The bus 801 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but that does not indicate only one bus or one type of bus.
The processor 802 may be any one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Micro Processor (MP), a Digital Signal Processor (DSP), and the like.
The communication interface 803 is used for communication with the outside. For example, the communication interface 803 may be used to receive updated distribution rules sent by the controller 10, or to receive original distribution rules sent by the controller 10, or the like.
The memory 804 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The memory 804 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a Hard Disk Drive (HDD), or a Solid State Drive (SSD).
The memory 804 stores executable code that the processor 802 executes to perform a method of processing multiple data streams.
In particular, in case of implementing the embodiment shown in fig. 1, and in case that the units of the distribution entity 20 described in the embodiment of fig. 1 are implemented by software, the software or program code of the units of the distribution entity 20 in fig. 1 may be stored in the memory 804. The processor 802 executes the program codes corresponding to the units stored in the memory 804 to execute the aforementioned processing methods of the multiple data streams.
The embodiment of the present application further provides a computing device cluster 80. Referring to the structural schematic diagram of the computing device cluster 80 shown in fig. 9, the computing device cluster 80 may include at least one computing device 800 shown in fig. 8, and the computing device cluster 80 is used for implementing the functions of the data stream processing system 100 shown in fig. 1. Wherein at least one computing device 800 in computing device cluster 80 is configured to implement controller 10 in data stream processing system 100 as shown in fig. 1, at least one computing device 800 in computing device cluster 80 is configured to implement distribution entity 20 in data stream processing system 100 as shown in fig. 1, and at least one computing device 800 in computing device cluster 80 is configured to implement network service instance 30 in data stream processing system 100 as shown in fig. 1. There may be any two or all three of the controller 10, the distribution entity 20, the network service instance 30 running on the computing device 800 in the computing device cluster 80.
The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store or a data storage device, such as a data center, that contains one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others. The computer-readable storage medium includes instructions that instruct a computing device to perform the method for processing the plurality of data streams.
The embodiment of the application also provides a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computing device, cause the processes or functions described in accordance with embodiments of the application to occur, in whole or in part.
The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, or data center to another website site, computer, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.).
The computer program product may be a software installation package that can be downloaded and executed on a computing device in the event that any of the aforementioned methods of processing multiple data streams are required.
The description of the flow or structure corresponding to each of the above drawings has emphasis, and a part not described in detail in a certain flow or structure may refer to the related description of other flows or structures.

Claims (19)

1. A method for processing multiple data streams, the method being applied to a data stream processing system, the data stream processing system including a controller, a distribution entity and a network service cluster, wherein the network service cluster includes at least one network service instance, and each network service instance stores state data of a part of data streams, the method including:
the controller elastically stretches the network service cluster;
the controller sends the updated distribution rules to the distribution entity;
the distribution entity indicates the elastically stretched network service instance in the network service cluster to redistribute the storage of the state data of the plurality of data streams according to the updated distribution rule;
the distribution entity sends the data flows to the network service instances in the network service cluster after elastic expansion and contraction according to the updated distribution rule;
and receiving at least one data stream by the network service instance in the network service cluster after elastic expansion, and processing the at least one data stream according to the stored state data of the at least one data stream.
2. The method of claim 1, wherein before the controller elastically scales the network service cluster, the method further comprises:
the controller sends the original distribution rule to the distribution entity;
and the distribution entity sends the plurality of data streams to the network service instances in the network service cluster according to the original distribution rule.
3. The method of claim 2, wherein the sending, by the distribution entity, the plurality of data flows to the elastically scaled network service instances in the network service cluster according to the updated distribution rule comprises:
the distribution entity determines to enable the updated distribution rule to distribute the plurality of data streams, deleting the original distribution rule.
4. The method of any of claims 1 to 3, wherein after the controller elastically scales the network service cluster, the method further comprises:
and the controller modifies the service resource use range of the network service instance in the network service cluster after elastic expansion.
5. The method of claim 4, wherein the traffic resources comprise at least one of an Internet Protocol (IP) address, a port number.
6. The method of any of claims 1 to 5, wherein the sending, by the distribution entity, the plurality of data flows to the elastically scaled network service instances in the network service cluster according to the updated distribution rule comprises:
the distribution entity sends an import data stream to a network service instance in the network service cluster after elastic expansion according to the updated distribution rule;
after the elastically scaled network service instance in the network service cluster receives at least one data flow and processes the at least one data flow according to the stored state data of the at least one data flow, the method further includes:
and the distribution entity receives an outlet data stream corresponding to the inlet data stream and sends the outlet data stream to the network service instance according to the updated distribution rule.
7. The method of any of claims 1 to 6, wherein the network service cluster comprises at least one service pair, each service pair comprising two network service instances operating in primary master mode or primary standby mode, the updated distribution rules comprising a distribution table and a forwarding table;
the sending, by the distribution entity, the plurality of data streams to the network service instances in the elastically scaled network service cluster according to the updated distribution rule includes:
the distribution entity determines a service pair corresponding to a data stream in the network service cluster after elastic expansion according to the distribution table;
the distribution entity determines a target network service instance in the service pair according to the forwarding table;
the distribution entity sends the data stream to the target network service instance.
8. The method of any of claims 1 to 7, wherein the at least one network service instance comprises any of a resilient load balancing gateway, a network address translation gateway, a network intrusion detection gateway, a traffic flushing gateway, a virtual private network gateway, a multicast gateway, and an anycast gateway.
9. A data stream processing system, the system comprising a controller, a distribution entity and a network service cluster, wherein the network service cluster comprises at least one network service instance, each network service instance storing state data of a portion of a data stream;
the controller is used for elastically stretching the network service cluster and sending the updated distribution rule to the distribution entity;
the distribution entity is configured to instruct, according to the updated distribution rule, the network service instance in the network service cluster after elastic expansion and contraction to redistribute storage of state data of the multiple data flows, and send the multiple data flows to the network service instance in the network service cluster after elastic expansion and contraction according to the updated distribution rule;
and the network service instance in the network service cluster after elastic expansion is used for receiving at least one data flow and processing the at least one data flow according to the stored state data of the at least one data flow.
10. The system of claim 9,
the controller is further configured to send the original distribution rule to the distribution entity;
the distribution entity is further configured to send the plurality of data streams to the network service instances in the network service cluster according to the original distribution rule.
11. The system of claim 10,
the distribution entity is further configured to determine to enable the updated distribution rule to distribute the plurality of data streams, and delete the original distribution rule.
12. The system of any of claims 9 to 11,
the controller is further configured to modify a usage range of the service resource of the network service instance in the network service cluster after the elastic expansion and contraction.
13. The system of any of claims 9 to 12,
the distribution entity is used for sending an import data stream to a network service instance in the network service cluster after elastic expansion according to the updated distribution rule; and receiving an outlet data flow corresponding to the inlet data flow, and sending the outlet data flow to the network service instance according to the updated distribution rule.
14. The system of any of claims 9 to 13, wherein the network service cluster comprises at least one service pair, each service pair comprising two network service instances operating in a primary master mode or a primary standby mode, the updated distribution rules comprising a distribution table and a forwarding table;
the distribution entity is configured to determine a service pair corresponding to a data stream in the elastically stretched network service cluster according to the distribution table; determining a target network service instance in the service pair according to the forwarding table; and sending the data stream to the target network service instance.
15. A method of processing a plurality of data streams, comprising:
the distribution entity receives the updated distribution rule sent by the controller;
the distribution entity indicates the network service instance in the network service cluster after elastic expansion to redistribute the storage of the state data of the plurality of data streams according to the updated distribution rule;
and the distribution entity sends the data streams to the network service instances in the network service cluster after elastic expansion according to the updated distribution rule.
16. A distribution entity, comprising:
a communication unit for receiving the updated distribution rule transmitted by the controller;
a distribution unit, configured to instruct, according to the updated distribution rule, a network service instance in the network service cluster after elastic scaling to redistribute storage of state data of the multiple data flows;
the distribution unit is further configured to invoke the communication unit to send the multiple data streams to the network service instances in the elastically stretched network service cluster according to the updated distribution rule.
17. A computing device comprising a processor and a memory, the memory having stored therein computer-readable instructions, the processor executing the computer-readable instructions to perform the steps of the method of any one of claims 1 to 8 performed by a distribution entity.
18. A computer readable storage medium comprising computer readable instructions which, when run on a computer, cause the computer to perform the steps of the method of any one of claims 1 to 8 performed by a distribution entity.
19. A computer program product comprising computer readable instructions which, when run on a computer, cause the computer to perform the steps of the method of any one of claims 1 to 8 performed by a distribution entity.
CN202110509707.5A 2021-02-01 2021-05-10 Method for processing multiple data streams and related system Pending CN114844906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/075256 WO2022161501A1 (en) 2021-02-01 2022-01-30 Method for processing multiple data flows, and related system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110138515 2021-02-01
CN2021101385158 2021-02-01

Publications (1)

Publication Number Publication Date
CN114844906A true CN114844906A (en) 2022-08-02

Family

ID=82562569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110509707.5A Pending CN114844906A (en) 2021-02-01 2021-05-10 Method for processing multiple data streams and related system

Country Status (2)

Country Link
CN (1) CN114844906A (en)
WO (1) WO2022161501A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3216194B1 (en) * 2014-11-04 2020-09-30 Telefonaktiebolaget LM Ericsson (publ) Network function virtualization service chaining
CN106209402B (en) * 2015-04-30 2019-10-22 华为技术有限公司 A kind of telescopic method and equipment of virtual network function
US10999155B2 (en) * 2017-10-26 2021-05-04 Cisco Technology, Inc. System and method for hybrid and elastic services
CN111880929B (en) * 2020-07-07 2024-02-02 腾讯科技(深圳)有限公司 Instance management method and device and computer equipment
CN112000448B (en) * 2020-07-17 2023-08-25 北京计算机技术及应用研究所 Application management method based on micro-service architecture

Also Published As

Publication number Publication date
WO2022161501A1 (en) 2022-08-04

Similar Documents

Publication Publication Date Title
EP3355553B1 (en) Reliable load-balancer using segment routing and real-time application monitoring
CN111464592B (en) Load balancing method, device, equipment and storage medium based on micro-service
US20220377045A1 (en) Network virtualization of containers in computing systems
CN109937401B (en) Live migration of load-balancing virtual machines via traffic bypass
US9405571B2 (en) Method and system for abstracting virtual machines in a network comprising plurality of hypervisor and sub-hypervisors
US11153194B2 (en) Control plane isolation for software defined network routing services
US9729444B2 (en) High speed packet processing using a distributed hash table
WO2015058626A1 (en) Virtual network function network elements management method, device and system
US20110019531A1 (en) Method and system for fault tolerance and resilience for virtualized machines in a network
US20210011780A1 (en) Exchanging runtime state information between datacenters using a controller bridge
US11558478B2 (en) Scaling service discovery in a micro-service environment
US11824765B2 (en) Fast redirect of traffic when pods fail
US10827042B2 (en) Traffic optimization for multi-node applications
Shin et al. IRIS-HiSA: highly scalable and available carrier-grade SDN controller cluster
US20230342182A1 (en) Exchanging runtime state information between datacenters with a gateway using a controller bridge
CN112655185B (en) Apparatus, method and storage medium for service allocation in a software defined network
CN103684965A (en) Exchanging device allocated based on VDs and message transmitting method allocated based on VDs
EP3343879B1 (en) A system and method of managing flow state in stateful applications
US10791088B1 (en) Methods for disaggregating subscribers via DHCP address translation and devices thereof
CN114844906A (en) Method for processing multiple data streams and related system
AT&T
JP2017126976A (en) Packet transfer system and packet transfer method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination