WO2023134643A1 - 流数据的处理方法、系统、节点、电子设备及存储介质 - Google Patents

流数据的处理方法、系统、节点、电子设备及存储介质 Download PDF

Info

Publication number
WO2023134643A1
WO2023134643A1 PCT/CN2023/071419 CN2023071419W WO2023134643A1 WO 2023134643 A1 WO2023134643 A1 WO 2023134643A1 CN 2023071419 W CN2023071419 W CN 2023071419W WO 2023134643 A1 WO2023134643 A1 WO 2023134643A1
Authority
WO
WIPO (PCT)
Prior art keywords
real
processed
time data
monitoring
monitoring task
Prior art date
Application number
PCT/CN2023/071419
Other languages
English (en)
French (fr)
Inventor
陈小云
刘学生
李小进
龚辉
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023134643A1 publication Critical patent/WO2023134643A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0252Traffic management, e.g. flow control or congestion control per individual bearer or channel
    • H04W28/0263Traffic management, e.g. flow control or congestion control per individual bearer or channel involving mapping traffic to individual bearers or channels, e.g. traffic flow template [TFT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1042Peer-to-peer [P2P] networks using topology management mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1051Group master selection mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/02Hierarchically pre-organised networks, e.g. paging networks, cellular networks, WLAN [Wireless Local Area Network] or WLL [Wireless Local Loop]
    • H04W84/04Large scale networks; Deep hierarchical networks
    • H04W84/08Trunked mobile radio systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of data processing, in particular to a processing method, system, node, electronic device and storage medium of stream data.
  • Common flow computing frameworks such as: Spark, Flink, JStorm, etc. are all heavyweight flow computing frameworks.
  • the above-mentioned flow computing frameworks provide relatively complete traffic distribution and current limiting strategies, as well as cluster management and observability, which are more applicable In scenarios where the amount of data is large, such as the Internet, and stream processing capabilities need to abstract basic capabilities, but in 5G network management and control scenarios, the computing resources deployed by the system in 5G network management and control scenarios are limited, and it is not suitable to use heavyweight stream computing frameworks. Therefore, such as edge cloud, toB, network management and other operation and maintenance systems in 5G network management and control scenarios, lightweight real-time streaming computing technology is required to meet real-time processing capabilities.
  • the purpose of the present invention is to solve the above problems, provide a stream data processing method, system, node, electronic equipment and storage medium, reduce the system resources consumed by real-time stream data processing, and realize the purpose of lightweight processing real-time stream data.
  • the embodiment of this application provides a stream data processing method, which is applied to the computing nodes in the computing node cluster.
  • the method includes: after the computing nodes are started, sending the Node application; in the case of being selected as the master node, and after reading batches of real-time monitoring task information, generate calculation rules corresponding to each monitoring task, and send the generated monitoring tasks to the distributed coordination service cluster respectively
  • the corresponding calculation rules are used for other computing nodes in the computing node cluster to process the monitoring tasks based on the computing rules of the monitoring tasks to be processed obtained from the distributed coordination service cluster; obtain the real-time data of the monitoring tasks to be processed by the computing nodes, and The real-time data is processed according to the calculation rules corresponding to the monitoring tasks to be processed.
  • an embodiment of the present application provides a computing node, including: an election module, used to send an application for running for the master node to the distributed coordination service cluster after the computing node is started; a generating module, used to In the case of being selected as the master node, after reading batches of real-time monitoring task information, generate calculation rules corresponding to each monitoring task, and send the generated calculation rules corresponding to each monitoring task to the distributed coordination service cluster , for other computing nodes in the computing node cluster to process the monitoring tasks based on the computing rules of the monitoring tasks to be processed obtained from the distributed coordination service cluster; the processing module is used to obtain real-time data of the monitoring tasks to be processed by the computing nodes, And process the real-time data according to the calculation rules corresponding to the monitoring tasks to be processed.
  • an embodiment of the present application also provides a flow data processing system, including: an external system, a distributed coordination service cluster, a message middleware cluster, and a computing node cluster including at least one of the above computing nodes; wherein, The external system is used to send the established monitoring tasks to the computing nodes in the computing node cluster, and monitor the processing results of the monitoring tasks; the distributed coordination service cluster is used to process the application of computing nodes to run for the master node, store computing rules and The calculation rules are notified to the computing nodes in the computing node cluster; the message middleware cluster is used to store the real-time data of the monitoring task and the processing results of the monitoring task.
  • an embodiment of the present application also provides an electronic device, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information that can be executed by the at least one processor.
  • An instruction the instruction is executed by at least one processor, so that the at least one processor can execute the above stream data processing method.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program, and implementing the above stream data processing method when the computer program is executed by a processor.
  • the computing node selected as the master node by the distributed coordination service cluster generates the calculation rules corresponding to each monitoring task, and synchronizes them to other computing nodes through the distributed coordination service
  • the calculation rules of the monitoring task process the acquired real-time data, and finally each computing node obtains the final result of the monitoring task through the calculation rules and real-time data of the monitoring task to be processed by the node, which greatly reduces the resources consumed by the system to process the monitoring task , to achieve a lightweight method for processing real-time streaming data.
  • FIG. 1 is a schematic diagram of a stream data processing system provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of functional units of a computing node provided by an embodiment of the present application.
  • FIG. 3 is a flow chart of a method for processing stream data provided by an embodiment of the present application.
  • Fig. 4 is a flow chart of generating calculation rules provided by an embodiment of the present application.
  • Fig. 5 is a flow chart of processing real-time data provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a computing node provided by an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • An embodiment of the present application relates to a stream data processing method, which is applied to a computing node in a computing node cluster.
  • the method includes: after the computing node is started, sending an application for running for the master node to the distributed coordination service cluster; In the case of being selected as the master node, after reading batches of real-time monitoring task information, generate calculation rules corresponding to each monitoring task, and send the generated calculation rules corresponding to each monitoring task to the distributed coordination service cluster, For other computing nodes in the computing node cluster to process the monitoring tasks based on the computing rules of the monitoring tasks to be processed obtained from the distributed coordination service cluster; obtain the real-time data of the monitoring tasks to be processed by the computing nodes, and based on the pending monitoring tasks The calculation rules corresponding to the tasks process the real-time data.
  • the stream data processing system provided by the embodiment of this application is composed of external systems, distributed coordination service clusters, computing node clusters, and message middleware clusters, as shown in Figure 1:
  • the external system is the user of real-time streaming data computing, and is mainly responsible for delivering monitoring tasks to the real-time computing node cluster and monitoring the calculation results of the monitoring tasks.
  • the distributed coordination service cluster is used to complete the master election of computing nodes and notify the computing nodes of the master election results. It is also used to provide a computing rule storage and notify computing nodes of changes in computing rules.
  • the computing node cluster is completed in cooperation with the distributed coordination service cluster: registering the main selection participants, monitoring the main selection results, updating the calculation rules (initiated by the selected main node), and monitoring the update of the calculation rules; and cooperating with external systems to complete: Update the monitoring task and output the calculation result of the monitoring task; cooperate with the message middleware cluster to complete: obtain real-time data sources, broadcast monitoring task changes, obtain monitoring task changes, forward data not required by the node, and obtain data that the node needs to process , Send monitoring results.
  • the message middleware cluster includes the following message topics: monitoring task change topic, which is used to store the information of monitoring task changes, and real-time data source topic, which is used to store the original data reported by the monitoring object (the monitoring object is usually a specific device or a candidate on the system) , and monitor the election results), the original data is monitoring raw data reported for monitoring changes in global computing rules, real-time data forwarding topics, used to store real-time computing nodes assigning tags according to empty task numbers, real-time monitoring result topics, and using It is used to store the calculation results of monitoring tasks.
  • monitoring task change topic which is used to store the information of monitoring task changes
  • real-time data source topic which is used to store the original data reported by the monitoring object (the monitoring object is usually a specific device or a candidate on the system) , and monitor the election results)
  • the original data is monitoring raw data reported for monitoring changes in global computing rules
  • real-time data forwarding topics used to store real-time computing nodes assigning tags according to empty task numbers, real-time monitoring result topics, and
  • the calculation node in the embodiment of this application is composed of the following functional units: distributed coordination listener, calculation rule manager, data routing table, calculation engine, monitoring task management, data preprocessing and temporary data buffer, as shown in Figure 2 Show.
  • the distributed coordination listener is used to complete the following tasks: register the node to participate in the election of the master, register and monitor the global calculation rules, and in the case of the master, after making the decision of the calculation rules, save the result of the decision to the distributed coordination cluster And monitor the changes of the global calculation rules, and save the calculation rules of this node to the calculation rule manager.
  • the calculation rule manager is used to complete the following tasks: respond to the call of the distributed coordination listener, provide the definition function of the global calculation rule, respond to the call of the distributed coordination listener, save the global calculation rule defined by the master node to the node and adapt it Synchronously update the data routing table, operator update, and data preprocessing strategy with configuration calculation rules.
  • the data routing table is used to complete the following tasks: provide an interface for the calculation rule manager to update the data routing table, provide an interface for the calculation engine to read routing information, and guide the calculation engine to process the data calculation results.
  • the routing table is composed of two attributes: the monitoring task number and the routing link information; among them, the monitoring task number is the information of the monitoring task, usually a string or a number; the routing link information is a one-way linked list, which is used to describe The data flow direction of the application monitoring task, the linked list node value is a table name in the data buffer.
  • the calculation engine is composed of operators and operator schedulers, where operators are used to calculate buffer data tables, logic algorithms, and information about execution cycles.
  • the work that the operator needs to complete is as follows: obtain the data of the data buffer table, perform calculation processing, query the routing table information and send the calculation results to other data buffer tables, or send the calculation results (according to the data buffer table processed this time name to query the routing table, if the next node is empty, it means the final data will be sent).
  • the operator scheduler is used to periodically execute the algorithm logic of the operator according to the operator scheduling execution period requirements.
  • the monitoring task manager is used to complete the following tasks: respond to update monitoring tasks initiated by external systems and send monitoring task change information to facilitate adaptation processing by other computing nodes.
  • Data preprocessing is used to accomplish the following tasks:
  • the data preprocessing strategy is sent to the designated partition of the real-time data forwarding topic, so that other computing nodes can obtain and process the data.
  • the data preprocessing strategy is data preprocessing pull and send
  • the strategy of data to real-time data forwarding topic stipulates the data preprocessing partition to store each data, and specifies a partition location that needs to be sent to the real-time data forwarding topic according to the task number.
  • the temporary data buffer consists of N data tables, where the table name is consistent with the node value of the routing table.
  • the temporary data buffer is used to complete the following tasks: support data storage and extraction, and regularly calculate the throughput ratio of the read processing buffer, for example: record the current throughput ratio every minute.
  • step 301 after the computing node is started, an application for running for the master node is sent to the distributed coordination service cluster.
  • the real-time computing nodes are started, and each computing node registers in the distributed coordination service cluster to participate in the master node election.
  • step 302 in the case of being selected as the master node, after reading batches of real-time monitoring task information, generating calculation rules corresponding to each monitoring task, and sending the generated monitoring tasks to the distributed coordination service cluster
  • the corresponding computing rules are used for other computing nodes in the computing node cluster to process the monitoring tasks based on the computing rules of the monitoring tasks to be processed obtained from the distributed coordination service cluster.
  • the computing node when the computing node is selected as the master node, and after reading batches of real-time monitoring task information from the external system, formulate calculation rules for all monitoring tasks, and send the formulated calculation rules to Distributed coordination service cluster, in which the distributed coordination service cluster provides the storage of calculation rules, and the master node stores the calculation rules in the distributed coordination service cluster, so that other computing nodes can obtain pending monitoring through the distributed coordination service cluster The calculation rule for the task.
  • the computing rules corresponding to the monitoring tasks to be processed are obtained through the distributed coordination service cluster.
  • a real-time computing node that is not the master node registers and listens to the global computing rules in the distributed coordination cluster, and obtains the computing rules of the monitoring tasks to be processed by the node through the distributed coordination service cluster after the computing rules are updated, and sends The calculation rules of this node are saved to the calculation rule manager.
  • the calculation rules include: indication information used to indicate the storage location of real-time data of the corresponding monitoring task, operator information used to indicate the algorithmic logic of the corresponding monitoring task, and a route used to indicate the data flow direction of the corresponding monitoring task table information.
  • all calculation nodes save the calculation rules to the calculation rule manager in the node, and update the data routing table, operators and data preprocessing of the node according to the calculation rules Strategy, wherein, the indication information used to indicate the storage location of the real-time data of the corresponding monitoring task is placed in the data preprocessing strategy; the operator information used to represent the algorithm logic of the corresponding monitoring task is placed in the operator; The routing table information indicating the data flow direction of the corresponding monitoring task is placed in the data routing table.
  • step 303 the real-time data of the monitoring task to be processed by the computing node is obtained, and the real-time data is processed according to the calculation rule corresponding to the monitoring task to be processed.
  • real-time data is obtained from the message middleware cluster, the task number of the monitoring task to which the real-time data belongs is identified, and label information is added to the real-time data; wherein, the label information includes the task number; and the task number of the monitoring task to be processed
  • the same real-time data is used as the real-time data of the monitoring tasks to be processed, and the real-time data of the monitoring tasks to be processed by other computing nodes are sent to the message middleware cluster according to the instruction information, so that other computing nodes can obtain the real-time data of each monitoring task to be processed. data.
  • the computing node obtains real-time data and monitoring task information from the message middleware cluster, and labels the real-time data with a task number according to the monitoring requirements of the real-time data and monitoring tasks.
  • the real-time data is the data of multiple monitoring objects; query the data routing table, according to the task number label of the real-time data and the task number in the data routing table, find out the real-time data belonging to the monitoring task to be processed by this node, and it will be marked
  • the real-time data of the task number label of the task processed by this node is sent to the temporary data buffer of the computing node for the computing engine to process the real-time data; for the real-time data that is not processed by the computing node in the real-time data, the above real-time data , according to the data preprocessing strategy issued by the calculation rule management, the designated partition of the real-time data forwarding topic sent to the message middleware cluster is convenient for other nodes to obtain and process tasks.
  • the computing node can also determine the storage location of the real-time data of the monitoring task to be processed according to the instruction information; and obtain the real-time data of the monitoring task to be processed from the determined storage location.
  • the computing node can also obtain the data sent by other nodes to the specified area in the specified area of the real-time data forwarding topic according to the data preprocessing policy issued by the computing rule.
  • the message middleware cluster after obtaining real-time data from the message middleware cluster, store the real-time data of the monitoring task to be processed in the temporary data buffer; The data is processed; wherein, when the throughput ratio of the temporary data buffer is greater than 1, the speed of obtaining real-time data from the message middleware cluster is reduced.
  • the computing node regularly calculates the throughput ratio of reading the temporary data buffer.
  • the throughput ratio is greater than 1, the speed at which the computing node obtains real-time data from the message middleware cluster is reduced.
  • the throughput ratio exceeds N (N>1, The value of N is not limited), and the acquisition of real-time data from the message middleware cluster is suspended. In the case of other throughput ratios, the acquisition speed of real-time data is not limited.
  • the real-time data of the monitoring task to be processed is processed according to the operator information of the monitoring task to be processed until the processing result is the final result; wherein, whether the processing result is the final result is determined through routing table information.
  • the computing node calculates the processing result based on the real-time data and algorithmic logic of the monitoring task to be processed.
  • it judges whether it is over by querying the data flow direction of the monitoring task to be processed in the data routing table. Calculation, if the next node in the data routing table is empty, it means that the calculation is over, and the processing result is sent to the message middleware cluster, otherwise, the data is put into the temporary data buffer, and the calculation process is continued and the above operations are repeated until the calculation is completed .
  • the processing results are sent to the message middleware cluster for external systems to obtain the processing results through the message middleware cluster.
  • step 401 the real-time computing node registers with the distributed coordination cluster to monitor the computing rule saving node.
  • step 402 the real-time computing node registers with the distributed coordination cluster to participate in the distributed leader election.
  • step 403 the distributed coordination cluster sends a master election result notification to the real-time computing nodes.
  • step 404 the real-time computing node judges whether it is selected as the master node, and if it is selected as the master node, step 405 is executed.
  • step 405 when the node is selected as the master node, the batch real-time monitoring task information is read from the external system, and the real-time computing node formulates computing rules.
  • step 406 the real-time calculation node sends the calculation rule to the distributed coordination cluster, and saves the calculation rule into the distributed coordination cluster.
  • step 407 the distributed coordination cluster sends rule updates to the real-time computing nodes.
  • step 408 the real-time computing node reads the computing rules of its own node, and notifies the computing rule manager to perform matching processing.
  • step 501 data preprocessing obtains task information from monitoring task management.
  • step 502 data preprocessing pulls real-time data from a real-time data source topic.
  • step 503 data preprocessing adds task label information to real-time data.
  • step 504 the data preprocessing center queries the data routing table for data routing information.
  • step 505 the data routing table returns routing information to the data preprocessing center.
  • step 506 it is judged whether the real-time data belongs to the data processed by the node.
  • step 507 if the data does not belong to the node to process, the real-time data is sent to the real-time data forwarding topic, and the real-time data required by the node is obtained from the real-time data forwarding topic.
  • step 508 if it belongs to the processing of the node, put the real-time data into the table corresponding to the temporary data buffer.
  • step 509 the computing engine reads data from the temporary data buffer.
  • step 510 the calculation engine calculates a processing result based on the read data.
  • step 511 the computing engine reads routing table information from the data routing table.
  • step 512 the data routing table returns routing table information to the computing engine.
  • step 513 the calculation engine calculates the destination of the transmission of the processing result.
  • step 514 it is judged whether the destination is the subject of real-time monitoring results.
  • step 515 if the destination is the topic of real-time monitoring results, the calculation engine sends the final processing result to the topic of real-time monitoring results.
  • step 516 if the destination is not the subject of real-time monitoring results, the calculation engine sends the data to the temporary data buffer corresponding table, and reads the data from the temporary data buffer and performs calculations until the calculated sending destination It is the subject of real-time monitoring.
  • the stream data processing method of the embodiment of the present application is applied to a single-node scenario, such as in the field of communication network management.
  • base station A includes cell 1# and cell 2#, and these two cells report the number of connections and
  • index of dropped calls rate where the formula for dropped calls rate is: dropped calls/connected times*100%.
  • the original reported data for one minute is shown in Table 1.
  • the user has two requirements: statistics of the call drop rate of base station A per minute and statistics of the call drop rate of cell 1# every 30 seconds.
  • Task 1 calculation requirements: count the call drop rate of base station A per minute.
  • Task 2 calculation requirements: count the call drop rate of cell 1# every 30 seconds.
  • the routing table information of each task is shown in Table 4:
  • data preprocessing generates table A and table D, and the intermediate data calculated by the operator are table B and table C.
  • Table B is the calculation result of operator a
  • table C is the calculation result of operator b
  • the final result of task 1 is the calculation result of operator c
  • the final result of task 2 is the calculation result of operator The calculation result of subd.
  • the final result of task 1 is obtained: the call drop rate of base station A per minute is 0.67%; the final result of task 2: the call drop rate of cell 1# is 1% and 0% every 30 seconds .
  • the method of the embodiment of the present application can also be applied to multi-node scenarios.
  • node 1 processes the above task 1 and performs task 1 related tasks. calculation;
  • node 2 processes the above task 2 and performs calculations related to task 2, then the routing table and operator information in node 1 only contains information related to task 1; the routing table and operator information in node 2 only contains Contains information related to task 2.
  • the method provided by the embodiment of this application can also be applied to business scenarios related to edge computing.
  • edge computing For example, a city has deployed a computing center to realize intelligent transportation, but due to limited resources, this application can be used to implement
  • the method in the example is processed, and the raw data in the intelligent traffic scene are: camera photos and license plate numbers.
  • the three monitoring tasks defined by the ITS are as follows:
  • the system can plan three real-time computing nodes to handle the above three task scenarios respectively.
  • the stream data processing method uses the computing node selected as the master node by the distributed coordination service cluster to generate the calculation rules corresponding to each monitoring task, and synchronizes them to other computing nodes through the distributed coordination service cluster to provide
  • the computing nodes process the acquired real-time data according to the calculation rules of the monitoring tasks to be processed, and finally each computing node obtains the final result of the monitoring tasks through the calculation rules and real-time data of the monitoring tasks to be processed by the node, and also uses the message middleware cluster
  • the real-time data distribution is completed by the native load balancing feature of the distributed coordination service cluster, which balances the pressure on each computing node during the real-time stream data computing process by using the distributed coordination mechanism of the distributed coordination service cluster, and uses the elasticity of the computing node itself to support the system
  • the horizontal expansion of processing capabilities realizes a lightweight stream data processing method. It greatly reduces the resources consumed by the system to process monitoring tasks, and does not need to introduce other heavy flow processing frameworks. It has the characteristics of simple deployment, convenient management and strong
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • the embodiment of the present application also relates to a computing node, as shown in FIG. 6 , including: a campaign module 601 , a generation module 602 and a processing module 603 .
  • the election module 601 is used to send an application for running for the master node to the distributed coordination service cluster after the computing node is started;
  • the generation module 602 is used to read the After batches of real-time monitoring task information, generate the calculation rules corresponding to each monitoring task, and send the generated calculation rules corresponding to each monitoring task to the distributed coordination service cluster for other computing nodes in the computing node cluster based on the distribution
  • the calculation rules of the monitoring tasks to be processed obtained by the type coordination service cluster are used to process the monitoring tasks;
  • the processing module 603 is used to obtain the real-time data of the monitoring tasks to be processed by the computing nodes, and process the monitoring tasks according to the calculation rules corresponding to the monitoring tasks to be processed real-time data processing.
  • start the real-time computing node and the election module 601 registers in the distributed coordination service cluster to participate in the election for the master node.
  • the generation module 602 formulates calculation rules for all monitoring tasks, and the formulated calculation rules Send to the distributed coordination service cluster, where the distributed coordination service cluster provides the storage of calculation rules, and the master node stores the calculation rules in the distributed coordination service cluster, so that other computing nodes can obtain pending processing through the distributed coordination service cluster The calculation rules for monitoring tasks.
  • the computing node provided by the embodiment of the present application further includes a search module (not shown in the figure), and the computing node obtains real-time data and monitoring task information from the message middleware cluster, and according to the monitoring requirements of the real-time data and monitoring task , to label the real-time data with a task number.
  • a search module not shown in the figure
  • the real-time data is the data of multiple monitoring objects; query the data routing table, according to the task number label of the real-time data and the task number in the data routing table, find out the real-time data belonging to the monitoring task to be processed by this node, and it will be marked
  • the real-time data of the task number label of the task processed by this node is sent to the temporary data buffer of the computing node for the computing engine to process the real-time data; for the real-time data that is not processed by the computing node in the real-time data, the above real-time data , according to the data preprocessing strategy issued by the calculation rule management, the designated partition of the real-time data forwarding topic sent to the message middleware cluster is convenient for other nodes to obtain and process tasks.
  • the processing module calculates the processing result according to the real-time data and algorithmic logic of the monitoring task to be processed. When a new processing result is obtained, it judges whether it is over by querying the data flow direction of the monitoring task to be processed in the data routing table. Calculation, if the next node in the data routing table is empty, it means that the calculation is over, and the processing result is sent to the message middleware cluster, otherwise, the data is put into the temporary data buffer, and the calculation process is continued and the above operations are repeated until the calculation is completed .
  • This embodiment is a device embodiment corresponding to the above-mentioned embodiment of the method for processing stream data applied to computing nodes, and this embodiment can be implemented in cooperation with the above-mentioned embodiment of the method for processing stream data applied to computing nodes.
  • the relevant technical details mentioned in the above embodiments of the method for processing stream data applied to computing nodes are still valid in this embodiment, and are not repeated here to reduce repetition.
  • the relevant technical details mentioned in this implementation manner may also be applied to the above embodiments of the method for processing stream data applied to computing nodes.
  • modules involved in the above embodiments of the present application are logic modules.
  • a logical unit can be a physical unit, or a part of a physical unit, and can also be realized by a combination of multiple physical units.
  • units that are not closely related to solving the technical problems proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • An embodiment of the present application also provides an electronic device, as shown in FIG. 7 , including at least one processor 701; and a memory 702 communicatively connected to at least one processor 701; wherein, the memory 702 stores information that can be processed by at least one
  • the instructions executed by the processor 701 are executed by at least one processor 701, so that the at least one processor can execute the above method for processing stream data.
  • the memory and the processor are connected by a bus
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
  • Embodiments of the present application also provide a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a device which can be A single chip microcomputer, a chip, etc.
  • a processor processor
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Abstract

本申请实施例公开了一种流数据的处理方法、系统、节点、电子设备及存储介质,方法包括:在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请;在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供计算节点集群中的其他计算节点基于从分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理;获取计算节点待处理的监控任务的实时数据,并根据待处理的监控任务对应的计算规则对实时数据进行处理。

Description

流数据的处理方法、系统、节点、电子设备及存储介质
相关申请
本申请要求于2022年1月11日申请的、申请号为202210028930.2的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理领域,尤其涉及一种流数据的处理方法、系统、节点、电子设备及存储介质。
背景技术
常见的流计算框架,如:Spark、Flink、JStorm等都为重量级的流计算框架,上述流计算框架提供了较完善的分流和限流等策略,提供了集群管理和可观察能力,比较适用于互联网等数据量较大、且流处理能力需要抽象出基础能力的场景,但在5G网络管控场景中,5G网络管控场景系统部署的计算资源受限,不适合采用重量级的流计算框架,因此,例如5G网络管控场景边缘云、toB、网管等运维系统,需要轻量化实时流的计算技术来满足实时处理能力。
发明内容
本发明的目的在于解决上述问题,提供一种流数据的处理方法、系统、节点、电子设备及存储介质,减少了实时流数据处理消耗的系统资源,实现了轻量化处理实时流数据的目的。
为解决上述问题,本申请的实施例提供了一种流数据的处理方法,应用于计算节点集群中的计算节点,方法包括:在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请;在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供计算节点集群中的其他计算节点基于从分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理;获取计算节点待处理的监控任务的实时数据,并根据待处理的监控任务对应的计算规则对实时数据进行处理。
为解决上述问题,本申请的实施例提供了一种计算节点,包括:竞选模块,用于在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请;生成模块,用于在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供计算节点集群中的其他计算节点基于从分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理;处理模块,用于获取计算节点待处理的监控任务的实时数据,并根据待处理的监控任务对应的计算规则对实时数据进行处理。
为解决上述问题,本申请的实施例还提供了一种流数据的处理系统,包括:外部系统、分布式协调服务集群、消息中间件集群和包括至少一个上述计算节点的计算节点集群;其中,外部系统,用于将建立的监控任务发送给计算节点集群中的计算节点,并监听监控任务的处理结果;分布式协调服务集群,用于处理计算节点竞选主节点的申请、存储计算规则和将计算规则通知给计算节点集群中的计算节点;消息中间件集群,用于存储监控任务的实时数据 和监控任务的处理结果。
为解决上述问题,本申请的实施例还提供了一种电子设备,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述流数据的处理方法。
为解决上述问题,本申请的实施例还提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述流数据的处理方法。
在本申请实施例中,被分布式协调服务集群选为主节点的计算节点生成各监控任务所对应的计算规则,并通过分布式协调服务集群同步到其他计算节点,供计算节点根据待处理的监控任务的计算规则处理获取的实时数据,最后各计算节点通过本节点待处理的监控任务的计算规则和实时数据获得监控任务的最终结果,大幅度的减少了系统处理监控任务所需消耗的资源,达到了轻量化处理实时流数据的方法。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是本申请一实施例提供的流数据的处理系统的示意图;
图2是本申请一实施例提供的计算节点的功能单元的示意图;
图3是本申请一实施例提供的流数据的处理方法的流程图;
图4是本申请一实施例提供的生成计算规则的流程图;
图5是本申请一实施例提供的处理实时数据的流程图;
图6是本申请一实施例提供的计算节点的结构示意图;
图7是本申请一实施例提供的电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。
本申请的一实施例涉及一种流数据的处理方法,应用于计算节点集群中的计算节点,方法包括:在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请;在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供计算节点集群中的其他计算节点基于从分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理;获取计算节点待处理的监控任务的实时数据,并根据待处理的监控任务对应的计算规则对实时数据进行处理。
本申请实施例提供的流数据的处理系统由外部系统、分布式协调服务集群和计算节点集群、消息中间件集群构成,如图1所示:
外部系统为实时流数据计算的使用方,主要负责向实时计算节点集群下发监控任务和监听监控任务的计算结果。
分布式协调服务集群,用于完成计算节点的选主工作并向计算节点通知选主结果,还用 于提供一个计算规则存储,并通知计算节点计算规则的变化。
计算节点集群,与分布式协调服务集群协同完成:注册选主参与者、监听选主结果、更新计算规则(由被选为主的节点发起)、监听计算规则的更新;与外部系统协同完成:监控任务的更新和监控任务计算结果的输出;与消息中间件集群协同完成:获取实时数据源、广播监控任务变化、获取监控任务变化、转发非本节点需要的数据、获取本节点需要处理的数据、发送监控结果。
消息中间件集群包含如下消息主题:监控任务变化主题,用于存储监控任务变化的信息、实时数据源主题,用于存储监控对象上报的原始数据(监控对象通常是一个具体设备或者系统上参选,并监听选举结果),原始数据为监听全局计算规则的变化而上报的监控原始数据、实时数据转发主题,用于存储实时计算节点按空任务号分配完标签的数据、实时监控结果主题,用于存储监控任务的计算结果。
本申请实施例中的计算节点由如下功能单元构成:分布式协调监听器、计算规则管理器、数据路由表、计算引擎、监控任务管理、数据预处理以及临时数据缓冲区,具体如图2所示。
分布式协调监听器用于完成以下工作:注册本节点参与选主、注册监听全局计算规则、在选为主的情况下,进行计算规则的决策后,将决策后的结果保存至分布式协调集群中以及监听全局计算规则的变化,并把本节点计算规则保存到计算规则管理器中。
计算规则管理器用于完成以下工作:响应分布式协调监听器的调用,提供全局计算规则的定义功能、响应分布式协调监听器的调用,把主节点定义好的全局计算规则保存至本节点以及适配计算规则同步更新数据路由表、算子更新和数据预处理策略。
数据路由表用于完成以下工作:提供接口用于计算规则管理器更新数据路由表和提供接口用于计算引擎读取路由信息,指导计算引擎对数据计算结果进行处理。
另外,路由表由监控任务号和路由链路信息两个属性组成;其中,监控任务号为监控任务的信息,通常是一个字符串或者数字;路由链路信息为一个单向链表,用于描述应用监控任务的数据流向,链表节点值为数据缓冲区中的一个表名称。
计算引擎,由算子和算子调度器构成,其中,算子用于计算缓冲数据表、逻辑算法和执行周期的信息构成。
算子需要完成的工作如下:获取数据缓冲区表的数据,进行计算处理,查询路由表信息并把计算结果发送到其他数据缓冲区表,或者发出计算结果(根据本次处理的数据缓冲区表名来查询路由表,如果下一个节点为空,说明是最终数据则发送出去)。
算子调度器,用于根据算子调度执行周期要求,周期性的执行算子的算法逻辑。
监控任务管理器用于完成以下工作:响应外部系统发起的更新监控任务和发送监控任务变化信息便于其他计算节点适配处理。
数据预处理用于完成以下工作:
1.读取实时数据、读取监控任务的信息,根据实时数据和监控任务的监控要求,给数据打上任务号标签。
2.读取路由表信息,将实时数据中属于本计算节点处理的数据,放到对应的数据缓冲区的表中,对于不属于本计算节点应该处理的数据,则根据计算规则管理下发的数据预处理策略,发送到实时数据转发主题的指定分区,便于其他计算节点获取数据并对数据进行处理。
3.根据计算规则管理下发的数据预处理策略,读取其他计算节点发送到实时数据转发主 题的数据的指定分区数据,进行实时处理,其中,数据预处理策略为数据预处理拉取和发送数据到实时数据转发主题的策略,约定数据预处理存放各数据的分区,并根据任务号指定某个需要发送到实时数据转发主题的分区位置。
4.读取处理缓冲区吞吐比率,在吞吐比率在1-N倍的情况下,启动第一级反压处理,降低实时数据源主题相关数据的拉取速度;在吞吐比率在N倍以上的情况下,则启动第二级反压处理机制,暂停实时数据源主题数据拉取,其他情况下则解除限流。
其中,若数据处理过程中某个算子成为瓶颈,即其处理速率跟不上上游发送数据的速率,则需要对上游进行限速或者暂时断流,避免数据积压造成系统崩溃,上述操作就是反压处理。
临时数据缓冲区,由N个数据表组成,其中,表名和路由表的节点值是一致的。
临时数据缓冲区用于完成如下工作:支持数据的存放和提取,定时计算读取处理缓冲区吞吐比率,例如:每一分钟记录一次当前的吞吐比率。
下面对本实施例中的流数据的处理方法的实现细节进行具体的说明,以下内容仅为方便理解本方案的实现细节,并非实施本方案的必须。具体流程如图3所示,可包括如下步骤:
在步骤301中,在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请。
在一个例子中,启动实时计算节点,各计算节点在分布式协调服务集群注册参与竞选主节点。
在步骤302中,在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供计算节点集群中的其他计算节点基于从分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理。
在一个例子中,在计算节点被选为主节点的情况下,,并从外部系统读取到批量的实时监控任务信息后,制定全部监控任务的计算规则,并将制定好的计算规则发送至分布式协调服务集群,其中,分布式协调服务集群提供计算规则的储存,主节点通过将计算规则存储在分布式协调服务集群中,使得其他计算节点可以通过分布式协调服务集群获取待处理的监控任务的计算规则。
另外,在计算节点未被选为主节点的情况下,通过分布式协调服务集群获取待处理的监控任务对应的计算规则。
在一个例子中,不是主节点的实时计算节点,通过在分布式协调集群注册监听全局计算规则,在计算规则更新后通过分布式协调服务集群获取本节点待处理的监控任务的计算规则,并将本节点的计算规则保存到计算规则管理器。
其中,计算规则包括:用于指示对应的监控任务的实时数据的存储位置的指示信息、用于表示对应的监控任务的算法逻辑的算子信息和用于表示对应的监控任务的数据流向的路由表信息。
在一个例子中,主节点制定好监控任务的计算规则后,所有计算节点将计算规则保存至节点中的计算规则管理器,并根据计算规则更新本节点的数据路由表、算子和数据预处理策略,其中,用于指示对应的监控任务的实时数据的存储位置的指示信息放在数据预处理策略中;用于表示对应的监控任务的算法逻辑的算子信息放在算子中;用于表示对应的监控任务的数据流向的路由表信息放在数据路由表中。
在步骤303中,获取计算节点待处理的监控任务的实时数据,并根据待处理的监控任务 对应的计算规则对实时数据进行处理。
具体地说,从消息中间件集群获取实时数据,识别实时数据所属的监控任务的任务号,并为实时数据添加标签信息;其中,标签信息包含任务号;将与待处理的监控任务的任务号相同的实时数据作为待处理的监控任务的实时数据,并根据指示信息将其他计算节点待处理的监控任务的实时数据发送至消息中间件集群,供其他计算节点获取各待处理的监控任务的实时数据。
在一个例子中,计算节点从消息中间件集群获取实时数据和监控任务的信息,根据实时数据和监控任务的监控要求,给实时数据打上任务号标签。其中,实时数据为多个监控对象的数据;查询数据路由表,根据实时数据的任务号标签和数据路由表中的任务号,找出属于本节点待处理的监控任务的实时数据,将被打上本节点处理任务的任务号标签的实时数据,发送到计算节点的临时数据缓冲区,供计算引擎对实时数据进行处理;对于实时数据中不属于本计算节点处理的实时数据,则将上述实时数据,根据计算规则管理下发的数据预处理策略,发送到消息中间件集群的实时数据转发主题的指定分区,便于其他节点获取并进行任务处理。
另外,计算节点还可以根据指示信息确定待处理的监控任务的实时数据的存储位置;从确定的存储位置获取待处理的监控任务的实时数据。
在一个例子中,计算节点还可以根据计算规则下发的数据预处理策略,在实时数据转发主题的指定区域,获取其他节点发送至上述指定区域的数据。
在一实施例中,在从消息中间件集群获取实时数据后,将待处理的监控任务的实时数据存放在临时数据缓冲区;根据待处理的监控任务对应的计算规则对临时数据缓冲区的实时数据进行处理;其中,在临时数据缓冲区的吞吐比率大于1的情况下,降低从消息中间件集群获取实时数据的速度。
在一个例子中,计算节点定时计算读取临时数据缓冲区的吞吐比率,当吞吐比率大于1时,降低计算节点从消息中间件集群获取实时数据的速度,当吞吐比率超过N(N>1,N的取值不受限制)时暂停从消息中间件集群获取实时数据,在其他吞吐比率的情况下下,不对实时数据的获取速度做限制。
在一实施例中,根据待处理的监控任务的算子信息对待处理的监控任务的实时数据进行处理,直到处理结果为最终结果;其中,通过路由表信息确定处理结果是否为最终结果。
在一个例子中,计算节点根据待处理的监控任务的实时数据和算法逻辑计算处理结果,当得到一个新的处理结果时,通过查询数据路由表中待处理的监控任务的数据流向,判断是否结束计算,如果数据路由表的下一个节点为空,则说明计算结束,将处理结果发送至消息中间件集群,否则将数据放入临时数据缓冲区,继续进行计算处理并重复上述操作,直至计算结束。
在一个例子中,在根据待处理的监控任务对应的计算规则对实时数据进行处理后,将处理结果发送至消息中间件集群,供外部系统通过消息中间件集群获取处理结果。
为了使本申请实施例提供的流数据的处理方法的过程更加清楚,接下来参考图4,对计算规则生成处理流程进行具体说明,具体步骤如下:
在步骤401中,实时计算节点向分布式协调集群注册计算规则保存节点监听。
在步骤402中,实时计算节点向分布式协调集群注册参与分布式选主。
在步骤403中,分布式协调集群向实时计算节点发送选主结果通知。
在步骤404中,实时计算节点判断是否被选为主,在被选为主节点的情况下执行步骤405。
在步骤405中,在本节点被选为主节点的情况下,从外部系统读取到批量的实时监控任务信息,并实时计算节点进行计算规则的制定。
在步骤406中,实时计算节点向分布式协调集群发送计算规则,并将计算规则保存到分布式协调集群中。
在步骤407中,分布式协调集群向实时计算节点发送规则更新。
在步骤408中,实时计算节点读取本节点的计算规则,并通知计算规则管理器进行相适配处理。
为了使本申请实施例提供的流数据的处理方法更加清楚,在上述例子的基础上,参考图5,对实时数据计算处理流程进行具体说明,具体步骤如下:
在步骤501中,数据预处理从监控任务管理获取任务信息。
在步骤502中,数据预处理从实时数据源主题拉取实时数据。
在步骤503中,数据预处理给实时数据添加任务标签信息。
在步骤504中,数据预处理中心向数据路由表查询数据路由信息。
在步骤505中,数据路由表向数据预处理中心返回路由信息。
在步骤506中,判断实时数据是否属于本节点处理的数据。
在步骤507中,在不属于本节点处理数据的情况下,将实时数据发送到实时数据转发主题,并从实时数据转发主题获取本节点需要的实时数据。
在步骤508中,在属于本节点处理的情况下,将实时数据放入临时数据缓冲区对应的表中。
在步骤509中,计算引擎从临时数据缓冲区读取数据。
在步骤510中,计算引擎通过读取的数据计算出处理结果。
在步骤511中,计算引擎从数据路由表读取路由表信息。
在步骤512中,数据路由表向计算引擎返回路由表信息。
在步骤513中,计算引擎计算处理结果的发送的目的地。
在步骤514中,判断目的地是否为实时监控结果主题。
在步骤515中,在目的地为实时监控结果主题的情况下,计算引擎向实时监控结果主题发送最终处理结果。
在步骤516中,在目的地不为实时监控结果主题的情况下,计算引擎将数据发送到临时数据缓冲区对应表,并从临时数据缓冲区读取数据并进行计算,直至计算出的发送目的地为实时监控主题。
在一个例子中,本申请实施例的流数据的处理方法应用于单节点场景,如通讯网管领域,假设基站A包含小区1#和小区2#,这两个小区每30秒上报一次连接次数和掉话次数两个采集项的数据,用户关注掉话率这个指标,其中,掉话率的公式为:掉话次数/连接次数*100%。一分钟原始上报数据如表1所示。
表1
时间 采集周期 基站 小区 掉话次数 连接次数
2021-06-24 00:00:00 30秒 A 1# 1 100
2021-06-24 00:00:30 30秒 A 1# 0 200
2021-06-24 00:00:00 30秒 A 2# 0 300
2021-06-24 00:00:30 30秒 A 2# 5 300
用户有两个要求:统计基站A每分钟的掉话率和统计每30秒小区1#的掉话率。
为了满足用户的上述要求,需要在系统中建立两个实时算子,并分为任务1和任务2:
任务1,计算要求:统计基站A每分钟的掉话率。
任务2,计算要求:统计每30秒,小区1#的掉话率。
根据计算规则的要求,系统生成四个缓冲区表,具体内容如表2所示:
表2
表名 存放的数据
A 任务1的源数据缓冲区表
B 任务1时间汇总中间数据缓冲区表
C 任务1空间汇总中间数据缓冲区表
D 任务2源数据缓冲区表
根据计算规则的要求,系统生成4个算子,具体内容如表3所示:
表3
Figure PCTCN2023071419-appb-000001
各任务的路由表信息如表4所示:
表4
任务号 路由链路表
1 head->A->B->C->空
2 head->D->空
根据原始数据,数据预处理生成表A和表D两张表数据,算子计算出的中间数据为表B和表C。
表A
任务号 时间 采集周期 基站 小区 掉话次数 连接次数
1 2021-06-24 00:00:00 30秒 A 1# 1 100
1 2021-06-24 00:00:30 30秒 A 1# 0 200
1 2021-06-24 00:00:00 30秒 A 2# 0 300
1 2021-06-24 00:00:30 30秒 A 2# 5 300
表D
Figure PCTCN2023071419-appb-000002
Figure PCTCN2023071419-appb-000003
算子计算后产生结果,表B中的数据为算子a的计算结果,表C为算子b的计算结果,任务1的最终结果为算子c的计算结果,任务2的最终结果为算子d的计算结果。表B和表C的具体内容如下所示:
表B
任务号 时间 采集周期 基站 小区 掉话次数 连接次数
1 2021-06-24 00:01:00 1分钟 A 1# 1 300
1 2021-06-24 00:01:00 1分钟 A 2# 5 600
表C
任务号 时间 采集周期 基站 掉话次数 连接次数
1 2021-06-24 00:01:00 1分钟 A 6 900
通过上述表中的各中间数据,得到任务1的最终结果:基站A每分钟的掉话率为0.67%;任务2的最终结果:每30秒小区1#的掉话率为1%和0%。
另外,本申请实施例的方法还可应用于多节点场景,同上一例子,在有两个节点参与实时流计算的情况下,那么根据主节点决策,节点1处理上述任务1,进行任务1相关的计算;节点2处理上述任务2,进行任务2相关的计算,那么节点1中的路由表和算子信息,仅包含任务1相关的信息系;节点2中的路由表和算子信息,仅包含任务2相关的信息。
在一个例子中,本申请实施例提供的方法还可以应用于边缘计算相关的业务场景,例如某市为实现智能交通,部署了一个计算中心,但由于资源受限,因此,可以采取本申请实施例中的方法进行处理,智能交通场景中的原始数据为:摄像头照片和车牌号。智能交通系统定义的三个监控任务如下:
1.判断驾驶车辆是否闯红灯。
2.判断驾驶车辆是否按标志线行驶。
3.判断驾驶员是否系安全带。
为了分散处理压力,系统可以规划三个实时计算节点分别处理上述三种任务场景。
本申请实施例提供的流数据的处理方法,通过被分布式协调服务集群选为主节点的计算节点生成各监控任务所对应的计算规则,并通过分布式协调服务集群同步到其他计算节点,供计算节点根据待处理的监控任务的计算规则处理获取的实时数据,最后各计算节点通过本节点待处理的监控任务的计算规则和实时数据获得监控任务的最终结果,并且还利用了消息中间件集群的原生负载均衡特性来完成实时数据的分发,利用分布式协调服务集群的分布式协调机制均衡了实时流数据计算过程在各计算节点上的压力,利用了计算节点自身的弹缩性支持了系统处理能力的水平扩展,实现了一种轻量化的流数据的处理方式。大幅度的减少了系统处理监控任务所需消耗的资源,并且不需要引入其他重量流处理框架,具有部署简单、管理方便以及适用性强的特点。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请实施例还涉及一种计算节点,如图6所示,包括:竞选模块601、生成模块602以及处理模块603。
具体地说,竞选模块601,用于在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请;生成模块602,用于在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供计算节点集群中的其他计算节点基于从分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理;处理模块603,用于获取计算节点待处理的监控任务的实时数据,并根据待处理的监控任务对应的计算规则对实时数据进行处理。
在一个例子中,启动实时计算节点,竞选模块601在分布式协调服务集群注册参与竞选主节点。
在一个例子中,在计算节点被选为主节点的情况下,并从外部系统读取到批量的实时监控任务信息后,生成模块602制定全部监控任务的计算规则,并将制定好的计算规则发送至分布式协调服务集群,其中,分布式协调服务集群提供计算规则的储存,主节点通过将计算规则存储在分布式协调服务集群中,使得其他计算节点可以通过分布式协调服务集群获取待处理的监控任务的计算规则。
在一个例子中,本申请实施例提供的计算节点还包括查找模块(图中未示出),计算节点从消息中间件集群获取实时数据和监控任务的信息,根据实时数据和监控任务的监控要求,给实时数据打上任务号标签。其中,实时数据为多个监控对象的数据;查询数据路由表,根据实时数据的任务号标签和数据路由表中的任务号,找出属于本节点待处理的监控任务的实时数据,将被打上本节点处理任务的任务号标签的实时数据,发送到计算节点的临时数据缓冲区,供计算引擎对实时数据进行处理;对于实时数据中不属于本计算节点处理的实时数据,则将上述实时数据,根据计算规则管理下发的数据预处理策略,发送到消息中间件集群的实时数据转发主题的指定分区,便于其他节点获取并进行任务处理。
在一个例子中,处理模块根据待处理的监控任务的实时数据和算法逻辑计算处理结果,当得到一个新的处理结果时,通过查询数据路由表中待处理的监控任务的数据流向,判断是否结束计算,如果数据路由表的下一个节点为空,则说明计算结束,将处理结果发送至消息中间件集群,否则将数据放入临时数据缓冲区,继续进行计算处理并重复上述操作,直至计算结束。
本实施方式为上述应用于计算节点的流数据的处理方法实施例相对应的装置实施例,本实施方式可与上述应用于上述应用于计算节点的流数据的处理方法实施例互相配合实施。上述应用于上述应用于计算节点的流数据的处理方法实施例提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述应用于上述应用于计算节点的流数据的处理方法实施例中。
本申请上述实施方式中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施方式中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施方式中不存在其它的单元。
本申请的实施例还提供一种电子设备,如图7所示,包括至少一个处理器701;以及,与至少一个处理器701通信连接的存储器702;其中,存储器702存储有可被至少一个处理器701执行的指令,指令被至少一个处理器701执行,以使至少一个处理器能够执行上述流 数据的处理方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果,未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。
本申请的实施例还提供一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
上述实施例是提供给本领域普通技术人员来实现和使用本申请的,本领域普通技术人员可以在脱离本申请的发明思想的情况下,对上述实施例做出种种修改或变化,因而本申请的保护范围并不被上述实施例所限,而应该符合权利要求书所提到的创新性特征的最大范围。

Claims (11)

  1. 一种流数据的处理方法,其中,应用于计算节点集群中的计算节点,包括:
    在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请;
    在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向所述分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供所述计算节点集群中的其他计算节点基于从所述分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理;
    获取计算节点待处理的监控任务的实时数据,并根据所述待处理的监控任务对应的计算规则对所述实时数据进行处理。
  2. 根据权利要求1所述的流数据的处理方法,其中,所述计算规则包括:用于指示对应的监控任务的实时数据的存储位置的指示信息;
    所述获取本计算节点待处理的监控任务的实时数据,包括:
    根据所述指示信息确定所述待处理的监控任务的实时数据的存储位置;
    从确定的存储位置获取所述待处理的监控任务的实时数据。
  3. 根据权利要求2所述的流数据的处理方法,其中,所述获取本计算节点待处理的监控任务的实时数据,还包括:
    从消息中间件集群获取实时数据,识别所述实时数据所属的监控任务的任务号,并为所述实时数据添加标签信息;其中,所述标签信息包含任务号;
    将与所述待处理的监控任务的任务号相同的实时数据作为所述待处理的监控任务的实时数据,并根据所述指示信息将所述其他计算节点待处理的监控任务的实时数据发送至所述消息中间件集群,供所述其他计算节点获取各所述待处理的监控任务的实时数据。
  4. 根据权利要求3所述的流数据的处理方法,其中,所述方法还包括:
    在所述从消息中间件集群获取实时数据后,将所述待处理的监控任务的实时数据存放在临时数据缓冲区;
    所述根据所述待处理的监控任务对应的计算规则对所述实时数据进行处理,包括:
    根据所述待处理的监控任务对应的计算规则对所述临时数据缓冲区的实时数据进行处理;
    其中,在所述临时数据缓冲区的吞吐比率大于1的情况下,降低从所述消息中间件集群获取实时数据的速度。
  5. 根据权利要求1所述的流数据的处理方法,其中,所述计算规则包括:用于表示对应的监控任务的算法逻辑的算子信息和用于表示对应的监控任务的数据流向的路由表信息;所述根据所述待处理的监控任务对应的计算规则对所述实时数据进行处理,包括:
    根据所述待处理的监控任务的算子信息对所述待处理的监控任务的实时数据进行处理,直到处理结果为最终结果;其中,通过所述路由表信息确定所述处理结果是否为最终结果。
  6. 根据权利要求1-5中任一项所述的流数据的处理方法,其中,所述方法还包括:
    在未被选为主节点的情况下,通过所述分布式协调服务集群获取所述待处理的监控任务对应的计算规则;
    基于从所述分布式协调服务集群获取的所述待处理的监控任务的计算规则进行监控任务的处理。
  7. 根据权利要求1-5中任一项所述的流数据的处理方法,其中,所述方法还包括:
    在根据所述待处理的监控任务对应的计算规则对所述实时数据进行处理后,将所述处理结果发送至所述消息中间件集群,供外部系统通过所述消息中间件集群获取所述处理结果。
  8. 一种计算节点,包括:
    竞选模块,设置为在计算节点启动后,向分布式协调服务集群发送用于竞选主节点的申请;
    生成模块,设置为在被选为主节点的情况下,并读取批量的实时的监控任务信息后,生成各监控任务分别对应的计算规则,并向所述分布式协调服务集群发送生成的各监控任务分别对应的计算规则,供所述计算节点集群中的其他计算节点基于从所述分布式协调服务集群获取的待处理的监控任务的计算规则进行监控任务的处理;
    处理模块,设置为获取计算节点待处理的监控任务的实时数据,并根据所述待处理的监控任务对应的计算规则对所述实时数据进行处理。
  9. 一种流数据的处理系统,其中,包括:外部系统、分布式协调服务集群、消息中间件集群和包括至少一个如权利要求8所述的计算节点的计算节点集群;
    其中,所述外部系统,用于将建立的监控任务发送给所述计算节点集群中的计算节点,并监听所述监控任务的处理结果;
    所述分布式协调服务集群,用于处理所述计算节点竞选主节点的申请、存储计算规则和将所述计算规则通知给计算节点集群中的计算节点;
    所述消息中间件集群,用于存储所述监控任务的实时数据和所述监控任务的处理结果。
  10. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任一项所述的流数据的处理方法。
  11. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的流数据的处理方法。
PCT/CN2023/071419 2022-01-11 2023-01-09 流数据的处理方法、系统、节点、电子设备及存储介质 WO2023134643A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210028930.2A CN116471627A (zh) 2022-01-11 2022-01-11 流数据的处理方法、系统、节点、电子设备及存储介质
CN202210028930.2 2022-01-11

Publications (1)

Publication Number Publication Date
WO2023134643A1 true WO2023134643A1 (zh) 2023-07-20

Family

ID=87182969

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071419 WO2023134643A1 (zh) 2022-01-11 2023-01-09 流数据的处理方法、系统、节点、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN116471627A (zh)
WO (1) WO2023134643A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117118939A (zh) * 2023-10-24 2023-11-24 腾讯科技(深圳)有限公司 一种数据处理方法,装置、设备以及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180336023A1 (en) * 2017-05-16 2018-11-22 Bank Of America Corporation Distributed storage framework information server platform architecture
CN110247954A (zh) * 2019-05-15 2019-09-17 南京苏宁软件技术有限公司 一种分布式任务的调度方法及系统
CN111708627A (zh) * 2020-06-22 2020-09-25 中国平安财产保险股份有限公司 基于分布式调度框架的任务调度方法以及装置
CN112104751A (zh) * 2020-11-10 2020-12-18 中国电力科学研究院有限公司 调控云数据处理方法、装置及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180336023A1 (en) * 2017-05-16 2018-11-22 Bank Of America Corporation Distributed storage framework information server platform architecture
CN110247954A (zh) * 2019-05-15 2019-09-17 南京苏宁软件技术有限公司 一种分布式任务的调度方法及系统
CN111708627A (zh) * 2020-06-22 2020-09-25 中国平安财产保险股份有限公司 基于分布式调度框架的任务调度方法以及装置
CN112104751A (zh) * 2020-11-10 2020-12-18 中国电力科学研究院有限公司 调控云数据处理方法、装置及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117118939A (zh) * 2023-10-24 2023-11-24 腾讯科技(深圳)有限公司 一种数据处理方法,装置、设备以及存储介质
CN117118939B (zh) * 2023-10-24 2024-01-30 腾讯科技(深圳)有限公司 一种数据处理方法,装置、设备以及存储介质

Also Published As

Publication number Publication date
CN116471627A (zh) 2023-07-21

Similar Documents

Publication Publication Date Title
WO2017063441A1 (zh) 一种数据库访问控制方法,及装置
CN107545338B (zh) 业务数据处理方法及业务数据处理系统
US8463809B2 (en) Method and computing system for distributed stream data processing using plural of computers
EP2445238B1 (en) Method and system for providing user service data
CN112148484B (zh) 一种基于耦合度的微服务在线分配方法与系统
WO2023134643A1 (zh) 流数据的处理方法、系统、节点、电子设备及存储介质
WO2022007781A1 (zh) 任务处理方法、边缘计算设备、计算机设备和介质
CN108509280B (zh) 一种基于推送模型的分布式计算集群本地性调度方法
CN116389491B (zh) 一种云边算力资源自适应计算系统
CN106293933A (zh) 一种支持多大数据计算框架的集群资源配置与调度方法
CN114710571A (zh) 数据包处理系统
CN114138434A (zh) 一种大数据任务调度系统
CN116777182A (zh) 半导体晶圆制造执行任务派工方法
CN109976873B (zh) 容器化分布式计算框架的调度方案获取方法及调度方法
CN111404818A (zh) 一种面向通用多核网络处理器的路由协议优化方法
Li et al. Efficient adaptive matching for real-time city express delivery
CN111782627B (zh) 面向广域高性能计算环境的任务与数据协同调度方法
CN116684418B (zh) 基于算力服务网关的算力编排调度方法、算力网络及装置
US11700189B2 (en) Method for performing task processing on common service entity, common service entity, apparatus and medium for task processing
CN113347430B (zh) 一种硬件转码加速设备的分布式调度装置及其使用方法
WO2022111466A1 (zh) 任务调度方法、控制方法、电子设备、计算机可读介质
CN114666226B (zh) 一种大规模边缘集群管理方法和系统
US20220245474A1 (en) Implementation of Rules in a Computing System
CN111767043B (zh) 基于业务调度引擎的跨系统业务调度方法和系统
CN112905351B (zh) 一种gpu和cpu负载调度方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23739985

Country of ref document: EP

Kind code of ref document: A1