CN114489598A - Storm task management and scheduling method - Google Patents

Storm task management and scheduling method Download PDF

Info

Publication number
CN114489598A
CN114489598A CN202210053528.XA CN202210053528A CN114489598A CN 114489598 A CN114489598 A CN 114489598A CN 202210053528 A CN202210053528 A CN 202210053528A CN 114489598 A CN114489598 A CN 114489598A
Authority
CN
China
Prior art keywords
storm
node
task
information
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210053528.XA
Other languages
Chinese (zh)
Other versions
CN114489598B (en
Inventor
高晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XCMG Hanyun Technologies Co Ltd
Original Assignee
XCMG Hanyun Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XCMG Hanyun Technologies Co Ltd filed Critical XCMG Hanyun Technologies Co Ltd
Priority to CN202210053528.XA priority Critical patent/CN114489598B/en
Publication of CN114489598A publication Critical patent/CN114489598A/en
Application granted granted Critical
Publication of CN114489598B publication Critical patent/CN114489598B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Abstract

The invention discloses a storm task management and scheduling method, which is characterized in that preset special consumption kafka spitout and different types of bolts are built in, only business processing logic is needed to be added to the bolts, meanwhile, topology design and parameter configuration are completed through web page dragging and node connecting lines, the storm task is submitted by taking an improved dolphinscheduler as scheduling service, and a storm collection log file and a rabbitmq are used for issuing and subscribing storm log real-time messages. The storm task management and scheduling method provided by the invention improves the development efficiency, realizes the checking function due to stable task operation, and facilitates debugging.

Description

Storm task management and scheduling method
Technical Field
The invention relates to a storm task management and scheduling method, and belongs to the technical field of big data processing.
Background
In the field of industrial big data, storm real-time calculation tasks are often used for completing operations such as data message analysis, terminal alarm, index processing, data storage and the like, storm tasks are usually developed, corresponding spout and bolt types need to be inherited and realized in codes, topology is constructed, distribution strategies are specified, various parameters are set, the codes are made into jar packets, the dependent public jar packets are added into an extib directory of the storm, and the tasks are submitted in a storm jar parameter mode.
The storm task development mode is low in efficiency, a large amount of work needs to be achieved in codes, for example, kafka data is consumed in spout, bolt interacts with other large data components, topology design is achieved in the codes through an API, the topology design is not visual enough, parameter setting is not flexible enough, a third party jar packet placed under an extib directory easily causes the problem of conflict with jar packet versions depending on services, and the problem is extremely difficult to eliminate. In addition, because the storm task may be run on multiple machines, each machine may run multiple storm worker processes, resulting in multiple log files, which causes inconvenience in storm log viewing, task debugging and operation and maintenance.
The storm task submitting and running are achieved through a manual command executing mode, misoperation exists in the storm task submitting and running mode, hidden dangers are brought to the safety of a server system, and meanwhile, a scheduling strategy is not used for guaranteeing stable and reliable submitting and running of tasks.
Disclosure of Invention
The purpose is as follows: in order to overcome the defects of the storm task in the processes of development, operation, debugging and the like, the invention provides a storm task management and scheduling method, which realizes the functions of storm topology construction, parameter configuration, task scheduling, jar packet resource isolation, log viewing and the like, and solves the problems of low storm real-time processing task development efficiency, difficult deployment, troublesome operation and maintenance and the like in the industrial big data field.
The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a storm task management and scheduling method comprises the following steps:
parameters of data structures of the spout node and the bolt node are set in the web page, and the spout node and the bolt node are connected on the web page to complete storm topological structure abstraction and form a storm task.
And submitting the storm task to a dolphinscheudler through an interface, judging whether the storm task is the submitted storm task or not by the dolphinscheudler according to a jobType field of the task type, if the storm task is the storm task, selecting a proper machine by the dolphinscheudler according to the actual running condition and scheduling strategy of the machine to submit the storm task, downloading jar package resources from hdfs to the selected machine on the selected machine by extracting information of a jobResourceList parameter and an nJar parameter in the storm task, and executing a command to submit the storm task to a storm cluster.
When the storm cluster receives the submitted storm task, the main method defined by the mainClass is executed firstly, params information is obtained in the main method, node information defined by nodes is extracted, and all nodes are traversed to map the storm topological structure in an abstract mode to be the storm topological structure.
And setting the operation parameters of the storm topology according to the parameters defined by the storm TopologyConfig, and submitting the storm topology to the storm cluster for operation through a storm topology submitter.
Preferably, the method further comprises the following steps:
aiming at the storm task in operation, host and port in a storm reset interface are respectively set as host and port information of the storm UI, overview information of the storm task is obtained through the storm reset interface, a storm topological structure id is obtained according to the overview information, the storm topological structure id is used as a value of a parameter topologyid parameter in the storm reset interface, and operation worker information of the storm task is obtained.
According to the worker information, building a flume agent configuration file on the selected machine, and generating a flume agent configuration file for each worker, wherein the source type in the flume agent configuration file uses the exec of flume itself.
For the flume sink type, RabbitmqPublishsink is adopted, the RabbitmqPublishsink reads data in flume channels and publishes the data to a subject of rabbitmq, the subject name is storm task name, and when an exchange type is declared in rabbitmq, fanout is selected, namely a subscription mode is published.
Uploading the flash agent configuration file to a machine where the storm log file is located by using a paramiko module of python, remotely calling a flash-ng agent command to start the flash agent, running the flash agent in a nohup mode, sending an execution command to the flash agent by using an invoke _ shell of the paramiko module, and meanwhile, sleeping the paramiko module for a certain time to wait for an execution result of the command and returning the execution result to the paramiko module.
The web page is connected with the rabbitmq through the websocket, the storm task log information sent to the rabbitmq by the rabbitmq publishing link in the flash agent is subscribed, and the subscribed storm task log information is displayed on the web page.
Preferably, the data structure of the spout node and the bolt node is as follows:
{
"nodeId": "",
"preNodeArray": "",
"nodeName": "",
"nodeBusinessConfig": {},
"nodeDescription": "",
"className": "",
"nodeType": "",
"grouping": "",
"taskNum": ,
"nodeResourceList": ""
}
wherein nodeId represents a node unique identifier; the preNodeArray represents a father node id, and a plurality of father node ids are connected by using a plurality of commas; nodeName represents the node name; the nodeBusinessConfig represents the node service configuration information; nodeDescription represents a node description; the className represents the complete class name of the node operation; nodeType represents a node type, 1 is an spout node, and 2-bit bolt nodes; grouping represents a grouping mode of data from a father node to the node, wherein 1 is shuffleGrouping, namely random grouping, 2 is fieldsGrouping, namely grouping according to fields, and 3 is localOrShufflegrouping, namely local or random grouping; taskNum represents the number of executors of the running node; nodeResourceList represents a list of resources of the node that use jar packets.
Preferably, the nodeBusinessConfig data structure of the spout node is as follows:
"nodeBusinessConfig": {
"kafkaSpoutConfig": {
"groupField": "",
"needDeserialize": "",
"deserializeClass": "",
"zookeeperConnect": "",
"groupId": "",
"topic": "",
"autoOffsetReset": ""
}
}
wherein kafka SpoutConfig is a data structure of kafka connection information, groupFaield indicates the name of a grouping field for data transmission to a next node, needDeserialize indicates whether to perform deserialization on the kafka connection information, 1 is to perform deserialization, and 2 is not to perform deserialization; deserializeClass represents the class name of the kafka ligation information reverse order class; zookeeper connect represents zookeeper connection information; groupId represents consumer group id; topic represents the message topic in kafka; autoOffsetReset denotes the Kafka consumption policy.
Preferably, the nodeResourceList data structure of the bolt node is as follows:
"nodeResourceList": [{
"res": "",
"name": "",
"id": ""
}]
wherein: res represents the file name of the jar package on hdfs, name is the alias of the jar package file, and id is the unique identifier of the jar package.
Preferably, the storm task data structure is as follows:
{
"jobName ": "",
"jobDescription": "",
"jobType": "",
"params": {
"nodes": [],
"mainClass": "",
"jobBusinessConfig": {
"stormTopologyConfig": {
"topologyName": "",
"disableFallBackOnJavaSerialization": "",
"registerSerializeClass": "",
"serializeImplementClass": "",
"reliableAck": "",
"topologyWorkers": ""
}
},
"mainJar": {
"id":
},
"jobResourceList": []
}
}
wherein: jobName represents a task name; jobDescription represents task description information; the jobType represents a task type, and is defaulted to a storm type; params is a task parameter, wherein nodes represent node information, the data type is a json array, the mainClass represents a running main method, the jobbusiness config represents service configuration of a task, the mainJar represents jar package information to which the running main method belongs, and the jobResourceList represents all jar package resource information required by a bolt node. The method comprises the steps that a storm TopologyConfig parameter in jobBusinessConfig represents storm task operation information, topologyName represents a topology name, disableFallBackOnJavaSerialization represents whether a JVM self serialization mechanism is forbidden or not, 1 represents yes, 2 represents no, registerSerrializeClass represents a registered serialized class, serializeImplementClass represents a serialization realization class, relibleAck represents whether a reliable transmission mechanism is enabled or not, 1 represents yes, 2 represents no, and topologyWorkers represent the number of operation workers of tasks.
Preferably, the command for executing the command to submit the storm task to the storm cluster is as follows:
storm jar mainJar mainClass params --jar dependJars
wherein: the mainJar represents the mainJar parameter information in the storm task; the mainClass represents the mainClass parameter information in the storm task; params represents params parameter information in the storm task; the dependJars is the information of a jobResourceList jar packet in the storm task, and a plurality of jar packets are separated by commas.
As a preferred scheme, the method for mapping the storm topology abstraction to the storm topology by traversing all nodes includes the following steps:
a. judging the node type nodeType, if the node type nodeType is 1, indicating that the node is an spout node, and executing operation b; otherwise, the node is a bolt node, and operation c is executed.
b. Defining a spout node, setting the spout node by using the parameter information of the node, wherein nodeId is the unique identification of the node, instantiating a kafka consumer in the spout node according to the zookeeper connection address, the topic name and the autoOffsetReset consumption strategy in the parameter kafka SpoutConfig, consuming kafka data, and deserializing the message data in the kafka according to needDeserialize and DeserialyReset parameters.
c. Defining a bolt node, setting the bolt node by using the parameter information of the node, wherein the nodeId is the unique identifier of the node, setting a father node of the bolt node according to a preNodeArray parameter, and simultaneously setting a data grouping mode from the father node to the node according to a grouping parameter.
Preferably, the source type in the flash agent configuration file is obtained by using a flash self-contained exec through executing the following commands:
tail-F /stormLogs/stormId/workerPort/worker.log
log directory information is acquired by the stormLogs through an interface, the stormld is topology id information acquired by the interface, the workport is occupied port information acquired by the interface, and the worker.
Preferably, for the operation of stopping the storm log from collecting the flash agent, a parent module exec _ command is used for sending a kill command to stop the flash agent.
Has the advantages that: the storm task management and scheduling method provided by the invention has the following beneficial effects:
1. the storm topology design and task management are completed through the web page, and the pre-defined special spout for consuming the kafka message enables development code amount of business personnel to be reduced, and development efficiency is greatly improved.
2. By modifying dolphinscheduler, the method supports storm task scheduling, and ensures stable and reliable submission of the storm task.
3. And submitting jar packages dependent on the tasks in a jar parameter mode to avoid dependent package version conflicts.
4. By using the paramiko module of python, the issue of the flash agent configuration file and the remote command calling are realized, the problems of ssh secret-free invalidation and the like are avoided, and the correct start of the flash agent and the collection of the storm running log are ensured.
5. By adopting the functions of flash acquisition of log file contents of storm task operation and rabbitmq publishing and subscribing, the contents of multiple machines and multiple log files can be checked in real time, so that the debugging, operation and maintenance convenience is greatly improved.
Drawings
Fig. 1 is a schematic flow chart of the system.
Fig. 2 is a schematic diagram of a storm topology corresponding to the embodiment.
Detailed Description
The present invention will be further described with reference to the following specific embodiments.
As shown in fig. 1, a storm task management and scheduling method includes the following steps:
step 1: designing the data structure of the spout node and the bolt node in the web page as follows:
{
"nodeId": "",
"preNodeArray": "",
"nodeName": "",
"nodeBusinessConfig": {},
"nodeDescription": "",
"className": "",
"nodeType": "",
"grouping": "",
"taskNum": ,
"nodeResourceList": ""
}
wherein nodeId represents a node unique identifier; the preNodeArray represents a father node id, and a plurality of father node ids are connected by using a plurality of commas; nodeName represents the node name; the nodeBusinessConfig represents the node service configuration information; nodeDescription represents a node description; className represents the complete class name that the node runs; nodeType represents a node type, 1 is an spout node, and 2-bit bolt nodes; grouping represents a grouping mode of data from a father node to the node, wherein 1 is shuffleGrouping, namely random grouping, 2 is fieldsGrouping, namely grouping according to fields, and 3 is localOrShufflegrouping, namely local or random grouping; taskNum represents the executor number of running nodes, namely the running concurrency number of the nodes; nodeResourceList represents a list of nodes using jar packet resources.
For the parameter of the data structure set by the spout node on the web page, the parameter comprises a nodeName node name, nodeDescription node description, nodeBusinessConfig node service configuration information and taskNum node running concurrency number (consistent with the partition number of kafka topic), the parameter of other data structures is automatically set by the system without manual setting, wherein the node service configuration information is mainly kafka connection information, and the nodeBusinessConfig data structure of the spout node is as follows:
"nodeBusinessConfig": {
"kafkaSpoutConfig": {
"groupField": "",
"needDeserialize": "",
"deserializeClass": "",
"zookeeperConnect": "",
"groupId": "",
"topic": "",
"autoOffsetReset": ""
}
}
wherein kafka SpoutConfig is a data structure of kafka connection information, groupFaield indicates the name of a grouping field for data transmission to a next node, needDeserialize indicates whether to perform deserialization on the kafka connection information, 1 is to perform deserialization, and 2 is not to perform deserialization; deserializzeclass represents the class name of the kafka connection information deserialization class (if deserialization of kafka connection information is required, the message in kafka is a json string by default); zookeeper connect represents zookeeper connection information; groupId represents consumer group id; topic represents the message topic in kafka; autoOffsetReset represents the Kafka consumption policy, divided into smalllest and larget.
Setting parameters of a data structure on a web page for a bolt node, wherein the parameters comprise: the method comprises the following steps of automatically setting parameters of other data structures by a system without manual setting, wherein the data structure of the used jar package resource list of the node of the bolt node is as follows:
"nodeResourceList": [{
"res": "",
"name": "",
"id": ""
}]
wherein: res represents the file name of the jar package on hdfs (distributed file system), the name is the alias of the jar package file, and the id is the unique identifier of the jar package.
The storm topological structure abstraction is completed by connecting the spout node and the bolt node on the web page, and a storm task is formed, if the node A is a father node of the node B through the connection setting, the prenoarray attribute value of the node B contains the nodeId attribute value of the node A, namely each child node only needs to store the nodeId information of the father node, and does not need to store the nodeId information of the child node.
Storing the storm task on a web page, and designing a data structure of the storm task as follows:
{
"jobName ": "",
"jobDescription": "",
"jobType": "",
"params": {
"nodes": [],
"mainClass": "",
"jobBusinessConfig": {
"stormTopologyConfig": {
"topologyName": "",
"disableFallBackOnJavaSerialization": "",
"registerSerializeClass": "",
"serializeImplementClass": "",
"reliableAck": "",
"topologyWorkers": ""
}
},
"mainJar": {
"id":
},
"jobResourceList": []
}
}
wherein: jobName represents a task name; jobDescription represents task description information; the jobType represents a task type, and is defaulted to a storm type; params is a task parameter, wherein nodes represent node information, the data type is a json array, the mainClass represents the operation main method, the jobbusiness config represents the service configuration of the task, the mainJar represents jar package information to which the operation main method belongs, the jobResourceList represents all jar package resource information required by the bolt node, and the format is consistent with the nodeResourceList of the bolt node. The storm TopologyConfig parameter in the jobBusinessConfig represents storm task running information, wherein topologyName represents a topology name, disableFallBackOnJavaSerialization represents whether a JVM self serialization mechanism is forbidden, 1 represents yes, 2 represents no, regiostrializeClass represents a registered serialized class (except for the forbidden JVM self serialization mechanism), seriolizeImplementClass represents a serialization implementation class (except for the forbidden JVM self serialization mechanism), relibleAck represents whether a reliable transmission mechanism is enabled, 1 represents yes, 2 represents no, and topologyWorrs represents the running worker number of tasks.
Step 2: submitting the storm task to a dolphinscheudler through an interface, judging whether the storm task is the submitted storm task or not by the dolphinscheudler according to a jobType field of the task type, and executing other types of tasks if the storm task is not the storm task; if the storm task is the storm task, the dolphinscheduler selects a proper machine to submit the storm task according to the actual running condition and the scheduling strategy of the machine, downloads jar package resources from hdfs to the selected machine on the selected machine by extracting the information of the mainJar parameters and the jobResourceList parameters in the storm task, and executes the following command to submit the storm task to the storm cluster.
storm jar mainJar mainClass params --jar dependJars
Wherein: the mainJar represents the parameter information of the mainJar in the storm task; the mainClass represents the mainClass parameter information in the storm task; params represents params parameter information in the storm task; the dependJars is the information of a jobResourceList jar packet in the storm task, and a plurality of jar packets are separated by commas.
The invention modifies the jobType field of the task type in the dolphinscheduler interface into: and judging whether the submitted task is a storm task to receive the storm task. The method and the device realize that a proper machine is selected to submit the storm task according to the retry strategies of the CPU load lowest, the free memory maximum and the failure, and submit the dependent jar packets through the jar parameters of depended jars corresponding to jobResourceList in the storm task, and a plurality of jar packets are separated by commas to avoid conflict with jar packet versions in other users or systems.
And step 3: when the storm cluster receives the submitted storm task, a main method defined by mainClass is executed firstly, params information is obtained in the main method, node information defined by nodes is extracted, all nodes are traversed, and the storm topological structure is abstractly mapped into the storm topological structure by executing the following operations:
a. judging the node type nodeType, if the node type nodeType is 1, indicating that the node is an spout node, and executing operation b; otherwise, the node is a bolt node, and operation c is executed.
b. Defining a spout node, setting the spout node by using the parameter information of the node, wherein nodeId is the unique identification of the node, instantiating a kafka consumer in the spout node according to the zookeeper connection address, the topic name and the autoOffsetReset consumption strategy in the parameter kafka SpoutConfig, consuming kafka data, and deserializing the message data in the kafka according to needDeserialize and DeserialyReset parameters.
c. Defining a bolt node, setting the bolt node by using the parameter information of the node, wherein the nodeId is the unique identification of the node, setting the father node of the bolt node according to the preNodeArray parameter (father node Id information), and simultaneously setting the data grouping mode from the father node to the node according to the grouping parameter.
Through the operation, the storm topological structure is constructed, and the operation parameters of the storm topological structure are set according to the parameters defined by the storm TopologyConfig. And finally submitting the storm topological structure to the storm cluster through a storm topological submitter to run.
And 4, step 4: aiming at the running storm task, host and port in the storm rest interface are respectively set as host and port information of the storm UI, and the storm rest interface is used for storing the host and the port information of the storm UI
http://host:port/api/v1/topology/summary
Obtaining overview information of the storm task, so as to obtain storm topological structure id, and taking the storm topological structure id as storm rest interface
http://host:port/api/v1/topology/{topologyid}
And acquiring the running worker information of the storm task by the value of the middle parameter topologyid, wherein the running worker information comprises information such as a running machine, an occupied port and a log directory.
And according to the acquired running information and log directory information of the storm task worker, constructing a configuration file of the flash agent on the selected machine, generating a flash agent configuration file for each worker, wherein the source type in the flash agent configuration file uses the exec of the flash.
The executed command is
tail-F /stormLogs/stormId/workerPort/worker.log
Log directory information is acquired by stormlgs through an interface, storm is topology id information acquired by the interface, workport is occupied port information acquired by the interface, and worker.
And adopting a self-defined RabbitmqPublishsink for the flume sink type, wherein the RabbitmqPublishsink reads data in the flume channel and publishes the data to a subject of the Rabbitmq, the subject is named as the storm task name, and when an exchange type is declared in the Rabbitmq, a fanout is selected, namely a subscription mode is published.
Uploading the flash agent configuration file to a machine where the storm log file is located by using a paramiko module of python, remotely calling a flash-ng agent command to start the flash agent, running the flash agent in a nohup mode, sending an execution command to the flash agent by using an invoke _ shell of the paramiko module, sleeping the paramiko module for 5 seconds to wait for an execution result of the command, and returning the execution result to the paramiko module; for the operation of stopping the storm log from collecting the flash agent, the Paramiko module exec _ command is directly used for sending a kill command to stop the flash agent.
The front end is connected with the rabbitmq through the websocket, subscribes storm task log information sent to the rabbitmq by a sink in the flume agent, displays the subscribed storm task log information on a front end page, displays log content realizing latest subscription, and does not need the page to cache the content checked last time.
Example 1:
taking the practical large and small tonnage crane working condition protocol analysis as an example, the working condition messages of the large and small tonnage cranes respectively exist in different kafka message themes, the working condition message messages need to be consumed from respective topic, the working condition messages are analyzed according to different protocols, finally, the analyzed working condition data are written into redis, kafka and postgres, the system is adopted to complete the working condition analysis and data storage of the large and small tonnage cranes, and the storm topological structure is shown in fig. 2, wherein: the large-tonnage working condition and the small-tonnage working condition represent the spout nodes, the rest nodes represent bolt nodes, the operation of the storm task and the query of the log are required to be carried out, a user only needs to configure the information of each node on a WEB page and realize the service logic of each bolt, and then the method is operated.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (10)

1. A storm task management and scheduling method is characterized in that: the method comprises the following steps:
parameters of data structures of the spout node and the bolt node are set in the web page, and the spout node and the bolt node are connected on the web page to complete storm topological structure abstraction and form a storm task;
submitting the storm task to a dolphinscheudler through an interface, judging whether the storm task is the submitted storm task or not by the dolphinscheudler according to a jobType field of a task type, if the storm task is the storm task, selecting a proper machine by the dolphinscheudler according to the actual running condition and a scheduling strategy of the machine to submit the storm task, downloading jar package resources from hdfs to the selected machine on the selected machine by extracting information of an nJar parameter and a jobResourceList parameter in the storm task, and submitting the storm task to a storm cluster by executing a command;
when the storm cluster receives the submitted storm task, a main method defined by mainClass is executed, params information is obtained in the main method, node information defined by nodes is extracted, and all nodes are traversed to map the storm topological structure in an abstract mode to the storm topological structure;
and setting the operation parameters of the storm topological structure according to the parameters defined by the storm TopologyConfig, and submitting the storm topological structure to the storm cluster through the storm topological submitter for operation.
2. A storm task management and scheduling method as claimed in claim 1, characterised in that: also comprises the following steps:
aiming at a storm task in operation, respectively setting host and port in a storm rest interface as host and port information of a storm UI, acquiring overview information of the storm task through the storm rest interface, acquiring a storm topological structure id according to the overview information, taking the storm topological structure id as a value of a parameter topologyid parameter in the storm rest interface, and acquiring operation worker information of the storm task;
according to the worker information, constructing a flash agent configuration file on the selected machine, generating a flash agent configuration file for each worker, wherein the source type in the flash agent configuration file uses the exec of flash;
for the flute sink type, adopting RabbmqPublishsink, reading data in the flute channel and publishing the data to a topic of rabbitmq, wherein the topic is called a storm task name, and selecting a fanout when an exchange type is declared in the rabbitmq, namely publishing a subscription mode;
uploading a flash agent configuration file to a machine where a storm log file is located by using a paramiko module of python, remotely calling a flash-ng agent command to start the flash agent, running the flash agent in a nohup mode, sending an execution command to the flash agent by using an invoke _ shell of the paramiko module, and meanwhile, sleeping the paramiko module for a certain time to wait for an execution result of the command and returning the execution result to the paramiko module;
the web page is connected with the rabbitmq through the websocket, the storm task log information sent to the rabbitmq by the rabbitmq publishing link in the flash agent is subscribed, and the subscribed storm task log information is displayed on the web page.
3. A storm task management and scheduling method according to claim 1 or 2, characterised in that: the spout node and bolt node data structures are as follows:
{
"nodeId": "",
"preNodeArray": "",
"nodeName": "",
"nodeBusinessConfig": {},
"nodeDescription": "",
"className": "",
"nodeType": "",
"grouping": "",
"taskNum": ,
"nodeResourceList": ""
}
wherein nodeId represents a node unique identifier; the preNodeArray represents a father node id, and a plurality of father node ids are connected by using a plurality of commas; nodeName represents the node name; the nodeBusinessConfig represents the node service configuration information; nodeDescription represents a node description; className represents the complete class name that the node runs; nodeType represents a node type, 1 is an spout node, and 2-bit bolt nodes; grouping represents a grouping mode of data from a father node to the node, wherein 1 is shuffleGrouping, namely random grouping, 2 is fieldsGrouping, namely grouping according to fields, and 3 is localOrShufflegrouping, namely local or random grouping; taskNum represents the number of executors of the running node; nodeResourceList represents a list of nodes using jar packet resources.
4. A storm task management and scheduling method according to claim 3, characterized in that: the nodeBusinessConfig data structure of the spout node is as follows:
"nodeBusinessConfig": {
"kafkaSpoutConfig": {
"groupField": "",
"needDeserialize": "",
"deserializeClass": "",
"zookeeperConnect": "",
"groupId": "",
"topic": "",
"autoOffsetReset": ""
}
}
wherein kafka SpoutConfig is a data structure of kafka connection information, groupFaield indicates the name of a grouping field for data transmission to a next node, needDeserialize indicates whether to perform deserialization on the kafka connection information, 1 is to perform deserialization, and 2 is not to perform deserialization; deserializeClass represents the class name of the kafka ligation information reverse order class; zookeeper connect represents zookeeper connection information; groupId represents consumer group id; topic represents the message topic in kafka; autoOffsetReset denotes the Kafka consumption policy.
5. A storm task management and scheduling method as claimed in claim 3, characterised in that: the nodeResourceList data structure of the bolt node is as follows:
"nodeResourceList": [{
"res": "",
"name": "",
"id": ""
}]
wherein: res represents the file name of the jar package on hdfs, name is the alias of the jar package file, and id is the unique identifier of the jar package.
6. A storm task management and scheduling method according to claim 1 or 2, characterised in that: storm task data structure is as follows:
{
"jobName ": "",
"jobDescription": "",
"jobType": "",
"params": {
"nodes": [],
"mainClass": "",
"jobBusinessConfig": {
"stormTopologyConfig": {
"topologyName": "",
"disableFallBackOnJavaSerialization": "",
"registerSerializeClass": "",
"serializeImplementClass": "",
"reliableAck": "",
"topologyWorkers": ""
}
},
"mainJar": {
"id":
},
"jobResourceList": []
}
}
wherein: jobName represents a task name; jobDescription represents task description information; the jobType represents a task type, and the default is a storm type; params is a task parameter, wherein nodes represents node information, the data type is json array, the mainClass represents a main operation method, the jobbusiness config represents service configuration of the task, the mainJar represents jar package information to which the main operation method belongs, and the jobResourceList represents all jar package resource information required by the bolt node; the method comprises the steps that a storm TopologyConfig parameter in jobBusinessConfig represents storm task operation information, topologyName represents a topology name, disableFallBackOnJavaSerialization represents whether a JVM self serialization mechanism is forbidden or not, 1 represents yes, 2 represents no, registerSerrializeClass represents a registered serialized class, serializeImplementClass represents a serialization realization class, relibleAck represents whether a reliable transmission mechanism is enabled or not, 1 represents yes, 2 represents no, and topologyWorkers represent the number of operation workers of tasks.
7. A storm task management and scheduling method according to claim 1 or 2, characterised in that: executing the command to submit the storm task to the storm cluster is as follows:
storm jar mainJar mainClass params --jar dependJars
wherein: the mainJar represents the parameter information of the mainJar in the storm task; the mainClass represents the mainClass parameter information in the storm task; params represents params parameter information in the storm task; the dependJars is the information of a jobResourceList jar packet in the storm task, and a plurality of jar packets are separated by commas.
8. A storm task management and scheduling method according to claim 1 or 2, characterised in that: the method for mapping the storm topological structure abstraction to the storm topological structure by traversing all nodes comprises the following steps:
a. judging the node type nodeType, if the node type nodeType is 1, indicating that the node is an spout node, and executing operation b; otherwise, the node is a bolt node, and operation c is executed;
b. defining a spout node, setting the spout node by using parameter information of the node, wherein nodeId is the unique identification of the node, instantiating a kafka consumer in the spout node according to a zookeeper connection address, a topic name and an autoOffsetReset consumption strategy in the parameter kafka SpoutConfig, consuming kafka data, and deserializing the message data in the kafka according to needDeserialize and DeserialyReset parameters;
c. defining a bolt node, setting the bolt node by using the parameter information of the node, wherein nodeId is the unique identifier of the node, setting a father node of the bolt node according to a preNodeArray parameter, and setting a data grouping mode from the father node to the node according to a grouping parameter.
9. A storm task management and scheduling method as claimed in claim 2, characterised in that: the source type in the flash agent configuration file is obtained by using the exec of the flash self-contained module and executing the following commands:
tail-F /stormLogs/stormId/workerPort/worker.log
log directory information is acquired by stormlgs through an interface, storm is topology id information acquired by the interface, workport is occupied port information acquired by the interface, and worker.
10. A storm task management and scheduling method as claimed in claim 2, characterised in that: for the operation of stopping the storm log from collecting the flash agent, a parent module exec _ command is used to send a kill command to stop the flash agent.
CN202210053528.XA 2022-01-18 2022-01-18 Storm task management and scheduling method Active CN114489598B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210053528.XA CN114489598B (en) 2022-01-18 2022-01-18 Storm task management and scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210053528.XA CN114489598B (en) 2022-01-18 2022-01-18 Storm task management and scheduling method

Publications (2)

Publication Number Publication Date
CN114489598A true CN114489598A (en) 2022-05-13
CN114489598B CN114489598B (en) 2023-03-28

Family

ID=81511671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210053528.XA Active CN114489598B (en) 2022-01-18 2022-01-18 Storm task management and scheduling method

Country Status (1)

Country Link
CN (1) CN114489598B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107517131A (en) * 2017-08-31 2017-12-26 四川长虹电器股份有限公司 A kind of analysis and early warning method based on log collection
EP3401788A2 (en) * 2017-05-02 2018-11-14 Salesforce.com, Inc. Event stream processing system
CN110427210A (en) * 2019-06-27 2019-11-08 苏州浪潮智能科技有限公司 A kind of fast construction method and device of storm topology task
CN112363774A (en) * 2020-11-06 2021-02-12 苏宁云计算有限公司 Storm real-time task configuration method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3401788A2 (en) * 2017-05-02 2018-11-14 Salesforce.com, Inc. Event stream processing system
CN107517131A (en) * 2017-08-31 2017-12-26 四川长虹电器股份有限公司 A kind of analysis and early warning method based on log collection
CN110427210A (en) * 2019-06-27 2019-11-08 苏州浪潮智能科技有限公司 A kind of fast construction method and device of storm topology task
CN112363774A (en) * 2020-11-06 2021-02-12 苏宁云计算有限公司 Storm real-time task configuration method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
STEFANOS ANTARIS等: "In-Memory Stream Indexing of Massive and Fast Incoming Multimedia Content" *
WENJUN QIAN等: "S-Storm: A Slot-Aware Scheduling Strategy for Even Scheduler in Storm" *
屈国庆: "基于Storm的实时日志分析系统的设计与实现" *
李成露: "基于云架构的交通流量数据分析平台" *

Also Published As

Publication number Publication date
CN114489598B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
US11025512B2 (en) Automated service-oriented performance management
EP2661014B1 (en) Polling sub-system and polling method for communication network system and communication apparatus
CN105183452B (en) Spring AOP-based remote protocol service system for monitoring power distribution equipment
US20040117452A1 (en) XML-based network management system and method for configuration management of heterogeneous network devices
CN101799751A (en) Method for building monitoring agent software of host machine
CN101146127B (en) A client buffer update method and device in distributed system
CN107908488B (en) Message request interface interaction method and device, computer equipment and storage medium
EP2400725A1 (en) User interface communication
CN111381983A (en) Lightweight message middleware system and method of virtual test target range verification system
US11381638B1 (en) System and method for parallel execution of activites in an integration flow
CN103516735A (en) Method and apparatus for upgrading network node
CN113703997A (en) Bidirectional asynchronous communication middleware system integrating multiple message agents and implementation method
JP5268589B2 (en) Information processing apparatus and information processing apparatus operating method
CN113609048B (en) Cloud edge service collaborative interaction method for electric power Internet of things
CN114489598B (en) Storm task management and scheduling method
CN112698973A (en) System, method and device for automatic registration and management of modbus equipment
CN115964151A (en) Flow calculation task scheduling system and method for big data processing
CN113281594B (en) System and method for realizing remote intelligent automatic test for relay protection
CN114385541A (en) Intelligent manufacturing-oriented OPC UA aggregation server and design method thereof
CN109669979A (en) The processing method and processing device of data, storage medium
CN113986462A (en) K8S-based operation and maintenance system, application method and storage medium
CN114443293A (en) Deployment system and method for big data platform
CN108259527B (en) Proxy-based service processing method and device and network element equipment
KR100581140B1 (en) A method for managing ingelligent network process using distributed middleware
CN113992690B (en) Message transmission method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant