CN109697112B - Distributed intensive one-stop operating system and implementation method - Google Patents

Distributed intensive one-stop operating system and implementation method Download PDF

Info

Publication number
CN109697112B
CN109697112B CN201811582185.6A CN201811582185A CN109697112B CN 109697112 B CN109697112 B CN 109697112B CN 201811582185 A CN201811582185 A CN 201811582185A CN 109697112 B CN109697112 B CN 109697112B
Authority
CN
China
Prior art keywords
job
node
information
module
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811582185.6A
Other languages
Chinese (zh)
Other versions
CN109697112A (en
Inventor
谢阳
何广柏
刘礼铭
刘树聪
徐一品
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Eshore Technology Co Ltd
Original Assignee
Guangdong Eshore Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Eshore Technology Co Ltd filed Critical Guangdong Eshore Technology Co Ltd
Priority to CN201811582185.6A priority Critical patent/CN109697112B/en
Publication of CN109697112A publication Critical patent/CN109697112A/en
Application granted granted Critical
Publication of CN109697112B publication Critical patent/CN109697112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/524Deadlock detection or avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to a distributed intensive one-stop operating system, an implementation method, a computer device and a storage medium, wherein the system comprises: the client unit is used for acquiring an operation instruction of a user; the service unit is used for providing data operation according to the operation instruction of the user and responding to the corresponding operation instruction; the execution unit is used for scheduling the job node according to the operation instruction of the user; the storage unit is used for storing configuration information, resource information and job information of the node; the monitoring unit is used for monitoring resource information, job information and job logs of the nodes. The invention realizes that the operation developer only needs to pay attention to the realization of the service, and the operation using operator can realize one-stop integrated management and monitoring of the distributed cluster operation under a large scale in the real sense of monitoring the whole cluster operation condition including cluster node resource monitoring, operation monitoring, log monitoring and the like in the foreground page unit visual management.

Description

Distributed intensive one-stop operating system and implementation method
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a distributed intensive one-stop operating system, an implementation method, a computer device, and a storage medium.
Background
Currently, with the expansion of services, the increase of data volume and the increase of system scale, many service processes have been difficult to process and analyze in real time. Offline data processing and analysis becomes part of system functionality, and in the case of mass traffic and data, original single-node job scheduling based on the operating system such as the crontab of Linux, AIX, HP-UX has not been able to meet the traffic demands.
In the prior art, the open source frame Quartz based on java language is widely used, and well integrated with a Spring frame, so that Spring-quatz is formed, and a cluster mode based on a database is realized. However, the database design under the original quantiz cluster mode is too complex, and CLOB exists, physical external keys and the like are not suitable for being used for the distributed MYSQL fragmentation with high concurrency and high performance, and the distribution and equalization of the cluster JOB and the lack of support and realization of resource monitoring, unified log, instruction control and the like under the system architecture of the split table design.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a distributed intensive one-stop operating system, an implementation method, a computer device, and a storage medium.
A distributed, intensive, one-stop operating system, the system comprising:
the client unit is used for acquiring an operation instruction of a user;
the service unit is used for providing data operation according to the operation instruction of the user and responding to the corresponding operation instruction;
the execution unit is used for scheduling the job node according to the operation instruction of the user;
the storage unit is used for storing configuration information, resource information and job information of the node;
and the monitoring unit is used for monitoring the resource information, the job information and the job log of the node.
In one embodiment, the service unit includes:
the job adding module is used for adding jobs, storing data in the MYSQL storage module and sending an adding instruction to the execution unit cluster;
the operation modification module is used for operation modification, storing data in the MYSQL storage module and sending a modification instruction to the execution unit cluster;
the job deleting module is used for deleting the jobs, storing the data in the MYSQL storage module and sending a deleting instruction to the execution unit cluster;
the job inquiry module is used for job configuration conditions, job inquiry in execution, history condition inquiry, job loading ready conditions of all execution unit nodes of the cluster, distribution conditions of the jobs at the cluster job nodes and job log inquiry;
the first communication module is used for broadcasting all the operation nodes through a latch mechanism of the Zookeeper;
a job ignition module for immediately executing a job in ready;
a job interrupt module for interrupting a job being executed;
and the job recovery module is used for recovering the job in the stopped state.
In one embodiment, the execution unit includes:
the second communication module is used for receiving the scheduling instruction of the service unit;
the single operation module is used for temporarily defining operation parameter information aiming at specific service scenes by a client without configuring the client in advance and forming a planned operation, and transmitting the operation parameter information to an operation cluster for one-time execution;
the timing operation module is used for configuring an execution plan of the operation by the client, operating parameter information and automatically loading or dynamically adding the operation in the operation when the operation node is started;
the resident operation module is used for configuring a plan which cannot be automatically executed;
the resource reporting module is used for writing the memory of the node, the service condition of the working thread pool and the service condition of the data source thread pool into the REDIS buffer memory at regular time;
the information reporting module is used for writing the operation of the node ready into REDIS buffer according to the node dimension and the operation dimension;
and the execution reporting module is used for writing the execution operation information of the node into REDIS buffer at regular time after the operation node is started.
In one embodiment, the memory cell includes:
the MYSQL storage module is used for storing configuration information of the job and execution history information of the job;
the REDIS storage module is used for caching resource information of the job cluster nodes and job information of the nodes through REDIS;
the Zookeeper storage module is used for storing lock information of an executing job, wherein the lock information comprises job basic information, an executing node, execution starting time and an operator.
In one embodiment, the monitoring unit comprises:
the resource monitoring module is used for actively reporting resource information through the nodes, monitoring the use conditions of resources such as the internal memory of the cluster nodes and threads;
the job monitoring module is used for monitoring whether the resident job is executing or not, and whether the conventional job execution time exceeds the maximum execution duration of job configuration or not;
and the log monitoring module is used for monitoring the output of the job log.
A method for implementing a distributed intensive one-stop job, the method comprising:
acquiring an operation instruction of a user through a client unit;
starting an operation node according to the operation instruction of the user, and loading grouping operation of the operation node;
acquiring a node configuration file stored by a storage unit, and initializing the operation node;
adding the job node into a scheduling cluster and establishing monitoring through a monitoring unit;
and scheduling the job node through an execution unit according to the operation instruction of the user.
In one embodiment, the method further comprises the step of automatically scheduling the job node according to the operation instruction of the user:
when the operation timer reaches the point, different nodes of the same operation start to ignite at the same time;
the distributed lock based on zookeeper then preempts the lock;
the node which is successfully preempted starts to execute the business logic of the operation, and the node which is failed gives up to execute the business logic of the operation;
the node which is successfully locked records the information of the job, the node information and the time information for starting execution in the lock content at the same time;
after execution is completed, recording an execution result and completion time, writing information into a MYSQL storage module, and releasing the lock.
In one embodiment, the method further comprises the step of manually scheduling the job node according to the user's operation instruction:
the user checks the ready job through the client module;
accessing a service unit through a network, and detecting the actual state of the job through the service unit;
sending an ignition instruction to the cluster operation node;
each cluster operation node starts ignition operation after identifying the ignition instruction after receiving the ignition instruction;
and executing the step of automatically scheduling the job node.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any one of the methods described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the methods described above.
The distributed intensive one-stop operation method, the device, the computer equipment and the storage medium are used for acquiring the operation instruction of a user through the client unit; the service unit is used for providing data operation according to the operation instruction of the user and responding to the corresponding operation instruction; the execution unit is used for scheduling the job node according to the operation instruction of the user; the storage unit is used for storing configuration information, resource information and job information of the node; the monitoring unit is used for monitoring resource information, job information and job logs of the nodes. The invention realizes that the operation developer only needs to pay attention to the realization of the service, and the operation using operator can realize one-stop integrated management and monitoring of the distributed cluster operation under a large scale in the real sense of monitoring the whole cluster operation condition including cluster node resource monitoring, operation monitoring, log monitoring and the like in the foreground page unit visual management.
Drawings
FIG. 1 is a block diagram of a distributed, intensive, one-stop operating system in one embodiment;
FIG. 2 is a block diagram of a service unit in a distributed intensive one-stop operating system in one embodiment;
FIG. 3 is a block diagram of an execution unit in a distributed intensive one-stop operating system in one embodiment;
FIG. 4 is a block diagram illustrating the structure of a storage unit in a distributed, intensive, one-stop operating system in one embodiment;
FIG. 5 is a block diagram of a monitor unit in a distributed intensive one-stop operating system in one embodiment;
FIG. 6 is a flow diagram of a method for implementing distributed intensive one-stop job in one embodiment;
FIG. 7 is a flow chart of a method for implementing a distributed intensive one-stop job in another embodiment;
FIG. 8 is a flow chart of a method for implementing a distributed intensive one-stop job in yet another embodiment;
FIG. 9 is a flow chart of the operational relationship of a client unit with a service unit and a storage unit in one embodiment;
FIG. 10 is a workflow diagram of an execution unit in one embodiment;
FIG. 11 is an internal block diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in FIG. 1, a distributed, intensive, one-stop operating system 100 is provided, comprising:
a client unit 101 for acquiring an operation instruction of a user;
a service unit 102, configured to provide data operations according to operation instructions of a user and respond to corresponding job instructions;
an execution unit 103, configured to schedule the job node according to an operation instruction of a user;
a storage unit 104 for storing configuration information, resource information, and job information of the node;
and a monitoring unit 105 for monitoring resource information, job information, and job log of the node.
In the present embodiment, the client unit 101 is used to acquire an operation instruction of a user. Specifically, the method can be that a PC end passes through a browser or a mobile end supporting HTML5, and a client end mainly carries out centralized job configuration, job monitoring and job operation, so that one-stop service management and monitoring page functions are provided, and configuration to operation monitoring are realized.
The service unit 102 is configured to provide data operations according to operation instructions of a user and respond to corresponding job instructions, and provide capabilities of the data operations, the job instructions, monitoring queries, log queries, and the like for use by the client.
The execution unit 103 is configured to schedule the job node according to the operation instruction of the user. Specifically, the dispatch instruction of the front end can be transmitted to each job node through a latch event mechanism based on a Zookeeper, and corresponding instruction actions are completed, corresponding immediate execution actions are performed, and an operator can select any node in the cluster by himself or give the cluster node to execute immediately.
The storage unit 104 is configured to store configuration information, resource information, and job information of the node, for example, an execution plan, an execution parameter, and a home packet of the configuration job are stored in MYSQL.
The monitoring unit 105 is configured to monitor resource information, JOB information, and JOB logs of the nodes, and specifically, may monitor a resource and a JOB condition of each node from a dimension of the node, or monitor a distribution condition of the JOB in the cluster node with a dimension of the JOB, where monitored data is actively reported by each node and written into the cache REDIS.
The embodiment realizes that a job developer only needs to pay attention to the realization of the service, and a job use operator can visually manage a foreground page unit, so that the condition of monitoring the whole cluster operation comprises cluster node resource monitoring, job monitoring, log monitoring and the like, and the distributed cluster operation under a large scale is truly and integrally managed and monitored in one-stop mode.
In one embodiment, as shown in fig. 2, a distributed intensive one-stop operating system is provided, wherein a service unit 200 in the system includes:
the job newly-adding module 201 is used for newly-adding a job, storing data in the MYSQL storage module, and sending a newly-adding instruction to the execution unit cluster;
the job modification module 202 is configured to modify a job, store data in the MYSQL storage module, and send a modification instruction to the execution unit cluster;
the job deleting module 203 is configured to delete a job, store data in the MYSQL storage module, and send a deleting instruction to the execution unit cluster;
the job query module 204 is used for job configuration conditions, job in execution queries, history condition queries, job loading ready conditions of all execution unit nodes of the cluster, distribution conditions of the jobs at the cluster job nodes and job log queries;
a first communication module 205, configured to broadcast all job nodes through a latch mechanism of a Zookeeper;
a job ignition module 206 for immediately executing a job in ready;
a job interrupt module 207 for interrupting a job in execution;
a job recovery module 208 for recovering the job in the stopped state.
In this embodiment, the service unit job addition indicates that a job is newly added, data is stored in MYSQL, and a new instruction is sent to the execution unit (job node) cluster, so is the job modification, and the job deletion. The service unit provides a standard restful-style HTTP protocol for clients to call.
The JOB inquiry comprises inquiry services such as JOB configuration conditions, JOB inquiry in execution, history condition inquiry, JOB loading ready conditions of all execution unit nodes of the cluster, JOB distribution conditions of JOB in the cluster JOB nodes, JOB log inquiry and the like.
And the instruction communication module is used for transmitting the operation instruction information to the operation node cluster through a latch mechanism of the service unit node, and the action is equivalent to broadcasting and notifying all operation nodes.
The JOB ignition module is used to immediately execute a JOB in ready.
The JOB interrupt module is used for interrupting a JOB in execution, and a JOB node in a new cluster mode can have the execution capacity of a plurality of JOB programs, and the JOB programs share a process.
Unlike job stopping, which is the time that the job is scheduled to be no longer valid (this time that the job has been started is executed as usual).
The job recovery module is used for recovering a job of which the job plan is in a stop state.
In the embodiment, the capabilities of providing data operation, operation instructions, monitoring inquiry, log inquiry and the like are realized for the client.
In one embodiment, as shown in FIG. 3, a distributed intensive one-stop operating system is provided, wherein execution units 300 in the system comprise:
a second communication module 301, configured to receive a scheduling instruction of a service unit;
the single job module 302 is configured to temporarily define job parameter information for a specific service scenario by a client without configuring the client in advance and without forming a planned job, and to perform one-time execution by a job cluster;
the timing operation module is used for configuring an execution plan of the operation by the client, operating parameter information and automatically loading or dynamically adding the operation in the operation when the operation node is started;
a resident job module 304 for configuring a plan that is not to be automatically executed;
the resource reporting module 305 is configured to write the memory of the node, the service condition of the working thread pool, and the service condition of the data source thread pool into the REDIS cache at regular time;
the information reporting module 306 is configured to write the job ready for the node into the REDIS cache according to the node dimension and the job dimension;
and the execution reporting module 307 is configured to write the job information in execution of the node into the REDIS cache at regular time after the job node is started.
In this embodiment, the execution unit refers to the capability of one node of the job scheduling cluster.
Specifically, the instruction communication module is used for automatically connecting a Zookeeper and registering a watch event when the job node is started, and is used for receiving a scheduling instruction of the service unit.
The single operation refers to that no client configuration is needed in advance, no planning operation is formed, and the client temporarily defines information such as operation parameters and the like aiming at specific business scenes and sends the information to the operation cluster for one-time execution.
The timing job module is used for the client to configure the information such as the execution plan of the job, the job parameters and the like, and the job node automatically loads or dynamically adds the job in the running process when starting.
The resident job is generally configured with a plan that is hardly automatically executed, for example, execution at point 0 defined as 12/31 in 2099, and when execution is required, ignition is performed by the client, and interruption is performed when stopping is required.
The reporting of the node resource refers to that after the operation node is started, the memory of the node and the service condition of the JOB thread pool are written into REDIS buffer memory at regular time.
The JOB information reporting refers to that after the JOB node is started, the JOB node is written into REDIS buffer at regular time according to node dimension (generally, IP: PORT) and JOB dimension (JOB packet: JOB).
The reporting in execution refers to that after the operation node is started, the in-execution JOB information of the node is written into REDIS buffer memory at regular time.
In this embodiment, based on the Zookeeper's watch event mechanism, the front-end scheduling instruction is transferred to each job node, and the corresponding instruction action is completed, corresponding to the immediate execution action, the operator may select any node in the cluster by himself or give it to the cluster node for immediate execution.
In one embodiment, as shown in FIG. 4, a distributed, intensive, one-stop operating system is provided, wherein a storage unit 400 in the system comprises:
a MYSQL storage module 401, configured to store configuration information of a job and execution history information of the job;
REDIS storage module 402, through REDIS buffer job cluster node resource information and node job information;
the Zookeeper storage module 403 is configured to store lock information of an executing job, where the lock information includes job basic information, an execution node, a start execution time, and an operator.
Specifically, MYSQL storage module 401 is mainly used to store JOB configuration information, JOB execution history information, and REDIS storage module 402 is mainly used to use REDIS to cache JOB cluster node resource information, and JOB information of each node. The Zookeeper storage module 403 stores lock information of the executing job, the lock content includes job basic information, an execution node (ip: port), an execution start time, and an operator (system automatic execution or manual ignition).
In this embodiment, in the single-node memory mode based on quantiz at the bottom layer, a JOB configuration table (only a single table) is redesigned to introduce a Zookeeper, the Zookeeper is utilized to make a bridge for communication and a distributed JOB lock in the cluster mode, and a JOB grouping concept is defined, so that nodes in the cluster can deploy JOB and allocate resources (memory and threads) which are not communicated according to service scene requirements, the resources are exerted to the maximum, and users can also designate the nodes to execute by themselves in addition to enabling the cluster nodes to randomly compete for lock to execute JOB in the new cluster mode, so that more scene requirements are further improved.
In one embodiment, as shown in fig. 5, a distributed intensive one-stop operating system is provided, wherein a monitoring unit 500 in the system includes:
the resource monitoring module 501 is configured to monitor, by using a node, the use conditions of resources such as a memory and threads of a cluster node, and actively report resource information;
a job monitoring module 502, configured to monitor whether a resident job is executing, and whether a conventional job execution time exceeds a maximum execution duration of job configuration;
the log monitoring module 503 is configured to monitor output of the job log.
Specifically, the resource monitoring module 501 monitors the use condition of resources such as the memory and threads of each node of the cluster mainly through the resource information actively reported by the node. The job monitoring module 502 mainly monitors whether the resident job is executing or not, and whether the conventional job execution time exceeds the maximum execution duration of the job configuration or not. The log monitoring module 503 is configured to monitor the output of the job log, and may define the keywords of the log and form an alarm. The monitoring information of all the monitoring units is provided for the client for display in a service mode, and the push alarm in a micro-message or short message mode can be carried out on the one hand by the docking alarm center.
In this embodiment, the resource and JOB condition of each node may be monitored from the dimension of the node, or the distribution condition of the JOB in the cluster node may be monitored with the dimension of the JOB, and the monitored data is actively reported by each node and written into the cache REDIS. The method and the device realize monitoring of the operation nodes and specific operations in the cluster according to various dimensions, and early warning of the execution time of the operations is realized.
In one embodiment, as shown in fig. 6, a method for implementing a distributed intensive one-stop job is provided, where the method is applied to the distributed intensive one-stop job system in the above embodiment, and the method includes:
step 602, obtaining an operation instruction of a user through a client unit;
step 604, starting the operation node according to the operation instruction of the user, and loading the grouping operation of the operation node;
step 606, obtaining a node configuration file stored in a storage unit, and initializing an operation node;
step 608, joining the job node into a scheduling cluster and establishing monitoring through a monitoring unit;
in step 610, the execution unit schedules the job node according to the operation instruction of the user.
Specifically, referring to fig. 9 and 10, the client accesses the job configuration page through the PC-side browser or the mobile-side browser, performs operations of adding a job, modifying the job, deleting the job, accesses the job data operation service through the HTTP protocol, writes data into MYSQL after receiving a request and after a series of basic verifications, and modifies the content of the listening node of the Zookeeper, where the content includes all information and instruction types (Add, update, delete) of the job.
After the operation node is started, the node GROUP JOB is loaded, the JOB of the designated GROUP can be automatically read from MYSQL, one or more GROUPs can be selected, the JOB is initialized to be ready, and meanwhile, the Zookeeper connection is initialized and node monitoring is established for receiving a scheduling instruction. After the initialization is successful, for the new addition, modification and deletion instruction of the previous step, the operation node receives the instruction, recognizes the instruction, judges whether the JOB group of the instruction is one of the node groups, if so, performs the operation of adding modification/deletion of the JOB to enable the JOB to be re-ready/remove the JOB from the ready plan, otherwise, ignores the instruction. Thus, the basic configuration of the JOB and the initialization of the scheduling node are basically completed, and the operation node is scheduled by the execution unit according to the operation instruction of the user.
It can be understood that the JOB condition on each node of the JOB cluster can be queried through the client JOB monitoring module, the distribution condition of a certain JOB in the cluster node can also be queried, and the deployment condition of the whole JOB can be monitored through 2 dimensions. In the scheduling process, the JOB condition in execution is checked through the JOB query service in execution, including information such as which node the JOB is executed at, when the JOB is executed, whether the JOB is automatically executed or manually executed, and the execution log of the JOB is checked through the log service in real time. After execution is completed, the history execution condition and the running log of the job can be checked through the history inquiry service.
In this embodiment, an execution plan of the configuration job, execution parameters, attribution grouping, and storing data in MYSQL are implemented. In addition, each node of the multi-node cluster mode can load one or more groups of JOBs, and the number of the JOB nodes can be defined according to application requirements.
In one embodiment, as shown in fig. 7, there is provided a method for implementing a distributed intensive one-stop job, the method further comprising the step of automatically scheduling job nodes according to an operation instruction of a user:
step 702, after the job timer reaches the point, different nodes of the same job start to fire at the same time;
step 704, immediately preempting the lock by the distributed lock based on the Zookeeper;
step 706, the node that is successfully preempted starts to execute the business logic of the job, and the node that is failed gives up executing the business logic of the job;
step 708, the node which is successfully locked records the information of the job, the node information and the time information for starting execution in the lock content at the same time;
and step 710, recording the execution result and the completion time after the execution is completed, writing information into the MYSQL storage module, and releasing the lock.
Specifically, in connection with fig. 10, after the JOB node in the above embodiment is successfully initialized, after the JOB timer arrives, different nodes of the same JOB (JOBID) (if the nodes are on different machines and the time is synchronous) start to fire at the same time, the lock is immediately preempted by the distributed lock based on the Zookeeper at the beginning of execution, the JOB logic is started to be executed by the node which is successfully preempted, the execution is abandoned by the node which is failed, the next execution plan is waited, the information of the JOB, the node information (ip and port) and the information such as the execution starting time are recorded in the lock content at the same time by the node which is successfully locked. After execution is completed (success or failure), recording the execution result and completion time, writing information into a job history table of MYSQL, and releasing the lock.
In one embodiment, as shown in fig. 8, there is provided a method for implementing a distributed intensive one-stop job, the method further comprising the step of manually scheduling the job node according to an operation instruction of a user:
step 802, a user checks a job ready by a client module;
step 804, accessing the service unit through the network, and detecting the actual state of the job through the service unit;
step 806, sending an ignition command to the cluster operation node;
step 808, each cluster operation node starts ignition operation after identifying the ignition instruction after receiving the ignition instruction;
step 810, executing the step of automatically scheduling the job node.
Specifically, in connection with fig. 10, the manual scheduling of a JOB is performed on the premise that the JOB is ready in a cluster JOB node, a user immediately performs an operation by checking a ready JOB at a client, the operation is performed by a network access instruction operation service, a server detects the actual state of the JOB (the JOB is ready by a node and is not being performed at the same time), then sends an action instruction FIRE to the cluster node, each cluster JOB node, after receiving the FIRE instruction, starts to FIRE the JOB after identifying the FIRE instruction, and then goes to the same process as the automatic scheduling and firing start above, and then completes the whole JOB process.
It should be understood that, although the steps in the flowcharts of fig. 6-10 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 6-10 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a distributed intensive one-stop job implementation method.
It will be appreciated by those skilled in the art that the structure shown in fig. 11 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method embodiments above when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the above method embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (8)

1. A distributed, intensive, one-stop operating system, the system comprising:
the client unit is used for acquiring an operation instruction of a user;
the service unit is used for providing data operation according to the operation instruction of the user and responding to the corresponding operation instruction;
the execution unit is used for scheduling the job node according to the operation instruction of the user;
the storage unit is used for storing configuration information, resource information and job information of the node;
the monitoring unit is used for monitoring the resource information, the job information and the job log of the node;
the service unit includes:
the job adding module is used for adding jobs, storing data in the MYSQL storage module and sending an adding instruction to the execution unit cluster;
the operation modification module is used for operation modification, storing data in the MYSQL storage module and sending a modification instruction to the execution unit cluster;
the job deleting module is used for deleting the jobs, storing the data in the MYSQL storage module and sending a deleting instruction to the execution unit cluster;
the job inquiry module is used for job configuration conditions, job inquiry in execution, history condition inquiry, job loading ready conditions of all execution unit nodes of the cluster, distribution conditions of the jobs at the cluster job nodes and job log inquiry;
the first communication module is used for broadcasting all the operation nodes through a latch mechanism of the Zookeeper;
a job ignition module for immediately executing a job in ready;
a job interrupt module for interrupting a job being executed;
a job recovery module for recovering a job in a stopped state;
the execution unit includes:
the second communication module is used for receiving the scheduling instruction of the service unit;
the single operation module is used for temporarily defining operation parameter information aiming at specific service scenes by a client without configuring the client in advance and forming a planned operation, and transmitting the operation parameter information to an operation cluster for one-time execution;
the timing operation module is used for configuring an execution plan of the operation by the client, operating parameter information and automatically loading or dynamically adding the operation in the operation when the operation node is started;
the resident operation module is used for configuring a plan which cannot be automatically executed;
the resource reporting module is used for writing the memory of the node, the service condition of the working thread pool and the service condition of the data source thread pool into the REDIS buffer memory at regular time;
the information reporting module is used for writing the operation of the node ready into REDIS buffer according to the node dimension and the operation dimension;
and the execution reporting module is used for writing the execution operation information of the node into REDIS buffer at regular time after the operation node is started.
2. The distributed intensive one-stop operating system of claim 1, wherein the storage unit comprises:
the MYSQL storage module is used for storing configuration information of the job and execution history information of the job;
the REDIS storage module is used for caching resource information of the job cluster nodes and job information of the nodes through REDIS;
the Zookeeper storage module is used for storing lock information of an executing job, wherein the lock information comprises job basic information, an executing node, execution starting time and an operator.
3. The distributed intensive one-stop operating system of claim 2, wherein the monitoring unit comprises:
the resource monitoring module is used for actively reporting resource information through the nodes, monitoring the internal memory of the cluster nodes and the service condition of thread resources;
the job monitoring module is used for monitoring whether the resident job is executing or not, and whether the conventional job execution time exceeds the maximum execution duration of job configuration or not;
and the log monitoring module is used for monitoring the output of the job log.
4. A method for implementing a distributed intensive one-stop job, wherein the method is applied to the distributed intensive one-stop job system as claimed in any one of claims 1 to 3, and comprises:
acquiring an operation instruction of a user through a client unit;
starting an operation node according to the operation instruction of the user, and loading grouping operation of the operation node;
acquiring a node configuration file stored by a storage unit, and initializing the operation node;
adding the job node into a scheduling cluster and establishing monitoring through a monitoring unit;
and scheduling the job node through an execution unit according to the operation instruction of the user.
5. The method of claim 4, further comprising the step of automatically scheduling the job node according to the user's operation instruction:
when the operation timer reaches the point, different nodes of the same operation start to ignite at the same time;
the distributed lock based on zookeeper then preempts the lock;
the node which is successfully preempted starts to execute the business logic of the operation, and the node which is failed gives up to execute the business logic of the operation;
the node which is successfully locked records the information of the job, the node information and the time information for starting execution in the lock content at the same time;
after execution is completed, recording an execution result and completion time, writing information into a MYSQL storage module, and releasing the lock.
6. The method of claim 5, further comprising the step of manually scheduling the job node according to the user's operation instructions:
the user checks the ready job through the client module;
accessing a service unit through a network, and detecting the actual state of the job through the service unit;
sending an ignition instruction to the cluster operation node;
each cluster operation node starts ignition operation after identifying the ignition instruction after receiving the ignition instruction;
and executing the step of automatically scheduling the job node.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 4 to 6 when the computer program is executed by the processor.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 4 to 6.
CN201811582185.6A 2018-12-24 2018-12-24 Distributed intensive one-stop operating system and implementation method Active CN109697112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811582185.6A CN109697112B (en) 2018-12-24 2018-12-24 Distributed intensive one-stop operating system and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811582185.6A CN109697112B (en) 2018-12-24 2018-12-24 Distributed intensive one-stop operating system and implementation method

Publications (2)

Publication Number Publication Date
CN109697112A CN109697112A (en) 2019-04-30
CN109697112B true CN109697112B (en) 2023-05-16

Family

ID=66231928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811582185.6A Active CN109697112B (en) 2018-12-24 2018-12-24 Distributed intensive one-stop operating system and implementation method

Country Status (1)

Country Link
CN (1) CN109697112B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112621755B (en) * 2020-12-18 2022-04-19 浙江工业大学 Remote monitoring system for multi-model industrial robots
CN113065030A (en) * 2021-01-11 2021-07-02 上海金融期货信息技术有限公司 Operation control system based on directed acyclic graph
CN113032131B (en) * 2021-05-26 2021-08-31 天津中新智冠信息技术有限公司 Redis-based distributed timing scheduling system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1350676A (en) * 1998-12-22 2002-05-22 电脑联合想象公司 System for sceduling and monitoring computer processes
CN105022668A (en) * 2015-04-29 2015-11-04 腾讯科技(深圳)有限公司 Job scheduling method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8640137B1 (en) * 2010-08-30 2014-01-28 Adobe Systems Incorporated Methods and apparatus for resource management in cluster computing
US9304663B1 (en) * 2013-04-12 2016-04-05 Groupon, Inc. Centralized, scalable, resource monitoring system
CN105791354A (en) * 2014-12-23 2016-07-20 中兴通讯股份有限公司 Job scheduling method and cloud scheduling server
CN105550029B (en) * 2015-12-24 2019-07-23 迈普通信技术股份有限公司 A kind of process scheduling method and device
CN105824868B (en) * 2015-12-24 2019-05-17 广东亿迅科技有限公司 A kind of distributed data base data processing method and distributed data base system
CN106027634B (en) * 2016-05-16 2019-06-04 白杨 Message port Exchange Service system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1350676A (en) * 1998-12-22 2002-05-22 电脑联合想象公司 System for sceduling and monitoring computer processes
CN105022668A (en) * 2015-04-29 2015-11-04 腾讯科技(深圳)有限公司 Job scheduling method and system

Also Published As

Publication number Publication date
CN109697112A (en) 2019-04-30

Similar Documents

Publication Publication Date Title
US11226847B2 (en) Implementing an application manifest in a node-specific manner using an intent-based orchestrator
CN110069572B (en) HIVE task scheduling method, device, equipment and storage medium based on big data platform
CN109101341B (en) Distribution method and equipment of distributed lock
CN109992354B (en) Container processing method, device, main body server, system and storage medium
CN110311831B (en) Container cloud-based system resource monitoring method and related equipment
CN109697112B (en) Distributed intensive one-stop operating system and implementation method
CN105700939A (en) Method and system for multi-thread synchronization in distributed system
CN107689976B (en) File transmission method and device
US20220283846A1 (en) Pod deployment method and apparatus
CN113626286A (en) Multi-cluster instance processing method and device, electronic equipment and storage medium
CN108512930B (en) Shared file management method, device, server and storage medium
CN109902028A (en) Automated testing method, device, equipment and the storage medium of ACL characteristic
CN113377668A (en) Automatic testing method and device for service interface and computer equipment
CN107729213B (en) Background task monitoring method and device
CN113342554B (en) IO multiplexing method, medium, device and operating system
CN113157411B (en) Celery-based reliable configurable task system and device
CN113946427A (en) Task processing method, processor and storage medium for multi-operating system
CN112632375B (en) Session information processing method, server and storage medium
CN114138549A (en) Data backup and recovery method based on kubernets system
CN111580927B (en) Communication method and container communication system
CN116521338A (en) Message queue management and control method, device, computer equipment and storage medium
CN114172903B (en) Node capacity expansion method, device, equipment and medium of slm scheduling system
CN115357198B (en) Mounting method and device of storage volume, storage medium and electronic equipment
CN112559445B (en) Data writing method and device
CN114816866A (en) Fault processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant