CN114116181A

CN114116181A - Distributed data analysis task scheduling system and method

Info

Publication number: CN114116181A
Application number: CN202210065289.XA
Authority: CN
Inventors: 胡艳平; 蔡鑫莹; 舒展
Original assignee: Hunan Yunchang Network Technology Co ltd
Current assignee: Hunan Yunchang Network Technology Co ltd
Priority date: 2022-01-20
Filing date: 2022-01-20
Publication date: 2022-03-01
Anticipated expiration: 2042-01-20
Also published as: CN114116181B

Abstract

The invention relates to the technical field of task scheduling, and discloses a distributed data analysis task scheduling system and a distributed data analysis task scheduling method, wherein the system comprises the following steps: the input processing node is used for acquiring signals of various interface modes and preprocessing the signals; the central node is used for realizing data interaction between the nodes and controlling communication between any two nodes; the output processing node is used for processing the data decoding and the final display effect of the task result; the control management terminal is used for receiving the task information, splitting the large task into a plurality of subtasks and then sending the subtasks to each input processing node; and the redundant node configuration module is used for receiving redundant node information and selecting the existing input processing node to pair with the redundant node. The current input processing node and the redundant node are bound, and a plurality of nodes (the input processing node and the redundant node) are enabled to process a certain subtask together, so that the processing efficiency of the task is improved.

Description

Distributed data analysis task scheduling system and method

Technical Field

The invention relates to the technical field of task scheduling, in particular to a distributed data analysis task scheduling system and a distributed data analysis task scheduling method.

Background

Distributed systems (distributed systems) are software systems built on top of a network. In a distributed system, a set of independent computers appear to the user as a unified whole, as if it were a system. The system has various general physical and logical resources, can dynamically allocate tasks, and realizes information exchange by the dispersed physical and logical resources through a computer network. There is a distributed operating system in the system that manages computer resources in a global manner. Typically, a distributed system has only one model or paradigm for a user. Above the operating system there is a layer of software middleware (middleware) responsible for implementing this model. A well-known example of a distributed system is the World Wide Web (World Wide Web) where everything looks like a document (Web page).

The distributed dispatching system is the biggest difference from the former two-bag system in the system constitution, and the whole system is distributed in a physical structural formula, and two nodes need to be controlled by a central node for communication. In a distributed scheduling system, a large task may be divided into several subtasks by the parallelism of the distributed scheduling system, and the subtasks are sent to nodes, executed by nodes, and passed through the number of central nodes. The central node implements a centralized communication strategy, and thus the central node is rather complicated and much heavier than the respective nodes. Integration is completed according to any interaction in the system, so that the execution of a large task is realized. In the existing distributed scheduling system, a large task is split into a plurality of subtasks and distributed to different processing nodes, and in the process, if a newly added blank processing node occurs, the new node is in an idle state because the task is already distributed, so that the processing speed of the task is reduced, and the operating pressure of other processing nodes is increased.

Disclosure of Invention

The present invention provides a distributed data analysis task scheduling system and method, so as to solve the problems that when a newly added blank processing node is proposed in the background art, the new node is in an idle state because the task has been distributed, the processing speed of the task is reduced, and the operating pressure of other processing nodes is increased.

In order to achieve the purpose, the invention provides the following technical scheme: a distributed data analysis task scheduling system, said system comprising:

the input processing node is used for acquiring signals of various interface modes and performing signal preprocessing, wherein the signal preprocessing comprises the step of encoding the acquired data to generate a data stream for transmission;

the central node is in communication connection with each node and is used for realizing data interaction between the nodes and controlling communication between any two nodes;

the output processing node is used for processing the data decoding and the final display effect of the task result;

the control management terminal is used for receiving the task information, splitting the large task into a plurality of subtasks and then sending the subtasks to each input processing node;

and the redundant node configuration module is used for receiving redundant node information, selecting the existing input processing node to be matched with the redundant node, commonly processing the subtasks of the input processing node and generating a virtual node IP, wherein the virtual node IP replaces the IP address of the input processing node to be used as a communication address with the central node.

As a further aspect of the present invention, the redundant node configuration module includes:

the node matching unit is used for receiving information of the input processing node and the redundant node, selecting a free input processing node to be paired with the redundant node, wherein the free input processing node is an input processing node which is not paired with the redundant node;

the data interaction unit is used for communicating with the input processing node and the redundant node which are successfully paired;

and the virtual IP generating unit is used for generating a virtual node IP, the virtual node IP address points to the data interaction unit, and the central node can realize communication with the data interaction unit through the virtual node IP address.

As a further aspect of the present invention, the redundant node configuration module further includes:

and the subtask dividing unit is used for extracting the subtasks successfully matched in the input processing node, sending the subtasks to the control management terminal, receiving the secondary subtasks divided by the subtasks, and respectively sending the secondary subtasks to the successfully matched input processing node and the redundant node.

As a further aspect of the present invention, the control management terminal includes a secondary management terminal corresponding to the sub-task dividing unit, and the secondary management terminal divides the sub-task into a plurality of secondary sub-tasks according to a task dividing logic of the control management terminal after receiving the sub-task information transmitted by the sub-task dividing unit.

As a further aspect of the present invention, in the process of matching the input processing node with the redundant node, there are one or more redundant nodes.

As a further scheme of the present invention, when the secondary management terminal cannot split the subtask and divide the subtask into a plurality of secondary subtasks, a pairing error instruction is sent to the node matching unit, the node matching unit cancels the pairing, and the input processing node and the redundant node in the pairing process return to the previous state.

As a further aspect of the present invention, when the input processing node pair fails, the node matching unit marks the input processing node as an unmatchable node, and the unmatchable node does not belong to a free input processing node.

As a further scheme of the present invention, the successfully paired input processing node and redundant node transmit a data stream to the data interaction unit, and implement data interaction with the central node through the data interaction unit.

As a further aspect of the present invention, after the current task is completed, the virtual node IP is cancelled, the pairing is cancelled, and the redundant node becomes a new input processing node.

A distributed data analysis task scheduling method comprises the following steps:

collecting signals of various interface modes and carrying out signal preprocessing, wherein the signal preprocessing comprises encoding the collected data to generate a data stream for transmission;

receiving redundant node information, selecting an existing input processing node to be paired with the redundant node, processing the subtasks of the input processing node together, and generating a virtual node IP, wherein the virtual node IP replaces the IP address of the input processing node to serve as a communication address with a central node;

receiving task information, splitting a large task into a plurality of subtasks, and sending the subtasks to each input processing node;

and carrying out data decoding and final display effect processing of the task result.

Compared with the prior art, the invention has the beneficial effects that: there are many processing nodes in a distributed system, some of which are new nodes that join after a task has started, because the task is ongoing and cannot be restarted, and thus these newly joined nodes become redundant nodes in the system. In the invention, the redundant node configuration module is arranged to acquire the information of the redundant node in real time, the redundant nodes and the current input processing node are paired in a one-to-one or one-to-many mode, and after the pairing is completed, the subtasks of the current input processing node are split and divided into a plurality of secondary subtasks to enable the current input processing node and the redundant node paired with the current input processing node to process together, so that the task solving speed is accelerated, and the whole burden of the system is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.

Fig. 1 is a schematic structural diagram of a distributed data analysis task scheduling system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a redundant node configuration module according to another preferred embodiment of the present invention;

fig. 3 is a schematic structural diagram of a sub-task dividing flow provided in a preferred embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, if there is a directional indication (such as up, down, left, right, front, and back) in the embodiment of the present invention, it is only used to explain the relative position relationship between the components, the motion situation, and the like in a certain posture, and if the certain posture is changed, the directional indication is changed accordingly.

In addition, if the description of "first", "second", etc. is referred to in the present invention, it is used for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

The following detailed description of specific implementations of the present invention is provided in conjunction with specific embodiments:

There are many processing nodes in a distributed system, some of which are new nodes that join after a task has started, because the task is ongoing and cannot be restarted, and thus these newly joined nodes become redundant nodes in the system. In the invention, the redundant node configuration module is arranged to acquire the information of the redundant node in real time, the redundant nodes and the current input processing node are paired in a one-to-one or one-to-many mode, and after the pairing is completed, the subtasks of the current input processing node are split and divided into a plurality of secondary subtasks to enable the current input processing node and the redundant node paired with the current input processing node to process together, so that the task solving speed is accelerated, and the whole burden of the system is reduced.

Fig. 1 shows a distributed data analysis task scheduling system according to the present invention, where the distributed data analysis task scheduling system is applied to a device capable of connecting to the internet in real time, where the device may be a mobile phone, a tablet computer, a computer, and a communication device, and is not specifically limited herein, and the distributed data analysis task scheduling system is detailed as follows:

Through the arrangement, the existing input processing node and the redundant node are bound through the redundant node configuration module, and a plurality of nodes (the input processing node and the redundant node) are enabled to process a certain subtask together, so that the processing efficiency of the task is improved.

In addition, as shown in fig. 2, in another preferred embodiment of the present invention, the redundant node configuration module includes:

The input processing node and the redundant node after the pairing are finished jointly process the same subtask, and the obtained processing data of the input processing node and the redundant node are collected in the data interaction unit and transmitted to the central node through the data interaction unit, so that the subsequent data interaction is finished.

It can be understood that, as shown in fig. 3, in another case of this embodiment, the redundant order taking configuration module further includes:

In another case of this embodiment, the control management terminal includes a secondary management terminal corresponding to the sub-task dividing unit, and the secondary management terminal divides the sub-task into a plurality of secondary sub-tasks according to a task dividing logic of the control management terminal after receiving the sub-task information transmitted by the sub-task dividing unit.

It should be noted here that, in the process of dividing the sub-task into a plurality of secondary sub-tasks by the secondary management terminal, the task dividing logic is the same as the dividing logic for dividing the large task into a plurality of sub-tasks by the control management terminal, and only in this case, the central node can still recognize when receiving the output of the plurality of secondary sub-tasks and can serve as the interactive content together with the data of other parallel nodes.

In one preferred embodiment of the present invention, the number of redundant nodes is one or more in the process of matching the input processing node with the redundant nodes.

It can be understood that, in the present invention, the tasks executed by the redundant nodes are not directly dispatched by the control management terminal, but are shared with the subtasks of the input processing nodes paired therewith, so that in the pairing process, one input processing node is necessary in the most basic pairing, and the specific number of the redundant nodes can be determined according to how many secondary subtasks the subtasks can be divided into.

In one case of this embodiment, when the secondary management terminal cannot split the subtask and divide the subtask into a plurality of secondary subtasks, a pairing error instruction is sent to the node matching unit, the node matching unit cancels the pairing, and the input processing node and the redundant node in the pairing process return to the previous state.

In another case of this embodiment, when the input processing node fails to pair, the node matching unit marks the input processing node as an unmatchable node, the unmatchable node does not belong to a free input processing node, the successfully paired input processing node and the redundant node transmit a data stream to the data interaction unit, and data interaction is implemented with the central node through the data interaction unit.

It should be noted that, in another case of this embodiment, after the current task is completed, the virtual node IP is cancelled, the pairing is cancelled, and the redundant node becomes a new input processing node.

The functions that can be realized by the distributed data analysis task scheduling system and method are all completed by computer equipment, the computer equipment comprises one or more processors and one or more memories, at least one program code is stored in the one or more memories, and the program code is loaded and executed by the one or more processors to realize the functions of the distributed data analysis task scheduling system and method.

The processor fetches instructions and analyzes the instructions one by one from the memory, then completes corresponding operations according to the instruction requirements, generates a series of control commands, enables all parts of the computer to automatically, continuously and coordinately act to form an organic whole, realizes the input of programs, the input of data, the operation and the output of results, and the arithmetic operation or the logic operation generated in the process is completed by the arithmetic unit; the Memory comprises a Read-Only Memory (ROM) for storing a computer program, and a protection device is arranged outside the Memory.

Illustratively, a computer program can be partitioned into one or more modules, which are stored in memory and executed by a processor to implement the present invention. One or more of the modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the terminal device.

Those skilled in the art will appreciate that the above description of the service device is merely exemplary and not limiting of the terminal device, and may include more or less components than those described, or combine certain components, or different components, such as may include input output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal equipment and connects the various parts of the entire user terminal using various interfaces and lines.

The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the terminal device by operating or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory mainly comprises a storage program area and a storage data area, wherein the storage program area can store an operating system, application programs (such as an information acquisition template display function, a product information publishing function and the like) required by at least one function and the like; the storage data area may store data created according to the use of the berth-state display system (e.g., product information acquisition templates corresponding to different product types, product information that needs to be issued by different product providers, etc.), and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the modules/units in the system according to the above embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the functions of the embodiments of the system. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A distributed data analysis task scheduling system, said system comprising:

2. The distributed data analysis task scheduling system of claim 1, wherein the redundant node configuration module comprises:

3. The distributed data analysis task scheduling system of claim 2, wherein the redundant node configuration module further comprises:

4. The distributed data analysis task scheduling system according to claim 3, wherein the control management terminal includes a secondary management terminal corresponding to the sub-task dividing unit, and the secondary management terminal divides the sub-task into a plurality of secondary sub-tasks according to a task dividing logic of the control management terminal after receiving the sub-task information transmitted by the sub-task dividing unit.

5. The distributed data analysis task scheduling system of claim 2 wherein there are one or more redundant nodes in matching an input processing node with a redundant node.

6. The distributed data analysis task scheduling system of claim 4, wherein when the secondary management terminal cannot split the subtask and divide the subtask into a plurality of secondary subtasks, a pairing error instruction is sent to the node matching unit, the node matching unit cancels the pairing, and the input processing node and the redundant node in the pairing process return to a previous state.

7. The distributed data analysis task scheduling system of claim 6, wherein when the input processing node pair fails, the node matching unit marks the input processing node as a non-matchable node, the non-matchable node not belonging to a free input processing node.

8. The distributed data analysis task scheduling system of claim 2, wherein the successfully paired input processing node and redundant node transmit data streams to the data interaction unit, and implement data interaction with the central node through the data interaction unit.

9. The distributed data analysis task scheduling system of claim 2 wherein after completion of the current task, the virtual node IP is cancelled, the pairing is cancelled, and the redundant node becomes the new input processing node.

10. A distributed data analysis task scheduling method is characterized by comprising the following steps: