CN113986519A

CN113986519A - Data scheduling processing method, device, equipment and storage medium

Info

Publication number: CN113986519A
Application number: CN202111625744.9A
Authority: CN
Inventors: 魏朝凌; 冉体松
Original assignee: Shenzhen Bimernet Technology Co ltd
Current assignee: Shenzhen Bimernet Technology Co ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-01-28
Anticipated expiration: 2041-12-29
Also published as: CN113986519B

Abstract

The invention relates to the field of data processing, and discloses a data scheduling processing method, a data scheduling processing device, data scheduling processing equipment and a storage medium. The method comprises the following steps: acquiring target processing data; judging whether the target processing data needs to be split or not; if the splitting processing is needed, splitting the target processing data according to a preset splitting algorithm to obtain N queue execution data, wherein N is a positive integer; converting the N queue execution data into N JSON file data; according to a preset distribution execution algorithm, carrying out distribution execution processing on the N JSON file data to generate N queue message processing data; monitoring the message processing data of the N queues to generate a data processing result, wherein the data processing result comprises: a process continuation result and a process completion result.

Description

Data scheduling processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a data scheduling method, apparatus, device, and storage medium.

Background

Task scheduling, as part of a software system or platform, is often used to execute programs that need to be executed periodically or temporarily in software, and the results of program execution, which usually do not need to be fed back to the user immediately, can be understood as "programs running in the background" at a simple point.

The existing task scheduler has the following problems:

A) stand-alone implementation. Many software or platforms, which only support stand-alone or single-node execution; however, in a large system, in order to support higher concurrency, a distributed architecture is usually adopted, and in this case, a task scheduler that only supports single-machine execution may have fatal problems such as repeated execution.

B) Only timed tasks are supported. The normal task schedulers such as quartz, etc. only support timing execution; for the temporary operation of the user, the executive program is required to be developed again.

C) Heterogeneous systems are not supported. Most task schedulers, which usually do not support tasks in different development languages, such as the scheduler for java development can only execute the task program for java development; some of the components support schedulers for executing tasks developed in different languages, and support local execution only, and cannot execute tasks on different servers or different operating systems.

Therefore, in the face of the data processing scheduling problem of the multi-language system, a technical problem that the current data processing scheduling system lacks universality and reusability is needed to be solved.

Disclosure of Invention

The invention mainly aims to solve the technical problem that a data processing scheduling system is lack of universality and reusability.

A first aspect of the present invention provides a data scheduling processing method, where the data scheduling processing method includes:

acquiring target processing data;

judging whether the target processing data needs to be split or not;

if the splitting processing is needed, splitting the target processing data according to a preset splitting algorithm to obtain N queue execution data, wherein N is a positive integer;

converting the N queue execution data into N JSON file data;

according to a preset distribution execution algorithm, carrying out distribution execution processing on the N JSON file data to generate N queue message processing data;

monitoring the message processing data of the N queues to generate a data processing result, wherein the data processing result comprises: a process continuation result and a process completion result.

Optionally, in a first implementation manner of the first aspect of the present invention, the acquiring target processing data includes:

based on a timing acquisition mode, capturing task data from a preset task table;

receiving a parameter setting instruction, and reading parameter data in the parameter setting instruction;

and matching and combining the task data and the parameter data to obtain target processing data.

Optionally, in a second implementation manner of the first aspect of the present invention, the determining whether the target processing data needs to be split includes:

judging whether parameter data in the target processing data belong to preset splitting setting data or not;

if the data belongs to the splitting setting data, determining that the target processing data needs to be split;

and if the data does not belong to the splitting setting data, determining that the target processing data does not need to be split.

Optionally, in a third implementation manner of the first aspect of the present invention, the splitting the target processing data according to a preset splitting algorithm to obtain N queue execution data includes:

splitting the parameter data of the target processing data into a plurality of minimum execution data based on the minimum splitting unit to obtain a minimum execution data set;

combining the data in the minimum execution data set to obtain an execution combination set;

and determining the elements in the execution combination set as queue execution data to obtain N queue execution data.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing, according to a preset allocation execution algorithm, allocation execution processing on the N JSON file data, and generating N queue message processing data includes:

analyzing the N JSON file data based on a preset Java execution database to obtain N execution data sets;

and inputting the N execution data sets into M processing threads based on the execution data set as an object to obtain N queue message processing data, wherein M is a positive integer.

Optionally, in a fifth implementation manner of the first aspect of the present invention, after analyzing the N JSON file data based on the preset Java execution database to obtain N execution data sets, before inputting the N execution data sets into M processing threads and obtaining N queue message processing data, taking the execution data set as an object, the method further includes:

periodically receiving state uploading data of Y processing threads according to a preset maintenance period, wherein Y is a positive integer not less than M;

and analyzing the state uploading data to obtain M executable processing threads in the Y processing threads.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the monitoring the N queue message processing data, and generating a data processing result includes:

sequentially reading the N queue message processing data to obtain N status characters;

judging whether the N state characters belong to processed state characters or not;

if the data processing results belong to the same category, determining the data processing results as processing completion results;

and if the data processing results do not belong to the same category, determining the data processing results as processing continuation results.

A second aspect of the present invention provides a data scheduling processing apparatus, including:

the acquisition module is used for acquiring target processing data;

the judging module is used for judging whether the target processing data needs to be split or not;

the splitting module is used for splitting the target processing data according to a preset splitting algorithm if the splitting processing is needed, so as to obtain N queue execution data, wherein N is a positive integer;

the conversion module is used for converting the N queue execution data into N JSON file data;

the distribution execution module is used for carrying out distribution execution processing on the N JSON file data according to a preset distribution execution algorithm to generate N queue message processing data;

a monitoring module, configured to monitor the N queue message processing data and generate a data processing result, where the data processing result includes: a process continuation result and a process completion result.

A third aspect of the present invention provides a data scheduling processing apparatus, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor calls the instructions in the memory to enable the data scheduling processing equipment to execute the data scheduling processing method.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-described data scheduling processing method.

In the embodiment of the invention, a JSON universal data format is used, a producer-consumer mode is based on, and the task is supported to be split, so that a distributed architecture can be supported; when the user quantity is increased and the transverse expansion is needed, only a producer or a consumer needs to be added singly or simultaneously, and the expansion is simple and convenient. The message is distributed based on the message queue, no matter timing or manual triggering of a user, only the message task message needs to be added, the task starting mode is flexible, and the technical problem that the data processing and scheduling system is lack of universality and reusability is solved.

Drawings

Fig. 1 is a schematic diagram of an embodiment of a data scheduling processing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an embodiment of a data scheduling processing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic diagram of another embodiment of a data scheduling processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an embodiment of a data scheduling processing apparatus in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a data scheduling processing method, a device, equipment and a storage medium.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a data scheduling processing method according to the embodiment of the present invention includes:

101. acquiring target processing data;

in this embodiment, the scheduling Center (Schedule Center/Schedule-service) mainly includes scheduling management and executor management. Actuator Management (Actuator Management) is mainly responsible for communicating with actuators. For different executors (such as part C # of the Check Engine, it can be understood as an executor), the backend only needs to implement the corresponding java executor subsequently, and is managed by an executor manager (administrator Management) in a unified way. Scheduling Management (Scheduler Management) is mainly responsible for the initiation of tasks, the selection of executors and the processing of results. The system is only responsible for scheduling tasks and does not care about the specific execution of the tasks (the specific execution is handed to the executor for management).

Further, at 101, the following steps may be performed:

1011. based on a timing acquisition mode, capturing task data from a preset task table;

1012. receiving a parameter setting instruction, and reading parameter data in the parameter setting instruction;

1013. and matching and combining the task data and the parameter data to obtain target processing data.

In 1011-.

The parameter setting instruction may be data such as a storage address and whether splitting is performed, for example, in the splitting parameter item, 0 indicates splitting is performed, and 1 indicates not splitting is performed. And combining the task data of the execution mode and the parameter data to obtain target data needing to be processed.

102. Judging whether the target processing data needs to be split or not;

in this embodiment, whether the split parameter in the target processing data belongs to 1 or 0 is checked, and based on the definition parameter of 0 or 1, an indication whether the target processing data needs to be split is obtained.

Further, at 102, the following steps may be performed:

1021. judging whether parameter data in target processing data belong to preset splitting setting data or not;

1022. if the data belongs to the splitting setting data, determining that the target processing data needs to be split;

1023. and if the data does not belong to the splitting setting data, determining that the target processing data does not need to be split.

In 1021-. On the contrary, if the splitting parameter in the parameter data belongs to any one of the characters "R", "P", and "Q", it is determined that the target processing data needs to be split.

103. If the splitting processing is needed, splitting the target processing data according to a preset splitting algorithm to obtain N queue execution data, wherein N is a positive integer;

in this embodiment, the parameters of the service delivery are split into the minimum execution unit. For example, the parameters with the least execution units are a and b, and the parameters delivered by the user are [ a1, a2, a3 ] and [ b1, b2, b3 ], so that the user can split the message into 9 combinations of a1b1, a1b2, a1b3, a2b1, a2b2, a2b3, a3b1, a3b2, a3b3 and the like before sending the message. Such splitting may allow tasks to execute at the same time at an increased rate.

Further, 103 may perform the following steps:

1031. splitting parameter data of the target processing data into a plurality of minimum execution data based on the minimum splitting unit to obtain a minimum execution data set;

1032. combining the data in the minimum execution data set to obtain an execution combination set;

1033. and determining the elements in the execution combination set as queue execution data to obtain N queue execution data.

In the 1031-1033 step, the parameters of the target data are [ a1, a2, a3 ] and [ b1, b2, b3 ], and the minimum execution units are a and b, which are divided into a1, a2, a3, b1, b2 and b 3. The split data are combined in a permutation and combination mode to obtain an execution combination set { a1b1, b1a1, a1b2, b2a1, a1b3, b3a1, a2b1, b1a2, a2b2, b2a2, a2b3, b3a2, b1a3, a3b1, b2a3, a3b2, b3a3 and a3b3}, 18 elements are determined as queue execution data, and 18 queue execution data are obtained.

104. Converting the N queue execution data into N JSON file data;

in the embodiment, the data exchange problem of the heterogeneous system is solved by using a JSON (Java Server pages), which is a universal data format. And after the executors execute the tasks, the results are put back to the message queue to be combined. The most important advantages of this design are low coupling and high scalability:

1) low coupling: the scheduling of tasks and the execution of tasks are completely separated. The design ensures that the realization of the scheduler and the executor are not influenced mutually, wherein, the realization logic is upgraded or changed at any end, and the other end is not influenced as long as the JSON format of the exchange data is not modified. This is also the basis for the latter high scalability.

2) High expansibility: according to the actual operation condition, the executor nodes can be dynamically added or the splitting strategy can be modified. For example, when the user concurrency is high and the execution speed of the executor is slower than the added number of tasks, the executor node may be added to increase the execution speed, or the splitting policy may be adjusted, for example, two sets of parameters are distributed each time, such as one time of sending [ a1b1, a2b1] to a scheduler for execution.

105. According to a preset distribution execution algorithm, carrying out distribution execution processing on the N JSON file data to generate N queue message processing data;

in this embodiment, N JSON file data are allocated to multiple threads for processing, and if the number of threads is greater than N, one thread is not repeatedly allocated to each JSON file, so that N queue messages are obtained. However, if the number of threads is less than N, after the JSON files are sorted and allocated, at least more than one JSON file is allocated to one thread, and data is generated for the processing state of each JSON file to obtain N queue message processing data.

Further, 105 may perform the following steps:

1051. analyzing N JSON file data based on a preset Java execution database to obtain N execution data sets;

1052. and inputting the N execution data sets into M processing threads based on the execution data set as an object to obtain N queue message processing data, wherein M is a positive integer.

In the 1051 and 1052 steps, N file data are distributed to M processing threads by Java executors, and N queue message processing data are generated from the processing content of JSON file data.

Preferably, between 1051 and 1052, the following steps may also be performed:

1053. periodically receiving state uploading data of Y processing threads according to a preset maintenance period, wherein Y is a positive integer not less than M;

1054. and analyzing the state uploading data to obtain M executable processing threads in the Y processing threads.

In the steps 1053 and 1054, all of the Y threads are not necessarily valid, and an executor (such as Check Engine) needs to call a heartbeat interface and report its basic information (such as service name, state, etc.) every 1 minute after starting and starting. The execution management will issue tasks according to the maintained state of the execution. When the task is issued, the service name of the executor (the name is provided by the last registration) of the task in the afternoon is provided, so that the executor determines whether the task needs to be executed; if the executor receives a task which does not belong to the executor, the task needs to be returned (such as putting back MQ), and then the Y threads are analyzed to obtain M threads which can execute the task.

106. Monitoring N queue message processing data, and generating a data processing result, wherein the data processing result comprises: a process continuation result and a process completion result.

In this embodiment, the result of task distributed execution is processed in a message queue manner, and the processing conditions indicated by the message processing data in the N queues are periodically monitored to obtain the data processing result.

Further, 106 may perform the following steps:

1061. sequentially reading N queue message processing data to obtain N state characters;

1062. judging whether the N state characters belong to the processed state characters;

1063. if the data processing results belong to the same category, determining the data processing results as processing completion results;

1064. and if the unevenness belongs to the same, determining the data processing result as a processing continuation result.

In the 1061-1064 step, the N queue message processing data are analyzed, and if the processing is set to 0 and the processing is not set to 1, N sequences "00010 … 00" are generated, and then it is determined whether all the N sequences belong to 0. The sequence may also be converted to a value, and a determination is made as to whether the value is zero, and if not, processing continues, and if zero, processing is complete.

With reference to fig. 2, the data scheduling processing method in the embodiment of the present invention is described above, and a data scheduling processing apparatus in the embodiment of the present invention is described below, where an embodiment of the data scheduling processing apparatus in the embodiment of the present invention includes:

an obtaining module 201, configured to obtain target processing data;

a judging module 202, configured to judge whether the target processing data needs to be split;

the splitting module 203 is configured to split the target processing data according to a preset splitting algorithm if splitting processing is required, so as to obtain N queue execution data, where N is a positive integer;

a conversion module 204, configured to convert the N queue execution data into N JSON file data;

the allocation execution module 205 is configured to perform allocation execution processing on the N JSON file data according to a preset allocation execution algorithm, and generate N queue message processing data;

a monitoring module 206, configured to monitor the N queue message processing data, and generate a data processing result, where the data processing result includes: a process continuation result and a process completion result.

Referring to fig. 3, another embodiment of a data scheduling processing apparatus according to the embodiment of the present invention includes:

an obtaining module 201, configured to obtain target processing data;

The obtaining module 201 is specifically configured to:

The determining module 202 is specifically configured to:

Wherein the splitting module 203 is specifically configured to:

The allocation execution module 205 is specifically configured to:

The data scheduling processing apparatus further includes a state analysis module 207, where the state analysis module 207 is specifically configured to:

The monitoring module 206 is specifically configured to:

Fig. 2 and fig. 3 describe the data scheduling processing apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the data scheduling processing apparatus in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 4 is a schematic structural diagram of a data scheduling processing apparatus 400 according to an embodiment of the present invention, where the data scheduling processing apparatus 400 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 410 (e.g., one or more processors) and a memory 420, and one or more storage media 430 (e.g., one or more mass storage devices) storing an application 433 or data 432. Memory 420 and storage medium 430 may be, among other things, transient or persistent storage. The program stored on the storage medium 430 may include one or more modules (not shown), each of which may include a series of instructions operating on the data scheduling processing device 400. Further, the processor 410 may be configured to communicate with the storage medium 430 to execute a series of instruction operations in the storage medium 430 on the data schedule processing apparatus 400.

The data scheduling based processing apparatus 400 may also include one or more power supplies 440, one or more wired or wireless network interfaces 450, one or more input-output interfaces 460, and/or one or more operating systems 431, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the data schedule-based processing apparatus architecture illustrated in fig. 4 does not constitute a limitation of the data schedule-based processing apparatus and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the data scheduling processing method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses, and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A data scheduling processing method is characterized by comprising the following steps:

acquiring target processing data;

judging whether the target processing data needs to be split or not;

converting the N queue execution data into N JSON file data;

2. The data scheduling processing method according to claim 1, wherein the acquiring target processing data includes:

3. The data scheduling processing method according to claim 2, wherein the determining whether the target processing data needs to be split includes:

4. The data scheduling processing method according to claim 1, wherein the splitting the target processing data according to a preset splitting algorithm to obtain N queue execution data includes:

5. The data scheduling processing method according to claim 1, wherein the performing, according to a preset allocation execution algorithm, allocation execution processing on the N JSON file data, and generating N queue message processing data includes:

6. The data scheduling processing method according to claim 5, wherein after the preset Java-based execution database analyzes the N JSON file data to obtain N execution data sets, before the execution data sets are used as objects, the N execution data sets are input into M processing threads to obtain N queue message processing data, the method further comprises:

7. The data scheduling processing method according to claim 1, wherein the monitoring the N queue messages to process data, and the generating the data processing result includes:

8. A data scheduling processing apparatus, characterized in that the data scheduling processing apparatus comprises:

the acquisition module is used for acquiring target processing data;

9. A data schedule processing apparatus, characterized in that the data schedule processing apparatus comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the data schedule processing apparatus to perform the data schedule processing method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a data scheduling processing method according to any one of claims 1 to 7.