WO2020253116A1

WO2020253116A1 - Batch data execution method, device, storage medium, and member host in cluster

Info

Publication number: WO2020253116A1
Application number: PCT/CN2019/121210
Authority: WO
Inventors: 符修亮; 叶松; 梁群峰; 吕林澧
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2019-06-20
Filing date: 2019-11-27
Publication date: 2020-12-24
Also published as: CN110362401A

Abstract

A batch data execution method, a device, a storage medium, and a member host in a cluster. The method comprises: acquiring batch data to be processed corresponding to a current task node, performing random partitioning on the batch data, and obtaining a pre-determined number of data partitions (S10); generating sub-tasks corresponding to the partitions according to the pre-determined number of data partitions, storing the pre-determined number of the sub-tasks in a pre-determined queue list in the form of a list, and sending, via a message middleware, a task processing message to each member host in a cluster, such that the member host in the cluster acquires, upon detecting the task processing message, a sub-task from the pre-determined queue list to process (S20); acquiring sample outlier data and corresponding sample outlier points (S50); training a convolutional neural network model according to the sample outlier data and the corresponding sample outlier points, and obtaining an outlier identification model (S60); acquiring a log record for execution of each sub-task, and identifying outlier data in the log record by means of the outlier identification model (S70); and locating an outlier point according to the outlier data, and processing the outlier point to successfully complete sub-task processing (S80). The invention is based on big data and achieves batch data processing by means of a cluster. Each member host in the cluster is capable of both distributing a task and executing a task, thereby achieving a maximum resource utilization rate. In addition, data partitioning is not dependent on a database, thereby enabling unrestricted expansion of the number of sub-tasks, and increasing a concurrent data processing speed.

Description

Data running batch method, device, storage medium and member host in cluster To

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 20, 2019, the application number is 201910553729.4, and the invention title is "data running batch method, device, storage medium and member host in a cluster", all of which The content is incorporated in the application by reference

Technical field

This application relates to the technical field of big data, and in particular to a data batch method, device, storage medium, and member hosts in a cluster.

Background technique

At present, when large-scale data processing is involved, the running batch application is deployed separately. When the batch is not running, the idle running application causes a waste of resources; single-threaded execution is used when large-scale tasks are executed, which results in slower task execution speed; large-scale tasks The data partition of Oracle relies on Oralce's hash partition. After the data volume of a table in Oralce reaches the order of 100 million or the single expression reaches 2G, the query efficiency will be significantly reduced. It needs to be partitioned from the row dimension Divide the table to avoid too much data in a single table. For data that is not regular or the range of values is difficult to determine, partitioning is forced through the hash method. The number of partitions needs to be set to a power of 2, and the scalability is not Strong. Therefore, how to improve the efficiency of large-scale data processing and maximize the use of resources is a technical problem to be solved urgently.

The above content is only used to assist the understanding of the technical solution of this application, and does not mean that the above content is recognized as prior art.

Summary of the invention

The main purpose of this application is to provide a data running batch method, device, storage medium, and member hosts in a cluster, which aims to solve the technical problems of low efficiency of large-batch data processing and low resource utilization in the prior art.

In order to achieve the above objective, this application provides a data running batch method, which includes the following steps:

Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;

Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;

Obtain sample abnormal data and corresponding sample abnormal points; train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model; obtain log records of executing each subtask, and pass the abnormality The recognition model recognizes the abnormal data in the log record; locates the abnormal point according to the abnormal data, and processes the abnormal point to successfully complete the subtask.

Preferably, the subtasks corresponding to the partitions are generated according to the preset number of partition data, the preset number of subtasks are stored in a preset queue list in the form of a queue, and the task processing is sent through a message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, after obtaining subtasks from the preset queue list for processing, the data running method returns include:

When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.

Preferably, when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is subtracted by one, and after obtaining the number of remaining subtasks, the data running method returns include:

When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, the message that all subtasks are processed is sent to the cluster through the message middleware.

Preferably, when the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and the processing of the last subtask is monitored. At the end, after the cluster consumption mode is adopted to send the subtask fully processed message to the cluster through the message middleware, the data batching method further includes:

When the message that all the subtasks are processed is completed, it is judged whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, the next task node is acquired and the next task is executed.

Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.

Preferably, the monitoring by means of broadcast consumption, and when the task processing message is monitored, obtaining a subtask from the preset queue list for processing includes:

Monitor through broadcast consumption;

When the task processing message is monitored, subtasks are obtained from the preset queue list for processing;

Block the function of obtaining subtasks on other member hosts in the cluster except the member host for obtaining subtasks.

Preferably, when the task processing message is monitored, obtaining a subtask from the preset queue list for processing includes:

When the task processing message is monitored, the CPU occupancy rate is calculated, multiple subtasks are obtained from the preset queue list according to the CPU occupancy rate, and multiple threads are used to concurrently process multiple subtasks.

In addition, in order to achieve the above object, this application also proposes a member host in a cluster. The member host in the cluster includes a memory, a processor, and a computer that is stored on the memory and can run on the processor. Read instructions, the computer-readable instructions are configured to implement the steps of the data running method as described above.

In addition, in order to achieve the above-mentioned object, this application also proposes a storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the method for data running as described above is implemented. step.

In addition, in order to achieve the above objective, this application also proposes a data batch running device, which includes:

The random partition module is used to obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data;

The generating module is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in a preset queue list, and send the task processing through the message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;

The acquisition module is used to acquire sample abnormal data and corresponding sample abnormalities;

The training module is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;

The identification module is used to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;

The positioning module is used to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.

In this application, by obtaining the batch of to-be-processed data corresponding to the current task node, the batch of to-be-processed data is randomly partitioned to obtain a preset number of partitioned data. The data partitioning does not depend on the database, and the number of sub-tasks can be arbitrarily expanded. Concurrent processing speed of big data; generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in the preset queue list, and send them through the message middleware The task processing message is sent to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains subtasks from the preset queue list for processing. Based on big data, through the cluster Process large quantities of data, distribute tasks and perform tasks at the same time to maximize the use of resources.

Description of the drawings

FIG. 1 is a schematic structural diagram of member hosts in a cluster of hardware operating environments involved in a solution of an embodiment of the present application;

FIG. 2 is a schematic flowchart of the first embodiment of the application data running method;

3 is a schematic flowchart of a second embodiment of the application data running method;

4 is a schematic flowchart of a third embodiment of the application data running method;

Fig. 5 is a structural block diagram of the first embodiment of the data running and batching device of this application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.

Referring to FIG. 1, FIG. 1 is a schematic diagram of a member host structure in a cluster of a hardware operating environment involved in a solution of an embodiment of the application.

As shown in Figure 1, the member hosts in the cluster may include a processor 1001, such as a central processing unit (Central Processing Unit). Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The wired interface of the user interface 1003 may be a USB interface in this application. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, can also be a stable memory (Non-volatile Memory, NVM), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001.

Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the member hosts in the cluster, and may include more or less components than shown in the figure, or combine some components, or different component arrangements. .

As shown in FIG. 1, a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer readable instructions.

In the member hosts in the cluster shown in FIG. 1, the network interface 1004 is mainly used to connect to a back-end server and to communicate data with the back-end server; the user interface 1003 is mainly used to connect to user equipment; the member hosts in the cluster pass through The processor 1001 calls the computer-readable instructions stored in the memory 1005, and executes the data running method provided in the embodiment of the present application.

Based on the above hardware structure, an embodiment of the data running batch method of this application is proposed.

Referring to Fig. 2, Fig. 2 is a schematic flow chart of the first embodiment of the data running method of this application, and the first embodiment of the data running method of this application is proposed.

In the first embodiment, the data batch method includes the following steps:

Step S10: Obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data.

It should be understood that the executive body of this embodiment is the member host in the cluster, where the member host in the cluster may be an electronic device such as a personal computer or a server. The cluster includes multiple member hosts, and all member hosts have the same function, and can distribute tasks and perform tasks at the same time, so as to maximize the use of resources. The execution subject of this embodiment is any member host in the cluster, and the batch running task is started periodically through a preset open source project. The preset open source project is quartz. After the batch running task is started, the current task is obtained. Node, and randomly partition the batch of to-be-processed data corresponding to the current task node, usually divided into 62 partitions of az, AZ, and 0-9, that is, the preset number is 62. The partition can be expanded and divided into a larger number of partitions. The data partition does not depend on the database, and the number of subtasks can be expanded arbitrarily to increase the concurrent processing speed of data.

Step S20: Generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in the preset queue list in the form of a queue, and send task processing messages to Each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing.

It is understandable that after the batch of data to be processed is partitioned, a preset number of subtasks corresponding to the partition are generated. Usually, the preset number is 62, that is, 62 subtasks, and the task labels are az, AZ, and 0- 9. The number of subtasks can also be expanded, divided into a larger number of subtasks, and put these 62 subtasks in a queue to the preset queue list, and the preset queue list is a redis queue list. Send the task processing message through the message middleware to notify each host in the cluster that there is a task to be processed.

It should be noted that the cluster monitors through broadcast consumption, so each member host in the cluster will receive the task processing message, and then obtain subtasks from the redis queue list, and call and obtain subtasks. The processing class corresponding to the task performs processing, and the processing class is usually a subtask executing corresponding algorithms or rules, etc., and the subtask is executed according to the processing class to obtain result data. For example, if the subtask is calculating the commission for order A, and the type of order A is a consumer loan, then the processing type is the commission calculation formula for the consumer loan type order, and the A order is calculated according to the commission calculation formula for the consumer loan type order The commission is calculated, that is, the subtask is processed.

Further, in this embodiment, after the step S20, the method further includes:

Step S50: Obtain sample abnormal data and corresponding sample abnormal points.

Step S60: Training the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model.

Step S70: Obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model.

Step S80: locate an abnormal point according to the abnormal data, and process the abnormal point to successfully complete the subtask.

In the specific implementation, when performing each subtask, log records of processing errors or abnormal data, and obtain a large number of sample abnormal data and corresponding sample abnormal points in advance, and learn the sample abnormal data and the corresponding abnormal points through the convolutional neural network model. Corresponding to the abnormal point of the sample, an abnormal recognition model is obtained, and the error or abnormal data in the log record is identified through the abnormal recognition model, so as to locate the abnormal point and perform abnormal processing to successfully complete the subtask.

In this embodiment, by obtaining the batch of to-be-processed data corresponding to the current task node, the batch of to-be-processed data is randomly partitioned to obtain a preset number of partitioned data. The data partitioning does not depend on the database, and the number of subtasks can be arbitrarily expanded. Increase the concurrent processing speed of data; generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in a preset queue list, and pass the message middleware Send a task processing message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains subtasks from the preset queue list for processing, based on big data, through The cluster processes large quantities of data, and distributes tasks while also performing tasks to maximize the use of resources.

Referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of the data running method of this application. Based on the first embodiment shown in FIG. 2 above, a second embodiment of the data running method of this application is proposed.

In the second embodiment, after the step S20, the method further includes:

Step S30: When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.

It should be understood that, in order to avoid repeated acquisition or execution of subtasks, redis+lua scripts are used to implement distributed locks for each member host to obtain subtasks from the preset queue list. When any member host B receives from the device When subtasks are acquired in the queue list, other member hosts are blocked from acquiring subtasks, that is, other member hosts are not allowed to obtain subtasks at this time, which greatly reduces the time brought by locking while achieving data synchronization Overhead. Wait for the member host B to obtain a subtask from the preset queue list, and subtract one from the preset number in the preset queue list to obtain the remaining number of subtasks, for example, the preset number is 62. The member host B obtains one subtask from the preset queue list, and the number of remaining subtasks is 61. After the member host B obtains a subtask from the preset queue list, the function of obtaining the subtask is opened to other member hosts, and the other member hosts can obtain the subtask from the preset queue list for processing until All the subtasks in the preset queue list are acquired and executed.

In the second embodiment, after the step S30, the method further includes:

Step S40: When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, and the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored , Adopting a cluster consumption mode to send all subtask processing completed messages to the cluster through the message middleware.

It should be noted that when any member host C in the cluster obtains subtasks from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks , If the number of remaining subtasks is zero, it means that the subtask obtained by the member host C from the preset queue list is the last subtask, and when the member host C finishes processing the last subtask , The group consumption mode can be adopted to send all subtask processing completed messages to the cluster through the message middleware.

In a specific implementation, in order to enable each host in the cluster to process more tasks continuously and efficiently, when the member host C in the cluster has processed the last subtask, the subtask can be sent through the message middleware All the tasks are processed to the cluster, so that each host in the cluster can continue to obtain the next task node to process the tasks of the next task node, and the consumption mode of all the subtasks to process the completed message adopts cluster consumption Therefore, only one host will listen to the message that all the subtasks have been processed.

In the second embodiment, after the step S40, the method further includes:

It is understandable that the member host C is the member host that processed the last subtask in the cluster, and the member host C sends a subtask complete processing completion message through the message middleware, and the subtask The consumption mode of all processed messages is cluster consumption, so only one member host, such as member host D, listens to the message that all the subtasks are processed, and the member host D is any member host in the cluster.

In a specific implementation, when the member host D listens to the message that the subtasks are all processed, it judges whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, it automatically enters the next task node for execution If all the subtasks are not processed successfully, the next task node will be suspended and the problem will be checked. If the problem has been checked and processed, the task processing mechanism will be awakened again, and the next task node will be acquired and the next task will be executed. Usually, the subtask processing data is monitored. If abnormal data is monitored, an alarm is issued and manual intervention is performed to troubleshoot errors.

In the second embodiment, when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is subtracted by one to obtain the number of remaining subtasks, when the remaining subtasks When the number is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the message is passed through the cluster consumption mode The middleware sends a message that all subtasks have been processed to the cluster, and by counting the number of subtasks in the queue list, it can grasp the execution progress of the subtasks, and when all subtasks have been processed, it will send the tasks to be completed in time Message, so that the member hosts in the cluster can obtain the next task node in time to process the next task, so as to improve the efficiency of large-scale data processing.

Referring to FIG. 4, FIG. 4 is a schematic flowchart of a third embodiment of the data running method of this application. Based on the second embodiment shown in FIG. 3, a third embodiment of the data running method of this application is proposed.

In the third embodiment, after the step S20, the method further includes:

Step S201: Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.

It should be understood that the cluster includes multiple member hosts, and all member hosts have the same function. They can distribute tasks and perform tasks at the same time, so as to maximize the use of resources. Each member host in the cluster passes Broadcast consumption mode for monitoring, so all hosts in the cluster will receive the task processing message, and then get the subtask from the redis queue list, and call the corresponding processing class for processing, and do for processing errors or abnormal data Logging.

Further, the step S201 includes:

Monitor through broadcast consumption;

It should be noted that all the host distribution tasks in the cluster can also execute tasks at the same time. To avoid repeated acquisition or execution of subtasks, redis+lua scripts are used for each member host to obtain subtasks from the preset queue list. A distributed lock is implemented. When any member host B obtains a subtask from the set queue list, the other member hosts in the cluster except the member host that obtains the subtask are blocked, namely Other member hosts are not allowed to perform the operation of obtaining subtasks at this time. After the member host B obtains one or more subtasks from the preset queue list, the member hosts in the cluster except for obtaining subtasks If other member hosts perform the function of open access to subtasks, other member hosts can obtain subtasks from the preset queue list for processing until all the subtasks in the preset queue list are acquired for execution. The redis+lua script is used to implement distributed locks, which greatly reduces the time overhead caused by locking while achieving data synchronization.

In this embodiment, when the task processing message is monitored, obtaining a subtask from the preset queue list for processing includes:

It is understandable that each member host in the cluster can process subtasks of different partitions at the same time, and each member host adopts multi-threaded concurrent processing tasks, and the processing speed is fast. Concurrent processing tasks are determined according to the CPU occupancy rate. Usually 5 threads are processed concurrently at the same time to improve the processing speed of batch subtasks. Each host in the cluster can calculate the respective CPU occupancy rate, determine the number of sub-tasks obtained according to the CPU occupancy rate, and obtain multiple sub-tasks without affecting the processing speed, so that multiple threads can concurrently process the acquisition The multiple subtasks of, improve the processing efficiency of subtasks in the preset queue list.

In this embodiment, the monitoring is performed through broadcast consumption. When the task processing message is monitored, subtasks are obtained from the preset queue list for processing, and all member hosts in the cluster will receive the task processing Messages, each member host in the cluster can distribute tasks at the same time, so as to maximize the use of resources and improve the efficiency of subtask processing.

In addition, the embodiment of the present application also proposes a storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium. Computer-readable instructions are stored on the storage medium, and the computer-readable instructions implement the steps of the data batching method as described above when the computer-readable instructions are executed by the processor.

For the method implemented when the computer-readable instruction is executed, please refer to the various embodiments of the data running method of this application, which will not be repeated here.

In addition, referring to FIG. 5, an embodiment of the present application also proposes a data batching device, and the data batching device includes:

The random partition module 10 is used to obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data.

The generating module 20 is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a preset queue list in the form of a queue, and send the tasks through the message middleware Process the message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing.

The obtaining module 30 is used to obtain sample abnormal data and corresponding sample abnormal points.

The training module 40 is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model.

The identification module 50 is configured to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal identification model.

The positioning module 60 is configured to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.

In an embodiment, the data batching device further includes:

The calculation module is configured to reduce the number of subtasks in the preset queue list by one when obtaining a subtask from the preset queue list to obtain the number of remaining subtasks.

In an embodiment, the data batching device further includes:

The sending module is used to determine that the acquired subtask is the last subtask when the number of remaining subtasks is zero, monitor the processing progress of the last subtask, and monitor the processing of the last subtask. At the end, the cluster consumption mode is adopted to send all the subtask processing completed messages to the cluster through the message middleware.

In one embodiment, the acquisition module 30 is also used to determine whether all the subtasks have been processed successfully when all the subtasks have been processed successfully. If all the subtasks have been processed successfully, then obtain the next one. Task node and execute the next task.

In an embodiment, the acquisition module 30 is further configured to monitor in a broadcast consumption manner, and when the task processing message is monitored, obtain subtasks from the preset queue list for processing.

In an embodiment, the acquisition module 30 is further configured to monitor through broadcast consumption; when the task processing message is monitored, obtain subtasks from the preset queue list for processing; In addition to the member host for obtaining the subtask, other member hosts are blocked for obtaining the subtask function.

In one embodiment, the acquisition module 30 is further configured to calculate the CPU occupancy rate when the task processing message is monitored, and acquire multiple subtasks from the preset queue list according to the CPU occupancy rate, and Use multithreading to process multiple subtasks concurrently.

For other embodiments or specific implementation manners of the data running batch device described in this application, reference may be made to the foregoing method embodiments, which will not be repeated here.

Through the description of the above embodiments, those skilled in the art can clearly understand the above The method of the embodiment can be realized by means of software plus the necessary general hardware platform, of course, it can also be realized by Over hardware, but in many cases the former is a better implementation. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product. Now, the computer software product is stored in a storage medium (such as Read Only Memory image (ROM)/Random Access Memory). Access Memory, RAM), magnetic disks, optical disks) include several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present invention.

The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly used in other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A method for running batches of data, characterized in that the method for running batches of data includes the following steps:

Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;

Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;

Obtain sample abnormal data and corresponding sample abnormalities;

Training a convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;

Obtain log records for executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;

The abnormal point is located according to the abnormal data, and the abnormal point is processed to successfully complete the subtask.
The data running batch method of claim 1, wherein the subtasks corresponding to the partitions are generated according to the preset number of partition data, and the preset number of subtasks are stored in a queue A queue list is preset, and task processing messages are sent to each member host in the cluster through the message middleware, so that when each member host in the cluster listens to the task processing message, it is obtained from the preset queue list After the subtasks are processed, the data batch method further includes:

When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
The data running batch method according to claim 2, wherein when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain After the number of remaining subtasks, the data running method further includes:

When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, a message that all subtasks are processed is sent to the cluster through the message middleware.
The data running batch method of claim 3, wherein when the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, and the processing progress of the last subtask is determined Perform monitoring. After monitoring that the last subtask has been processed, using the cluster consumption mode to send a subtask complete processing message to the cluster through the message middleware, the data batching method further includes:

When the message that all the subtasks are processed is completed, it is judged whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, the next task node is acquired and the next task is executed.
The data running batch method of claim 1, wherein the subtasks corresponding to the partitions are generated according to the preset number of partition data, and the preset number of subtasks are stored in a queue A queue list is preset, and task processing messages are sent to each member host in the cluster through the message middleware, so that when each member host in the cluster listens to the task processing message, it is obtained from the preset queue list After the subtasks are processed, the data batch method further includes:

Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
The data running batch method according to claim 5, wherein the monitoring through the broadcast consumption mode, and when the task processing message is monitored, obtaining subtasks from the preset queue list for processing, comprises :

Monitor through broadcast consumption;

When the task processing message is monitored, subtasks are obtained from the preset queue list for processing;

Block the function of obtaining subtasks on other member hosts in the cluster except the member host for obtaining subtasks.
7. The data running batch method of claim 6, wherein when the task processing message is monitored, obtaining subtasks from the preset queue list for processing includes:

When the task processing message is monitored, the CPU occupancy rate is calculated, multiple subtasks are obtained from the preset queue list according to the CPU occupancy rate, and multiple threads are used to concurrently process multiple subtasks.
A member host in a cluster, characterized in that the member host in the cluster includes: a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, the computer When the readable instructions are executed by the processor, the following steps are implemented:

Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;

Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;

Obtain sample abnormal data and corresponding sample abnormalities;

Training a convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;

Obtain log records for executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;

The abnormal point is located according to the abnormal data, and the abnormal point is processed to successfully complete the subtask.
The member host in the cluster according to claim 8, wherein the subtasks corresponding to the partition are generated according to the preset number of partition data, and the preset number of the subtasks are stored in a queue To the preset queue list, and send task processing messages to each member host in the cluster through the message middleware, so that each member host in the cluster listens to the task processing message from the preset queue list After obtaining the subtask for processing, it also includes:

When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
9. The member host in the cluster according to claim 9, wherein when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one, After obtaining the number of remaining subtasks, it also includes:

When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, the message that all subtasks are processed is sent to the cluster through the message middleware.
The member host in the cluster according to claim 10, wherein when the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, and the processing of the last subtask is Monitoring the progress, after monitoring that the last subtask has been processed, after the step of using the cluster consumption mode to send the subtask complete processing message to the cluster through the message middleware, it also includes:

When the message that all the subtasks are processed is completed, it is judged whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, the next task node is acquired and the next task is executed.
The member host in the cluster according to claim 8, wherein the subtasks corresponding to the partition are generated according to the preset number of partition data, and the preset number of the subtasks are stored in a queue To the preset queue list, and send task processing messages to each member host in the cluster through the message middleware, so that each member host in the cluster listens to the task processing message from the preset queue list After obtaining the subtask for processing, it also includes:

Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
A data batch running device, characterized in that the data batch running device comprises:

The random partition module is used to obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data;

The generating module is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in a preset queue list, and send the task processing through the message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;

The acquisition module is used to acquire sample abnormal data and corresponding sample abnormalities;

The training module is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;

The identification module is used to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;

The positioning module is used to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.
The data batch running device of claim 13, wherein the data batch running device further comprises:

The calculation module is configured to reduce the number of subtasks in the preset queue list by one when obtaining a subtask from the preset queue list to obtain the number of remaining subtasks.
The data batch running device of claim 14, wherein the data batch running device further comprises:

The sending module is used to determine that the acquired subtask is the last subtask when the number of remaining subtasks is zero, monitor the processing progress of the last subtask, and monitor the processing of the last subtask. At the end, the cluster consumption mode is adopted to send all the subtask processing completed messages to the cluster through the message middleware.
The data running batch device according to claim 15, wherein the acquiring module is further used to determine whether all the subtasks have been processed successfully when all the subtasks have been processed successfully. After all the processing is successful, the next task node is obtained and the next task is executed.
The data running batch device according to claim 13, wherein the acquisition module is further configured to monitor through broadcast consumption, and when the task processing message is monitored, obtain from the preset queue list Subtasks are processed.
A storage medium, characterized in that computer readable instructions are stored on the storage medium, and when the computer readable instructions are executed by a processor, the following steps are implemented:

Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;

Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;

Obtain sample abnormal data and corresponding sample abnormalities;

Training a convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;

Obtain log records for executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;

The abnormal point is located according to the abnormal data, and the abnormal point is processed to successfully complete the subtask.
The storage medium of claim 18, wherein the subtasks corresponding to the partitions are generated according to the preset number of partition data, and the preset number of subtasks are stored in a queue in a preset Queue list, and send task processing messages to each member host in the cluster through the message middleware, so that when each member host in the cluster listens to the task processing message, it obtains subtasks from the preset queue list After the processing steps, it also includes:

When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
The storage medium according to claim 19, wherein when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the remaining subtasks After the number of tasks, it also includes:

When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, the message that all subtasks are processed is sent to the cluster through the message middleware. To