WO2020253116A1 - Batch data execution method, device, storage medium, and member host in cluster - Google Patents

Batch data execution method, device, storage medium, and member host in cluster Download PDF

Info

Publication number
WO2020253116A1
WO2020253116A1 PCT/CN2019/121210 CN2019121210W WO2020253116A1 WO 2020253116 A1 WO2020253116 A1 WO 2020253116A1 CN 2019121210 W CN2019121210 W CN 2019121210W WO 2020253116 A1 WO2020253116 A1 WO 2020253116A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtasks
data
cluster
subtask
preset
Prior art date
Application number
PCT/CN2019/121210
Other languages
French (fr)
Chinese (zh)
Inventor
符修亮
叶松
梁群峰
吕林澧
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020253116A1 publication Critical patent/WO2020253116A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Definitions

  • This application relates to the technical field of big data, and in particular to a data batch method, device, storage medium, and member hosts in a cluster.
  • the running batch application is deployed separately.
  • the idle running application causes a waste of resources; single-threaded execution is used when large-scale tasks are executed, which results in slower task execution speed; large-scale tasks
  • the data partition of Oracle relies on Oralce's hash partition. After the data volume of a table in Oralce reaches the order of 100 million or the single expression reaches 2G, the query efficiency will be significantly reduced. It needs to be partitioned from the row dimension Divide the table to avoid too much data in a single table. For data that is not regular or the range of values is difficult to determine, partitioning is forced through the hash method. The number of partitions needs to be set to a power of 2, and the scalability is not Strong. Therefore, how to improve the efficiency of large-scale data processing and maximize the use of resources is a technical problem to be solved urgently.
  • the main purpose of this application is to provide a data running batch method, device, storage medium, and member hosts in a cluster, which aims to solve the technical problems of low efficiency of large-batch data processing and low resource utilization in the prior art.
  • this application provides a data running batch method, which includes the following steps:
  • sample abnormal data and corresponding sample abnormal points train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model; obtain log records of executing each subtask, and pass the abnormality
  • the recognition model recognizes the abnormal data in the log record; locates the abnormal point according to the abnormal data, and processes the abnormal point to successfully complete the subtask.
  • the subtasks corresponding to the partitions are generated according to the preset number of partition data, the preset number of subtasks are stored in a preset queue list in the form of a queue, and the task processing is sent through a message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, after obtaining subtasks from the preset queue list for processing, the data running method returns include:
  • the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
  • the data running method returns include:
  • the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, the message that all subtasks are processed is sent to the cluster through the message middleware.
  • the data batching method further includes:
  • the subtasks corresponding to the partitions are generated according to the preset number of partition data, the preset number of subtasks are stored in a preset queue list in the form of a queue, and the task processing is sent through a message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, after obtaining subtasks from the preset queue list for processing, the data running method returns include:
  • Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
  • the monitoring by means of broadcast consumption, and when the task processing message is monitored, obtaining a subtask from the preset queue list for processing includes:
  • obtaining a subtask from the preset queue list for processing includes:
  • the CPU occupancy rate is calculated, multiple subtasks are obtained from the preset queue list according to the CPU occupancy rate, and multiple threads are used to concurrently process multiple subtasks.
  • this application also proposes a member host in a cluster.
  • the member host in the cluster includes a memory, a processor, and a computer that is stored on the memory and can run on the processor.
  • Read instructions, the computer-readable instructions are configured to implement the steps of the data running method as described above.
  • this application also proposes a storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the method for data running as described above is implemented. step.
  • this application also proposes a data batch running device, which includes:
  • the random partition module is used to obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data;
  • the generating module is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in a preset queue list, and send the task processing through the message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;
  • the acquisition module is used to acquire sample abnormal data and corresponding sample abnormalities
  • the training module is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model
  • the identification module is used to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;
  • the positioning module is used to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.
  • the batch of to-be-processed data is randomly partitioned to obtain a preset number of partitioned data.
  • the data partitioning does not depend on the database, and the number of sub-tasks can be arbitrarily expanded.
  • Concurrent processing speed of big data generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in the preset queue list, and send them through the message middleware
  • the task processing message is sent to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains subtasks from the preset queue list for processing.
  • Based on big data through the cluster Process large quantities of data, distribute tasks and perform tasks at the same time to maximize the use of resources.
  • FIG. 1 is a schematic structural diagram of member hosts in a cluster of hardware operating environments involved in a solution of an embodiment of the present application
  • FIG. 2 is a schematic flowchart of the first embodiment of the application data running method
  • FIG. 3 is a schematic flowchart of a second embodiment of the application data running method
  • FIG. 4 is a schematic flowchart of a third embodiment of the application data running method
  • Fig. 5 is a structural block diagram of the first embodiment of the data running and batching device of this application.
  • FIG. 1 is a schematic diagram of a member host structure in a cluster of a hardware operating environment involved in a solution of an embodiment of the application.
  • the member hosts in the cluster may include a processor 1001, such as a central processing unit (Central Processing Unit). Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the wired interface of the user interface 1003 may be a USB interface in this application.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface).
  • the memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, can also be a stable memory (Non-volatile Memory, NVM), such as disk storage.
  • RAM Random Access Memory
  • NVM Non-volatile Memory
  • the memory 1005 may also be a storage device independent of the foregoing processor 1001.
  • FIG. 1 does not constitute a limitation on the member hosts in the cluster, and may include more or less components than shown in the figure, or combine some components, or different component arrangements. .
  • a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer readable instructions.
  • the network interface 1004 is mainly used to connect to a back-end server and to communicate data with the back-end server; the user interface 1003 is mainly used to connect to user equipment; the member hosts in the cluster pass through
  • the processor 1001 calls the computer-readable instructions stored in the memory 1005, and executes the data running method provided in the embodiment of the present application.
  • FIG. 2 is a schematic flow chart of the first embodiment of the data running method of this application, and the first embodiment of the data running method of this application is proposed.
  • the data batch method includes the following steps:
  • Step S10 Obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data.
  • the executive body of this embodiment is the member host in the cluster, where the member host in the cluster may be an electronic device such as a personal computer or a server.
  • the cluster includes multiple member hosts, and all member hosts have the same function, and can distribute tasks and perform tasks at the same time, so as to maximize the use of resources.
  • the execution subject of this embodiment is any member host in the cluster, and the batch running task is started periodically through a preset open source project.
  • the preset open source project is quartz.
  • the current task is obtained. Node, and randomly partition the batch of to-be-processed data corresponding to the current task node, usually divided into 62 partitions of az, AZ, and 0-9, that is, the preset number is 62.
  • the partition can be expanded and divided into a larger number of partitions.
  • the data partition does not depend on the database, and the number of subtasks can be expanded arbitrarily to increase the concurrent processing speed of data.
  • Step S20 Generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in the preset queue list in the form of a queue, and send task processing messages to Each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing.
  • a preset number of subtasks corresponding to the partition are generated.
  • the preset number is 62, that is, 62 subtasks, and the task labels are az, AZ, and 0- 9.
  • the number of subtasks can also be expanded, divided into a larger number of subtasks, and put these 62 subtasks in a queue to the preset queue list, and the preset queue list is a redis queue list.
  • the cluster monitors through broadcast consumption, so each member host in the cluster will receive the task processing message, and then obtain subtasks from the redis queue list, and call and obtain subtasks.
  • the processing class corresponding to the task performs processing, and the processing class is usually a subtask executing corresponding algorithms or rules, etc., and the subtask is executed according to the processing class to obtain result data. For example, if the subtask is calculating the commission for order A, and the type of order A is a consumer loan, then the processing type is the commission calculation formula for the consumer loan type order, and the A order is calculated according to the commission calculation formula for the consumer loan type order The commission is calculated, that is, the subtask is processed.
  • the method further includes:
  • Step S50 Obtain sample abnormal data and corresponding sample abnormal points.
  • Step S60 Training the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model.
  • Step S70 Obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model.
  • Step S80 locate an abnormal point according to the abnormal data, and process the abnormal point to successfully complete the subtask.
  • log records of processing errors or abnormal data when performing each subtask, log records of processing errors or abnormal data, and obtain a large number of sample abnormal data and corresponding sample abnormal points in advance, and learn the sample abnormal data and the corresponding abnormal points through the convolutional neural network model.
  • an abnormal recognition model is obtained, and the error or abnormal data in the log record is identified through the abnormal recognition model, so as to locate the abnormal point and perform abnormal processing to successfully complete the subtask.
  • the batch of to-be-processed data is randomly partitioned to obtain a preset number of partitioned data.
  • the data partitioning does not depend on the database, and the number of subtasks can be arbitrarily expanded.
  • FIG. 3 is a schematic flowchart of a second embodiment of the data running method of this application. Based on the first embodiment shown in FIG. 2 above, a second embodiment of the data running method of this application is proposed.
  • the method further includes:
  • Step S30 When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
  • redis+lua scripts are used to implement distributed locks for each member host to obtain subtasks from the preset queue list.
  • any member host B receives from the device
  • other member hosts are blocked from acquiring subtasks, that is, other member hosts are not allowed to obtain subtasks at this time, which greatly reduces the time brought by locking while achieving data synchronization Overhead. Wait for the member host B to obtain a subtask from the preset queue list, and subtract one from the preset number in the preset queue list to obtain the remaining number of subtasks, for example, the preset number is 62.
  • the member host B obtains one subtask from the preset queue list, and the number of remaining subtasks is 61. After the member host B obtains a subtask from the preset queue list, the function of obtaining the subtask is opened to other member hosts, and the other member hosts can obtain the subtask from the preset queue list for processing until All the subtasks in the preset queue list are acquired and executed.
  • the method further includes:
  • Step S40 When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, and the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored , Adopting a cluster consumption mode to send all subtask processing completed messages to the cluster through the message middleware.
  • the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks . If the number of remaining subtasks is zero, it means that the subtask obtained by the member host C from the preset queue list is the last subtask, and when the member host C finishes processing the last subtask ,
  • the group consumption mode can be adopted to send all subtask processing completed messages to the cluster through the message middleware.
  • the subtask in order to enable each host in the cluster to process more tasks continuously and efficiently, when the member host C in the cluster has processed the last subtask, the subtask can be sent through the message middleware All the tasks are processed to the cluster, so that each host in the cluster can continue to obtain the next task node to process the tasks of the next task node, and the consumption mode of all the subtasks to process the completed message adopts cluster consumption Therefore, only one host will listen to the message that all the subtasks have been processed.
  • the method further includes:
  • the member host C is the member host that processed the last subtask in the cluster, and the member host C sends a subtask complete processing completion message through the message middleware, and the subtask
  • the consumption mode of all processed messages is cluster consumption, so only one member host, such as member host D, listens to the message that all the subtasks are processed, and the member host D is any member host in the cluster.
  • the member host D listens to the message that the subtasks are all processed, it judges whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, it automatically enters the next task node for execution If all the subtasks are not processed successfully, the next task node will be suspended and the problem will be checked. If the problem has been checked and processed, the task processing mechanism will be awakened again, and the next task node will be acquired and the next task will be executed. Usually, the subtask processing data is monitored. If abnormal data is monitored, an alarm is issued and manual intervention is performed to troubleshoot errors.
  • the number of subtasks in the preset queue list is subtracted by one to obtain the number of remaining subtasks, when the remaining subtasks
  • the acquired subtask is determined to be the last subtask
  • the processing progress of the last subtask is monitored
  • the processing of the last subtask is monitored
  • the message is passed through the cluster consumption mode
  • the middleware sends a message that all subtasks have been processed to the cluster, and by counting the number of subtasks in the queue list, it can grasp the execution progress of the subtasks, and when all subtasks have been processed, it will send the tasks to be completed in time Message, so that the member hosts in the cluster can obtain the next task node in time to process the next task, so as to improve the efficiency of large-scale data processing.
  • FIG. 4 is a schematic flowchart of a third embodiment of the data running method of this application. Based on the second embodiment shown in FIG. 3, a third embodiment of the data running method of this application is proposed.
  • the method further includes:
  • Step S201 Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
  • the cluster includes multiple member hosts, and all member hosts have the same function. They can distribute tasks and perform tasks at the same time, so as to maximize the use of resources.
  • Each member host in the cluster passes Broadcast consumption mode for monitoring, so all hosts in the cluster will receive the task processing message, and then get the subtask from the redis queue list, and call the corresponding processing class for processing, and do for processing errors or abnormal data Logging.
  • step S201 includes:
  • all the host distribution tasks in the cluster can also execute tasks at the same time.
  • redis+lua scripts are used for each member host to obtain subtasks from the preset queue list.
  • a distributed lock is implemented.
  • the member hosts in the cluster except for obtaining subtasks If other member hosts perform the function of open access to subtasks, other member hosts can obtain subtasks from the preset queue list for processing until all the subtasks in the preset queue list are acquired for execution.
  • the redis+lua script is used to implement distributed locks, which greatly reduces the time overhead caused by locking while achieving data synchronization.
  • obtaining a subtask from the preset queue list for processing includes:
  • the CPU occupancy rate is calculated, multiple subtasks are obtained from the preset queue list according to the CPU occupancy rate, and multiple threads are used to concurrently process multiple subtasks.
  • each member host in the cluster can process subtasks of different partitions at the same time, and each member host adopts multi-threaded concurrent processing tasks, and the processing speed is fast.
  • Concurrent processing tasks are determined according to the CPU occupancy rate. Usually 5 threads are processed concurrently at the same time to improve the processing speed of batch subtasks.
  • Each host in the cluster can calculate the respective CPU occupancy rate, determine the number of sub-tasks obtained according to the CPU occupancy rate, and obtain multiple sub-tasks without affecting the processing speed, so that multiple threads can concurrently process the acquisition The multiple subtasks of, improve the processing efficiency of subtasks in the preset queue list.
  • the monitoring is performed through broadcast consumption.
  • subtasks are obtained from the preset queue list for processing, and all member hosts in the cluster will receive the task processing Messages, each member host in the cluster can distribute tasks at the same time, so as to maximize the use of resources and improve the efficiency of subtask processing.
  • the embodiment of the present application also proposes a storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.
  • Computer-readable instructions are stored on the storage medium, and the computer-readable instructions implement the steps of the data batching method as described above when the computer-readable instructions are executed by the processor.
  • an embodiment of the present application also proposes a data batching device, and the data batching device includes:
  • the random partition module 10 is used to obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data.
  • the generating module 20 is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a preset queue list in the form of a queue, and send the tasks through the message middleware Process the message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing.
  • the obtaining module 30 is used to obtain sample abnormal data and corresponding sample abnormal points.
  • the training module 40 is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model.
  • the identification module 50 is configured to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal identification model.
  • the positioning module 60 is configured to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.
  • the data batching device further includes:
  • the calculation module is configured to reduce the number of subtasks in the preset queue list by one when obtaining a subtask from the preset queue list to obtain the number of remaining subtasks.
  • the data batching device further includes:
  • the sending module is used to determine that the acquired subtask is the last subtask when the number of remaining subtasks is zero, monitor the processing progress of the last subtask, and monitor the processing of the last subtask.
  • the cluster consumption mode is adopted to send all the subtask processing completed messages to the cluster through the message middleware.
  • the acquisition module 30 is also used to determine whether all the subtasks have been processed successfully when all the subtasks have been processed successfully. If all the subtasks have been processed successfully, then obtain the next one. Task node and execute the next task.
  • the acquisition module 30 is further configured to monitor in a broadcast consumption manner, and when the task processing message is monitored, obtain subtasks from the preset queue list for processing.
  • the acquisition module 30 is further configured to monitor through broadcast consumption; when the task processing message is monitored, obtain subtasks from the preset queue list for processing; In addition to the member host for obtaining the subtask, other member hosts are blocked for obtaining the subtask function.
  • the acquisition module 30 is further configured to calculate the CPU occupancy rate when the task processing message is monitored, and acquire multiple subtasks from the preset queue list according to the CPU occupancy rate, and Use multithreading to process multiple subtasks concurrently.
  • Access Memory RAM
  • magnetic disks optical disks
  • RAM Random Access Memory
  • optical disks include several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present invention.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Abstract

A batch data execution method, a device, a storage medium, and a member host in a cluster. The method comprises: acquiring batch data to be processed corresponding to a current task node, performing random partitioning on the batch data, and obtaining a pre-determined number of data partitions (S10); generating sub-tasks corresponding to the partitions according to the pre-determined number of data partitions, storing the pre-determined number of the sub-tasks in a pre-determined queue list in the form of a list, and sending, via a message middleware, a task processing message to each member host in a cluster, such that the member host in the cluster acquires, upon detecting the task processing message, a sub-task from the pre-determined queue list to process (S20); acquiring sample outlier data and corresponding sample outlier points (S50); training a convolutional neural network model according to the sample outlier data and the corresponding sample outlier points, and obtaining an outlier identification model (S60); acquiring a log record for execution of each sub-task, and identifying outlier data in the log record by means of the outlier identification model (S70); and locating an outlier point according to the outlier data, and processing the outlier point to successfully complete sub-task processing (S80). The invention is based on big data and achieves batch data processing by means of a cluster. Each member host in the cluster is capable of both distributing a task and executing a task, thereby achieving a maximum resource utilization rate. In addition, data partitioning is not dependent on a database, thereby enabling unrestricted expansion of the number of sub-tasks, and increasing a concurrent data processing speed.

Description

数据跑批方法、装置、存储介质及集群中的成员主机 Data running batch method, device, storage medium and member host in cluster To
本申请要求于2019年6月20日提交中国专利局、申请号为201910553729.4、发明名称为“数据跑批方法、装置、存储介质及集群中的成员主机”的中国专利申请的优先权,其全部内容通过引用结合在申请中This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 20, 2019, the application number is 201910553729.4, and the invention title is "data running batch method, device, storage medium and member host in a cluster", all of which The content is incorporated in the application by reference
技术领域Technical field
本申请涉及大数据的技术领域,尤其涉及一种数据跑批方法、装置、存储介质及集群中的成员主机。This application relates to the technical field of big data, and in particular to a data batch method, device, storage medium, and member hosts in a cluster.
背景技术Background technique
目前,涉及大批量数据处理时,跑批应用单独部署,在不跑批时,跑批应用的闲置导致资源浪费;大批量任务执行时采用单线程执行,导致任务执行速度较慢;大批量任务的数据分区依赖于Oralce的哈希(hash)分区,Oralce中的一张表数据量达到亿数量级后或是单表达到2G大小,查询效率会明显下降,需要通过分区的方式,从行的维度对表进行划分,避免单表数据量过大,对于数据规律性不强,或者取值范围难以确定的,通过哈希法强行进行分区,分区个数需设置成2的幂次,扩展性不强。因此,如何提高大批量数据处理的效率及实现资源的最大化使用是亟待解决的技术问题。At present, when large-scale data processing is involved, the running batch application is deployed separately. When the batch is not running, the idle running application causes a waste of resources; single-threaded execution is used when large-scale tasks are executed, which results in slower task execution speed; large-scale tasks The data partition of Oracle relies on Oralce's hash partition. After the data volume of a table in Oralce reaches the order of 100 million or the single expression reaches 2G, the query efficiency will be significantly reduced. It needs to be partitioned from the row dimension Divide the table to avoid too much data in a single table. For data that is not regular or the range of values is difficult to determine, partitioning is forced through the hash method. The number of partitions needs to be set to a power of 2, and the scalability is not Strong. Therefore, how to improve the efficiency of large-scale data processing and maximize the use of resources is a technical problem to be solved urgently.
上述内容仅用于辅助理解本申请的技术方案,并不代表承认上述内容是现有技术。The above content is only used to assist the understanding of the technical solution of this application, and does not mean that the above content is recognized as prior art.
发明内容Summary of the invention
本申请的主要目的在于提供一种数据跑批方法、装置、存储介质及集群中的成员主机,旨在解决现有技术中大批量数据处理效率低及资源使用率不高的技术问题。The main purpose of this application is to provide a data running batch method, device, storage medium, and member hosts in a cluster, which aims to solve the technical problems of low efficiency of large-batch data processing and low resource utilization in the prior art.
为实现上述目的,本申请提供一种数据跑批方法,所述数据跑批方法包括以下步骤:In order to achieve the above objective, this application provides a data running batch method, which includes the following steps:
获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据;Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;
根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;
获取样本异常数据及对应的样本异常点;根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型;获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据;根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。Obtain sample abnormal data and corresponding sample abnormal points; train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model; obtain log records of executing each subtask, and pass the abnormality The recognition model recognizes the abnormal data in the log record; locates the abnormal point according to the abnormal data, and processes the abnormal point to successfully complete the subtask.
优选地,所述根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理之后,所述数据跑批方法还包括:Preferably, the subtasks corresponding to the partitions are generated according to the preset number of partition data, the preset number of subtasks are stored in a preset queue list in the form of a queue, and the task processing is sent through a message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, after obtaining subtasks from the preset queue list for processing, the data running method returns include:
当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数。When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
优选地,所述当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数之后,所述数据跑批方法还包括:Preferably, when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is subtracted by one, and after obtaining the number of remaining subtasks, the data running method returns include:
当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, the message that all subtasks are processed is sent to the cluster through the message middleware.
优选地,所述当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群之后,所述数据跑批方法还包括:Preferably, when the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and the processing of the last subtask is monitored. At the end, after the cluster consumption mode is adopted to send the subtask fully processed message to the cluster through the message middleware, the data batching method further includes:
监听到所述子任务全部处理完成消息时,判断所有子任务是否已经全部处理成功,若所有子任务已经全部处理成功,则获取下一个任务节点并执行下一个任务。When the message that all the subtasks are processed is completed, it is judged whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, the next task node is acquired and the next task is executed.
优选地,所述根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理之后,所述数据跑批方法还包括:Preferably, the subtasks corresponding to the partitions are generated according to the preset number of partition data, the preset number of subtasks are stored in a preset queue list in the form of a queue, and the task processing is sent through a message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, after obtaining subtasks from the preset queue list for processing, the data running method returns include:
通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
优选地,所述通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,包括:Preferably, the monitoring by means of broadcast consumption, and when the task processing message is monitored, obtaining a subtask from the preset queue list for processing includes:
通过广播消费方式进行监听;Monitor through broadcast consumption;
在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;When the task processing message is monitored, subtasks are obtained from the preset queue list for processing;
对所述集群中的除了获取子任务的成员主机之外的其他成员主机进行获取子任务功能封锁。Block the function of obtaining subtasks on other member hosts in the cluster except the member host for obtaining subtasks.
优选地,所述在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,包括:Preferably, when the task processing message is monitored, obtaining a subtask from the preset queue list for processing includes:
在监听到所述任务处理消息时,计算CPU占用率,根据所述CPU占用率从所述预设队列列表中获取多个子任务,并采用多线程并发处理多个子任务。When the task processing message is monitored, the CPU occupancy rate is calculated, multiple subtasks are obtained from the preset queue list according to the CPU occupancy rate, and multiple threads are used to concurrently process multiple subtasks.
此外,为实现上述目的,本申请还提出一种集群中的成员主机,所述集群中的成员主机包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令配置为实现如上文所述的数据跑批方法的步骤。In addition, in order to achieve the above object, this application also proposes a member host in a cluster. The member host in the cluster includes a memory, a processor, and a computer that is stored on the memory and can run on the processor. Read instructions, the computer-readable instructions are configured to implement the steps of the data running method as described above.
此外,为实现上述目的,本申请还提出一种存储介质,所述存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如上文所述的数据跑批方法的步骤。In addition, in order to achieve the above-mentioned object, this application also proposes a storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the method for data running as described above is implemented. step.
此外,为实现上述目的,本申请还提出一种数据跑批装置,所述数据跑批装置包括:In addition, in order to achieve the above objective, this application also proposes a data batch running device, which includes:
随机分区模块,用于获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据;The random partition module is used to obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data;
生成模块,用于根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;The generating module is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in a preset queue list, and send the task processing through the message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;
获取模块,用于获取样本异常数据及对应的样本异常点;The acquisition module is used to acquire sample abnormal data and corresponding sample abnormalities;
训练模块,用于根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型;The training module is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;
识别模块,用于获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据;The identification module is used to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;
定位模块,用于根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。The positioning module is used to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.
本申请中,通过获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据,数据分区不依赖于数据库,可以任意扩展子任务数,加大数据的并发处理速度;根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,基于大数据,通过集群处理大批量数据,分发任务同时也可以执行任务,实现资源的最大化使用。In this application, by obtaining the batch of to-be-processed data corresponding to the current task node, the batch of to-be-processed data is randomly partitioned to obtain a preset number of partitioned data. The data partitioning does not depend on the database, and the number of sub-tasks can be arbitrarily expanded. Concurrent processing speed of big data; generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in the preset queue list, and send them through the message middleware The task processing message is sent to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains subtasks from the preset queue list for processing. Based on big data, through the cluster Process large quantities of data, distribute tasks and perform tasks at the same time to maximize the use of resources.
附图说明Description of the drawings
图1是本申请实施例方案涉及的硬件运行环境的集群中的成员主机的结构示意图;FIG. 1 is a schematic structural diagram of member hosts in a cluster of hardware operating environments involved in a solution of an embodiment of the present application;
图2为本申请数据跑批方法第一实施例的流程示意图;FIG. 2 is a schematic flowchart of the first embodiment of the application data running method;
图3为本申请数据跑批方法第二实施例的流程示意图;3 is a schematic flowchart of a second embodiment of the application data running method;
图4为本申请数据跑批方法第三实施例的流程示意图;4 is a schematic flowchart of a third embodiment of the application data running method;
图5为本申请数据跑批装置第一实施例的结构框图。Fig. 5 is a structural block diagram of the first embodiment of the data running and batching device of this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.
参照图1,图1为本申请实施例方案涉及的硬件运行环境的集群中的成员主机结构示意图。Referring to FIG. 1, FIG. 1 is a schematic diagram of a member host structure in a cluster of a hardware operating environment involved in a solution of an embodiment of the application.
如图1所示,该集群中的成员主机可以包括:处理器1001,例如中央处理器(Central Processing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display),可选用户接口1003还可以包括标准的有线接口、无线接口,对于用户接口1003的有线接口在本申请中可为USB接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(WIreless-FIdelity,WI-FI)接口)。存储器1005可以是高速的随机存取存储器(Random Access Memory,RAM)存储器,也可以是稳定的存储器(Non-volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in Figure 1, the member hosts in the cluster may include a processor 1001, such as a central processing unit (Central Processing Unit). Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The wired interface of the user interface 1003 may be a USB interface in this application. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, can also be a stable memory (Non-volatile Memory, NVM), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001.
本领域技术人员可以理解,图1中示出的结构并不构成对集群中的成员主机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the member hosts in the cluster, and may include more or less components than shown in the figure, or combine some components, or different component arrangements. .
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及计算机可读指令。As shown in FIG. 1, a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer readable instructions.
在图1所示的集群中的成员主机中,网络接口1004主要用于连接后台服务器,与所述后台服务器进行数据通信;用户接口1003主要用于连接用户设备;所述集群中的成员主机通过处理器1001调用存储器1005中存储的计算机可读指令,并执行本申请实施例提供的数据跑批方法。In the member hosts in the cluster shown in FIG. 1, the network interface 1004 is mainly used to connect to a back-end server and to communicate data with the back-end server; the user interface 1003 is mainly used to connect to user equipment; the member hosts in the cluster pass through The processor 1001 calls the computer-readable instructions stored in the memory 1005, and executes the data running method provided in the embodiment of the present application.
基于上述硬件结构,提出本申请数据跑批方法的实施例。Based on the above hardware structure, an embodiment of the data running batch method of this application is proposed.
参照图2,图2为本申请数据跑批方法第一实施例的流程示意图,提出本申请数据跑批方法第一实施例。Referring to Fig. 2, Fig. 2 is a schematic flow chart of the first embodiment of the data running method of this application, and the first embodiment of the data running method of this application is proposed.
在第一实施例中,所述数据跑批方法包括以下步骤:In the first embodiment, the data batch method includes the following steps:
步骤S10:获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据。Step S10: Obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data.
应理解的是,本实施例的执行主体是所述集群中的成员主机,其中,所述集群中的成员主机可为个人电脑或服务器等电子设备。所述集群中包括多台成员主机,所有的成员主机都具有一样的功能,可以分发任务同时也可以执行任务,实现资源的最大化使用。本实施例的执行主体为所述集群中的任意一台成员主机,通过预设开源项目定时启动跑批任务,所述预设开源项目为quartz,跑批任务启动后,获取到所述当前任务节点,并将所述当前任务节点对应的所述批量待处理数据随机分区,通常分为a-z、A-Z以及0-9一共62个分区的所述分区数据,即所述预设数量为62,也可对分区进行拓展,分为更多数量的分区,数据分区不依赖于数据库,并且可以任意扩展子任务数,加大数据的并发处理速度。It should be understood that the executive body of this embodiment is the member host in the cluster, where the member host in the cluster may be an electronic device such as a personal computer or a server. The cluster includes multiple member hosts, and all member hosts have the same function, and can distribute tasks and perform tasks at the same time, so as to maximize the use of resources. The execution subject of this embodiment is any member host in the cluster, and the batch running task is started periodically through a preset open source project. The preset open source project is quartz. After the batch running task is started, the current task is obtained. Node, and randomly partition the batch of to-be-processed data corresponding to the current task node, usually divided into 62 partitions of az, AZ, and 0-9, that is, the preset number is 62. The partition can be expanded and divided into a larger number of partitions. The data partition does not depend on the database, and the number of subtasks can be expanded arbitrarily to increase the concurrent processing speed of data.
步骤S20:根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。Step S20: Generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in the preset queue list in the form of a queue, and send task processing messages to Each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing.
可理解的是,所述批量待处理数据分区完成后,生成对应分区的预设数量的子任务,通常所述预设数量为62,也即62个子任务,任务标号为a-z、A-Z以及0-9,也可对子任务数进行拓展,分为更多数量的子任务,并以队列的方式将这62个子任务放到所述预设队列列表,所述预设队列列表为redis队列列表,通过所述消息中间件发送所述任务处理消息,告知所述集群中的各主机有任务待处理。It is understandable that after the batch of data to be processed is partitioned, a preset number of subtasks corresponding to the partition are generated. Usually, the preset number is 62, that is, 62 subtasks, and the task labels are az, AZ, and 0- 9. The number of subtasks can also be expanded, divided into a larger number of subtasks, and put these 62 subtasks in a queue to the preset queue list, and the preset queue list is a redis queue list. Send the task processing message through the message middleware to notify each host in the cluster that there is a task to be processed.
需要说明的是,所述集群通过广播消费方式监听,所以所述集群中的各成员主机都会收到所述任务处理消息,然后从所述redis队列列表里面获取子任务,并调用与获取的子任务对应的处理类进行处理,所述处理类通常为子任务执行对应的算法或规则等,根据所述处理类执行所述子任务,获得结果数据。比如,所述子任务为计算A订单的佣金,A订单的类型为消费贷,则所述处理类为消费贷类型订单的佣金计算公式,根据所述消费贷类型订单的佣金计算公式计算A订单的佣金,计算完成,即该子任务处理完成。It should be noted that the cluster monitors through broadcast consumption, so each member host in the cluster will receive the task processing message, and then obtain subtasks from the redis queue list, and call and obtain subtasks. The processing class corresponding to the task performs processing, and the processing class is usually a subtask executing corresponding algorithms or rules, etc., and the subtask is executed according to the processing class to obtain result data. For example, if the subtask is calculating the commission for order A, and the type of order A is a consumer loan, then the processing type is the commission calculation formula for the consumer loan type order, and the A order is calculated according to the commission calculation formula for the consumer loan type order The commission is calculated, that is, the subtask is processed.
进一步地,在本实施例中,所述步骤S20之后,还包括:Further, in this embodiment, after the step S20, the method further includes:
步骤S50:获取样本异常数据及对应的样本异常点。Step S50: Obtain sample abnormal data and corresponding sample abnormal points.
步骤S60:根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型。Step S60: Training the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model.
步骤S70:获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据。Step S70: Obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model.
步骤S80:根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。Step S80: locate an abnormal point according to the abnormal data, and process the abnormal point to successfully complete the subtask.
在具体实现中,执行各子任务时,对于处理错误或异常的数据进行日志记录,可预先获取大量的样本异常数据及对应的样本异常点,通过卷积神经网络模型学习所述样本异常数据及对应的样本异常点,获得异常识别模型,通过所述异常识别模型识别日志记录中的错误或异常数据,从而定位出异常点,进行异常处理,以成功处理完成子任务。In the specific implementation, when performing each subtask, log records of processing errors or abnormal data, and obtain a large number of sample abnormal data and corresponding sample abnormal points in advance, and learn the sample abnormal data and the corresponding abnormal points through the convolutional neural network model. Corresponding to the abnormal point of the sample, an abnormal recognition model is obtained, and the error or abnormal data in the log record is identified through the abnormal recognition model, so as to locate the abnormal point and perform abnormal processing to successfully complete the subtask.
本实施例中,通过获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据,数据分区不依赖于数据库,可以任意扩展子任务数,加大数据的并发处理速度;根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,基于大数据,通过集群处理大批量数据,分发任务同时也可以执行任务,实现资源的最大化使用。In this embodiment, by obtaining the batch of to-be-processed data corresponding to the current task node, the batch of to-be-processed data is randomly partitioned to obtain a preset number of partitioned data. The data partitioning does not depend on the database, and the number of subtasks can be arbitrarily expanded. Increase the concurrent processing speed of data; generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in a preset queue list, and pass the message middleware Send a task processing message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains subtasks from the preset queue list for processing, based on big data, through The cluster processes large quantities of data, and distributes tasks while also performing tasks to maximize the use of resources.
参照图3,图3为本申请数据跑批方法第二实施例的流程示意图,基于上述图2所示的第一实施例,提出本申请数据跑批方法的第二实施例。Referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of the data running method of this application. Based on the first embodiment shown in FIG. 2 above, a second embodiment of the data running method of this application is proposed.
在第二实施例中,所述步骤S20之后,还包括:In the second embodiment, after the step S20, the method further includes:
步骤S30:当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数。Step S30: When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
应理解的是,为了避免子任务的重复获取或者执行,对于各成员主机从所述预设队列列表中获取子任务采用redis+lua脚本实现分布式锁,当任意一个成员主机B从所述设队列列表中获取子任务时,对其他成员主机进行获取子任务功能封锁,即不允许其他成员主机在此时进行获取子任务的操作,在达到数据同步的同时大大减少加锁所带来的时间开销。待所述成员主机B从所述预设队列列表中获取一个子任务,对所述预设队列列表中的所述预设数量进行减一,获得剩余子任务数,比如所述预设数量为62,所述成员主机B从所述预设队列列表中获取一个子任务,则所述剩余子任务数为61。所述成员主机B从所述预设队列列表中获取一个子任务之后,对其他成员主机开放获取子任务的功能,则其他成员主机可从所述预设队列列表中获取子任务进行处理,直至所述预设队列列表中的子任务全部被获取执行。It should be understood that, in order to avoid repeated acquisition or execution of subtasks, redis+lua scripts are used to implement distributed locks for each member host to obtain subtasks from the preset queue list. When any member host B receives from the device When subtasks are acquired in the queue list, other member hosts are blocked from acquiring subtasks, that is, other member hosts are not allowed to obtain subtasks at this time, which greatly reduces the time brought by locking while achieving data synchronization Overhead. Wait for the member host B to obtain a subtask from the preset queue list, and subtract one from the preset number in the preset queue list to obtain the remaining number of subtasks, for example, the preset number is 62. The member host B obtains one subtask from the preset queue list, and the number of remaining subtasks is 61. After the member host B obtains a subtask from the preset queue list, the function of obtaining the subtask is opened to other member hosts, and the other member hosts can obtain the subtask from the preset queue list for processing until All the subtasks in the preset queue list are acquired and executed.
在第二实施例中,所述步骤S30之后,还包括:In the second embodiment, after the step S30, the method further includes:
步骤S40:当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。Step S40: When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, and the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored , Adopting a cluster consumption mode to send all subtask processing completed messages to the cluster through the message middleware.
需要说明的是,在所述集群中的任意一个成员主机C从所述预设队列列表中获取子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数,若所述剩余子任务数为零,则说明该成员主机C从所述预设队列列表中获取的该子任务为最后一条子任务,则在该成员主机C处理完该最后一条子任务时,可采用群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。It should be noted that when any member host C in the cluster obtains subtasks from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks , If the number of remaining subtasks is zero, it means that the subtask obtained by the member host C from the preset queue list is the last subtask, and when the member host C finishes processing the last subtask , The group consumption mode can be adopted to send all subtask processing completed messages to the cluster through the message middleware.
在具体实现中,为了使得所述集群中各主机能够连续高效处理更多任务,在所述集群中的该成员主机C处理完所述最后一条子任务时,可通过所述消息中间件发送子任务全部处理完成消息至所述集群,以使所述集群中的各主机可继续获取下一任务节点对下一任务节点的任务进行处理,所述子任务全部处理完成消息的消费方式采用集群消费方式,所以只会有一台主机监听到所述子任务全部处理完成消息。In a specific implementation, in order to enable each host in the cluster to process more tasks continuously and efficiently, when the member host C in the cluster has processed the last subtask, the subtask can be sent through the message middleware All the tasks are processed to the cluster, so that each host in the cluster can continue to obtain the next task node to process the tasks of the next task node, and the consumption mode of all the subtasks to process the completed message adopts cluster consumption Therefore, only one host will listen to the message that all the subtasks have been processed.
在第二实施例中,所述步骤S40之后,还包括:In the second embodiment, after the step S40, the method further includes:
监听到所述子任务全部处理完成消息时,判断所有子任务是否已经全部处理成功,若所有子任务已经全部处理成功,则获取下一个任务节点并执行下一个任务。When the message that all the subtasks are processed is completed, it is judged whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, the next task node is acquired and the next task is executed.
可理解的是,所述成员主机C为所述集群中处理最后一条子任务的成员主机,所述所述成员主机C通过所述消息中间件发送一条子任务全部处理完成消息,所述子任务全部处理完成消息的消费方式采用集群消费,所以只会有一台成员主机,比如成员主机D,监听到所述子任务全部处理完成消息,所述成员主机D为所述集群中任意一个成员主机。It is understandable that the member host C is the member host that processed the last subtask in the cluster, and the member host C sends a subtask complete processing completion message through the message middleware, and the subtask The consumption mode of all processed messages is cluster consumption, so only one member host, such as member host D, listens to the message that all the subtasks are processed, and the member host D is any member host in the cluster.
在具体实现中,所述成员主机D监听到所述子任务全部处理完成消息时,判断各子任务是否已全部处理成功,如果所有子任务均已经全部处理成功,则自动进入下一个任务节点执行,如果所有子任务并未全部处理成功,则暂停获取下一个任务节点,等待问题排查,若问题排查并处理完成,则再度唤醒任务处理机制,继续获取下一个任务节点并执行下一个任务。通常会对子任务处理数据进行监控,若监控到异常数据之后进行告警,人工介入排查错误。In a specific implementation, when the member host D listens to the message that the subtasks are all processed, it judges whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, it automatically enters the next task node for execution If all the subtasks are not processed successfully, the next task node will be suspended and the problem will be checked. If the problem has been checked and processed, the task processing mechanism will be awakened again, and the next task node will be acquired and the next task will be executed. Usually, the subtask processing data is monitored. If abnormal data is monitored, an alarm is issued and manual intervention is performed to troubleshoot errors.
在第二实施例中,当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数,当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群,通过对队列列表中的子任务数量进行计数,从而掌握子任务的执行进度,在所有子任务都已处理完成时,及时发送任务全部处理完成消息,以使集群中的成员主机及时获取下一个任务节点进行下一个任务的处理,实现大批量数据处理效率的提升。In the second embodiment, when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is subtracted by one to obtain the number of remaining subtasks, when the remaining subtasks When the number is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the message is passed through the cluster consumption mode The middleware sends a message that all subtasks have been processed to the cluster, and by counting the number of subtasks in the queue list, it can grasp the execution progress of the subtasks, and when all subtasks have been processed, it will send the tasks to be completed in time Message, so that the member hosts in the cluster can obtain the next task node in time to process the next task, so as to improve the efficiency of large-scale data processing.
参照图4,图4为本申请数据跑批方法第三实施例的流程示意图,基于上述图3所示的第二实施例,提出本申请数据跑批方法的第三实施例。Referring to FIG. 4, FIG. 4 is a schematic flowchart of a third embodiment of the data running method of this application. Based on the second embodiment shown in FIG. 3, a third embodiment of the data running method of this application is proposed.
在第三实施例中,所述步骤S20之后,还包括:In the third embodiment, after the step S20, the method further includes:
步骤S201:通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。Step S201: Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
应理解的是,所述集群中包括多台成员主机,所有的成员主机都具有一样的功能,可以分发任务同时也可以执行任务,实现资源的最大化使用,所述集群中的各成员主机通过广播消费方式进行监听,所以所述集群中所有的主机都会收到所述任务处理消息,然后从redis队列列表里面获取子任务,并调用相应的处理类进行处理,对于处理错误或异常的数据做日志记录。It should be understood that the cluster includes multiple member hosts, and all member hosts have the same function. They can distribute tasks and perform tasks at the same time, so as to maximize the use of resources. Each member host in the cluster passes Broadcast consumption mode for monitoring, so all hosts in the cluster will receive the task processing message, and then get the subtask from the redis queue list, and call the corresponding processing class for processing, and do for processing errors or abnormal data Logging.
进一步地,所述步骤S201,包括:Further, the step S201 includes:
通过广播消费方式进行监听;Monitor through broadcast consumption;
在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;When the task processing message is monitored, subtasks are obtained from the preset queue list for processing;
对所述集群中的除了获取子任务的成员主机之外的其他成员主机进行获取子任务功能封锁。Block the function of obtaining subtasks on other member hosts in the cluster except the member host for obtaining subtasks.
需要说明的是,所述集群中所有的主机分发任务同时也可以执行任务,为了避免子任务的重复获取或者执行,对于各成员主机从所述预设队列列表中获取子任务采用redis+lua脚本实现分布式锁,当任意一个成员主机B从所述设队列列表中获取子任务时,对所述集群中的除了获取子任务的成员主机之外的其他成员主机进行获取子任务功能封锁,即不允许其他成员主机在此时进行获取子任务的操作,所述成员主机B从所述预设队列列表中获取一个或多个子任务之后,对所述集群中的除了获取子任务的成员主机之外的其他成员主机进行开放获取子任务的功能,则其他成员主机可从所述预设队列列表中获取子任务进行处理,直至所述预设队列列表中的子任务全部被获取执行。采用redis+lua脚本实现分布式锁,在达到数据同步的同时大大减少加锁所带来的时间开销。It should be noted that all the host distribution tasks in the cluster can also execute tasks at the same time. To avoid repeated acquisition or execution of subtasks, redis+lua scripts are used for each member host to obtain subtasks from the preset queue list. A distributed lock is implemented. When any member host B obtains a subtask from the set queue list, the other member hosts in the cluster except the member host that obtains the subtask are blocked, namely Other member hosts are not allowed to perform the operation of obtaining subtasks at this time. After the member host B obtains one or more subtasks from the preset queue list, the member hosts in the cluster except for obtaining subtasks If other member hosts perform the function of open access to subtasks, other member hosts can obtain subtasks from the preset queue list for processing until all the subtasks in the preset queue list are acquired for execution. The redis+lua script is used to implement distributed locks, which greatly reduces the time overhead caused by locking while achieving data synchronization.
在本实施例中,所述在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,包括:In this embodiment, when the task processing message is monitored, obtaining a subtask from the preset queue list for processing includes:
在监听到所述任务处理消息时,计算CPU占用率,根据所述CPU占用率从所述预设队列列表中获取多个子任务,并采用多线程并发处理多个子任务。When the task processing message is monitored, the CPU occupancy rate is calculated, multiple subtasks are obtained from the preset queue list according to the CPU occupancy rate, and multiple threads are used to concurrently process multiple subtasks.
可理解的是,所述集群中的各成员主机可以同时处理不同分区的子任务,且每台成员主机采用多线程并发处理任务,处理速度快。并发处理任务根据CPU的占用率确定,通常5个线程同时并发处理,提高批量子任务处理速度。所述集群中的各主机可通过计算各自的CPU占用率,根据所述CPU占用率判断获取子任务的数量,在不影响处理速度的前提下,获取多个子任务,从而多个线程并发处理获取的多个子任务,提升所述预设队列列表中子任务处理效率。It is understandable that each member host in the cluster can process subtasks of different partitions at the same time, and each member host adopts multi-threaded concurrent processing tasks, and the processing speed is fast. Concurrent processing tasks are determined according to the CPU occupancy rate. Usually 5 threads are processed concurrently at the same time to improve the processing speed of batch subtasks. Each host in the cluster can calculate the respective CPU occupancy rate, determine the number of sub-tasks obtained according to the CPU occupancy rate, and obtain multiple sub-tasks without affecting the processing speed, so that multiple threads can concurrently process the acquisition The multiple subtasks of, improve the processing efficiency of subtasks in the preset queue list.
在本实施例中,通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,集群中所有的成员主机都会收到所述任务处理消息,集群中各成员主机分发任务同时也可以执行任务,实现资源的最大化使用,也提高子任务处理效率。In this embodiment, the monitoring is performed through broadcast consumption. When the task processing message is monitored, subtasks are obtained from the preset queue list for processing, and all member hosts in the cluster will receive the task processing Messages, each member host in the cluster can distribute tasks at the same time, so as to maximize the use of resources and improve the efficiency of subtask processing.
此外,本申请实施例还提出一种存储介质,所述计算机可读存储介质可以为非易失性可读存储介质。所述存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如上文所述的数据跑批方法的步骤。In addition, the embodiment of the present application also proposes a storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium. Computer-readable instructions are stored on the storage medium, and the computer-readable instructions implement the steps of the data batching method as described above when the computer-readable instructions are executed by the processor.
其中,该计算机可读指令被执行时所实现的方法可参照本申请数据跑批方法的各个实施例,此处不再赘述。For the method implemented when the computer-readable instruction is executed, please refer to the various embodiments of the data running method of this application, which will not be repeated here.
此外,参照图5,本申请实施例还提出一种数据跑批装置,所述数据跑批装置包括:In addition, referring to FIG. 5, an embodiment of the present application also proposes a data batching device, and the data batching device includes:
随机分区模块10,用于获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据。The random partition module 10 is used to obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data.
生成模块20,用于根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。The generating module 20 is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a preset queue list in the form of a queue, and send the tasks through the message middleware Process the message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing.
获取模块30,用于获取样本异常数据及对应的样本异常点。The obtaining module 30 is used to obtain sample abnormal data and corresponding sample abnormal points.
训练模块40,用于根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型。The training module 40 is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model.
识别模块50,用于获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据。The identification module 50 is configured to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal identification model.
定位模块60,用于根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。The positioning module 60 is configured to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.
在一实施例中,所述数据跑批装置还包括:In an embodiment, the data batching device further includes:
计算模块,用于当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数。The calculation module is configured to reduce the number of subtasks in the preset queue list by one when obtaining a subtask from the preset queue list to obtain the number of remaining subtasks.
在一实施例中,所述数据跑批装置还包括:In an embodiment, the data batching device further includes:
发送模块,用于当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。The sending module is used to determine that the acquired subtask is the last subtask when the number of remaining subtasks is zero, monitor the processing progress of the last subtask, and monitor the processing of the last subtask. At the end, the cluster consumption mode is adopted to send all the subtask processing completed messages to the cluster through the message middleware.
在一实施例中,所述获取模块30,还用于监听到所述子任务全部处理完成消息时,判断所有子任务是否已经全部处理成功,若所有子任务已经全部处理成功,则获取下一个任务节点并执行下一个任务。In one embodiment, the acquisition module 30 is also used to determine whether all the subtasks have been processed successfully when all the subtasks have been processed successfully. If all the subtasks have been processed successfully, then obtain the next one. Task node and execute the next task.
在一实施例中,所述获取模块30,还用于通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。In an embodiment, the acquisition module 30 is further configured to monitor in a broadcast consumption manner, and when the task processing message is monitored, obtain subtasks from the preset queue list for processing.
在一实施例中,所述获取模块30,还用于通过广播消费方式进行监听;在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;对所述集群中的除了获取子任务的成员主机之外的其他成员主机进行获取子任务功能封锁。In an embodiment, the acquisition module 30 is further configured to monitor through broadcast consumption; when the task processing message is monitored, obtain subtasks from the preset queue list for processing; In addition to the member host for obtaining the subtask, other member hosts are blocked for obtaining the subtask function.
在一实施例中,所述获取模块30,还用于在监听到所述任务处理消息时,计算CPU占用率,根据所述CPU占用率从所述预设队列列表中获取多个子任务,并采用多线程并发处理多个子任务。In one embodiment, the acquisition module 30 is further configured to calculate the CPU occupancy rate when the task processing message is monitored, and acquire multiple subtasks from the preset queue list according to the CPU occupancy rate, and Use multithreading to process multiple subtasks concurrently.
本申请所述数据跑批装置的其他实施例或具体实现方式可参照上述各方法实施例,此处不再赘述。For other embodiments or specific implementation manners of the data running batch device described in this application, reference may be made to the foregoing method embodiments, which will not be repeated here.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述 实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通 过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体 现出来,该计算机软件产品存储在一个存储介质(如只读存储器镜像(Read Only Memory image,ROM)/随机存取存储器(Random Access Memory,RAM)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand the above The method of the embodiment can be realized by means of software plus the necessary general hardware platform, of course, it can also be realized by Over hardware, but in many cases the former is a better implementation. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product. Now, the computer software product is stored in a storage medium (such as Read Only Memory image (ROM)/Random Access Memory). Access Memory, RAM), magnetic disks, optical disks) include several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present invention.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly used in other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种数据跑批方法,其特征在于,所述数据跑批方法包括以下步骤: A method for running batches of data, characterized in that the method for running batches of data includes the following steps:
    获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据;Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;
    根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;
    获取样本异常数据及对应的样本异常点;Obtain sample abnormal data and corresponding sample abnormalities;
    根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型;Training a convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;
    获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据;Obtain log records for executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;
    根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。The abnormal point is located according to the abnormal data, and the abnormal point is processed to successfully complete the subtask.
  2. 如权利要求1所述的数据跑批方法,其特征在于,所述根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理之后,所述数据跑批方法还包括:The data running batch method of claim 1, wherein the subtasks corresponding to the partitions are generated according to the preset number of partition data, and the preset number of subtasks are stored in a queue A queue list is preset, and task processing messages are sent to each member host in the cluster through the message middleware, so that when each member host in the cluster listens to the task processing message, it is obtained from the preset queue list After the subtasks are processed, the data batch method further includes:
    当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数。When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
  3. 如权利要求2所述的数据跑批方法,其特征在于,所述当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数之后,所述数据跑批方法还包括:The data running batch method according to claim 2, wherein when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain After the number of remaining subtasks, the data running method further includes:
    当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, a message that all subtasks are processed is sent to the cluster through the message middleware.
  4. 如权利要求3所述的数据跑批方法,其特征在于,所述当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群之后,所述数据跑批方法还包括:The data running batch method of claim 3, wherein when the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, and the processing progress of the last subtask is determined Perform monitoring. After monitoring that the last subtask has been processed, using the cluster consumption mode to send a subtask complete processing message to the cluster through the message middleware, the data batching method further includes:
    监听到所述子任务全部处理完成消息时,判断所有子任务是否已经全部处理成功,若所有子任务已经全部处理成功,则获取下一个任务节点并执行下一个任务。When the message that all the subtasks are processed is completed, it is judged whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, the next task node is acquired and the next task is executed.
  5. 如权利要求1所述的数据跑批方法,其特征在于,所述根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理之后,所述数据跑批方法还包括:The data running batch method of claim 1, wherein the subtasks corresponding to the partitions are generated according to the preset number of partition data, and the preset number of subtasks are stored in a queue A queue list is preset, and task processing messages are sent to each member host in the cluster through the message middleware, so that when each member host in the cluster listens to the task processing message, it is obtained from the preset queue list After the subtasks are processed, the data batch method further includes:
    通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
  6. 如权利要求5所述的数据跑批方法,其特征在于,所述通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,包括:The data running batch method according to claim 5, wherein the monitoring through the broadcast consumption mode, and when the task processing message is monitored, obtaining subtasks from the preset queue list for processing, comprises :
    通过广播消费方式进行监听;Monitor through broadcast consumption;
    在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;When the task processing message is monitored, subtasks are obtained from the preset queue list for processing;
    对所述集群中的除了获取子任务的成员主机之外的其他成员主机进行获取子任务功能封锁。Block the function of obtaining subtasks on other member hosts in the cluster except the member host for obtaining subtasks.
  7. 如权利要求6所述的数据跑批方法,其特征在于,所述在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理,包括:7. The data running batch method of claim 6, wherein when the task processing message is monitored, obtaining subtasks from the preset queue list for processing includes:
    在监听到所述任务处理消息时,计算CPU占用率,根据所述CPU占用率从所述预设队列列表中获取多个子任务,并采用多线程并发处理多个子任务。When the task processing message is monitored, the CPU occupancy rate is calculated, multiple subtasks are obtained from the preset queue list according to the CPU occupancy rate, and multiple threads are used to concurrently process multiple subtasks.
  8. 一种集群中的成员主机,其特征在于,所述集群中的成员主机包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令被所述处理器执行时实现如下步骤:A member host in a cluster, characterized in that the member host in the cluster includes: a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, the computer When the readable instructions are executed by the processor, the following steps are implemented:
    获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据;Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;
    根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;
    获取样本异常数据及对应的样本异常点;Obtain sample abnormal data and corresponding sample abnormalities;
    根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型;Training a convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;
    获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据;Obtain log records for executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;
    根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。The abnormal point is located according to the abnormal data, and the abnormal point is processed to successfully complete the subtask.
  9. 如权利要求8所述的集群中的成员主机,其特征在于,所述根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理的步骤之后,还包括:The member host in the cluster according to claim 8, wherein the subtasks corresponding to the partition are generated according to the preset number of partition data, and the preset number of the subtasks are stored in a queue To the preset queue list, and send task processing messages to each member host in the cluster through the message middleware, so that each member host in the cluster listens to the task processing message from the preset queue list After obtaining the subtask for processing, it also includes:
    当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数。When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
  10. 如权利要求9所述的集群中的成员主机,其特征在于,所述当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数的步骤之后,还包括:9. The member host in the cluster according to claim 9, wherein when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one, After obtaining the number of remaining subtasks, it also includes:
    当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, the message that all subtasks are processed is sent to the cluster through the message middleware.
  11. 如权利要求10所述的集群中的成员主机,其特征在于,所述当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群的步骤之后,还包括:The member host in the cluster according to claim 10, wherein when the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, and the processing of the last subtask is Monitoring the progress, after monitoring that the last subtask has been processed, after the step of using the cluster consumption mode to send the subtask complete processing message to the cluster through the message middleware, it also includes:
    监听到所述子任务全部处理完成消息时,判断所有子任务是否已经全部处理成功,若所有子任务已经全部处理成功,则获取下一个任务节点并执行下一个任务。When the message that all the subtasks are processed is completed, it is judged whether all the subtasks have been processed successfully, and if all the subtasks have been processed successfully, the next task node is acquired and the next task is executed.
  12. 如权利要求8所述的集群中的成员主机,其特征在于,所述根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理的步骤之后,还包括:The member host in the cluster according to claim 8, wherein the subtasks corresponding to the partition are generated according to the preset number of partition data, and the preset number of the subtasks are stored in a queue To the preset queue list, and send task processing messages to each member host in the cluster through the message middleware, so that each member host in the cluster listens to the task processing message from the preset queue list After obtaining the subtask for processing, it also includes:
    通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。Monitoring is performed in a broadcast consumption mode, and when the task processing message is monitored, subtasks are obtained from the preset queue list for processing.
  13. 一种数据跑批装置,其特征在于,所述数据跑批装置包括:A data batch running device, characterized in that the data batch running device comprises:
    随机分区模块,用于获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据;The random partition module is used to obtain the batch of to-be-processed data corresponding to the current task node, and randomly partition the batch of to-be-processed data to obtain a preset number of partitioned data;
    生成模块,用于根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;The generating module is configured to generate subtasks corresponding to the partitions according to the preset number of partition data, store the preset number of subtasks in a queue in a preset queue list, and send the task processing through the message middleware Message to each member host in the cluster, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;
    获取模块,用于获取样本异常数据及对应的样本异常点;The acquisition module is used to acquire sample abnormal data and corresponding sample abnormalities;
    训练模块,用于根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型;The training module is used to train the convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;
    识别模块,用于获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据;The identification module is used to obtain log records of executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;
    定位模块,用于根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。The positioning module is used to locate abnormal points according to the abnormal data, and process the abnormal points to successfully complete the subtasks.
  14. 如权利要求13所述的数据跑批装置,其特征在于,所述数据跑批装置还包括:The data batch running device of claim 13, wherein the data batch running device further comprises:
    计算模块,用于当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数。The calculation module is configured to reduce the number of subtasks in the preset queue list by one when obtaining a subtask from the preset queue list to obtain the number of remaining subtasks.
  15. 如权利要求14所述的数据跑批装置,其特征在于,所述数据跑批装置还包括:The data batch running device of claim 14, wherein the data batch running device further comprises:
    发送模块,用于当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。The sending module is used to determine that the acquired subtask is the last subtask when the number of remaining subtasks is zero, monitor the processing progress of the last subtask, and monitor the processing of the last subtask. At the end, the cluster consumption mode is adopted to send all the subtask processing completed messages to the cluster through the message middleware.
  16. 如权利要求15所述的数据跑批装置,其特征在于,所述获取模块,还用于监听到所述子任务全部处理完成消息时,判断所有子任务是否已经全部处理成功,若所有子任务已经全部处理成功,则获取下一个任务节点并执行下一个任务。The data running batch device according to claim 15, wherein the acquiring module is further used to determine whether all the subtasks have been processed successfully when all the subtasks have been processed successfully. After all the processing is successful, the next task node is obtained and the next task is executed.
  17. 如权利要求13所述的数据跑批装置,其特征在于,所述获取模块,还用于通过广播消费方式进行监听,在监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理。The data running batch device according to claim 13, wherein the acquisition module is further configured to monitor through broadcast consumption, and when the task processing message is monitored, obtain from the preset queue list Subtasks are processed.
  18. 一种存储介质,其特征在于,所述存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:A storage medium, characterized in that computer readable instructions are stored on the storage medium, and when the computer readable instructions are executed by a processor, the following steps are implemented:
    获取当前任务节点对应的批量待处理数据,将所述批量待处理数据进行随机分区,获得预设数量的分区数据;Obtain batch data to be processed corresponding to the current task node, and randomly partition the batch data to be processed to obtain a preset number of partition data;
    根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理;Generate the subtasks corresponding to the partitions according to the preset number of partition data, store the preset numbers of the subtasks in the preset queue list in the form of a queue, and send task processing messages to the cluster through the message middleware Each member host, so that when each member host in the cluster listens to the task processing message, it obtains a subtask from the preset queue list for processing;
    获取样本异常数据及对应的样本异常点;Obtain sample abnormal data and corresponding sample abnormalities;
    根据所述样本异常数据及对应的样本异常点对卷积神经网络模型进行训练,获得异常识别模型;Training a convolutional neural network model according to the sample abnormal data and corresponding sample abnormal points to obtain an abnormal recognition model;
    获取执行各子任务的日志记录,通过所述异常识别模型识别所述日志记录中异常数据;Obtain log records for executing each subtask, and identify abnormal data in the log records through the abnormal recognition model;
    根据所述异常数据定位异常点,对所述异常点进行处理,以成功处理完成子任务。The abnormal point is located according to the abnormal data, and the abnormal point is processed to successfully complete the subtask.
  19. 如权利要求18所述的存储介质,其特征在于,所述根据所述预设数量的分区数据生成对应分区的子任务,将所述预设数量的所述子任务以队列形式存放至预设队列列表,并通过消息中间件发送任务处理消息至集群中的各成员主机,以使所述集群中的各成员主机监听到所述任务处理消息时,从所述预设队列列表中获取子任务进行处理的步骤之后,还包括:The storage medium of claim 18, wherein the subtasks corresponding to the partitions are generated according to the preset number of partition data, and the preset number of subtasks are stored in a queue in a preset Queue list, and send task processing messages to each member host in the cluster through the message middleware, so that when each member host in the cluster listens to the task processing message, it obtains subtasks from the preset queue list After the processing steps, it also includes:
    当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数。When a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the number of remaining subtasks.
  20. 如权利要求19所述的存储介质,其特征在于,所述当从所述预设队列列表中获取一个子任务时,对所述预设队列列表中的子任务数量进行减一,获得剩余子任务数的步骤之后,还包括:The storage medium according to claim 19, wherein when a subtask is obtained from the preset queue list, the number of subtasks in the preset queue list is reduced by one to obtain the remaining subtasks After the number of tasks, it also includes:
    当所述剩余子任务数为零时,认定获取的子任务为最后一条子任务,对所述最后一条子任务的处理进度进行监听,在监听到所述最后一条子任务处理完时,采用集群消费方式通过所述消息中间件发送子任务全部处理完成消息至所述集群。 When the number of remaining subtasks is zero, the acquired subtask is determined to be the last subtask, the processing progress of the last subtask is monitored, and when the processing of the last subtask is monitored, the cluster is used In the consumption mode, the message that all subtasks are processed is sent to the cluster through the message middleware. To
PCT/CN2019/121210 2019-06-20 2019-11-27 Batch data execution method, device, storage medium, and member host in cluster WO2020253116A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910553729.4 2019-06-20
CN201910553729.4A CN110362401A (en) 2019-06-20 2019-06-20 Data run the member host in batch method, apparatus, storage medium and cluster

Publications (1)

Publication Number Publication Date
WO2020253116A1 true WO2020253116A1 (en) 2020-12-24

Family

ID=68217029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121210 WO2020253116A1 (en) 2019-06-20 2019-11-27 Batch data execution method, device, storage medium, and member host in cluster

Country Status (2)

Country Link
CN (1) CN110362401A (en)
WO (1) WO2020253116A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168275A (en) * 2021-10-28 2022-03-11 厦门国际银行股份有限公司 Task scheduling method, system, terminal device and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362401A (en) * 2019-06-20 2019-10-22 深圳壹账通智能科技有限公司 Data run the member host in batch method, apparatus, storage medium and cluster
CN113568761B (en) * 2020-04-28 2023-06-27 中国联合网络通信集团有限公司 Data processing method, device, equipment and storage medium
CN111679920A (en) * 2020-06-08 2020-09-18 中国银行股份有限公司 Method and device for processing batch equity data
CN112148505A (en) * 2020-09-18 2020-12-29 京东数字科技控股股份有限公司 Data batching system, method, electronic device and storage medium
CN113537937A (en) * 2021-07-16 2021-10-22 重庆富民银行股份有限公司 Task arrangement method, device and equipment based on topological sorting and storage medium
CN113485812B (en) * 2021-07-23 2023-12-12 重庆富民银行股份有限公司 Partition parallel processing method and system based on large-data-volume task
CN114240109A (en) * 2021-12-06 2022-03-25 中电金信软件有限公司 Method, device and system for cross-region processing batch running task
CN116501499B (en) * 2023-05-17 2023-09-19 建信金融科技有限责任公司 Data batch running method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8245081B2 (en) * 2010-02-10 2012-08-14 Vmware, Inc. Error reporting through observation correlation
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN109144731A (en) * 2018-08-31 2019-01-04 中国平安人寿保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN109461068A (en) * 2018-09-13 2019-03-12 深圳壹账通智能科技有限公司 Judgment method, device, equipment and the computer readable storage medium of fraud
CN110362401A (en) * 2019-06-20 2019-10-22 深圳壹账通智能科技有限公司 Data run the member host in batch method, apparatus, storage medium and cluster

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8190744B2 (en) * 2009-05-28 2012-05-29 Palo Alto Research Center Incorporated Data center batch job quality of service control
US9736034B2 (en) * 2012-09-19 2017-08-15 Oracle International Corporation System and method for small batching processing of usage requests
CN104092794B (en) * 2014-07-25 2017-08-11 中国工商银行股份有限公司 Batch process handling method and system
JP6532385B2 (en) * 2015-11-02 2019-06-19 キヤノン株式会社 INFORMATION PROCESSING SYSTEM, CONTROL METHOD THEREOF, AND PROGRAM
US10417038B2 (en) * 2016-02-18 2019-09-17 Red Hat, Inc. Batched commit in distributed transactions
CN108733477B (en) * 2017-04-20 2021-04-23 中国移动通信集团湖北有限公司 Method, device and equipment for data clustering processing
CN108255619B (en) * 2017-12-28 2019-09-17 新华三大数据技术有限公司 A kind of data processing method and device
CN108564167B (en) * 2018-04-09 2020-07-31 杭州乾圆科技有限公司 Method for identifying abnormal data in data set
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN109672627A (en) * 2018-09-26 2019-04-23 深圳壹账通智能科技有限公司 Method for processing business, platform, equipment and storage medium based on cluster server
CN109298225B (en) * 2018-09-29 2020-10-09 国网四川省电力公司电力科学研究院 Automatic identification model system and method for abnormal state of voltage measurement data
CN109558600B (en) * 2018-11-14 2023-06-30 抖音视界有限公司 Translation processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8245081B2 (en) * 2010-02-10 2012-08-14 Vmware, Inc. Error reporting through observation correlation
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN109144731A (en) * 2018-08-31 2019-01-04 中国平安人寿保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN109461068A (en) * 2018-09-13 2019-03-12 深圳壹账通智能科技有限公司 Judgment method, device, equipment and the computer readable storage medium of fraud
CN110362401A (en) * 2019-06-20 2019-10-22 深圳壹账通智能科技有限公司 Data run the member host in batch method, apparatus, storage medium and cluster

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168275A (en) * 2021-10-28 2022-03-11 厦门国际银行股份有限公司 Task scheduling method, system, terminal device and storage medium
CN114168275B (en) * 2021-10-28 2022-10-18 厦门国际银行股份有限公司 Task scheduling method, system, terminal device and storage medium

Also Published As

Publication number Publication date
CN110362401A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
WO2020253116A1 (en) Batch data execution method, device, storage medium, and member host in cluster
WO2018036167A1 (en) Test task executor assignment method, device, server and storage medium
WO2020233077A1 (en) System service monitoring method, device, and apparatus, and storage medium
WO2018076800A1 (en) Method and system for asynchronously updating data
WO2018155963A1 (en) Method of accelerating execution of machine learning based application tasks in a computing device
WO2020048047A1 (en) System fault warning method, apparatus, and device, and storage medium
WO2018014582A1 (en) Insurance policy data processing method, device, servicer and storage medium
WO2019037196A1 (en) Method and device for task assignment, and computer-readable storage medium
WO2020015064A1 (en) System fault processing method, apparatus, device and storage medium
WO2020119115A1 (en) Data verification method, device, apparatus, and storage medium
WO2016112558A1 (en) Question matching method and system in intelligent interaction system
WO2020224249A1 (en) Blockchain-based transaction processing method, device and apparatus, and storage medium
WO2018036168A1 (en) Method and device for executing data processing task, execution server, and storage medium
WO2020147264A1 (en) Method, apparatus and device for monitoring multi-system log data, and readable storage medium
WO2020119369A1 (en) Intelligent it operation and maintenance fault positioning method, apparatus and device, and readable storage medium
WO2020224251A1 (en) Block chain transaction processing method, device, apparatus and storage medium
WO2020019405A1 (en) Database monitoring method, device and apparatus, and computer storage medium
WO2020224250A1 (en) Method, apparatus, and device for smart contract triggering, and storage medium
WO2021034024A1 (en) Electronic apparatus and method for controlling thereof
WO2020098075A1 (en) Financial data processing method, apparatus and device, and storage medium
WO2020087981A1 (en) Method and apparatus for generating risk control audit model, device and readable storage medium
WO2018035929A1 (en) Method and apparatus for processing verification code
WO2020233073A1 (en) Blockchain environment test method, device and apparatus, and storage medium
WO2020073494A1 (en) Webpage backdoor detecting method, device, storage medium and apparatus
WO2016179880A1 (en) Network hospital platform, expert platform and emergency expert consultation request method based on expert platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933258

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933258

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 29/03/2022)