WO2019144552A1 - Data task processing method, application server and computer-readable storage medium - Google Patents

Data task processing method, application server and computer-readable storage medium Download PDF

Info

Publication number
WO2019144552A1
WO2019144552A1 PCT/CN2018/089192 CN2018089192W WO2019144552A1 WO 2019144552 A1 WO2019144552 A1 WO 2019144552A1 CN 2018089192 W CN2018089192 W CN 2018089192W WO 2019144552 A1 WO2019144552 A1 WO 2019144552A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
data
running
dependency
synchronized
Prior art date
Application number
PCT/CN2018/089192
Other languages
French (fr)
Chinese (zh)
Inventor
陈龙
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019144552A1 publication Critical patent/WO2019144552A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4494Execution paradigms, e.g. implementations of programming paradigms data driven
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45508Runtime interpretation or emulation, e g. emulator loops, bytecode interpretation
    • G06F9/45512Command shells
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence

Definitions

  • the present application relates to the field of data analysis, and in particular, to a data task processing method, an application server, and a computer readable storage medium.
  • Hadoop is an open source distributed infrastructure that allows users to develop distributed programs without knowing the underlying details of the distribution.
  • Hadoop implements a distributed file system that provides high transfer rates to access application data for applications with very large data sets.
  • ozzie is used to implement task scheduling, but task scheduling based on individual ozzie cannot analyze the dependency relationship between data and tasks.
  • the task of the model can be executed after the data synchronization is completed for a period of time, but the data and task dependencies are controlled. Confusion, it is hard to find after a problem.
  • the present application provides a data task processing method, an application server, and a computer readable storage medium, which can obtain a help document corresponding to a current page operation by analyzing the request help information, thereby improving a user experience.
  • the present application provides a data task processing method, which is applied to an application server, and the method includes:
  • the present application further provides an application server, where the application server includes a memory, a processor, and a data task processing system executable on the processor, where the data task is stored.
  • the processing system implements the following steps when executed by the processor:
  • the present application further provides a computer readable storage medium storing a data task processing system, the data task processing system being executable by at least one processor, such that The at least one processor performs the steps of the data task processing method as described above.
  • the application server, the data task processing method, and the computer readable storage medium proposed by the present application firstly acquire a task list from a terminal device; and then configure the task relyer to analyze data and tasks. Dependency; then, recording the execution process of the data synchronization; further, determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task; finally, if the data has been synchronized, performing the task of completing the data synchronization If the data is not synchronized, an alert message is issued.
  • FIG. 1 is a schematic diagram of an optional application environment of each embodiment of the present application.
  • FIG. 2 is a schematic diagram of an optional hardware architecture of the application server of FIG. 1;
  • FIG. 3 is a schematic diagram of a program module of a first embodiment of a data task processing system of the present application
  • FIG. 4 is a schematic diagram of a program module of a second embodiment of the data task processing system of the present application.
  • FIG. 5 is a schematic flowchart diagram of a first embodiment of a data task processing method according to the present application.
  • FIG. 6 is a schematic flowchart of a second embodiment of a data task processing method according to the present application.
  • FIG. 1 it is a schematic diagram of an optional application environment of each embodiment of the present application.
  • the present application is applicable to an application environment including, but not limited to, the terminal device 1, the application server 2, and the network 3.
  • the terminal device 1 may be a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, an in-vehicle device, etc.
  • Mobile devices such as, and fixed terminals such as digital TVs, desktop computers, notebooks, servers, and the like.
  • the application server 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server.
  • the application server 2 may be a stand-alone server or a server cluster composed of multiple servers.
  • the network 3 may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, Wireless or wired networks such as 5G networks, Bluetooth, Wi-Fi, and call networks
  • the application server 2 is respectively connected to one or more of the terminal devices 1 through the network 3 for data transmission and interaction.
  • FIG. 2 it is a schematic diagram of an optional hardware architecture of the application server 2 in FIG.
  • the application server 2 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus. It is pointed out that Figure 1 only shows the application server 2 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), and a random access memory (RAM). , static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the application server 2, such as a hard disk or memory of the application server 2.
  • the memory 11 may also be an external storage device of the application server 2, such as a plug-in hard disk equipped on the application server 2, a smart memory card (SMC), and a secure digital number. (Secure Digital, SD) card, flash card, etc.
  • the memory 11 can also include both the internal storage unit of the application server 2 and its external storage device.
  • the memory 11 is generally used to store an operating system installed in the application server 2 and various types of application software, such as program code of the data task processing system 200. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the application server 2.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running the data task processing system 200 and the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 2 and other electronic devices.
  • the network interface 13 is mainly used to connect the application server 2 to one or more of the terminal devices 1 through the network 3, and the application server 2 and the one or more terminals. A data transmission channel and a communication connection are established between the devices 1.
  • the present application proposes a data task processing system 200.
  • FIG. 3 it is a program module diagram of the first embodiment of the data task processing system 200 of the present application.
  • the data task processing system 200 includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the data task processing operations of the embodiments of the present application may be implemented. .
  • data task processing system 200 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 3, the data task processing system 200 can be divided into an acquisition module 201, a configuration module 202, a recording module 203, a determination module 204, an execution module 205, and an early warning module 206. among them:
  • the obtaining module 201 is configured to acquire a task list from the terminal device 1.
  • the application server 2 has a Hadoop data platform center built therein, and the Hadoop data platform center acquires data from the external terminal device 1, and when the application server 2 performs data processing according to the data acquired by the Hadoop data platform center, Data acquisition, data cleaning, data analysis and other operations are required. Each process may involve multiple tasks, some need to be executed sequentially, and some may be executed in parallel.
  • the application server 2 acquires a task list from the terminal device 1 through the acquisition module 201.
  • the application server 2 manages the execution and order of these tasks through oozie.
  • Oozie is a Hadoop-based scheduler that writes the scheduling process in the form of xml. It can schedule mr, pig, hive, shell, jar, and so on.
  • the application server 2 executes the task flow node in the order of oozie, supports fork (multiple branches), and join (combines multiple nodes into one).
  • the configuration module 202 is configured to configure a relying party of the task in the task list to configure a dependency relationship between the data and the task.
  • the task dependant is configured to configure the dependency relationship between the data and the task, and only the task with complete data can be executed.
  • the application server 2 obtains a valid reer (dependency) configuration of the task flow node fork (branching multiple nodes), and executes a resume state query statement to output the original result, thereby merging the plurality of task nodes, and supplementing Fully dependent on the state, and the result of the dependency is de-duplicated. Finally, the de-duplicated dependency result is labeled with the configuration slice tag, and the scheduling dependency of all tasks is completed.
  • the application server 2 loads the configuration file by using the hive load, overwrites the configuration table, and collects the latest configuration file from the production environment, and then deploys the configuration and deploys the command to implement the configuration dependent:
  • Step1 Upload the script and authorize, format (allow private users to operate), upload to /tmp directory
  • Step2 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step3 Execute the command
  • the implementation configuration dependency can be modified by modifying the configuration deployment command:
  • Step1 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step2 Execute the command
  • the recording module 203 is configured to record an execution process of data synchronization of the task.
  • the application server 2 The execution process of data synchronization is recorded by the recording module 203.
  • the recording module 203 uses a shell to create a log and a status table, and records the execution process of the data synchronization and the execution time of the data synchronization.
  • the determining module 204 is configured to determine, according to the execution process of the data synchronization and the dependency relationship between the data and the task, whether the data is synchronized.
  • the application server 2 first determines whether the data has been synchronized by the determining module 204. The application server 2 determines whether the data is synchronized according to the execution process of the data synchronization recorded in the log and the status table, the execution time of the data synchronization, and the dependency relationship between the data and the task.
  • the executing module 205 is configured to perform a task of completing data synchronization when data has been synchronized.
  • the warning module 206 is configured to issue an early warning message if the data is not synchronized.
  • the execution module 205 performs the task only when the data is synchronized, that is, the data is complete.
  • the warning module 206 sends the warning information when the data is not synchronized.
  • the warning information includes, but is not limited to, data information that has not been synchronized, time of the last synchronization, and the like, to notify the staff. Manual intervention.
  • the data task processing system 200 proposed by the present application firstly acquires a task list from the terminal device 1; then, configures the task relyer to analyze the dependency relationship between the data and the task; and then, records The execution process of the data synchronization; further, determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task; finally, if the data has been synchronized, performing the task of completing the data synchronization; if the data is not completed , issued an early warning message.
  • the data task processing system 200 further includes a sorting module 207, where
  • the obtaining module 201 is further configured to acquire a waiting running task
  • the task of completing the data synchronization is performed only when the data synchronization is completed.
  • the tasks include, but are not limited to, a round running task and a heavy running task.
  • a round-robin task refers to a task that is executed cyclically within an effective date.
  • a re-run task refers to a task that needs to be re-executed after a failed execution.
  • the application server 2 obtains the waiting round running task through the obtaining module 201, and determines whether the dependent configuration is satisfied, and analyzes the valid round running date series of the task, in the case that the dependent configuration is satisfied.
  • the round running task sequence is based on the working day or the natural day series, and the default setting round running sequence length is up to 730 days in the past.
  • the code for implementing the round running task is:
  • Step1 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step2 Execute the command
  • the obtaining module 201 is further configured to acquire a re-running task that is waiting;
  • Table 3 is a configuration requirement of the re-run task in an implementation of the present application:
  • the code for implementing the re-run task is:
  • Step1 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step2 Execute the command
  • the sorting module 207 is further configured to sort the round running task and the heavy running task according to a priority level.
  • the execution module 205 is further configured to preferentially execute a task with a high level.
  • the sorting module 207 prioritizes the round running task and the heavy running task according to the chronological order of the obtaining tasks. It can be understood that in other embodiments of the present application, the priority level requirement can be set according to actual needs.
  • the warning module 206 is further configured to monitor a currently executed task, and issue an early warning when an abnormality occurs during the execution of the task.
  • the application server 2 monitors the currently executed task through the early warning module 206. When an abnormality occurs during the execution of the task, an early warning is issued to notify the staff to process in time.
  • the data task processing system 200 proposed by the present application may further sort the acquired round running task and the heavy running task according to the priority level, preferentially execute the high level task, and monitor the current execution.
  • the task when an abnormality occurs during the execution of the task, issues an early warning to achieve the supervisory task.
  • the present application also proposes a data task processing method.
  • FIG. 5 it is a schematic flowchart of the first embodiment of the data task processing method of the present application.
  • the order of execution of the steps in the flowchart shown in FIG. 5 may be changed according to different requirements, and some steps may be omitted.
  • Step S301 the task list is acquired from the terminal device 1.
  • the application server 2 has a Hadoop data platform center built therein, and the Hadoop data platform center acquires data from the external terminal device 1, and when the application server 2 performs data processing according to the data acquired by the Hadoop data platform center, Data acquisition, data cleaning, data analysis and other operations are required. Each process may involve multiple tasks, some need to be executed sequentially, and some may be executed in parallel.
  • the application server 2 acquires a task list from the terminal device 1.
  • the application server 2 manages the execution and order of these tasks through oozie.
  • Oozie is a Hadoop-based scheduler that writes the scheduling process in the form of xml. It can schedule mr, pig, hive, shell, jar, and so on.
  • the application server 2 executes the task flow node in the order of oozie, supports fork (multiple branches), and join (combines multiple nodes into one).
  • Step S302 configuring a relying party of the task in the task list to configure a dependency relationship between the data and the task.
  • the task dependant is configured to configure the dependency relationship between the data and the task, and only the task with complete data can be executed.
  • the application server 2 obtains a valid reer (dependency) configuration of the task flow node fork (branching multiple nodes), and executes a resume state query statement to output the original result, thereby merging the plurality of task nodes, and supplementing Fully dependent on the state, and the result of the dependency is de-duplicated. Finally, the de-duplicated dependency result is labeled with the configuration slice tag, and the scheduling dependency of all tasks is completed.
  • the application server 2 loads the configuration file by using the hive load, overwrites the configuration table, and collects the latest configuration file from the production environment, and then deploys the configuration and deploys the command to implement the configuration dependent:
  • Step1 Upload the script and authorize, format (allow private users to operate), upload to /tmp directory
  • Step2 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step3 Execute the command
  • the implementation configuration dependency can be modified by modifying the configuration deployment command:
  • Step1 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step2 Execute the command
  • Step S303 recording an execution process of data synchronization of the task.
  • the application server 2 Record the execution process of data synchronization.
  • the application server 2 uses a shell to create a log and a status table, and records the execution process of the data synchronization and the execution time of the data synchronization.
  • Step S304 determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task.
  • the application server 2 first determines whether the data has been synchronized. The application server 2 determines whether the broken data is completed according to the execution process of the data synchronization recorded in the log and the status table, the execution time of the data synchronization, and the dependency relationship between the data and the task.
  • Step S305 when the data has been synchronized, the task of completing the data synchronization is performed.
  • Step S306 if the data is not synchronized, an early warning message is issued.
  • the application server 2 performs the task only when the data is synchronized, that is, when the data is complete.
  • the application server 2 sends an alert message.
  • the alert information includes, but is not limited to, data information that has not been synchronized, time of the last synchronization, and the like, to notify the staff. Manual intervention.
  • the data task processing method proposed by the present application firstly acquires a task list from the terminal device 1; then, configures the task relyer to analyze the dependency relationship between the data and the task; and then, records the data synchronization. Execution process; further, determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task; finally, if the data has been synchronized, performing the task of completing the data synchronization; if the data is not synchronized, issuing Early warning information.
  • FIG. 6 is a schematic flowchart diagram of a second embodiment of a data task processing method of the present application. In this embodiment, the method further includes the following steps:
  • Step S401 acquiring a waiting running task
  • the task is executed only when the data synchronization is completed.
  • the tasks include, but are not limited to, a round running task and a heavy running task.
  • a round-robin task refers to a task that is executed cyclically within an effective date.
  • a re-run task refers to a task that needs to be re-executed after a failed execution.
  • the application server 2 acquires the waiting round running task, and determines whether the relying party configuration is satisfied, and analyzes the effective round running date series of the task on the premise that the relying party configuration is satisfied.
  • the round running The task sequence is based on the weekday or natural day series.
  • the default setting of the round run sequence length is up to 730 days.
  • the code for implementing the round running task is:
  • Step1 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step2 Execute the command
  • Step S402 acquiring a re-running task that is waiting
  • Table 3 is a configuration requirement of the re-run task in an implementation of the present application:
  • the code for implementing the re-run task is:
  • Step1 Switch users (if your private user allows you to execute the hive command, you can not switch)
  • Step2 Execute the command
  • Step S403 sorting the round running task and the running running task according to a priority level.
  • step S404 the task with a high level is preferentially executed.
  • the application server 2 prioritizes the round running task and the heavy running task according to the chronological order of the obtaining tasks. It can be understood that in other embodiments of the present application, the priority level requirement can be set according to actual needs.
  • Step S405 monitoring the currently executed task, and issuing an early warning when an abnormality occurs during the execution of the task.
  • the application server 2 monitors the currently executed task, and when an abnormality occurs during the execution of the task, an early warning is issued to notify the staff to process in time.
  • the data task processing method proposed by the present application may further sort the acquired round running task and the heavy running task according to the priority level, preferentially execute the high level task, and monitor the current execution.
  • the task when an abnormality occurs during the execution of the task, issues an early warning to achieve the supervisory task.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

Disclosed is a data task processing method, the method comprising: acquiring a task list from a terminal device; configuring a task relier so as to analyze a reliance relationship between data and a task; recording an execution process of data synchronization; judging whether data is synchronized according to the execution process of data synchronization, and the reliance relationship between the data and the task; if the data synchronization has been completed, executing the task for which the data synchronization has been completed; and if the data synchronization has not been completed, sending out pre-warning information. Also provided in the present application is an application server and a computer-readable storage medium. By means of the data task processing method, application server and computer-readable storage medium provided in the present application, whether data is synchronized can be judged according to the execution process of data synchronization and the reliance relationship between data and a task, so that the task can be executed only if the data synchronization is completed.

Description

数据任务处理方法、应用服务器及计算机可读存储介质Data task processing method, application server, and computer readable storage medium
优先权申明Priority claim
本申请要求于2018年1月24日提交中国专利局、申请号为201810066359.7,发明名称为“数据任务处理方法、应用服务器及计算机可读存储介质”的中国专利申请的优先权,其内容全部通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 201810066359.7, filed on January 24, 2018, entitled "Data Task Processing Method, Application Server, and Computer Readable Storage Media", the contents of which are all passed. The citations are incorporated herein by reference.
技术领域Technical field
本申请涉及数据分析领域,尤其涉及一种数据任务处理方法、应用服务器及计算机可读存储介质。The present application relates to the field of data analysis, and in particular, to a data task processing method, an application server, and a computer readable storage medium.
背景技术Background technique
Hadoop是一个开源分布式基础架构,用户可以在不了解分布式底层细节的情况下,开发分布式程序。Hadoop实现了一个分布式文件系统,它提供高传输率来访问应用程序的数据,适合那些有着超大数据集的应用程序。通常利用ozzie实现任务的调度,但是基于单独ozzie的任务调度不能分析实现数据与任务的依赖关系,比如模型的任务要等一段时间内的数据同步跑完后才能执行,然而数据与任务的依赖控制混乱,出问题后较难发现。Hadoop is an open source distributed infrastructure that allows users to develop distributed programs without knowing the underlying details of the distribution. Hadoop implements a distributed file system that provides high transfer rates to access application data for applications with very large data sets. Usually, ozzie is used to implement task scheduling, but task scheduling based on individual ozzie cannot analyze the dependency relationship between data and tasks. For example, the task of the model can be executed after the data synchronization is completed for a period of time, but the data and task dependencies are controlled. Confusion, it is hard to find after a problem.
发明内容Summary of the invention
有鉴于此,本申请提出一种数据任务处理方法、应用服务器及计算机可读存储介质,能够通过分析所述请求帮助信息,获取与当前页面操作相关对应的帮助文档,提高用户体验。In view of this, the present application provides a data task processing method, an application server, and a computer readable storage medium, which can obtain a help document corresponding to a current page operation by analyzing the request help information, thereby improving a user experience.
首先,为实现上述目的,本申请提出一种数据任务处理方法,该方法应用于应用服务器,所述方法包括:First, in order to achieve the above object, the present application provides a data task processing method, which is applied to an application server, and the method includes:
从终端设备获取任务列表;Obtaining a task list from the terminal device;
配置所述任务依赖者,以分析数据与任务的依赖关系;Configuring the task relyer to analyze data and task dependencies;
记录数据同步的执行过程;Record the execution process of data synchronization;
根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步;Determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task;
若数据已经同步完成,执行已经完成数据同步的任务;If the data has been synchronized, perform the task of completing the data synchronization;
若数据没有完成同步,发出预警信息。If the data is not synchronized, an alert message is issued.
此外,为实现上述目的,本申请还提供一种应用服务器,所述应用服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的数据任务处理系统,所述数据任务处理系统被所述处理器执行时实现如下步骤:In addition, in order to achieve the above object, the present application further provides an application server, where the application server includes a memory, a processor, and a data task processing system executable on the processor, where the data task is stored. The processing system implements the following steps when executed by the processor:
从终端设备获取任务列表;Obtaining a task list from the terminal device;
配置所述任务依赖者,以分析数据与任务的依赖关系;Configuring the task relyer to analyze data and task dependencies;
记录数据同步的执行过程;Record the execution process of data synchronization;
根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同 步;Determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task;
若数据已经同步完成,执行已经完成数据同步的任务;If the data has been synchronized, perform the task of completing the data synchronization;
若数据没有完成同步,发出预警信息。If the data is not synchronized, an alert message is issued.
进一步地,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有数据任务处理系统,所述数据任务处理系统可被至少一个处理器执行,以使所述至少一个处理器执行如上述的数据任务处理方法的步骤。Further, to achieve the above object, the present application further provides a computer readable storage medium storing a data task processing system, the data task processing system being executable by at least one processor, such that The at least one processor performs the steps of the data task processing method as described above.
相较于现有技术,本申请所提出的应用服务器、数据任务处理方法及计算机可读存储介质,首先,从终端设备获取任务列表;然后,配置所述任务依赖者,以分析数据与任务的依赖关系;接着,记录数据同步的执行过程;进一步地,根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步;最后,若数据已经同步完成,执行已经完成数据同步的任务;若数据没有完成同步,发出预警信息。这样,既可以避免了现有技术中数据与任务的依赖控制混乱的缺陷,还可以通所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步,实现只有数据完成同步才能执行任务。Compared with the prior art, the application server, the data task processing method, and the computer readable storage medium proposed by the present application firstly acquire a task list from a terminal device; and then configure the task relyer to analyze data and tasks. Dependency; then, recording the execution process of the data synchronization; further, determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task; finally, if the data has been synchronized, performing the task of completing the data synchronization If the data is not synchronized, an alert message is issued. In this way, the defect of the dependency control of the data and the task in the prior art can be avoided, and the execution process of the data synchronization and the dependency relationship between the data and the task can be used to judge whether the data is synchronized, and the task can be executed only when the data is synchronized. .
附图说明DRAWINGS
图1是本申请各个实施例一可选的应用环境示意图;1 is a schematic diagram of an optional application environment of each embodiment of the present application;
图2是图1中应用服务器一可选的硬件架构的示意图;2 is a schematic diagram of an optional hardware architecture of the application server of FIG. 1;
图3是本申请数据任务处理系统第一实施例的程序模块示意图;3 is a schematic diagram of a program module of a first embodiment of a data task processing system of the present application;
图4是本申请数据任务处理系统第二实施例的程序模块示意图;4 is a schematic diagram of a program module of a second embodiment of the data task processing system of the present application;
图5为本申请数据任务处理方法第一实施例的流程示意图;FIG. 5 is a schematic flowchart diagram of a first embodiment of a data task processing method according to the present application;
图6为本申请数据任务处理方法第二实施例的流程示意图;6 is a schematic flowchart of a second embodiment of a data task processing method according to the present application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.
参阅图1所示,是本申请各个实施例一可选的应用环境示意图。Referring to FIG. 1 , it is a schematic diagram of an optional application environment of each embodiment of the present application.
在本实施例中,本申请可应用于包括,但不仅限于,终端设备1、应用服务器2、网络3的应用环境中。其中,所述终端设备1可以是移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置、车载装置等等的可移动设备,以及诸如数字TV、台式计算机、笔记本、服务器等等的固定终端。所述应用服务器2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备,该应用服务器2可以是独立的服务器,也可以是多个服务器所组成的服务器集群。所述网络3可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi、通话网络等无线或有线网络。In this embodiment, the present application is applicable to an application environment including, but not limited to, the terminal device 1, the application server 2, and the network 3. The terminal device 1 may be a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, an in-vehicle device, etc. Mobile devices such as, and fixed terminals such as digital TVs, desktop computers, notebooks, servers, and the like. The application server 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The application server 2 may be a stand-alone server or a server cluster composed of multiple servers. The network 3 may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, Wireless or wired networks such as 5G networks, Bluetooth, Wi-Fi, and call networks.
其中,所述应用服务器2通过所述网络3分别与一个或多个所述终端设备1通信连接,以进行数据传输和交互。The application server 2 is respectively connected to one or more of the terminal devices 1 through the network 3 for data transmission and interaction.
参阅图2所示,是图1中应用服务器2一可选的硬件架构的示意图。Referring to FIG. 2, it is a schematic diagram of an optional hardware architecture of the application server 2 in FIG.
本实施例中,所述应用服务器2可包括,但不仅限于,可通过系统总线相互通信连接存储器11、处理器12、网络接口13。需要指出的是,图1仅示出了具有组件11-13的应用服务器2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In this embodiment, the application server 2 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus. It is pointed out that Figure 1 only shows the application server 2 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器11可以是所述应用服务器2的内部存储单元,例如该应用服务器2的硬盘或内存。在另一些实施例中,所述存储器11也可以是所述应用服务器2的外部存储设备,例如该应用服务器2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器11还可以既包括所述应用服务器2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器11通常用于存储安装于所述应用服务器2的操作系统和各类应用软件,例如数据任务处理系统200的程序代码等。此外,所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), and a random access memory (RAM). , static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 11 may be an internal storage unit of the application server 2, such as a hard disk or memory of the application server 2. In other embodiments, the memory 11 may also be an external storage device of the application server 2, such as a plug-in hard disk equipped on the application server 2, a smart memory card (SMC), and a secure digital number. (Secure Digital, SD) card, flash card, etc. Of course, the memory 11 can also include both the internal storage unit of the application server 2 and its external storage device. In this embodiment, the memory 11 is generally used to store an operating system installed in the application server 2 and various types of application software, such as program code of the data task processing system 200. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述应用服务器2的总体操作。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行所述的数据任务处理系统200等。The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the application server 2. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running the data task processing system 200 and the like.
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述应用服务器2与其他电子设备之间建立通信连接。本实施例中,所述网络接口13主要用于通过所述网络3将所述应用服务器2与一个或多个所述终端设备1相连,在所述应用服务器2与所述一个或多个终端设备1之间的建立数据传输通道和通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 2 and other electronic devices. In this embodiment, the network interface 13 is mainly used to connect the application server 2 to one or more of the terminal devices 1 through the network 3, and the application server 2 and the one or more terminals. A data transmission channel and a communication connection are established between the devices 1.
至此,己经详细介绍了本申请各个实施例的应用环境和相关设备的硬件结构和功能。下面,将基于上述应用环境和相关设备,提出本申请的各个实施例。So far, the application environment of the various embodiments of the present application and the hardware structure and functions of related devices have been described in detail. Hereinafter, various embodiments of the present application will be proposed based on the above-described application environment and related devices.
首先,本申请提出一种数据任务处理系统200。First, the present application proposes a data task processing system 200.
参阅图3所示,是本申请数据任务处理系统200第一实施例的程序模块图。Referring to FIG. 3, it is a program module diagram of the first embodiment of the data task processing system 200 of the present application.
本实施例中,所述数据任务处理系统200包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的数据任务处理操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,数据任务处理系统200可以被划分为一个或多个模块。例如,在图3中,所述数据任务处理系统200可以被分割成获取模块201、配置模块202、记录模块203、判断模块204、执行模块205以及预警模块206。其中:In this embodiment, the data task processing system 200 includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the data task processing operations of the embodiments of the present application may be implemented. . In some embodiments, data task processing system 200 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 3, the data task processing system 200 can be divided into an acquisition module 201, a configuration module 202, a recording module 203, a determination module 204, an execution module 205, and an early warning module 206. among them:
所述获取模块201,用于从终端设备1获取任务列表。The obtaining module 201 is configured to acquire a task list from the terminal device 1.
具体地,所述应用服务器2内建有hadoop数据平台中心,hadoop数据平台中心从外部的终端设备1获取数据,所述应用服务器2根据hadoop数据平台中心获取到的数据并进行数据处理的时候,需要进行数据采集,数据清洗、数据分析等操作,每一个过程都可能涉及到多个任务,有的需要顺序执行,有的可以并行执行。Specifically, the application server 2 has a Hadoop data platform center built therein, and the Hadoop data platform center acquires data from the external terminal device 1, and when the application server 2 performs data processing according to the data acquired by the Hadoop data platform center, Data acquisition, data cleaning, data analysis and other operations are required. Each process may involve multiple tasks, some need to be executed sequentially, and some may be executed in parallel.
在本实施例中,所述应用服务器2通过获取模块201从终端设备1获取任务列表。所述应用服务器2通过oozie管理这些任务的执行和顺序。oozie是基于hadoop的调度器,以xml的形式写调度流程,可以调度mr,pig,hive,shell,jar等等。所述应用服务器2通过oozie顺序执行任务流程节点,支持fork(分支多个节点),join(合并多个节点为一个)。In this embodiment, the application server 2 acquires a task list from the terminal device 1 through the acquisition module 201. The application server 2 manages the execution and order of these tasks through oozie. Oozie is a Hadoop-based scheduler that writes the scheduling process in the form of xml. It can schedule mr, pig, hive, shell, jar, and so on. The application server 2 executes the task flow node in the order of oozie, supports fork (multiple branches), and join (combines multiple nodes into one).
所述配置模块202,用于配置所述任务列表中的任务的依赖者,以配置数据与任务的依赖关系。The configuration module 202 is configured to configure a relying party of the task in the task list to configure a dependency relationship between the data and the task.
具体地,配置任务依赖者是为了配置数据与任务的依赖关系,只有数据齐全的任务才能执行。在本实施例中,所述应用服务器2通过获取任务流程节点fork(分支多个节点)的有效relier(依赖)配置,并执行relier状态查询语句,输出原始结果,进而合并多个任务节点,补全依赖状态,并对依赖结果去重,最后为所述去重的依赖结果标注依赖配置切片标签,完成所有任务的调度依赖。Specifically, the task dependant is configured to configure the dependency relationship between the data and the task, and only the task with complete data can be executed. In this embodiment, the application server 2 obtains a valid reer (dependency) configuration of the task flow node fork (branching multiple nodes), and executes a resume state query statement to output the original result, thereby merging the plurality of task nodes, and supplementing Fully dependent on the state, and the result of the dependency is de-duplicated. Finally, the de-duplicated dependency result is labeled with the configuration slice tag, and the scheduling dependency of all tasks is completed.
请参阅表1,为本实施例中的依赖配置格式要求。Please refer to Table 1, which is the dependency configuration format requirement in this embodiment.
表1Table 1
Figure PCTCN2018089192-appb-000001
Figure PCTCN2018089192-appb-000001
在本实施例中,所述应用服务器2通过hive load加载配置文件,覆写配置表,从生产环境采集最新的配置文件修改后请运营部署,部署命令以实现配置依赖者:In this embodiment, the application server 2 loads the configuration file by using the hive load, overwrites the configuration table, and collects the latest configuration file from the production environment, and then deploys the configuration and deploys the command to implement the configuration dependent:
Step1:上传脚本并授权、格式化(允许私人用户操作),上传到/tmp目录Step1: Upload the script and authorize, format (allow private users to operate), upload to /tmp directory
chmod 777/tmp/relier_config_all.txtChmod 777/tmp/relier_config_all.txt
Step2:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step2: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step3:执行命令Step3: Execute the command
hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;Hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;
truncate table fm_relier_check_script;Truncate table fm_relier_check_script;
load data local inpath'/tmp/relier_config_all.txt'into table aml_awbs.fm_relier_check_script;"Load data local inpath'/tmp/relier_config_all.txt'into table aml_awbs.fm_relier_check_script;"
在发明另一实施例中,实现配置依赖者可以通过修改配置部署命令:In another embodiment of the invention, the implementation configuration dependency can be modified by modifying the configuration deployment command:
Step1:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step1: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step2:执行命令Step2: Execute the command
hive-e"set mapred.job.queue.name=queue_0006_02;Hive-e"set mapred.job.queue.name=queue_0006_02;
insert overwrite table aml_awbs.fm_relier_check_scriptInsert overwrite table aml_awbs.fm_relier_check_script
select relier_name,Select relier_name,
src_job_name,Src_job_name,
if(relier_name='i_jt-aml-999-cd',If(relier_name='i_jt-aml-999-cd',
'select concat(y,m,d)datestr,\'Y\'state from aml_awbs.JOB_STATE where JOB_NAME=\'jt-aml-999-cd\”,'select concat(y,m,d)datestr,\'Y\'state from aml_awbs.JOB_STATE where JOB_NAME=\'jt-aml-999-cd\",
relier_name)script_string,Relier_name)script_string,
forkFork
from aml_awbs.fm_relier_check_script"From aml_awbs.fm_relier_check_script"
所述记录模块203,用于记录所述任务的数据同步的执行过程。The recording module 203 is configured to record an execution process of data synchronization of the task.
具体地,从上文可知,只有数据齐全的任务才可执行,因此为了确保所述hadoop数据平台中心从外部的终端设备1获取数据是齐全的,当数据有更新或者修改时,所述应用服务器2通过记录模块203记录数据同步的执行过程。在本实施例中,所述记录模块203利用shell创建日志和状态表,记录数据同步的执行过程及数据同步的执行时间。Specifically, it can be seen from the above that only the task with complete data can be executed, so in order to ensure that the data of the Hadoop data platform center from the external terminal device 1 is complete, when the data is updated or modified, the application server 2 The execution process of data synchronization is recorded by the recording module 203. In this embodiment, the recording module 203 uses a shell to create a log and a status table, and records the execution process of the data synchronization and the execution time of the data synchronization.
所述判断模块204,用于根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步完成。The determining module 204 is configured to determine, according to the execution process of the data synchronization and the dependency relationship between the data and the task, whether the data is synchronized.
具体地,在执行任务之前,所述应用服务器2通过判断模块204先判断数据是否已经同步完成。所述应用服务器2根据shell创建的日志和状态表中记录的数据同步的执行过程、数据同步的执行时间以及数据与任务的依赖关系判断数据是否同步完成。Specifically, before executing the task, the application server 2 first determines whether the data has been synchronized by the determining module 204. The application server 2 determines whether the data is synchronized according to the execution process of the data synchronization recorded in the log and the status table, the execution time of the data synchronization, and the dependency relationship between the data and the task.
所述执行模块205,用于当数据已经同步完成,执行已经完成数据同步的任务。The executing module 205 is configured to perform a task of completing data synchronization when data has been synchronized.
所述预警模块206,用于若数据没有完成同步,发出预警信息。The warning module 206 is configured to issue an early warning message if the data is not synchronized.
具体地,只有数据完成同步,即数据齐全的情况下,所述执行模块205才会执行任务。当数据没有完成同步时,所述预警模块206发出预警信息,在本实施例中,所述预警信息包括但不限于没有完成同步的数据信息、最后一次同步的时间等等,以通知工作人员进行人工干预。Specifically, the execution module 205 performs the task only when the data is synchronized, that is, the data is complete. The warning module 206 sends the warning information when the data is not synchronized. In this embodiment, the warning information includes, but is not limited to, data information that has not been synchronized, time of the last synchronization, and the like, to notify the staff. Manual intervention.
通过上述程序模块201-206,本申请所提出的数据任务处理系统200,首先,从终端设备1获取任务列表;然后,配置所述任务依赖者,以分析数据与任务的依赖关系;接着,记录数据同步的执行过程;进一步地,根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步;最后,若数据已经同步完成,执行已经完成数据同步的任务;若数据没有完成同步,发出预警信息。这样,既可以避免了现有技术中数据与任务的依赖控制混乱的缺陷,还可以通所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步,实现只有数据完成同步才能执行任务。Through the above program modules 201-206, the data task processing system 200 proposed by the present application firstly acquires a task list from the terminal device 1; then, configures the task relyer to analyze the dependency relationship between the data and the task; and then, records The execution process of the data synchronization; further, determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task; finally, if the data has been synchronized, performing the task of completing the data synchronization; if the data is not completed , issued an early warning message. In this way, the defect of the dependency control of the data and the task in the prior art can be avoided, and the execution process of the data synchronization and the dependency relationship between the data and the task can be used to judge whether the data is synchronized, and the task can be executed only when the data is synchronized. .
进一步地,基于本申请数据任务处理系统200的上述第一实施例,提出本申请的第二实施例(如图4所示)。本实施例中,所述数据任务处理系统200还包括及排序模块207,其中,Further, based on the above-described first embodiment of the data task processing system 200 of the present application, a second embodiment of the present application (shown in FIG. 4) is proposed. In this embodiment, the data task processing system 200 further includes a sorting module 207, where
所述获取模块201,还用于获取等待中的轮跑任务;The obtaining module 201 is further configured to acquire a waiting running task;
从上文可知,在第一实施例中,只有数据同步完成时才会执行已经完成数据同步的任务。在本实施例中,所述任务包括但不限于轮跑任务和重跑任务。轮跑任务指的是在有效日期内循环执行的任务,重跑任务指的是执行失败后需重新执行的任务。As apparent from the above, in the first embodiment, the task of completing the data synchronization is performed only when the data synchronization is completed. In this embodiment, the tasks include, but are not limited to, a round running task and a heavy running task. A round-robin task refers to a task that is executed cyclically within an effective date. A re-run task refers to a task that needs to be re-executed after a failed execution.
具体地,所述应用服务器2通过所述获取模块201获取等待中的轮跑任务,并判断是否满足依赖者配置,在满足依赖者配置的前提下,分析任务的有效轮跑日期系列,在本实施例中,轮跑任务序列基于工作日或自然日系列,默认设置轮跑序列长度最长为过去730日。Specifically, the application server 2 obtains the waiting round running task through the obtaining module 201, and determines whether the dependent configuration is satisfied, and analyzes the valid round running date series of the task, in the case that the dependent configuration is satisfied. In the embodiment, the round running task sequence is based on the working day or the natural day series, and the default setting round running sequence length is up to 730 days in the past.
请参阅表2,为轮跑任务配置要求:Please refer to Table 2 for the configuration requirements for the round-trip task:
表2Table 2
Figure PCTCN2018089192-appb-000002
Figure PCTCN2018089192-appb-000002
Figure PCTCN2018089192-appb-000003
Figure PCTCN2018089192-appb-000003
在本实施例中,实现轮跑任务的代码为:In this embodiment, the code for implementing the round running task is:
插入配置表:Insert the configuration table:
Step1:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step1: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step2:执行命令Step2: Execute the command
Figure PCTCN2018089192-appb-000004
Figure PCTCN2018089192-appb-000004
所述获取模块201,还用于获取等待中的重跑任务;The obtaining module 201 is further configured to acquire a re-running task that is waiting;
具体地,请参阅表3,为本申请一实施中的重跑任务的配置要求:Specifically, please refer to Table 3, which is a configuration requirement of the re-run task in an implementation of the present application:
表3table 3
Figure PCTCN2018089192-appb-000005
Figure PCTCN2018089192-appb-000005
在本实施例中,实现重跑任务的代码为:In this embodiment, the code for implementing the re-run task is:
插入配置表:Insert the configuration table:
Step1:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step1: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step2:执行命令Step2: Execute the command
hive-e"set mapred.job.queue.name=queue_0006_02;Hive-e"set mapred.job.queue.name=queue_0006_02;
       insert into table aml_awbs.fm_model_task_rerun_setInsert into table aml_awbs.fm_model_task_rerun_set
       select'ky','zq','1214-25','20141202','y','y','1.0'from default.dual"Select'ky', 'zq', '1214-25', '20141202', 'y', 'y', '1.0' from default.dual"
所述排序模块207,还用于将所述轮跑任务及所述重跑任务按优先等级高低进行排序。The sorting module 207 is further configured to sort the round running task and the heavy running task according to a priority level.
所述执行模块205,还用于优先执行等级高的任务。The execution module 205 is further configured to preferentially execute a task with a high level.
具体地,在本实施例中,所述排序模块207按照获取任务的时间先后顺序对所述轮跑任务及所述重跑任务进行优先等级高低排序。可以理解的是,在本申请的其他实施例中,可以根据实际需求设定优先等级要求。Specifically, in this embodiment, the sorting module 207 prioritizes the round running task and the heavy running task according to the chronological order of the obtaining tasks. It can be understood that in other embodiments of the present application, the priority level requirement can be set according to actual needs.
所述预警模块206,还用于监控当前执行的任务,当任务执行过程当中出现异常时,发出预警。The warning module 206 is further configured to monitor a currently executed task, and issue an early warning when an abnormality occurs during the execution of the task.
具体地,所述应用服务器2通过预警模块206监控当前执行的任务,任务执行过程当中出现异常时,发出预警,以通知工作人员及时处理。Specifically, the application server 2 monitors the currently executed task through the early warning module 206. When an abnormality occurs during the execution of the task, an early warning is issued to notify the staff to process in time.
通过上述程序模块207,本申请所提出的数据任务处理系统200,还可以将获取到的轮跑任务及所述重跑任务按优先等级高低进行排序,优先执行等级高的任务,同时监控当前执行的任务,当任务执行过程当中出现异常时,发出预警,从而实现监管任务。Through the above-mentioned program module 207, the data task processing system 200 proposed by the present application may further sort the acquired round running task and the heavy running task according to the priority level, preferentially execute the high level task, and monitor the current execution. The task, when an abnormality occurs during the execution of the task, issues an early warning to achieve the supervisory task.
此外,本申请还提出一种数据任务处理方法。In addition, the present application also proposes a data task processing method.
参阅图5所示,是本申请数据任务处理方法第一实施例的流程示意图。在本实施例中,根据不同的需求,图5所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。Referring to FIG. 5, it is a schematic flowchart of the first embodiment of the data task processing method of the present application. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 5 may be changed according to different requirements, and some steps may be omitted.
步骤S301,从终端设备1获取任务列表。Step S301, the task list is acquired from the terminal device 1.
具体地,所述应用服务器2内建有hadoop数据平台中心,hadoop数据平台中心从外部的终端设备1获取数据,所述应用服务器2根据hadoop数据平台中心获取到的数据并进行数据处理的时候,需要进行数据采集,数据清洗、数据分析等操作,每一个过程都可能涉及到多个任务,有的需要顺序执行,有的可以并行执行。Specifically, the application server 2 has a Hadoop data platform center built therein, and the Hadoop data platform center acquires data from the external terminal device 1, and when the application server 2 performs data processing according to the data acquired by the Hadoop data platform center, Data acquisition, data cleaning, data analysis and other operations are required. Each process may involve multiple tasks, some need to be executed sequentially, and some may be executed in parallel.
在本实施例中,所述应用服务器2从终端设备1获取任务列表。所述应用服务器2通过oozie管理这些任务的执行和顺序。oozie是基于hadoop的调度器,以xml的形式写调度流程,可以调度mr,pig,hive,shell,jar等等。所述应用服务器2通过oozie顺序执行任务流程节点,支持fork(分支多个节点),join(合并多个节点为一个)。In this embodiment, the application server 2 acquires a task list from the terminal device 1. The application server 2 manages the execution and order of these tasks through oozie. Oozie is a Hadoop-based scheduler that writes the scheduling process in the form of xml. It can schedule mr, pig, hive, shell, jar, and so on. The application server 2 executes the task flow node in the order of oozie, supports fork (multiple branches), and join (combines multiple nodes into one).
步骤S302,配置所述任务列表中的任务的依赖者,以配置数据与任务的依赖关系。Step S302, configuring a relying party of the task in the task list to configure a dependency relationship between the data and the task.
具体地,配置任务依赖者是为了配置数据与任务的依赖关系,只有数据齐全的任务才能执行。在本实施例中,所述应用服务器2通过获取任务流程节点fork(分支多个节点)的有效relier(依赖)配置,并执行relier状态查询语句,输出原始结果,进而合并多个任务节点,补全依赖状态,并对依赖结果去重,最后为所述去重的依赖结果标注依赖配置切片标签,完成所有任务的调度依赖。Specifically, the task dependant is configured to configure the dependency relationship between the data and the task, and only the task with complete data can be executed. In this embodiment, the application server 2 obtains a valid reer (dependency) configuration of the task flow node fork (branching multiple nodes), and executes a resume state query statement to output the original result, thereby merging the plurality of task nodes, and supplementing Fully dependent on the state, and the result of the dependency is de-duplicated. Finally, the de-duplicated dependency result is labeled with the configuration slice tag, and the scheduling dependency of all tasks is completed.
请参阅表1,为本实施例中的依赖配置格式要求。Please refer to Table 1, which is the dependency configuration format requirement in this embodiment.
表1Table 1
Figure PCTCN2018089192-appb-000006
Figure PCTCN2018089192-appb-000006
在本实施例中,所述应用服务器2通过hive load加载配置文件,覆写配置表,从生产环境采集最新的配置文件修改后请运营部署,部署命令以实现配置依赖者:In this embodiment, the application server 2 loads the configuration file by using the hive load, overwrites the configuration table, and collects the latest configuration file from the production environment, and then deploys the configuration and deploys the command to implement the configuration dependent:
Step1:上传脚本并授权、格式化(允许私人用户操作),上传到/tmp目录Step1: Upload the script and authorize, format (allow private users to operate), upload to /tmp directory
chmod 777/tmp/relier_config_all.txtChmod 777/tmp/relier_config_all.txt
Step2:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step2: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step3:执行命令Step3: Execute the command
hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;Hive-e"use aml_awbs;set mapred.job.queue.name=queue_0006_02;
truncate table fm_relier_check_script;Truncate table fm_relier_check_script;
load data local inpath'/tmp/relier_config_all.txt'into table aml_awbs.fm_relier_check_script;"Load data local inpath'/tmp/relier_config_all.txt'into table aml_awbs.fm_relier_check_script;"
在发明另一实施例中,实现配置依赖者可以通过修改配置部署命令:In another embodiment of the invention, the implementation configuration dependency can be modified by modifying the configuration deployment command:
Step1:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step1: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step2:执行命令Step2: Execute the command
hive-e"set mapred.job.queue.name=queue_0006_02;Hive-e"set mapred.job.queue.name=queue_0006_02;
insert overwrite table aml_awbs.fm_relier_check_scriptInsert overwrite table aml_awbs.fm_relier_check_script
select relier_name,Select relier_name,
src_job_name,Src_job_name,
if(relier_name='i_jt-aml-999-cd',If(relier_name='i_jt-aml-999-cd',
'select concat(y,m,d)datestr,\'Y\'state from aml_awbs.JOB_STATE where JOB_NAME=\'jt-aml-999-cd\”,'select concat(y,m,d)datestr,\'Y\'state from aml_awbs.JOB_STATE where JOB_NAME=\'jt-aml-999-cd\",
relier_name)script_string,Relier_name)script_string,
forkFork
from aml_awbs.fm_relier_check_script"From aml_awbs.fm_relier_check_script"
步骤S303,记录所述任务的数据同步的执行过程。Step S303, recording an execution process of data synchronization of the task.
具体地,从上文可知,只有数据齐全的任务才可执行,因此为了确保所述hadoop数据平台中心从外部的终端设备1获取数据是齐全的,当数据有更新或者修改时,所述应用服务器2记录数据同步的执行过程。在本实施例中,所述应用服务器2利用shell创建日志和状态表,记录数据同步的执行过程及数据同步的执行时间。Specifically, it can be seen from the above that only the task with complete data can be executed, so in order to ensure that the data of the Hadoop data platform center from the external terminal device 1 is complete, when the data is updated or modified, the application server 2 Record the execution process of data synchronization. In this embodiment, the application server 2 uses a shell to create a log and a status table, and records the execution process of the data synchronization and the execution time of the data synchronization.
步骤S304,根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步完成。Step S304, determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task.
具体地,在执行任务之前,所述应用服务器2先判断数据是否已经同步完成。所述应用服务器2根据shell创建的日志和状态表中记录的数据同步的执行过程、数据同步的执行时间以及数据与任务的依赖关系判断断数据是否同步完成。Specifically, before executing the task, the application server 2 first determines whether the data has been synchronized. The application server 2 determines whether the broken data is completed according to the execution process of the data synchronization recorded in the log and the status table, the execution time of the data synchronization, and the dependency relationship between the data and the task.
步骤S305,当数据已经同步完成,执行已经完成数据同步的任务。Step S305, when the data has been synchronized, the task of completing the data synchronization is performed.
步骤S306,若数据没有完成同步,发出预警信息。Step S306, if the data is not synchronized, an early warning message is issued.
具体地,只有数据完成同步,即数据齐全的情况下,所述应用服务器2才会执行任务。当数据没有完成同步时,所述应用服务器2发出预警信息,在本实施例中,所述预警信息包括但不限于没有完成同步的数据信息、最后一次同步的时间等等,以通知工作人员进行人工干预。Specifically, the application server 2 performs the task only when the data is synchronized, that is, when the data is complete. When the data is not synchronized, the application server 2 sends an alert message. In this embodiment, the alert information includes, but is not limited to, data information that has not been synchronized, time of the last synchronization, and the like, to notify the staff. Manual intervention.
通过上述步骤S301-306,本申请所提出的数据任务处理方法,首先,从终端设备1获取任务列表;然后,配置所述任务依赖者,以分析数据与任务的依赖关系;接着,记录数据同步的执行过程;进一步地,根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步;最后,若数据已经同步完成,执行已经完成数据同步的任务;若数据没有完成同步,发出预警信息。这样,既可以避免了现有技术中数据与任务的依赖控制混乱的缺陷,还可以通所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步,实现只有数据完成同步才能执行任务。Through the above steps S301-306, the data task processing method proposed by the present application firstly acquires a task list from the terminal device 1; then, configures the task relyer to analyze the dependency relationship between the data and the task; and then, records the data synchronization. Execution process; further, determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task; finally, if the data has been synchronized, performing the task of completing the data synchronization; if the data is not synchronized, issuing Early warning information. In this way, the defect of the dependency control of the data and the task in the prior art can be avoided, and the execution process of the data synchronization and the dependency relationship between the data and the task can be used to judge whether the data is synchronized, and the task can be executed only when the data is synchronized. .
进一步地,基于本申请数据任务处理方法的上述第一实施例,提出本申请数据任务处理方法的第二实施例。Further, based on the above first embodiment of the data task processing method of the present application, a second embodiment of the data task processing method of the present application is proposed.
如图6所示,是本申请数据任务处理方法第二实施例的流程示意图。在本实施例中,该方法还包括如下步骤:FIG. 6 is a schematic flowchart diagram of a second embodiment of a data task processing method of the present application. In this embodiment, the method further includes the following steps:
步骤S401,获取等待中的轮跑任务;Step S401, acquiring a waiting running task;
从上文可知,在第一实施例中,只有数据同步完成时才会执行任务。在本实施例中,所述任务包括但不限于轮跑任务和重跑任务。轮跑任务指的是在有效日期内循环执行的任务,重跑任务指的是执行失败后需重新执行的任务。As apparent from the above, in the first embodiment, the task is executed only when the data synchronization is completed. In this embodiment, the tasks include, but are not limited to, a round running task and a heavy running task. A round-robin task refers to a task that is executed cyclically within an effective date. A re-run task refers to a task that needs to be re-executed after a failed execution.
具体地,所述应用服务器2获取等待中的轮跑任务,并判断是否满足依赖者配置,在满足依赖者配置的前提下,分析任务的有效轮跑日期系列,在本实施例中,轮跑任务序列基于工作日或自然日系列,默认设置轮跑序列长度最长为过去730日。Specifically, the application server 2 acquires the waiting round running task, and determines whether the relying party configuration is satisfied, and analyzes the effective round running date series of the task on the premise that the relying party configuration is satisfied. In this embodiment, the round running The task sequence is based on the weekday or natural day series. The default setting of the round run sequence length is up to 730 days.
请参阅表2,为轮跑任务配置要求:Please refer to Table 2 for the configuration requirements for the round-trip task:
表2Table 2
Figure PCTCN2018089192-appb-000007
Figure PCTCN2018089192-appb-000007
Figure PCTCN2018089192-appb-000008
Figure PCTCN2018089192-appb-000008
在本实施例中,实现轮跑任务的代码为:In this embodiment, the code for implementing the round running task is:
插入配置表:Insert the configuration table:
Step1:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step1: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step2:执行命令Step2: Execute the command
Figure PCTCN2018089192-appb-000009
Figure PCTCN2018089192-appb-000009
Figure PCTCN2018089192-appb-000010
Figure PCTCN2018089192-appb-000010
步骤S402,获取等待中的重跑任务;Step S402, acquiring a re-running task that is waiting;
具体地,请参阅表3,为本申请一实施中的重跑任务的配置要求:Specifically, please refer to Table 3, which is a configuration requirement of the re-run task in an implementation of the present application:
表3table 3
Figure PCTCN2018089192-appb-000011
Figure PCTCN2018089192-appb-000011
Figure PCTCN2018089192-appb-000012
Figure PCTCN2018089192-appb-000012
在本实施例中,实现重跑任务的代码为:In this embodiment, the code for implementing the re-run task is:
插入配置表:Insert the configuration table:
Step1:切换用户(如果你的私人用户允许执行hive命令,可以不切换)Step1: Switch users (if your private user allows you to execute the hive command, you can not switch)
sudo su-hduser0006Sudo su-hduser0006
Step2:执行命令Step2: Execute the command
hive-e"set mapred.job.queue.name=queue_0006_02;Hive-e"set mapred.job.queue.name=queue_0006_02;
       insert into table aml_awbs.fm_model_task_rerun_setInsert into table aml_awbs.fm_model_task_rerun_set
       select'ky','zq','1214-25','20141202','y','y','1.0'from default.dual"Select'ky', 'zq', '1214-25', '20141202', 'y', 'y', '1.0' from default.dual"
步骤S403,将所述轮跑任务及所述重跑任务按优先等级高低进行排序。Step S403, sorting the round running task and the running running task according to a priority level.
步骤S404,优先执行等级高的任务。In step S404, the task with a high level is preferentially executed.
具体地,在本实施例中,所述应用服务器2按照获取任务的时间先后顺序对所述轮跑任务及所述重跑任务进行优先等级高低排序。可以理解的是,在本申请的其他实施例中,可以根据实际需求设定优先等级要求。Specifically, in the embodiment, the application server 2 prioritizes the round running task and the heavy running task according to the chronological order of the obtaining tasks. It can be understood that in other embodiments of the present application, the priority level requirement can be set according to actual needs.
步骤S405,监控当前执行的任务,当任务执行过程当中出现异常时,发出预警。Step S405, monitoring the currently executed task, and issuing an early warning when an abnormality occurs during the execution of the task.
具体地,所述应用服务器2监控当前执行的任务,任务执行过程当中出现异常时,发出预警,以通知工作人员及时处理。Specifically, the application server 2 monitors the currently executed task, and when an abnormality occurs during the execution of the task, an early warning is issued to notify the staff to process in time.
通过上述步骤S401-S405,本申请所提出的数据任务处理方法,还可以将获取到的轮跑任务及所述重跑任务按优先等级高低进行排序,优先执行等级高的任务,同时监控当前执行的任务,当任务执行过程当中出现异常时,发出预警,从而实现监管任务。Through the above steps S401-S405, the data task processing method proposed by the present application may further sort the acquired round running task and the heavy running task according to the priority level, preferentially execute the high level task, and monitor the current execution. The task, when an abnormality occurs during the execution of the task, issues an early warning to achieve the supervisory task.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims (20)

  1. 一种数据任务处理方法,应用于应用服务器,其特征在于,所述方法包括:A data task processing method is applied to an application server, and the method includes:
    从终端设备获取任务列表;Obtaining a task list from the terminal device;
    配置所述任务列表中的任务的依赖者,以分析数据与任务的依赖关系;Configuring a relying party of tasks in the task list to analyze data and task dependencies;
    记录所述任务的数据同步的执行过程;Recording an execution process of data synchronization of the task;
    根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步;Determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task;
    若数据已经同步完成,执行已经完成数据同步的任务;If the data has been synchronized, perform the task of completing the data synchronization;
    若数据没有完成同步,发出预警信息。If the data is not synchronized, an alert message is issued.
  2. 如权利要求1所述的数据任务处理方法,其特征在于,所述配置所述任务列表中的任务的依赖者,以分析数据与任务的依赖关系的步骤,具体包括如下步骤:The data task processing method according to claim 1, wherein the step of configuring a relying party of the task in the task list to analyze a dependency relationship between the data and the task comprises the following steps:
    获取所述任务的流程节点的有效依赖配置;Obtaining an effective dependency configuration of the process node of the task;
    执行依赖状态查询语句,并输出原始依赖结果;Execute a dependency state query statement and output the original dependency result;
    合并多个任务节点,补全依赖状态,并对依赖结果去重;Merging multiple task nodes, completing the dependency state, and deduplicating the dependency result;
    为所述去重的依赖结果标注依赖配置切片标签,完成分析所有任务的依赖关系。Labeling the de-duplicated dependency results with the dependency configuration slice tag completes the analysis of the dependencies of all tasks.
  3. 如权利要求1所述的数据任务处理方法,其特征在于,所述若数据已经同步完成,执行已经完成数据同步的任务的步骤,具体包括如下步骤:The data task processing method according to claim 1, wherein the step of performing a task of completing data synchronization, if the data has been synchronized, comprises the following steps:
    获取等待中的轮跑任务及重跑任务;Obtain the waiting for running and re-running tasks;
    执行所述轮跑任务及所述重跑任务。Performing the round running task and the running back task.
  4. 如权利要求1所述的数据任务处理方法,其特征在于,所述预警信息包括没有完成同步的数据信息以及最后一次同步的时间。The data task processing method according to claim 1, wherein said warning information includes data information that has not been synchronized and a time of the last synchronization.
  5. 如权利要求3所述的数据任务处理方法,其特征在于,所述执行所述轮跑任务及所述重跑任务的步骤之前,还包括如下步骤:The data task processing method according to claim 3, wherein before the step of executing the round running task and the running task, the method further comprises the following steps:
    将所述轮跑任务及所述重跑任务按优先等级高低进行排序;Sorting the round running task and the running task according to a priority level;
    优先执行等级高的任务。Priority is given to performing high-level tasks.
  6. 如权利要求5所述的数据任务处理方法,其特征在于,按照获取任务的时间先后顺序对所述轮跑任务及所述重跑任务进行优先等级高低排序。The data task processing method according to claim 5, wherein the round running task and the heavy running task are prioritized according to a chronological order of the obtaining tasks.
  7. 如权利要求5所述的数据任务处理方法,其特征在于,所述方法还包括如下步骤:The data task processing method according to claim 5, wherein the method further comprises the following steps:
    监控当前执行的任务;Monitor the currently executing tasks;
    当任务执行过程当中出现异常时,发出预警。An alert is issued when an exception occurs during the execution of the task.
  8. 一种应用服务器,其特征在于,所述应用服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的数据任务处理系统,所述数据任务处理系统被所述处理器执行时实现如下步骤:An application server, comprising: a memory, a processor, on the memory, a data task processing system operable on the processor, wherein the data task processing system is used by the processor The following steps are implemented during execution:
    从终端设备获取任务列表;Obtaining a task list from the terminal device;
    配置所述任务依赖者,以分析数据与任务的依赖关系;Configuring the task relyer to analyze data and task dependencies;
    记录数据同步的执行过程;Record the execution process of data synchronization;
    根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步;Determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task;
    若数据已经同步完成,执行已经完成数据同步的任务;If the data has been synchronized, perform the task of completing the data synchronization;
    若数据没有完成同步,发出预警信息。If the data is not synchronized, an alert message is issued.
  9. 如权利要求8所述的应用服务器,其特征在于,所述配置所述任务列表中的任务的依赖者,以分析数据与任务的依赖关系的步骤,具体包括如下步骤:The application server according to claim 8, wherein the step of configuring a relying party of the task in the task list to analyze the dependency relationship between the data and the task comprises the following steps:
    获取所述任务的流程节点的有效依赖配置;Obtaining an effective dependency configuration of the process node of the task;
    执行依赖状态查询语句,并输出原始依赖结果;Execute a dependency state query statement and output the original dependency result;
    合并多个任务节点,补全依赖状态,并对依赖结果去重;Merging multiple task nodes, completing the dependency state, and deduplicating the dependency result;
    为所述去重的依赖结果标注依赖配置切片标签,完成分析所有任务的依赖关系。Labeling the de-duplicated dependency results with the dependency configuration slice tag completes the analysis of the dependencies of all tasks.
  10. 如权利要求8所述的应用服务器,其特征在于,所述若数据已经同步完成,执行已经完成数据同步的任务的步骤,具体包括如下步骤:The application server according to claim 8, wherein the step of performing the task of completing the data synchronization, if the data has been synchronized, comprises the following steps:
    获取等待中的轮跑任务及重跑任务;Obtain the waiting for running and re-running tasks;
    执行所述轮跑任务及所述重跑任务。Performing the round running task and the running back task.
  11. 如权利要求8所述的应用服务器,其特征在于,所述预警信息包括没有完成同步的数据信息以及最后一次同步的时间。The application server according to claim 8, wherein said warning information includes data information that has not been synchronized and a time of the last synchronization.
  12. 如权利要求10所述的应用服务器,其特征在于,所述执行所述轮跑任务及所述重跑任务的步骤之前,还包括如下步骤:The application server according to claim 10, wherein before the step of executing the round running task and the running task, the method further comprises the following steps:
    将所述轮跑任务及所述重跑任务按优先等级高低进行排序;Sorting the round running task and the running task according to a priority level;
    优先执行等级高的任务。Priority is given to performing high-level tasks.
  13. 如权利要求12所述的应用服务器,其特征在于,按照获取任务的时间先后顺序对所述轮跑任务及所述重跑任务进行优先等级高低排序。The application server according to claim 12, wherein the round running task and the heavy running task are prioritized according to a chronological order of obtaining tasks.
  14. 如权利要求12所述的应用服务器,其特征在于,所述数据任务处理系统被该应用服务器的处理器执行时,还实现如下步骤:The application server according to claim 12, wherein when the data task processing system is executed by the processor of the application server, the following steps are further implemented:
    监控当前执行的任务;Monitor the currently executing tasks;
    当任务执行过程当中出现异常时,发出预警。An alert is issued when an exception occurs during the execution of the task.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有数据任务处理系统,所述数据任务处理系统可被至少一个处理器执行,以使所述至少一个处理器执行如下步骤:A computer readable storage medium storing a data task processing system, the data task processing system being executable by at least one processor to cause the at least one processor to perform the following steps:
    从终端设备获取任务列表;Obtaining a task list from the terminal device;
    配置所述任务依赖者,以分析数据与任务的依赖关系;Configuring the task relyer to analyze data and task dependencies;
    记录数据同步的执行过程;Record the execution process of data synchronization;
    根据所述数据同步的执行过程及数据与任务的依赖关系判断数据是否同步;Determining whether the data is synchronized according to the execution process of the data synchronization and the dependency relationship between the data and the task;
    若数据已经同步完成,执行已经完成数据同步的任务;If the data has been synchronized, perform the task of completing the data synchronization;
    若数据没有完成同步,发出预警信息。If the data is not synchronized, an alert message is issued.
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述配置所述任务列表中的任务的依赖者,以分析数据与任务的依赖关系的步骤,具体包括如下步骤:The computer readable storage medium according to claim 15, wherein the step of configuring a relying party of the task in the task list to analyze a dependency relationship between the data and the task comprises the following steps:
    获取所述任务的流程节点的有效依赖配置;Obtaining an effective dependency configuration of the process node of the task;
    执行依赖状态查询语句,并输出原始依赖结果;Execute a dependency state query statement and output the original dependency result;
    合并多个任务节点,补全依赖状态,并对依赖结果去重;Merging multiple task nodes, completing the dependency state, and deduplicating the dependency result;
    为所述去重的依赖结果标注依赖配置切片标签,完成分析所有任务的依赖关系。Labeling the de-duplicated dependency results with the dependency configuration slice tag completes the analysis of the dependencies of all tasks.
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,所述若数据已经同步完成,执行已经完成数据同步的任务的步骤,具体包括如下步骤:The computer readable storage medium according to claim 15, wherein the step of performing a task of completing data synchronization if the data has been synchronized is completed, and specifically includes the following steps:
    获取等待中的轮跑任务及重跑任务;Obtain the waiting for running and re-running tasks;
    执行所述轮跑任务及所述重跑任务。Performing the round running task and the running back task.
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,所述执行所述轮跑任务及所述重跑任务的步骤之前,还包括如下步骤:The computer readable storage medium according to claim 17, wherein said step of performing said round running task and said rerunning task further comprises the steps of:
    将所述轮跑任务及所述重跑任务按优先等级高低进行排序;Sorting the round running task and the running task according to a priority level;
    优先执行等级高的任务。Priority is given to performing high-level tasks.
  19. 如权利要求18所述的计算机可读存储介质,其特征在于,按照获取任务的时间先后顺序对所述轮跑任务及所述重跑任务进行优先等级高低排序。The computer readable storage medium according to claim 18, wherein the round running task and the heavy running task are prioritized according to a chronological order of obtaining tasks.
  20. 如权利要求18所述的计算机可读存储介质,其特征在于,所述数据任务处理系统被处理器执行时,还实现如下步骤:The computer readable storage medium of claim 18, wherein when the data task processing system is executed by the processor, the following steps are further implemented:
    监控当前执行的任务;Monitor the currently executing tasks;
    当任务执行过程当中出现异常时,发出预警。An alert is issued when an exception occurs during the execution of the task.
PCT/CN2018/089192 2018-01-24 2018-05-31 Data task processing method, application server and computer-readable storage medium WO2019144552A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810066359.7 2018-01-24
CN201810066359.7A CN108427600B (en) 2018-01-24 2018-01-24 Data task processing method, application server and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2019144552A1 true WO2019144552A1 (en) 2019-08-01

Family

ID=63156041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/089192 WO2019144552A1 (en) 2018-01-24 2018-05-31 Data task processing method, application server and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN108427600B (en)
WO (1) WO2019144552A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116525A (en) * 2013-01-24 2013-05-22 贺海武 Map reduce computing method under internet environment
CN103873567A (en) * 2014-03-03 2014-06-18 北京智谷睿拓技术服务有限公司 Task-based data transmission method and device
CN104615486A (en) * 2014-12-26 2015-05-13 北京京东尚科信息技术有限公司 Multi-task scheduling and executing method, device and system for search promotion platform
CN105184470A (en) * 2015-08-28 2015-12-23 浪潮软件股份有限公司 Message mode-based method for integrating task lists of multiple business systems
CN106294496A (en) * 2015-06-09 2017-01-04 北京京东尚科信息技术有限公司 A kind of data migration method based on hadoop cluster and instrument
CN106980543A (en) * 2017-04-05 2017-07-25 福建智恒软件科技有限公司 The distributed task dispatching method and device triggered based on event

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129390B (en) * 2011-03-10 2013-06-12 中国科学技术大学苏州研究院 Task scheduling system of on-chip multi-core computing platform and method for task parallelization
CN102750179B (en) * 2011-04-22 2014-10-01 中国移动通信集团河北有限公司 Method and device for scheduling tasks between cloud computing platform and data warehouse

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116525A (en) * 2013-01-24 2013-05-22 贺海武 Map reduce computing method under internet environment
CN103873567A (en) * 2014-03-03 2014-06-18 北京智谷睿拓技术服务有限公司 Task-based data transmission method and device
CN104615486A (en) * 2014-12-26 2015-05-13 北京京东尚科信息技术有限公司 Multi-task scheduling and executing method, device and system for search promotion platform
CN106294496A (en) * 2015-06-09 2017-01-04 北京京东尚科信息技术有限公司 A kind of data migration method based on hadoop cluster and instrument
CN105184470A (en) * 2015-08-28 2015-12-23 浪潮软件股份有限公司 Message mode-based method for integrating task lists of multiple business systems
CN106980543A (en) * 2017-04-05 2017-07-25 福建智恒软件科技有限公司 The distributed task dispatching method and device triggered based on event

Also Published As

Publication number Publication date
CN108427600B (en) 2021-03-16
CN108427600A (en) 2018-08-21

Similar Documents

Publication Publication Date Title
US11321085B2 (en) Meta-indexing, search, compliance, and test framework for software development
US9852035B2 (en) High availability dynamic restart priority calculator
US8984516B2 (en) System and method for shared execution of mixed data flows
CN108959292B (en) Data uploading method, system and computer readable storage medium
CN108874558B (en) Message subscription method of distributed transaction, electronic device and readable storage medium
US9471386B2 (en) Allocating resources to tasks in a build process
CN109408205B (en) Task scheduling method and device based on hadoop cluster
US9396039B1 (en) Scalable load testing using a queue
US9830354B2 (en) Accelerating multiple query processing operations
EP3226133A1 (en) Task scheduling and resource provisioning system and method
US8438247B1 (en) Techniques for capturing data sets
WO2019062189A1 (en) Electronic device, method and system for conducting data table filing processing, and storage medium
WO2019000629A1 (en) Multi-data-source data synchronizing method and system, application server and computer readable storage medium
WO2019148721A1 (en) Electronic device, risk early warning method for internet service system, and storage medium
US11023284B2 (en) System and method for optimization and load balancing of computer clusters
WO2019095667A1 (en) Database data collection method, application server, and computer readable storage medium
US11635994B2 (en) System and method for optimizing and load balancing of applications using distributed computer clusters
JP2023527195A (en) Baseline monitoring methods, devices, readable media, and electronics
WO2019071968A1 (en) Salary calculation method, application server, and computer readable storage medium
WO2019071958A1 (en) Cloud computing-based salary calculation method, application server, and computer readable storage medium
US20160004473A1 (en) Migration decision window selection based on hotspot characteristics
CN113076224B (en) Data backup method, data backup system, electronic device and readable storage medium
CN113424152A (en) Workflow-based scheduling and batching in a multi-tenant distributed system
WO2019144552A1 (en) Data task processing method, application server and computer-readable storage medium
US9772877B2 (en) Managing I/O operations in a shared file system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18902842

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18902842

Country of ref document: EP

Kind code of ref document: A1