WO2020186809A1 - 基于大数据平台的hive任务调度方法、装置、设备及存储介质 - Google Patents
基于大数据平台的hive任务调度方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2020186809A1 WO2020186809A1 PCT/CN2019/120594 CN2019120594W WO2020186809A1 WO 2020186809 A1 WO2020186809 A1 WO 2020186809A1 CN 2019120594 W CN2019120594 W CN 2019120594W WO 2020186809 A1 WO2020186809 A1 WO 2020186809A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- task
- hive
- target
- file
- log
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
Definitions
- This application relates to the field of data processing technology, and in particular to a HIVE task scheduling method, device, equipment and storage medium based on a big data platform.
- HIVE is a data warehouse tool based on Hadoop, which can map structured data files into a database table, and provides simple SQL query functions, which can convert SQL statements into MapReduce tasks for operation. Its advantages are low learning costs, simple MapReduce statistics can be quickly realized through SQL-like statements, no need to develop special MapReduce applications, and it is very suitable for statistical analysis of data warehouses.
- HIVE is a data warehouse infrastructure built on Hadoop. It provides a series of tools that can be used for data extraction and conversion loading (ETL), which is a mechanism for storing, querying, and analyzing large-scale data stored in Hadoop.
- ETL data extraction and conversion loading
- HIVE defines a simple SQL-like query language called HQL, which allows users familiar with SQL to query data.
- this language also allows developers familiar with MapReduce to develop custom mappers and reducers to handle complex analysis tasks that the built-in mappers and reducers cannot complete.
- the execution of HIVE tasks is independent of each other, and there is no correlation with its predecessors, so that during the execution of HIVE tasks, it is necessary to manually serialize the execution sequence of HIVE tasks, which affects the execution efficiency of HIVE tasks.
- the embodiments of the present application provide a method, device, equipment, and storage medium for scheduling HIVE tasks based on a big data platform, so as to solve the problem that the current HIVE task is not associated with its predecessor HIVE task, resulting in low task execution efficiency.
- a HIVE task scheduling method based on a big data platform includes:
- the original HIVE task includes a startup file, a configuration file, and a business file;
- the pre-task log carries a task completion tag corresponding to the pre-task identifier
- the pre-HIVE task corresponding to the pre-task identifier is successfully completed, and the business file in the target HIVE task is executed;
- a task completion tag is generated, and the task completion tag is associated with the own task identification and stored in a target task log corresponding to the target HIVE task.
- a HIVE task scheduling device based on a big data platform includes:
- the original task acquisition module is used to acquire the original HIVE task sent by the client, where the original HIVE task includes startup files, configuration files and business files;
- the task log table obtaining module is used to trigger a log program based on the startup file in the original HIVE task to obtain a task log table.
- the task log table includes at least one pending HIVE task, and each pending HIVE task corresponds to a task process time;
- the target task acquisition module is configured to acquire a target HIVE task from at least one of the HIVE tasks to be processed based on the task processing time corresponding to each HIVE task to be processed;
- the configuration file reading module is used to read the configuration file in the target HIVE task by using a configuration file reading tool
- the task identification acquisition module is configured to, if the reading is successful, acquire the pre-task identification and its own task identification contained in the configuration file in the target HIVE task;
- a pre-task log obtaining module configured to query the task log table based on the pre-task identifier, and obtain the pre-task log corresponding to the pre-task identifier;
- the business file execution module is configured to, if the pre-task log carries a task completion tag corresponding to the pre-task identifier, the pre-HIVE task corresponding to the pre-task identifier is successfully completed and execute the target Business documents in HIVE tasks;
- the task completion processing module is configured to generate a task completion tag if the business file is successfully executed, and store the task completion tag in association with the own task identifier in a target task log corresponding to the target HIVE task.
- a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
- the original HIVE task includes a startup file, a configuration file, and a business file;
- the pre-task log carries a task completion tag corresponding to the pre-task identifier
- the pre-HIVE task corresponding to the pre-task identifier is successfully completed, and the business file in the target HIVE task is executed;
- a task completion tag is generated, and the task completion tag is associated with the own task identification and stored in a target task log corresponding to the target HIVE task.
- One or more readable storage media storing computer readable instructions
- the computer readable storage medium storing computer readable instructions
- the one Or multiple processors perform the following steps:
- the original HIVE task includes a startup file, a configuration file, and a business file;
- the pre-task log carries a task completion tag corresponding to the pre-task identifier
- the pre-HIVE task corresponding to the pre-task identifier is successfully completed, and the business file in the target HIVE task is executed;
- a task completion tag is generated, and the task completion tag is associated with the own task identification and stored in a target task log corresponding to the target HIVE task.
- FIG. 1 is a schematic diagram of an application environment of the HIVE task scheduling method based on a big data platform in an embodiment of the present application;
- FIG. 2 is a flowchart of the HIVE task scheduling method based on the big data platform in an embodiment of the present application
- FIG. 3 is another flowchart of the HIVE task scheduling method based on the big data platform in an embodiment of the present application
- FIG. 4 is a schematic diagram of an HIVE task scheduling device based on a big data platform in an embodiment of the present application
- Fig. 5 is a schematic diagram of a computer device in an embodiment of the present application.
- the HIVE task scheduling method based on the big data platform provided by the embodiment of the application can be applied to the application environment shown in FIG. 1.
- the HIVE task scheduling method based on a big data platform is applied to a big data platform system.
- the big data platform system includes a client and a server as shown in FIG. 1.
- the client and the server communicate through the network to achieve Connect HIVE tasks in series to automate the execution of HIVE tasks, eliminating the need to manually connect HIVE tasks in series and improve the efficiency of HIVE task execution.
- the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
- the client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices.
- the server can be implemented as an independent server or a server cluster composed of multiple servers.
- a method for scheduling HIVE tasks based on a big data platform is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
- S201 Obtain the original HIVE task sent by the client, where the original HIVE task includes a startup file, a configuration file, and a business file.
- the original HIVE task is the HIVE task sent by the client to the server.
- the startup file in the original HIVE task refers to the file used to start the HIVE task, specifically the SH startup file.
- the SH file is called the script Bash application and use the developer file.
- the SH file is said to be created and saved in the Bash language because the instructions it contains are written in that language.
- the SH file can be typed and executed in the command line interface of the text command shell.
- SH files are mostly used for program development, these files are very important in Bash applications, because the application mainly uses scripts and command execution to make this application work. Since SH files are used for programming scripts of this application and they contain commands to execute programs, they are indeed very important.
- the SH file extension has now been developed into an interactive command interpreter, although it was originally a scripting language. Currently, most applications (such as C-Shell, Korn Shell and Bourne Shell) also use SH file script storage.
- the configuration file in the original HIVE task refers to the file used to configure the specific information of the HIVE task.
- the specific information includes but not limited to the variables configured in the configuration file, its own task ID, and the alarm target mailbox.
- the configuration file of the original HIVE task also includes the pre-task identifier.
- the configured variable is the variable applied in the business logic of the original HIVE task.
- the own task ID is used to uniquely identify an original HIVE task.
- the pre-task identifier is used to uniquely identify the pre-HIVE task corresponding to the original HIVE task.
- the alarm object mailbox refers to the mailbox of the object that is pre-configured by the developer to give an alarm when the task fails. Generally, it can be the mailbox of the operation and maintenance personnel.
- the business file in the original HIVE task is used to store the executable file that implements its business logic in the original HIVE task. Understandably, when the business file is executed, the corresponding business logic in the business file can be executed in the big data platform system to realize the processing of data in the big data platform system and obtain the corresponding data processing results.
- S202 Trigger a log program based on the startup file in the original HIVE task, and obtain a task log table.
- the task log table includes at least one pending HIVE task, and each pending HIVE task corresponds to a task processing time.
- the log program is a program used to monitor and obtain the log of each original HIVE task.
- the task log table is a data table of the start and end time (including the start time and the end time) and occupied resources of all the original HIVE tasks recorded by the log program.
- the server when the server receives each original HIVE task, it triggers a log recording task for recording the original HIVE task through a log program preset in the server to record the data formed during the execution of the original HIVE task . That is, the server triggers a log recording task when receiving an original HIVE task, and records the status of the original HIVE task in the execution process in the task log file corresponding to the original HIVE task, that is, the task log file records the Data such as the start time, end time, task progress, time consumption, resource usage and completion of the original HIVE task.
- the task log table is a data table used to store task log files recorded by log recording tasks corresponding to all original HIVE tasks.
- the server triggers the logging program based on the startup file of the original HIVE task
- the logging program assigns a logging task to the original HIVE task, so that the relevant data obtained by the logging task is stored in the corresponding task log. File.
- the newly received original HIVE task is stored as a new pending HIVE task in the task log table, so that the newly received original HIVE task and other original HIVE tasks that have not been processed before are used as pending tasks in the task log table.
- Process HIVE tasks to achieve orderly management of all unprocessed original HIVE tasks.
- the pending HIVE task refers to the original HIVE task that has not been processed and is recorded in the task log table.
- the task processing time corresponding to the HIVE task to be processed can be understood as the start time of the HIVE task to be processed, and it can be the time set by the user to execute the HIVE task to be processed independently by the client (generally for timing tasks), or It is the time when the server receives the original HIVE task by default (usually for real-time tasks). Understandably, the server triggers the log program based on the startup file of the original HIVE task to obtain the task log table so that all pending HIVE tasks that have not been processed can be managed uniformly through the task log table, so that the pending HIVE tasks are based on certain The execution sequence is executed to ensure the execution efficiency of the pending HIVE tasks.
- S203 Obtain a target HIVE task from at least one HIVE task to be processed based on the task processing time corresponding to each HIVE task to be processed.
- the server determines the pending HIVE to be processed according to the order of the start time.
- the task is the target HIVE task, that is, the target HIVE task is determined from at least one HIVE task to be processed, in order to achieve priority processing of the HIVE task to be processed with the start time earlier, so as to realize the orderly management of the HIVE task currently to be processed.
- the configuration file reading tool is a tool developed in advance and stored in the server for reading configuration files.
- the configuration file reading tool has a built-in regular expression for reading the configuration file.
- the server uses the regular expression in the configuration file reading tool to match the target HIVE task to determine whether the words, format, or file format in the target configuration file conform to the preset format. If the configuration file of the target HIVE task conforms to the preset format, the server reads successfully; if the configuration file of the target HIVE task does not conform to the preset format, the server fails to read.
- the configuration file reading tool is used to read the configuration file of the target HIVE task to verify whether the configuration file configured by the developer meets its file format requirements, that is, to verify the file format of the HIVE task, thereby ensuring The smooth execution of the target HIVE mission.
- the server successfully reads the configuration file in the target HIVE task with the configuration file reading tool, it is determined that the configuration file meets the requirements of the preset format, and the pre-task identifier contained in the configuration file of the HIVE task can be matched and obtained. And its own task identification.
- the configuration file of any target HIVE task contains its own task identifier, but it may include the predecessor task identifier, or it may not contain the predecessor task identifier. If it contains the predecessor task identifier, the predecessor tasks it contains There can be one or more identifiers. Since each target HIVE task corresponds to a business logic, the business logic involves at least one business parameter and performs logical processing on the at least one business parameter. If all business parameters can be directly obtained through the same data table, there is no need to wait for other business logic to execute at this time, and there is no need to configure its corresponding pre-task identifier so that the configuration file of the target HIVE task does not contain the pre-task identifier.
- the configuration file of the task includes the pre-task identifier, and the pre-HIVE task corresponding to the pre-task identifier is configured to ensure the smooth execution of the configured target HIVE task.
- the server after the server successfully reads the configuration file of the target HIVE task, it can first match according to a preset keyword (the preset keyword may be a keyword set in the pre-task configuration module in the configuration file editing interface) Whether the configuration file contains the pre-task identifier, if it contains the pre-task identifier, it means that the target HIVE task has a pre-HIVE task. You need to obtain the pre-task identifier and its own task identifier contained in the configuration file of the target HIVE task. Perform subsequent step S205 and subsequent steps. If the pre-task identifier is not included, it means that the target HIVE task does not have a pre-HIVE task. At this time, the business file in the target HIVE task can be directly executed, and the steps after step S207 are executed.
- a preset keyword may be a keyword set in the pre-task configuration module in the configuration file editing interface
- S206 Query the task log table based on the pre-task identifier, and obtain the pre-task log corresponding to the pre-task identifier.
- Each task log file is The own task ID is associated and stored in the task log table.
- the server may query the task log table based on the predecessor task identifier contained in the configuration file of the target HIVE task to obtain the task log file corresponding to the predecessor task identifier as the predecessor task log.
- the pre-task log is used to record the start time, end time, task progress, time consumption, resource occupation, and completion status of the pre-HIVE task.
- the completion status of the pre-HIVE task can be determined by whether the task completion tag is included.
- the task completion tag is included in the pre-task log, it means that the business file of the pre-HIVE task is executed successfully; If the log does not contain the task completion tag, it means that the business file of the pre-HIVE task was not executed successfully.
- the server can execute the business file in the target HIVE task, that is, execute the target HIVE
- the business logic in the business file of the task that is, after the configuration file is successfully read, the pre-task log is queried according to the pre-task identifier in the configuration file.
- the execution logic of the HIVE task can be verified. Realize the target HIVE task after successful logical verification (that is, the pre-HIVE task), to ensure the smooth execution of the target HIVE task.
- the server executes the business file of the target HIVE task
- the server can obtain the data processing result corresponding to its business logic based on the business file, and generate a data processing result for identifying the successful execution of the business file Task completion label.
- the server needs to associate the task completion tag with the own task ID of the target HIVE task and store it in the target task log corresponding to the target HIVE task, so as to determine that the target HIVE task has been executed according to the task completion tag recorded in the target task log. And the execution is successful.
- the target task log is specifically a task log file corresponding to the target HIVE task in the log record table.
- the task completion tag is associated and stored in the target task log corresponding to its own task ID, so that the target HIVE task can be subsequently determined as the HIVE task of the pre-HIVE task (ie the post HIVE task of the target HIVE task) According to the task completion tag carried in the target task log of the target HIVE task, it can be determined that the preceding HIVE task is executed successfully.
- the task completion tag is stored in the target task log, which is helpful for the smooth execution of the post HIVE task, and realizes the automatic execution of the target HIVE task, the pre-HIVE task and the post HIVE task. There is no need to manually serialize the HIVE task and improve the execution of the HIVE task. s efficiency.
- the server needs the target HIVE task from the task log table. Delete at least one pending HIVE task to avoid repeated execution and reduce the efficiency of execution processing.
- the log program is first triggered by the startup file in the original HIVE task to obtain a task log table containing at least one HIVE task to be processed, so as to realize all unprocessed Orderly management of HIVE tasks. Then according to the task processing time of the HIVE task to be processed, the target HIVE task is determined to realize the orderly management of the currently executed HIVE task. Then, the configuration file of the target HIVE task is read through the configuration file reading tool to verify the file form of the HIVE task to ensure the smooth execution of the HIVE task that has been successfully verified. When the configuration file is successfully read, the pre-task log is queried according to the pre-task identifier in the configuration file.
- the execution logic of the HIVE task can be verified to ensure Successful execution of the target HIVE task with successful logical verification.
- the pre-task log contains the task completion tag
- the business file of the target HIVE task is executed
- the task completion tag is generated when the business file is successfully executed
- the task completion tag and its own task identification are associated and stored in the target task log. It helps the smooth execution of the post-HIVE tasks and realizes the automated execution of the target HIVE tasks, the pre-HIVE tasks and the post-HIVE tasks, without the need to manually serialize HIVE tasks, which improves the efficiency of HIVE task execution.
- the HIVE task scheduling method based on the big data platform further includes:
- sending alarm information to the client may specifically be sending alarm information to the client corresponding to the alarm target mailbox configured in the configuration file.
- the file error information can record the configuration content in the configuration file that does not meet the preset format requirements and the corresponding standard format, so that the operation and maintenance personnel can quickly modify and maintain based on the file error information to maintain the configuration file of the target HIVE task and improve Operation and maintenance efficiency.
- the alarm information formed based on the file error information specifically refers to the alarm information formed by filling the file alarm information in a preset alarm template. Since the target HIVE task cannot be executed because it fails to read the configuration file, the target HIVE task needs to be deleted from at least one pending HIVE task corresponding to the task log table to avoid repeated execution and reduce execution processing efficiency.
- the pre-task log may include a task completion tag (at this time, step S207 may be performed), or the task completion tag may not be completed, that is, Step S207 cannot be continued at this time, and corresponding error processing is required. That is, after step S206, that is, after obtaining the pre-task log corresponding to the pre-task identifier, the HIVE task scheduling method based on the big data platform further includes:
- the event listener is a program pre-configured in the server to implement event monitoring, and the event listener is a program dedicated to monitoring the pre-task log to obtain updated data in the pre-task log. Specifically, if the pre-task log does not carry the task completion tag corresponding to the pre-task identifier, it means that the pre-HIVE task has not been successfully executed. If the execution of the target HIVE task is directly terminated, the previous steps may be executed. It is an invalid operation, and the previous steps need to be executed again when the target HIVE task is executed next time, which affects its work efficiency.
- the server when the server does not carry the task completion tag in the pre-task log, it triggers the preset event listener to monitor the update data in the pre-task log to obtain the execution process of the pre-HIVE task Updated data.
- the preset monitoring period is a preset period for monitoring the pre-task log.
- the preset monitoring period can be understood as a period of waiting for the pre-HIVE task to perform processing after the pre-HIVE task is not successfully completed. Specifically, if the event listener does not detect that the update data contains the task completion tag corresponding to the pre-task identifier within the preset monitoring period, it means that the pre-HIVE task is executed within the preset monitoring period but still has not Successful completion. At this time, if the target HIVE task continues to wait for the completion of the predecessor HIVE task, the waiting time overhead is too large, which will reduce the efficiency of HIVE task scheduling.
- the update data contains the task completion tag corresponding to the pre-task identifier is not monitored within the preset monitoring period
- a timeout error message is generated, the target HIVE task is terminated, and an alarm message based on the timeout error message is sent to the client.
- the timeout error message may include detailed information about the waiting timeout for executing the target HIVE task, so that the operation and maintenance personnel can quickly modify and maintain based on the timeout error message, so as to maintain the business logic of the pre-HIVE task and improve the efficiency of operation and maintenance.
- the alarm information formed based on the timeout error information specifically refers to the alarm information formed by filling the timeout error information in a preset alarm template.
- sending alarm information to the client may specifically be sending alarm information to the client corresponding to the alarm target mailbox configured in the configuration file. Since the target HIVE task cannot be executed due to a timeout error report, the target HIVE task must be deleted from at least one pending HIVE task corresponding to the task log table to avoid repeated execution and reduce the execution processing efficiency.
- the HIVE task scheduling method based on the big data platform further includes:
- the event listener detects within the preset monitoring period that the update data contains the task completion tag corresponding to the pre-task identifier, it means that the pre-HIVE task is executed within the preset monitoring period, and the pre- The successful completion of the HIVE task indicates that the waiting period of the target HIVE task is valid within the preset monitoring period, which can effectively ensure the smooth execution of the target HIVE task and improve the execution efficiency of the HIVE task.
- the event listener can be used to monitor itself and automatically execute the business files in the target HIVE task when it is determined that the pre-HIVE task is successfully completed, ensuring the automation of the HIVE task scheduling process and improving execution efficiency .
- the HIVE task scheduling method based on the big data platform further includes:
- the server executes the business file of the target HIVE task
- the error report times of the target HIVE task are updated to increase the error report times by 1. It is understandable that the number of error reports is 0 by default. If the business file is not successfully executed, 1 is added to the number of errors reported for the last unsuccessful execution of the business file.
- a task incomplete tag is generated, and the number of error reports of the target HIVE task is updated according to the task incomplete tag in the target task log in the target HIVE task.
- the preset number threshold is a preset threshold for evaluating whether to retry, and the preset number threshold can be set to three or other times. Specifically, if the server, after updating the number of error reports of the target HIVE task, determines that the number of error reports is greater than the preset number threshold, it means that the target HIVE task has been repeatedly executed the preset number of times, but the result of each execution is that the business file has not been successfully executed. If the target HIVE task continues to be executed, it is very likely that it will not be executed successfully, which will affect the execution efficiency of the HIVE task.
- the retry error information may include specific information that still makes errors after multiple retries during the execution of the target HIVE task, so that the operation and maintenance personnel can modify and maintain the business files of the target HIVE task based on the retry error information, thereby improving the efficiency of operation and maintenance.
- the alarm information formed based on the retry error information specifically refers to the alarm information formed by filling the retry error information into a preset alarm template.
- the HIVE task scheduling method based on the big data platform further includes:
- S215 If the number of error reports is not greater than the preset number threshold, repeat the execution of the business file in the target HIVE task until the business file is successfully executed or the error number of the target HIVE task is greater than the preset number threshold.
- the retry mechanism in order to avoid the network accidentally affecting the execution of the target HIVE task, when the business file of the HIVE task is executed incorrectly, the retry mechanism can be activated for repeated execution to ensure the smooth execution of the target HIVE task. Specifically, if the number of error reports of the target HIVE task is not greater than the preset number threshold, it means that the target HIVE task can continue to be executed repeatedly. Therefore, the business files in the target HIVE task are repeatedly executed to improve the execution efficiency of the target HIVE task.
- a stop condition for repeated execution can be set, that is, until the business file is executed successfully or the number of error reports of the target HIVE task is greater than the preset number of thresholds, to ensure the efficiency of the execution of HIVE task scheduling.
- the log program is first triggered by the startup file in the original HIVE task to obtain a task log table containing at least one HIVE task to be processed, so as to realize all unprocessed Orderly management of HIVE tasks. Then according to the task processing time of the HIVE task to be processed, the target HIVE task is determined to realize the orderly management of the currently executed HIVE task. Then, the configuration file of the target HIVE task is read through the configuration file reading tool to verify the file form of the HIVE task to ensure the smooth execution of the HIVE task that has been successfully verified. When the configuration file is successfully read, the pre-task log is queried according to the pre-task identifier in the configuration file.
- the execution logic of the HIVE task can be verified to ensure Successful execution of the target HIVE task with successful logical verification.
- the pre-task log contains the task completion tag
- the business file of the target HIVE task is executed
- the task completion tag is generated when the business file is successfully executed
- the task completion tag and its own task identification are associated and stored in the target task log. It helps the smooth execution of the post-HIVE tasks and realizes the automated execution of the target HIVE tasks, the pre-HIVE tasks and the post-HIVE tasks, without the need to manually serialize HIVE tasks, which improves the efficiency of HIVE task execution.
- the target HIVE task also provides a preset number of automatic retries when the target HIVE task fails to eliminate the impact of network accidents on the target HIVE task and ensure the smooth execution of the target HIVE task.
- a corresponding error reporting mechanism is triggered to send an error message to the client.
- the error message may be an error message formed based on the file error message, timeout error message, and retry error message to remind the operation. Maintenance personnel perform operation and maintenance on HIVE tasks to improve the work efficiency of operation and maintenance personnel.
- the HIVE task scheduling method based on the big data platform before step S201, before acquiring the original HIVE task sent by the client, the HIVE task scheduling method based on the big data platform further includes:
- the task configuration request is a request used to trigger the server to configure the HIVE task.
- the task type includes pre-dependent type or non-dependent type.
- the pre-dependency type specifically refers to tasks that can be executed only if they rely on the execution result data of the pre-HIVE task.
- the non-dependent type refers to the task that does not need to rely on the data result of the pre-HIVE task, and only needs to be directly obtained in a data table. Specifically, before configuring any original HIVE task, the user needs to determine whether it needs to rely on the execution result data of the pre-HIVE task according to its business logic. If reliance is required, select its task type as the pre-dependent type; if no dependency is required, Then select its task type as the non-dependent type.
- the server controls the client to enter the configuration file editing interface corresponding to the task type based on the task type in the task configuration request. Specifically, if the task type is a pre-dependent type, the control client enters the first configuration file editing interface; if the task type is a non-dependent type, the control client enters the second configuration file editing interface.
- the first configuration file editing interface and the second configuration file editing interface both include a variable configuration module, an own task configuration module, and an alarm target mailbox configuration module, which are respectively used to configure their corresponding variables, own task ID, and alarm object. mailbox.
- the first configuration file editing interface has more pre-task configuration modules than the second configuration file editing interface, and the pre-task configuration module is a module for configuring its pre-tasks.
- the server can obtain the configuration file formed based on the configuration file editing interface sent by the client.
- the user can configure the variable assignment formula in the variable configuration module of the client.
- the right side is the target value of the variable.
- a specific format (such as The specific format "$ ⁇ ") surrounds its variable target value.
- the target value of the variable in the variable configuration module can be modified directly in the configuration file without changing the task of the logic main body.
- the user configures his own task ID through the client's own task configuration module, he can also add a corresponding time stamp when his own task ID is generated.
- the time stamp can be the time stamp of the current day and the time stamp of the current month, respectively, for checking Whether the task of the current day or month is completed.
- the user can configure the email address of the alarm target in the alarm target email configuration module of the client.
- you can also add a corresponding time stamp when the pre-task identifier is configured.
- the time stamp can be the time stamp of the current day and the current month The timestamp of is used to check whether the pre-HIVE task of the current day or the current month is completed.
- variable configuration module is provided in the configuration file editing interface, and the main logic variables configured in the variable configuration module include variable names and variable target values (that is, the value of the variable is determined during this configuration).
- variable names and variable target values that is, the value of the variable is determined during this configuration.
- S304 Use a preset regular expression to perform format matching on the configuration file, and if the matching is successful, send matching success information to the client, so that the client forms an original HIVE task based on the successfully matched configuration file.
- the server uses a preset regular expression to match the configuration file to determine whether the word, format, or file format in the configuration file meets the preset format. If it meets the preset format, the configuration is successful and the client Send configuration success information so that the client can form the original HIVE task based on the successfully matched configuration file; if it does not conform to the preset format, the configuration will fail, generate a reminder message, and send the reminder information to the client for developers Modify the configuration file accordingly. That is, after the server obtains the configuration file, the pre-configured regular expression can be used to match the format of the configuration file to ensure the accuracy of the final original HIVE task and ensure that the subsequent configuration file can be successfully read by the configuration file reading tool .
- the HIVE task scheduling method based on the big data platform provided in this embodiment, you can enter the corresponding configuration file editing interface according to the task type of the task configuration request to obtain the corresponding configuration file, and use regular expressions to format the configuration file Match, so as to ensure the accuracy of the final configuration file, so that it can be smoothly read by the configuration file reading tool, thereby improving the executable of the original HIVE task formed, and avoiding termination due to file errors.
- an HIVE task scheduling device based on a big data platform corresponds to the HIVE task scheduling method based on the big data platform in the above-mentioned embodiment one-to-one.
- the HIVE task scheduling device based on the big data platform includes an original task acquisition module 401, a task log table acquisition module 402, a target task acquisition module 403, a configuration file reading module 404, a task identification acquisition module 405, and The task log acquisition module 406, the business file execution module 407, and the task completion processing module 408 are set.
- the detailed description of each functional module is as follows:
- the original task obtaining module 401 is used to obtain the original HIVE task sent by the client.
- the original HIVE task includes a startup file, a configuration file, and a business file.
- the task log table obtaining module 402 is configured to trigger a log program based on the startup file in the original HIVE task to obtain a task log table.
- the task log table includes at least one pending HIVE task, and each pending HIVE task corresponds to a task processing time.
- the target task acquisition module 403 is configured to acquire a target HIVE task from at least one HIVE task to be processed based on the task processing time corresponding to each HIVE task to be processed.
- the configuration file reading module 404 is configured to use a configuration file reading tool to read the configuration file in the target HIVE task.
- the task identification obtaining module 405 is configured to obtain the pre-task identification and its own task identification contained in the configuration file in the target HIVE task if the reading is successful.
- the pre-task log obtaining module 406 is configured to query the task log table based on the pre-task identifier, and obtain the pre-task log corresponding to the pre-task identifier.
- the business file execution module 407 is configured to, if the task completion tag corresponding to the pre-task identifier is carried in the pre-task log, the pre-HIVE task corresponding to the pre-task identifier is successfully completed, and the business file in the target HIVE task is executed.
- the task completion processing module 408 is configured to generate a task completion tag if the business file is successfully executed, and store the task completion tag in association with its own task identifier in the target task log corresponding to the target HIVE task.
- the HIVE task scheduling device based on the big data platform further includes a file error processing module.
- the file error processing module is used to generate file error information if the reading fails, terminate the target HIVE task, and send the alarm information based on the file error information to the client.
- the HIVE task scheduling device based on the big data platform further includes an update data acquisition module and a timeout error reporting processing module.
- the timeout error processing module is used to generate a timeout error message, terminate the target HIVE task, and send it to the client if the event listener does not detect that the update data contains the task completion tag corresponding to the pre-task identifier within the preset listening period.
- Alarm information formed based on timeout error information.
- the HIVE task scheduling device based on the big data platform further includes: a monitoring execution processing module.
- the monitoring execution processing module is used for if the event listener detects that the update data contains the task completion tag corresponding to the pre-task identifier within the preset monitoring period, the pre-HIVE task corresponding to the pre-task identifier is successfully completed and the execution target Business documents in HIVE missions.
- the HIVE task scheduling device based on the big data platform further includes: an error report frequency acquisition module and a retry error report processing module.
- the error frequency acquisition module is used to update the error frequency of the target HIVE task if the business file is not executed successfully.
- the retry error processing module is used to generate retry error information if the number of error reports is greater than the preset threshold, terminate the target HIVE task, and send alarm information based on the retry error information to the client.
- the HIVE task scheduling device based on the big data platform further includes: a retry execution processing module.
- the retry execution processing module is configured to repeatedly execute the business file in the target HIVE task if the number of error reports is not greater than the preset number threshold, until the business file is successfully executed or the error number of the target HIVE task is greater than the preset number threshold.
- the HIVE task scheduling device based on the big data platform further includes a task configuration request acquisition unit, an editing interface entry unit, a configuration file acquisition unit, and a format matching processing unit.
- the task configuration request obtaining unit is used to obtain the task configuration request sent by the client, and the task configuration request includes the task type.
- the editing interface entry unit is used to control the client to enter the configuration file editing interface corresponding to the task type based on the task type.
- the configuration file obtaining unit is used to obtain the configuration file formed by the client based on the configuration file editing interface.
- the format matching processing unit is configured to use a preset regular expression to perform format matching on the configuration file, and if the matching is successful, send a matching success message to the client, so that the client forms the original HIVE task based on the successfully matched configuration file.
- the HIVE task scheduling device based on the big data platform can be implemented in whole or in part by software, hardware, and combinations thereof.
- the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
- a computer device is provided.
- the computer device may be a server, and its internal structure diagram may be as shown in FIG. 5.
- the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
- the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
- the database of the computer equipment is used to store data used or generated during the execution of the HIVE task scheduling method based on the big data platform, such as a task log table.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer-readable instructions are executed by the processor to realize a HIVE task scheduling method based on a big data platform.
- a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
- the processor executes the computer-readable instructions to implement the The HIVE task scheduling method of the big data platform, such as S201-S215 shown in Figure 2, or shown in Figure 3, is not repeated here to avoid repetition.
- the processor implements the functions of the modules/units in this embodiment of the HIVE task scheduling device based on the big data platform when the processor executes the computer-readable instructions, such as the original task acquisition module 401 and the task log table acquisition module shown in FIG. 4 402.
- the functions of target task acquisition module 403, configuration file reading module 404, task identification acquisition module 405, pre-task log acquisition module 406, business file execution module 407, and task completion processing module 408 are not repeated here. Repeat.
- the readable storage medium in this embodiment includes a nonvolatile readable storage medium and a volatile readable storage medium.
- one or more readable storage media storing computer readable instructions are provided.
- the computer readable storage medium stores computer readable instructions, and the computer readable instructions are executed by one or more processors.
- the one or more processors are executed to implement the HIVE task scheduling method based on the big data platform in the above embodiment, for example, S201-S215 shown in Figure 2 or shown in Figure 3, in order to avoid repetition, not here Repeat it again.
- the computer-readable instruction is executed by the processor, the function of each module/unit in the embodiment of the HIVE task scheduling device based on the big data platform is realized, for example, the original task acquisition module 401 and task log shown in FIG.
- the functions of the table obtaining module 402, the target task obtaining module 403, the configuration file reading module 404, the task identification obtaining module 405, the pre-task log obtaining module 406, the business file execution module 407, and the task completion processing module 408 are to avoid duplication. I won't repeat it here.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
一种基于大数据平台的HIVE任务调度方法、装置、设备及存储介质。该方法包括:获取客户端发送的原始HIVE任务,基于原始HIVE任务中的启动文件触发日志程序,获取任务日志表;从待处理HIVE任务中获取目标HIVE任务;采用配置文件读取工具读取目标HIVE任务中的配置文件;若读取成功,则获取目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;若前置任务日志中携带与前置任务标识相对应的任务完成标签,则前置任务标识对应的前置HIVE任务成功完成,执行目标HIVE任务中的业务文件;若业务文件执行成功,则生成任务完成标签,将任务完成标签与自身任务标识关联存储到与目标HIVE任务相对应的目标任务日志中。该方法可提高HIVE任务执行的效率。
Description
本申请以2019年3月19日提交的申请号为201910208508.3,名称为“基于大数据平台的HIVE任务调度方法、装置、设备及存储介质”的中国发明申请为基础,并要求其优先权。
本申请涉及数据处理技术领域,尤其涉及一种基于大数据平台的HIVE任务调度方法、装置、设备及存储介质。
HIVE是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。HIVE是建立在Hadoop上的数据仓库基础构架。它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。HIVE定义了简单的类SQL查询语言,称为HQL,它允许熟悉SQL的用户查询数据。同时,这个语言也允许熟悉MapReduce开发者的开发自定义的mapper和reducer来处理内建的mapper和reducer无法完成的复杂的分析工作。在Hadoop大数据平台中,HIVE任务的执行相互独立,没有关联与其前置任务,使得在HIVE任务执行过程中,需手动串联HIVE任务之间的执行顺序,影响HIVE任务的执行效率。
发明内容
本申请实施例提供一种基于大数据平台的HIVE任务调度方法、装置、设备及存储介质,以解决当前HIVE任务不关联其前置HIVE任务而导致任务执行效率较低的问题。
一种基于大数据平台的HIVE任务调度方法,包括:
获取客户端发送的原始HIVE任务,所述原始HIVE任务包括启动文件、配置文件和业务文件;
基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;
基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任务中获取目标HIVE任务;
采用配置文件读取工具读取所述目标HIVE任务中的配置文件;
若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;
基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置任务日志;
若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;
若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
一种基于大数据平台的HIVE任务调度装置,包括:
原始任务获取模块,用于获取客户端发送的原始HIVE任务,所述原始HIVE任务包括 启动文件、配置文件和业务文件;
任务日志表获取模块,用于基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;
目标任务获取模块,用于基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任务中获取目标HIVE任务;
配置文件读取模块,用于采用配置文件读取工具读取所述目标HIVE任务中的配置文件;
任务标识获取模块,用于若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;
前置任务日志获取模块,用于基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置任务日志;
业务文件执行模块,用于若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;
任务完成处理模块,用于若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取客户端发送的原始HIVE任务,所述原始HIVE任务包括启动文件、配置文件和业务文件;
基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;
基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任务中获取目标HIVE任务;
采用配置文件读取工具读取所述目标HIVE任务中的配置文件;
若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;
基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置任务日志;
若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;
若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取客户端发送的原始HIVE任务,所述原始HIVE任务包括启动文件、配置文件和业务文件;
基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;
基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任务中获取目标HIVE任务;
采用配置文件读取工具读取所述目标HIVE任务中的配置文件;
若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;
基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置任务日志;
若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;
若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中基于大数据平台的HIVE任务调度方法的一应用环境示意图;
图2是本申请一实施例中基于大数据平台的HIVE任务调度方法的一流程图;
图3是本申请一实施例中基于大数据平台的HIVE任务调度方法的另一流程图;
图4是本申请一实施例中基于大数据平台的HIVE任务调度装置的一示意图;
图5是本申请一实施例中计算机设备的一示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的基于大数据平台的HIVE任务调度方法,该基于大数据平台的HIVE任务调度方法可应用如图1所示的应用环境中。具体地,该基于大数据平台的HIVE任务调度方法应用在大数据平台系统中,该大数据平台系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于实现串联HIVE任务,以实现HIVE任务执行的自动化,无需人工串联HIVE任务,提高HIVE任务执行效率。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种基于大数据平台的HIVE任务调度方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:
S201:获取客户端发送的原始HIVE任务,原始HIVE任务包括启动文件、配置文件和业务文件。
其中,原始HIVE任务是客户端给服务器发送的HIVE任务。
原始HIVE任务中的启动文件是指用于启动HIVE任务的文件,具体为SH启动文件。其中,SH文件被称为脚本Bash的应用程序和使用开发人员文件。SH文件被称为是创建并保存在Bash的语言,因为它包含的说明都写在该语言。SH文件可以在文本命令shell的命令行界面中键入执行。SH文件大多是用于程序开发,这些文件在Bash的应用程序非常重要,因为该应用程序主要使用脚本以及命令执行,使这个应用程序的工作。由于SH 文件是使用这个应用程序编程脚本和它们包含执行程序的命令,他们确实是非常重要的。SH文件扩展名现已发展成为交互式的命令解释器,虽然它最初是一种脚本语言。当前大多应用程序(如C-壳牌,Korn Shell的和Bourne Shell)中也使用了SH文件脚本的存储。
原始HIVE任务中的配置文件是指用于配置HIVE任务的特定信息的文件,该特定信息包括但不限于配置文件中配置的变量、自身任务标识和报警对象邮箱等。若根据原始HIVE任务的业务逻辑,存在前置HIVE任务时,该原始HIVE任务的配置文件中还包括前置任务标识。其中,所配置的变量是原始HIVE任务的业务逻辑中应用到的变量。自身任务标识是用于唯一识别某一原始HIVE任务的标识。前置任务标识是用于唯一识别原始HIVE任务对应的前置HIVE任务的标识。报警对象邮箱是指开发人员预先配置的在任务出错时进行报警提醒的对象的邮箱,一般可以为运维人员的邮箱。
原始HIVE任务中的业务文件用于存储原始HIVE任务中实现其业务逻辑的可执行文件。可以理解地,该业务文件在被执行时,可在大数据平台系统执行该业务文件中相应的业务逻辑,实现对大数据平台系统中的数据进行处理,获取相应的数据处理结果。
S202:基于原始HIVE任务中的启动文件触发日志程序,获取任务日志表,任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间。
其中,日志程序是用于监控并获取每一原始HIVE任务的日志的程序。任务日志表是采用日志程序记录到的所有原始HIVE任务的起止时间(包括起始时间和终止时间)、占用资源等统计日志的数据表。
具体地,服务器在接收到每一原始HIVE任务时,会通过预先设置在服务器中的日志程序触发一个用于记录该原始HIVE任务的日志记录任务,以记录原始HIVE任务执行过程中所形成的数据。即服务器在接收到一原始HIVE任务时触发一个日志记录任务,把该原始HIVE任务的执行过程中的状态都记录到与该原始HIVE任务相对应的任务日志文件中,即任务日志文件中记录该原始HIVE任务的起始时间、终止时间、任务进程、耗时、资源占用和完成情况等数据。该任务日志表是用于存储所有原始HIVE任务对应的日志记录任务所记录的任务日志文件的数据表。
具体地,服务器在基于原始HIVE任务的启动文件触发日志程序时,使该日志程序给该原始HIVE任务分配一日志记录任务,以便将该日志记录任务所获取的相关数据存储在与其对应的任务日志文件中。然后,将新接收到的原始HIVE任务作为新的待处理HIVE任务存储在任务日志表中,以将该新接收到的原始HIVE任务与其之前未曾处理的其他原始HIVE任务作为任务日志表中的待处理HIVE任务,以实现对所有未曾处理的原始HIVE任务的有序管理。其中,待处理HIVE任务是指记录在任务日志表中未曾处理的原始HIVE任务。待处理HIVE任务对应的任务处理时间可以理解为该待处理HIVE任务的起始时间,可以是用户通过客户端自主设置的用于执行该待处理HIVE任务的时间(一般针对定时任务),也可以是服务器默认接收到原始HIVE任务的时间(一般针对实时任务)。可以理解地,服务器基于原始HIVE任务的启动文件触发日志程序,以获取任务日志表,以便通过该任务日志表对未曾处理的所有待处理HIVE任务进行统一管理,以使待处理HIVE任务依据一定的执行顺序进行执行,保证待处理HIVE任务的执行效率。
S203:基于每一待处理HIVE任务对应的任务处理时间,从至少一个待处理HIVE任务中获取目标HIVE任务。
具体地,服务器基于任务日志表中的每一待处理HIVE任务的任务处理时间,即该待处理HIVE任务的起始时间,依据该起始时间的先后顺序,确定当前所要进行处理的待处理HIVE任务为目标HIVE任务,即从至少一个待处理HIVE任务中确定目标HIVE任务,以实现对起始时间在先的待处理HIVE任务优先处理,从而实现对当前要进行处理的HIVE任务有序管理。
S204:采用配置文件读取工具读取目标HIVE任务中的配置文件。
其中,配置文件读取工具是预先开发并存储在服务器中的用于读取配置文件的工具。具体地,该配置文件读取工具上内置有用于读取配置文件的正则表达式。本实施例中,服务器采用配置文件读取工具中的正则表达式对目标HIVE任务进行匹配,以确定该目标配置文件中的单词、格式或者文件形式等内容是否符合预设格式。若目标HIVE任务的配置文件符合预设格式,则服务器读取成功;若目标HIVE任务的配置文件不符合预设格式,则服务器读取失败。可以理解地,采用配置文件读取工具对目标HIVE任务的配置文件进行读取,以检验开发人员配置的配置文件是否满足其文件形式需求,即实现对HIVE任务的文件形式进行校验,从而保障目标HIVE任务的顺利执行。
S205:若读取成功,则获取目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识。
具体地,若服务器采用配置文件读取工具读取目标HIVE任务中的配置文件成功,则认定该配置文件符合预设格式要求,此时可匹配获取HIVE任务的配置文件中包含的前置任务标识和自身任务标识。
一般来说,任一目标HIVE任务的配置文件均包含其自身任务标识,但可能包含前置任务标识,也有可能未包含前置任务标识,若包含前置任务标识,其所包含的前置任务标识可以是一个,也可以是多个。由于每一目标HIVE任务对应一业务逻辑,该业务逻辑中涉及到至少一个业务参数并对至少一个业务参数进行逻辑处理。若所有业务参数均可以通过同一数据表直接获取,则此时无需等待其他业务逻辑执行,无需配置其对应的前置任务标识,使其目标HIVE任务的配置文件不包含前置任务标识。若至少一个业务参数无法通过同一数据表直接获取,而是需要通过同一数据库中其他数据表或者其他数据库中的数据表获取,则此时需等待其他业务逻辑执行,则需使其对应的目标HIVE任务的配置文件包含前置任务标识,并配置与该前置任务标识相对应的前置HIVE任务,以保证所配置的目标HIVE任务的顺利执行。
本实施例中,服务器在读取目标HIVE任务的配置文件成功之后,可先按预设关键词(该预设关键词可以是配置文件编辑界面中前置任务配置模块中设置的关键词)匹配该配置文件中是否包含前置任务标识,若包含前置任务标识,则说明该目标HIVE任务存在前置HIVE任务,需获取目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识,执行后续步骤S205及其之后的步骤。若不包含前置任务标识,则说明该目标HIVE任务不存在前置HIVE任务,此时可直接执行目标HIVE任务中的业务文件,并执行步骤S207之后的步骤。
S206:基于前置任务标识查询任务日志表,获取与前置任务标识相对应的前置任务日志。
由于日志程序在接收到每一原始HIVE任务时,触发用于记录该原始HIVE任务对应的日志记录任务,并将该日志记录任务所获取的信息记录到任务日志文件中,每一任务日志文件与其自身任务标识关联存储在任务日志表中。本实施例中,服务器可基于目标HIVE任务的配置文件中包含的前置任务标识查询任务日志表,以获取与该前置任务标识相对应的任务日志文件作为前置任务日志。可以理解地,该前置任务日志用于记录前置HIVE任务的起始时间、终止时间、任务进程、耗时、资源占用和完成情况等信息。本实施例中,前置HIVE任务的完成情况可通过是否包含任务完成标签来确定,即若前置任务日志中包含任务完成标签,则说明前置HIVE任务的业务文件执行成功;若前置任务日志中未包含任务完成标签,则说明前置HIVE任务的业务文件未执行成功。
S207:若前置任务日志中携带与前置任务标识相对应的任务完成标签,则前置任务标识对应的前置HIVE任务成功完成,执行目标HIVE任务中的业务文件。
具体地,若前置任务日志中携带与前置任务标识相对应的任务完成标签,则说明前置HIVE任务成功完成,此时,服务器可以执行该目标HIVE任务中的业务文件,即执行目标 HIVE任务的业务文件中的业务逻辑。即在配置文件读取成功之后,根据该配置文件中的前置任务标识查询前置任务日志,通过识别前置任务日志中是否包含任务完成标签,以实现对HIVE任务的执行逻辑校验,以实现逻辑校验成功(即前置HIVE任务)执行成功之后的目标HIVE任务,以保证目标HIVE任务的顺利执行。
S208:若业务文件执行成功,则生成任务完成标签,将任务完成标签与自身任务标识关联存储到与目标HIVE任务相对应的目标任务日志中。
具体地,服务器执行目标HIVE任务的业务文件时,若业务文件执行成功,则服务器可基于该业务文件中获取与其业务逻辑相对应的数据处理结果,并生成一用于标识其业务文件执行成功的任务完成标签。然后,服务器需将任务完成标签与目标HIVE任务的自身任务标识关联存储到与目标HIVE任务相对应的目标任务日志,以便根据该目标任务日志记录的任务完成标签,确定该目标HIVE任务已经执行完成且执行成功。该目标任务日志具体为日志记录表中与目标HIVE任务相对应的任务日志文件。可以理解地,将任务完成标签关联存储到与其自身任务标识相对应的目标任务日志中,以便后续将该目标HIVE任务确定为前置HIVE任务的HIVE任务(即目标HIVE任务的后置HIVE任务)可以根据该目标HIVE任务的目标任务日志中携带的任务完成标签,确定其前置HIVE任务执行成功。即将任务完成标签存储到目标任务日志中,有助于后置HIVE任务的顺利执行,实现目标HIVE任务与前置HIVE任务与后置HIVE任务的自动化执行,无需手动串联HIVE任务,提高HIVE任务执行的效率。
可以理解地,若目标HIVE任务的业务文件执行完成,并将任务完成标签与自身任务标识关联存储到与目标HIVE任务相对应的目标任务日志之后,服务器需该目标HIVE任务从任务日志表对应的至少一个待处理HIVE任务中删除,以避免重复执行,降低执行处理效率。
本实施例所提供的基于大数据平台的HIVE任务调度方法中,先通过原始HIVE任务中的启动文件触发日志程序,以获取包含至少一个待处理HIVE任务的任务日志表,以实现对所有未曾处理的HIVE任务的有序管理。再根据待处理HIVE任务的任务处理时间,确定目标HIVE任务,以实现对当前要执行的HIVE任务有序管理。然后,通过配置文件读取工具读取目标HIVE任务的配置文件,以实现对HIVE任务的文件形式进行校验,以保证校验成功的HIVE任务的顺利执行。在配置文件读取成功时,根据该配置文件中的前置任务标识查询前置任务日志,通过识别前置任务日志中是否包含任务完成标签,以实现对HIVE任务的执行逻辑校验,以保证逻辑校验成功的目标HIVE任务的顺利执行。最后,在前置任务日志中包含任务完成标签时,执行目标HIVE任务的业务文件,在业务文件执行成功时生成任务完成标签,将任务完成标签和自身任务标识关联存储到目标任务日志中,有助于后置HIVE任务的顺利执行,实现目标HIVE任务与前置HIVE任务与后置HIVE任务的自动化执行,无需手动串联HIVE任务,提高HIVE任务执行的效率。
进一步地,由于HIVE任务调度过程中,可能出现各种错误,影响HIVE任务的调度,为了保障HIVE任务出错之后,及时对无法继续执行的HIVE任务进行运营维护,提高运维人员的工作效率,需给HIVE任务调度配置相应的报警机制。在一实施例中,采用配置文件读取工具读取目标HIVE任务中的配置文件过程中可能存在读取失败这一种出错情形,因此,需触发大数据平台中的报警机制进行报警。具体地,在步骤S204之后,即在采用配置文件读取工具读取目标HIVE任务中的配置文件之后,基于大数据平台的HIVE任务调度方法还包括:
S209:若读取失败,则生成文件报错信息,终止目标HIVE任务,向客户端发送基于文件报错信息形成的报警信息。
具体地,若服务器采用配置文件读取工具读取目标HIVE任务中的配置文件失败,则认定该配置文件不符合预设格式要求,使得配置文件无法执行,因此,此时可生成文件报 错信息,并终止目标HIVE任务,向客户端发送基于文件报错信息形成的报警信息,以提醒运维人员进行修改。本实施例中,向客户端发送报警信息具体可以是向配置文件中配置的报警对象邮箱对应的客户端发送报警信息。该文件报错信息中可记录配置文件中不符合预设格式要求的配置内容及其对应的标准格式,以便运维人员基于该文件报错信息进行快捷修改维护,以维护目标HIVE任务的配置文件,提高运营维护效率。本实施例中,基于文件报错信息形成的报警信息具体是指将文件报警信息填充在预先设置的报警模板所形成的报警信息。由于目标HIVE任务因为读取配置文件失败,无法执行,此时需将目标HIVE任务从任务日志表对应的至少一个待处理HIVE任务中删除,以避免重复执行,降低执行处理效率。
进一步地,由于HIVE任务调度过程中,可能出现各种错误,影响HIVE任务的调度,为了保障HIVE任务出错之后,及时对无法继续执行HIVE任务进行运营维护,提高运维人员的工作效率,需给HIVE任务调度配置相应的报警机制。在一实施例中,在获取前置任务标识对应的前置任务日志中,该前置任务日志中可能包含任务完成标签(此时可执行步骤S207),也可能未完成任务完成标签,即此时无法继续执行步骤S207,需进行相应的报错处理。即在步骤S206之后,即在获取与前置任务标识相对应的前置任务日志之后,基于大数据平台的HIVE任务调度方法还包括:
S210:若前置任务日志中未携带与前置任务标识相对应的任务完成标签,则前置任务标识对应的前置HIVE任务未成功完成,触发事件监听程序,以监听前置任务日志的更新数据。
其中,事件监听程序是预先配置在服务器中的用于实现事件监听的程序,该事件监听程序是专用于监听前置任务日志,以获取前置任务日志中的更新数据的程序。具体地,若前置任务日志中未携带与前置任务标识相对应的任务完成标签,则说明前置HIVE任务未执行成功,若直接终止目标HIVE任务的执行,可能会使其前面执行的步骤为无效操作,在下次执行该目标HIVE任务时需重新执行前面的步骤,影响其工作效率。为了保证目标HIVE任务的执行效率,服务器在前置任务日志中未携带任务完成标签时,触发预先设置的事件监听程序,以监听前置任务日志中的更新数据,以获取前置HIVE任务执行过程的更新数据。
S211:若事件监听程序在预设监听期限内未监听到更新数据包含与前置任务标识相对应的任务完成标签,则生成超时报错信息,终止目标HIVE任务,向客户端发送基于超时报错信息形成的报警信息。
其中,预设监听期限为预先设置的监听前置任务日志的期限。该预设监听期限可以理解为在前置HIVE任务未成功完成之后,等待该前置HIVE任务执行处理的期限。具体地,若事件监听程序在预设监听期限内未监听到更新数据包含与前置任务标识相对应的任务完成标签,即说明该前置HIVE任务在预设监听期限内进行执行处理但仍然未成功完成,此时,若目标HIVE任务继续等待前置HIVE任务完成,则其等待的时间开销过大,会降低HIVE任务调度的效率。因此,在预设监听期限内未监听到更新数据包含与前置任务标识相对应的任务完成标签时,生成超时报错信息,并终止目标HIVE任务,向客户端发送基于超时报错信息形成的报警信息,以提醒运维人员进行修改。该超时报错信息可包含执行目标HIVE任务等待超时的详细信息,以便运维人员基于该超时报错信息进行快捷修改维护,以维护其前置HIVE任务的业务逻辑,提高运营维护效率。本实施例中,基于超时报错信息形成的报警信息具体是指将超时报错信息填充在预先设置的报警模板所形成的报警信息。进一步地,向客户端发送报警信息具体可以是向配置文件中配置的报警对象邮箱对应的客户端发送报警信息。由于目标HIVE任务因为超时报错,无法执行,此时需将目标HIVE任务从任务日志表对应的至少一个待处理HIVE任务中删除,以避免重复执行,降低执行处理效率。
进一步地,在步骤S210之后,即在触发事件监听程序,以监听前置任务日志的更新数据之后,基于大数据平台的HIVE任务调度方法还包括:
S212:若事件监听程序在预设监听期限内监听到更新数据包含与前置任务标识相对应的任务完成标签,则前置任务标识对应的前置HIVE任务成功完成,执行目标HIVE任务中的业务文件。
具体地,若事件监听程序在预设监听期限内监听到更新数据包含与前置任务标识相对应的任务完成标签,即说明该前置HIVE任务在预设监听期限内进行执行处理,且前置HIVE任务成功完成,说明目标HIVE任务在预设监听期限内的等待有效,可有效保证目标HIVE任务的顺利执行,提高HIVE任务的执行效率。而且,在目标HIVE任务执行过程中,可通过事件监听程序自行监听并在确定前置HIVE任务成功完成时,自动执行目标HIVE任务中的业务文件,保证HIVE任务调度过程中的自动化,提高执行效率。
进一步地,由于HIVE任务调度过程中,可能出现各种错误,影响HIVE任务的调度,为了保障HIVE任务出错之后,对可继续执行的HIVE任务(即业务文件未执行成功)的HIVE任务配置相应的重试机制,以实现对HIVE任务进行重试,以提高其执行效率。在步骤S207之后,即在执行目标HIVE任务中的业务文件之后,基于大数据平台的HIVE任务调度方法还包括:
S213:若业务文件未执行成功,则更新目标HIVE任务的报错次数。
具体地,服务器执行目标HIVE任务的业务文件时,若业务文件未执行成功,则更新目标HIVE任务的报错次数,使其报错次数加1。可以理解地,该报错次数默认为0,若未成功执行业务文件时,在上次未成功执行业务文件的报错次数的基础上加1。本实施例中,若目标HIVE任务的业务文件未执行成功,则生成任务未完成标签,根据目标HIVE任务中的目标任务日志中的任务未完成标签更新该目标HIVE任务的报错次数。
S214:若报错次数大于预设次数阈值,则生成重试报错信息,终止目标HIVE任务,向客户端发送基于重试报错信息形成的报警信息。
其中,预设次数阈值为预先设置的用于评估是否进行重试的阈值,该预设次数阈值可以设置为三次或者其他次数。具体地,若服务器在更新目标HIVE任务的报错次数之后,确定该报错次数大于预设次数阈值,说明该目标HIVE任务已经重复执行预设次数,但每次执行结果均是业务文件未执行成功,若继续执行目标HIVE任务极有可能还是未执行成功,从而影响HIVE任务的执行效率。因此,在报错次数大于预设次数阈值时,生成重试报错信息,终止目标HIVE任务,向客户端发送基于重试报错信息形成的报警信息。该重试报错信息可包含执行目标HIVE任务过程中重试多次仍然出错的具体信息,以便运维人员可基于该重试报错信息对目标HIVE任务的业务文件进行修改维护,提高运营维护效率。本实施例中,基于重试报错信息形成的报警信息具体是指将重试报错信息填充在预先设置的报警模板所形成的报警信息。
进一步地,在步骤S213之后,即在更新目标HIVE任务的报错次数之后,基于大数据平台的HIVE任务调度方法还包括:
S215:若报错次数不大于预设次数阈值,则重复执行目标HIVE任务中的业务文件,直至业务文件执行成功或者目标HIVE任务的报错次数大于预设次数阈值。
本实施例中,为了避免网络意外对目标HIVE任务执行的影响,可在执行HIVE任务的业务文件出错时,启动重试机制进行重复执行,以保证目标HIVE任务的顺利执行。具体地,若目标HIVE任务的报错次数不大于预设次数阈值,说明目标HIVE任务还可继续重复执行,因此,重复执行目标HIVE任务中的业务文件,以提高目标HIVE任务的执行效率。为了避免目标HIVE任务一直在重复执行,可设置重复执行的停止条件,即直至业务文件执行成功或者目标HIVE任务的报错次数大于预设次数阈值,以保证HIVE任务调度执行的效率。
本实施例所提供的基于大数据平台的HIVE任务调度方法中,先通过原始HIVE任务中的启动文件触发日志程序,以获取包含至少一个待处理HIVE任务的任务日志表,以实现对所有未曾处理的HIVE任务的有序管理。再根据待处理HIVE任务的任务处理时间,确定目标HIVE任务,以实现对当前要执行的HIVE任务有序管理。然后,通过配置文件读取工具读取目标HIVE任务的配置文件,以实现对HIVE任务的文件形式进行校验,以保证校验成功的HIVE任务的顺利执行。在配置文件读取成功时,根据该配置文件中的前置任务标识查询前置任务日志,通过识别前置任务日志中是否包含任务完成标签,以实现对HIVE任务的执行逻辑校验,以保证逻辑校验成功的目标HIVE任务的顺利执行。最后,在前置任务日志中包含任务完成标签时,执行目标HIVE任务的业务文件,在业务文件执行成功时生成任务完成标签,将任务完成标签和自身任务标识关联存储到目标任务日志中,有助于后置HIVE任务的顺利执行,实现目标HIVE任务与前置HIVE任务与后置HIVE任务的自动化执行,无需手动串联HIVE任务,提高HIVE任务执行的效率。并且,还提供在目标HIVE任务出错时自动重试预设次数,以排除网络意外对目标HIVE任务的影响,保证目标HIVE任务的顺利执行。进一步地,在目标HIVE任务出错时,触发相应的报错机制,以向客户端发送报错信息,该报错信息可以是基于文件报错信息、超时报错信息和重试报错信息形成的报错信息,以提醒运维人员对HIVE任务进行运营维护,提高运维人员的工作效率。
在一实施例中,如图3所示,在步骤S201之前,在获取客户端发送的原始HIVE任务之前,基于大数据平台的HIVE任务调度方法还包括:
S301:获取客户端发送的任务配置请求,任务配置请求包括任务类型。
其中,任务配置请求是用于触发服务器进行HIVE任务配置的请求。该任务类型包括前置依赖类型或者无依赖类型。其中,前置依赖类型具体是指需要依赖前置HIVE任务的执行结果数据才可以执行的任务。无依赖类型是指无需依赖前置HIVE任务的数据结果,只需在一数据表中直接获取的任务。具体地,用户在配置任一原始HIVE任务之前,需根据其业务逻辑确定是否需要依赖前置HIVE任务的执行结果数据,若需要依赖,则选择其任务类型为前置依赖类型;若无需依赖,则选择其任务类型为无依赖类型。
S302:基于任务类型,控制客户端进入与任务类型相对应的配置文件编辑界面。
服务器基于任务配置请求中的任务类型,控制客户端进入与该任务类型相对应的配置文件编辑界面。具体地,若任务类型为前置依赖类型,则控制客户端进入第一配置文件编辑界面;若任务类型为无依赖类型,则控制客户端进入第二配置文件编辑界面。本实施例中,第一配置文件编辑界面和第二配置文件编辑界面均包括变量配置模块、自身任务配置模块和报警对象邮箱配置模块,分别用于配置其对应的变量、自身任务标识和报警对象邮箱。该第一配置文件编辑界面比第二配置文件编辑界面多了前置任务配置模块,该前置任务配置模块是用于配置其前置任务的模块。
S303:获取客户端基于配置文件编辑界面形成的配置文件。
服务器可获取客户端发送的基于配置文件编辑界面形成的配置文件。例如,用户可在客户端的变量配置模块中配置变量赋值式,该变量赋值式中,“=”左边是变量名称,右边是变量目标值,对于逻辑主体中的变量,还可采用特定格式(如“${}”这一特定格式)包围其变量目标值。本实施例中,在变量配置模块中配置其配置文件中的变量时,在后续变量发生变化时,可直接在配置文件对变量配置模块中变量目标值进行修改,而不对逻辑主体有任务变动。在用户通过客户端的自身任务配置模块中配置自身任务标识时,还可在其自身任务标识生成时添加相应的时间戳,该时间戳可以为当日的时间戳和当月的时间戳,分别用于检查当日或者当月的自身任务是否完成。用户可在客户端的报警对象邮箱配置模块中配置报警对象的电子邮箱。而在第一配置文件编辑界面的前置任务配置模块中,除了配置前置任务标识,还可在前置任务标识配置完成时添加相应的时间戳,该时间戳可以为当日的时间戳和当月的时间戳,分别用于检查当日或者当月的前置HIVE任务是否完 成。
本实施例中,配置文件编辑界面中提供变量配置模块,在变量配置模块中配置的逻辑主体变量包括变量名称和变量目标值(即本次配置时确定变量的值)。在配置文件编辑或修改过程中,可配置该变量名称对应的变量目标值,而不配置或修改其变量名称(或者逻辑主体),以避免逻辑主体变动时需进行回归测试才可确定其他模块是否存在问题。这种分享业务逻辑主体与变量的方式,方便调试。
S304:采用预设的正则表达式对配置文件进行格式匹配,若匹配成功,则向客户端发送匹配成功信息,以使客户端基于匹配成功的配置文件形成原始HIVE任务。
具体地,服务器采用预先设置的正则表达式对配置文件进行匹配,以确定配置文件中的单词、格式或者文件形式等内容是否符合预设格式,若符合预设格式,则配置成功,向客户端发送配置成功信息,以使客户端可基于匹配成功的配置文件形成原始HIVE任务;若不符合预设格式,则配置失败,生成提醒信息,并将该提醒信息发送给客户端,以使开发人员相应修改配置文件。即在服务器获取配置文件之后,可采用预先配置的正则表达式对配置文件进行格式匹配,以保证最终形成的原始HIVE任务的准确性,保证后续该配置文件可被配置文件读取工具顺利读取。
本实施例所提供的基于大数据平台的HIVE任务调度方法中,可根据任务配置请求的任务类型进入相应的配置文件编辑界面,以获取相应的配置文件,并采用正则表达式对配置文件进行格式匹配,从而保证最终形成的配置文件的准确性,使其可被配置文件读取工具顺利读取,从而提高所形成的原始HIVE任务的可执行性,避免因文件报错而终止。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种基于大数据平台的HIVE任务调度装置,该基于大数据平台的HIVE任务调度装置与上述实施例中基于大数据平台的HIVE任务调度方法一一对应。如图4所示,该基于大数据平台的HIVE任务调度装置包括原始任务获取模块401、任务日志表获取模块402、目标任务获取模块403、配置文件读取模块404、任务标识获取模块405、前置任务日志获取模块406、业务文件执行模块407和任务完成处理模块408。各功能模块详细说明如下:
原始任务获取模块401,用于获取客户端发送的原始HIVE任务,原始HIVE任务包括启动文件、配置文件和业务文件。
任务日志表获取模块402,用于基于原始HIVE任务中的启动文件触发日志程序,获取任务日志表,任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间。
目标任务获取模块403,用于基于每一待处理HIVE任务对应的任务处理时间,从至少一个待处理HIVE任务中获取目标HIVE任务。
配置文件读取模块404,用于采用配置文件读取工具读取目标HIVE任务中的配置文件。
任务标识获取模块405,用于若读取成功,则获取目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识。
前置任务日志获取模块406,用于基于前置任务标识查询任务日志表,获取与前置任务标识相对应的前置任务日志。
业务文件执行模块407,用于若前置任务日志中携带与前置任务标识相对应的任务完成标签,则前置任务标识对应的前置HIVE任务成功完成,执行目标HIVE任务中的业务文件。
任务完成处理模块408,用于若业务文件执行成功,则生成任务完成标签,将任务完成标签与自身任务标识关联存储到与目标HIVE任务相对应的目标任务日志中。
优选地,在配置文件读取模块404之后,基于大数据平台的HIVE任务调度装置还包 括文件报错处理模块。
文件报错处理模块,用于若读取失败,则生成文件报错信息,终止目标HIVE任务,向客户端发送基于文件报错信息形成的报警信息。
优选地,在前置任务日志获取模块406之后,基于大数据平台的HIVE任务调度装置还包括更新数据获取模块和超时报错处理模块。
更新数据获取模块,用于若前置任务日志中未携带与前置任务标识相对应的任务完成标签,则前置任务标识对应的前置HIVE任务未成功完成,触发事件监听程序,以监听前置任务日志的更新数据。
超时报错处理模块,用于若事件监听程序在预设监听期限内未监听到更新数据包含与前置任务标识相对应的任务完成标签,则生成超时报错信息,终止目标HIVE任务,向客户端发送基于超时报错信息形成的报警信息。
优选地,在更新数据获取模块之后,基于大数据平台的HIVE任务调度装置还包括:监听执行处理模块。
监听执行处理模块,用于若事件监听程序在预设监听期限内监听到更新数据包含与前置任务标识相对应的任务完成标签,则前置任务标识对应的前置HIVE任务成功完成,执行目标HIVE任务中的业务文件。
优选地,在业务文件执行模块407之后,基于大数据平台的HIVE任务调度装置还包括:报错次数获取模块和重试报错处理模块。
报错次数获取模块,用于若业务文件未执行成功,则更新目标HIVE任务的报错次数。
重试报错处理模块,用于若报错次数大于预设次数阈值,则生成重试报错信息,终止目标HIVE任务,向客户端发送基于重试报错信息形成的报警信息。
优选地,在报错次数获取模块之后,基于大数据平台的HIVE任务调度装置还包括:重试执行处理模块。
重试执行处理模块,用于若报错次数不大于预设次数阈值,则重复执行目标HIVE任务中的业务文件,直至业务文件执行成功或者目标HIVE任务的报错次数大于预设次数阈值。
优选地,在原始任务获取模块401之前,基于大数据平台的HIVE任务调度装置还包括任务配置请求获取单元、编辑界面进入单元、配置文件获取单元和格式匹配处理单元。
任务配置请求获取单元,用于获取客户端发送的任务配置请求,任务配置请求包括任务类型。
编辑界面进入单元,用于基于任务类型,控制客户端进入与任务类型相对应的配置文件编辑界面。
配置文件获取单元,用于获取客户端基于配置文件编辑界面形成的配置文件。
格式匹配处理单元,用于采用预设的正则表达式对配置文件进行格式匹配,若匹配成功,则向客户端发送匹配成功信息,以使客户端基于匹配成功的配置文件形成原始HIVE任务。
关于基于大数据平台的HIVE任务调度装置的具体限定可以参见上文中对于基于大数据平台的HIVE任务调度方法的限定,在此不再赘述。上述基于大数据平台的HIVE任务调度装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指 令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储执行上述基于大数据平台的HIVE任务调度方法过程中采用或者生成的数据,如任务日志表。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于大数据平台的HIVE任务调度方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中基于大数据平台的HIVE任务调度方法,例如图2所示S201-S215,或者图3所示,为避免重复,这里不再赘述。或者,处理器执行计算机可读指令时实现基于大数据平台的HIVE任务调度装置这一实施例中的各模块/单元的功能,例如图4所示的原始任务获取模块401、任务日志表获取模块402、目标任务获取模块403、配置文件读取模块404、任务标识获取模块405、前置任务日志获取模块406、业务文件执行模块407和任务完成处理模块408的功能,为避免重复,这里不再赘述。本实施例中的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。
在一实施例中,提供一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现上述实施例中基于大数据平台的HIVE任务调度方法,例如图2所示S201-S215,或者图3所示,为避免重复,这里不再赘述。或者,该计算机可读指令被处理器执行时实现上述基于大数据平台的HIVE任务调度装置这一实施例中的各模块/单元的功能,例如图4所示的原始任务获取模块401、任务日志表获取模块402、目标任务获取模块403、配置文件读取模块404、任务标识获取模块405、前置任务日志获取模块406、业务文件执行模块407和任务完成处理模块408的功能,为避免重复,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一非易失性可读存储介质也可以存储在易失性可读存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。
Claims (20)
- 一种基于大数据平台的HIVE任务调度方法,其特征在于,包括:获取客户端发送的原始HIVE任务,所述原始HIVE任务包括启动文件、配置文件和业务文件;基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任务中获取目标HIVE任务;采用配置文件读取工具读取所述目标HIVE任务中的配置文件;若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置任务日志;若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
- 如权利要求1所述的基于大数据平台的HIVE任务调度方法,其特征在于,在所述采用配置文件读取工具读取所述目标HIVE任务中的配置文件之后,所述基于大数据平台的HIVE任务调度方法还包括:若读取失败,则生成文件报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述文件报错信息形成的报警信息。
- 如权利要求1所述的基于大数据平台的HIVE任务调度方法,其特征在于,在所述获取与所述前置任务标识相对应的前置任务日志之后,所述基于大数据平台的HIVE任务调度方法还包括:若所述前置任务日志中未携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务未成功完成,触发事件监听程序,以监听所述前置任务日志的更新数据;若所述事件监听程序在预设监听期限内未监听到所述更新数据包含与所述前置任务标识相对应的任务完成标签,则生成超时报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述超时报错信息形成的报警信息。
- 如权利要求3所述的基于大数据平台的HIVE任务调度方法,其特征在于,在所述触发事件监听程序,以监听所述前置任务日志的更新数据之后,所述基于大数据平台的HIVE任务调度方法还包括:若所述事件监听程序在预设监听期限内监听到所述更新数据包含与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件。
- 如权利要求1所述的基于大数据平台的HIVE任务调度方法,其特征在于,在所述执行所述目标HIVE任务中的业务文件之后,所述基于大数据平台的HIVE任务调度方法还包括:若所述业务文件未执行成功,则更新所述目标HIVE任务的报错次数;若所述报错次数大于预设次数阈值,则生成重试报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述重试报错信息形成的报警信息。
- 如权利要求5所述的基于大数据平台的HIVE任务调度方法,其特征在于,在所述 更新所述目标HIVE任务的报错次数之后,所述基于大数据平台的HIVE任务调度方法还包括:若所述报错次数不大于预设次数阈值,则重复执行所述目标HIVE任务中的业务文件,直至所述业务文件执行成功或者所述目标HIVE任务的报错次数大于所述预设次数阈值。
- 如权利要求1所述的基于大数据平台的HIVE任务调度方法,其特征在于,在所述获取客户端发送的原始HIVE任务之前,所述基于大数据平台的HIVE任务调度方法还包括:获取客户端发送的任务配置请求,所述任务配置请求包括任务类型;基于所述任务类型,控制所述客户端进入与所述任务类型相对应的配置文件编辑界面;获取客户端基于所述配置文件编辑界面形成的配置文件;采用预设的正则表达式对所述配置文件进行格式匹配,若匹配成功,则向所述客户端发送匹配成功信息,以使所述客户端基于匹配成功的配置文件形成原始HIVE任务。
- 一种基于大数据平台的HIVE任务调度装置,其特征在于,包括:原始任务获取模块,用于获取客户端发送的原始HIVE任务,所述原始HIVE任务包括启动文件、配置文件和业务文件;任务日志表获取模块,用于基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;目标任务获取模块,用于基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任务中获取目标HIVE任务;配置文件读取模块,用于采用配置文件读取工具读取所述目标HIVE任务中的配置文件;任务标识获取模块,用于若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;前置任务日志获取模块,用于基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置任务日志;业务文件执行模块,用于若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;任务完成处理模块,用于若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:获取客户端发送的原始HIVE任务,所述原始HIVE任务包括启动文件、配置文件和业务文件;基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任务中获取目标HIVE任务;采用配置文件读取工具读取所述目标HIVE任务中的配置文件;若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置 任务日志;若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
- 如权利要求9所述的计算机设备,其特征在于,在所述采用配置文件读取工具读取所述目标HIVE任务中的配置文件之后,所述处理器执行所述计算机可读指令时还实现如下步骤:若读取失败,则生成文件报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述文件报错信息形成的报警信息。
- 如权利要求9所述的计算机设备,其特征在于,在所述获取与所述前置任务标识相对应的前置任务日志之后,所述处理器执行所述计算机可读指令时还实现如下步骤:若所述前置任务日志中未携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务未成功完成,触发事件监听程序,以监听所述前置任务日志的更新数据;若所述事件监听程序在预设监听期限内未监听到所述更新数据包含与所述前置任务标识相对应的任务完成标签,则生成超时报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述超时报错信息形成的报警信息。
- 如权利要求11所述的计算机设备,其特征在于,在所述触发事件监听程序,以监听所述前置任务日志的更新数据之后,所述处理器执行所述计算机可读指令时还实现如下步骤:若所述事件监听程序在预设监听期限内监听到所述更新数据包含与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件。
- 如权利要求9所述的计算机设备,其特征在于,在所述执行所述目标HIVE任务中的业务文件之后,所述处理器执行所述计算机可读指令时还实现如下步骤:若所述业务文件未执行成功,则更新所述目标HIVE任务的报错次数;若所述报错次数大于预设次数阈值,则生成重试报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述重试报错信息形成的报警信息。
- 如权利要求9所述的计算机设备,其特征在于,在所述获取客户端发送的原始HIVE任务之前,所述处理器执行所述计算机可读指令时还实现如下步骤:获取客户端发送的任务配置请求,所述任务配置请求包括任务类型;基于所述任务类型,控制所述客户端进入与所述任务类型相对应的配置文件编辑界面;获取客户端基于所述配置文件编辑界面形成的配置文件;采用预设的正则表达式对所述配置文件进行格式匹配,若匹配成功,则向所述客户端发送匹配成功信息,以使所述客户端基于匹配成功的配置文件形成原始HIVE任务。
- 一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:获取客户端发送的原始HIVE任务,所述原始HIVE任务包括启动文件、配置文件和业务文件;基于所述原始HIVE任务中的启动文件触发日志程序,获取任务日志表,所述任务日志表包括至少一个待处理HIVE任务,每一待处理HIVE任务对应一任务处理时间;基于每一所述待处理HIVE任务对应的任务处理时间,从至少一个所述待处理HIVE任 务中获取目标HIVE任务;采用配置文件读取工具读取所述目标HIVE任务中的配置文件;若读取成功,则获取所述目标HIVE任务中的配置文件包含的前置任务标识和自身任务标识;基于所述前置任务标识查询所述任务日志表,获取与所述前置任务标识相对应的前置任务日志;若所述前置任务日志中携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件;若所述业务文件执行成功,则生成任务完成标签,将所述任务完成标签与所述自身任务标识关联存储到与所述目标HIVE任务相对应的目标任务日志中。
- 如权利要求15所述的可读存储介质,其特征在于,在所述采用配置文件读取工具读取所述目标HIVE任务中的配置文件之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:若读取失败,则生成文件报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述文件报错信息形成的报警信息。
- 如权利要求15所述的可读存储介质,其特征在于,在所述获取与所述前置任务标识相对应的前置任务日志之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:若所述前置任务日志中未携带与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务未成功完成,触发事件监听程序,以监听所述前置任务日志的更新数据;若所述事件监听程序在预设监听期限内未监听到所述更新数据包含与所述前置任务标识相对应的任务完成标签,则生成超时报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述超时报错信息形成的报警信息。
- 如权利要求17所述的可读存储介质,其特征在于,在所述触发事件监听程序,以监听所述前置任务日志的更新数据之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:若所述事件监听程序在预设监听期限内监听到所述更新数据包含与所述前置任务标识相对应的任务完成标签,则所述前置任务标识对应的前置HIVE任务成功完成,执行所述目标HIVE任务中的业务文件。
- 如权利要求15所述的可读存储介质,其特征在于,在所述执行所述目标HIVE任务中的业务文件之后,所述可读存储介质还包括:若所述业务文件未执行成功,则更新所述目标HIVE任务的报错次数;若所述报错次数大于预设次数阈值,则生成重试报错信息,终止所述目标HIVE任务,向所述客户端发送基于所述重试报错信息形成的报警信息。
- 如权利要求15所述的可读存储介质,其特征在于,在所述获取客户端发送的原始HIVE任务之前,所述可读存储介质还包括:获取客户端发送的任务配置请求,所述任务配置请求包括任务类型;基于所述任务类型,控制所述客户端进入与所述任务类型相对应的配置文件编辑界面;获取客户端基于所述配置文件编辑界面形成的配置文件;采用预设的正则表达式对所述配置文件进行格式匹配,若匹配成功,则向所述客户端发送匹配成功信息,以使所述客户端基于匹配成功的配置文件形成原始HIVE任务。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910208508.3 | 2019-03-19 | ||
CN201910208508.3A CN110069572B (zh) | 2019-03-19 | 2019-03-19 | 基于大数据平台的hive任务调度方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020186809A1 true WO2020186809A1 (zh) | 2020-09-24 |
Family
ID=67366392
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/120594 WO2020186809A1 (zh) | 2019-03-19 | 2019-11-25 | 基于大数据平台的hive任务调度方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110069572B (zh) |
WO (1) | WO2020186809A1 (zh) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112486982A (zh) * | 2020-11-17 | 2021-03-12 | 中信银行股份有限公司 | 一种数据获取方法、装置及存储介质 |
CN112861496A (zh) * | 2021-03-22 | 2021-05-28 | 平安商业保理有限公司 | 报表生成显示方法、装置、计算机设备和可读存储介质 |
CN113268318A (zh) * | 2021-04-07 | 2021-08-17 | 北京思特奇信息技术股份有限公司 | 一种任务调度的方法和分布式系统 |
CN113342490A (zh) * | 2021-05-31 | 2021-09-03 | 北京顶象技术有限公司 | 一种建模任务调度的执行方法和装置 |
CN113779336A (zh) * | 2021-09-08 | 2021-12-10 | 五八同城信息技术有限公司 | 用户行为数据的处理方法及装置、电子设备 |
CN113780704A (zh) * | 2020-10-22 | 2021-12-10 | 北京京东振世信息技术有限公司 | 一种任务处理方法和装置 |
CN113986380A (zh) * | 2021-10-27 | 2022-01-28 | 北京百度网讯科技有限公司 | 数据处理方法、装置和系统、电子设备及存储介质 |
CN114710403A (zh) * | 2022-03-30 | 2022-07-05 | 中国建设银行股份有限公司 | 数据调度方法、装置、设备、介质及程序产品 |
CN114816717A (zh) * | 2022-05-19 | 2022-07-29 | 广州有信科技有限公司 | 计算机任务的执行方法、装置、设备及存储介质 |
CN117009327A (zh) * | 2023-09-27 | 2023-11-07 | 腾讯科技(深圳)有限公司 | 一种数据处理方法、装置及计算机设备、介质 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069572B (zh) * | 2019-03-19 | 2022-08-02 | 深圳壹账通智能科技有限公司 | 基于大数据平台的hive任务调度方法、装置、设备及存储介质 |
CN110490451A (zh) * | 2019-08-15 | 2019-11-22 | 中国平安财产保险股份有限公司 | 基于hadoop的任务数据管控方法、装置以及计算机设备 |
CN110764998B (zh) * | 2019-09-06 | 2024-04-02 | 平安健康保险股份有限公司 | 基于Django框架的数据比对方法、装置、设备及存储介质 |
CN110837509A (zh) * | 2019-11-08 | 2020-02-25 | 深圳市彬讯科技有限公司 | 一种调度依赖的方法、装置、设备以及存储介质 |
CN111090569A (zh) * | 2019-12-11 | 2020-05-01 | 深圳震有科技股份有限公司 | 一种调度系统及基于调度系统的关系日志生成方法、介质 |
CN111158798A (zh) * | 2019-12-27 | 2020-05-15 | 中国银行股份有限公司 | 一种业务数据处理方法及装置 |
CN111930814B (zh) * | 2020-05-29 | 2024-02-27 | 武汉达梦数据库股份有限公司 | 一种基于etl系统的文件事件的调度方法和etl系统 |
CN112367205B (zh) * | 2020-11-12 | 2023-04-18 | 深圳前海微众银行股份有限公司 | 一种对http调度请求的处理方法及调度系统 |
CN113064713A (zh) * | 2021-04-23 | 2021-07-02 | 中国工商银行股份有限公司 | 一种任务执行方法、装置及设备 |
CN114968913A (zh) * | 2022-05-25 | 2022-08-30 | 中国平安财产保险股份有限公司 | 一种数据管理方法、装置及计算设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104536811A (zh) * | 2014-12-26 | 2015-04-22 | 广州华多网络科技有限公司 | 基于hive任务的任务调度方法及装置 |
CN106201754A (zh) * | 2016-07-06 | 2016-12-07 | 乐视控股(北京)有限公司 | 任务信息分析方法及装置 |
CN106528275A (zh) * | 2015-09-10 | 2017-03-22 | 网易(杭州)网络有限公司 | 数据任务的处理方法及任务调度器 |
US20170351620A1 (en) * | 2016-06-07 | 2017-12-07 | Qubole Inc | Caching Framework for Big-Data Engines in the Cloud |
CN110069572A (zh) * | 2019-03-19 | 2019-07-30 | 深圳壹账通智能科技有限公司 | 基于大数据平台的hive任务调度方法、装置、设备及存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001063448A2 (en) * | 2000-02-25 | 2001-08-30 | Navic Systems, Inc. | Method and system of user profile generation |
US7379959B2 (en) * | 2002-09-07 | 2008-05-27 | Appistry, Inc. | Processing information using a hive of computing engines including request handlers and process handlers |
US20150084784A1 (en) * | 2013-09-25 | 2015-03-26 | Solutionbee, LLC | Apiary monitoring system |
CN104616205B (zh) * | 2014-11-24 | 2019-10-25 | 北京科东电力控制系统有限责任公司 | 一种基于分布式日志分析的电力系统运行状态监视方法 |
US9886292B2 (en) * | 2015-10-26 | 2018-02-06 | Dell Products L.P. | Making user profile data portable across platforms |
CN107818112B (zh) * | 2016-09-13 | 2021-12-14 | 腾讯科技(深圳)有限公司 | 一种大数据分析作业系统及任务提交方法 |
CN107301214B (zh) * | 2017-06-09 | 2020-08-28 | 广州虎牙信息科技有限公司 | 在hive中数据迁移方法、装置及终端设备 |
-
2019
- 2019-03-19 CN CN201910208508.3A patent/CN110069572B/zh active Active
- 2019-11-25 WO PCT/CN2019/120594 patent/WO2020186809A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104536811A (zh) * | 2014-12-26 | 2015-04-22 | 广州华多网络科技有限公司 | 基于hive任务的任务调度方法及装置 |
CN106528275A (zh) * | 2015-09-10 | 2017-03-22 | 网易(杭州)网络有限公司 | 数据任务的处理方法及任务调度器 |
US20170351620A1 (en) * | 2016-06-07 | 2017-12-07 | Qubole Inc | Caching Framework for Big-Data Engines in the Cloud |
CN106201754A (zh) * | 2016-07-06 | 2016-12-07 | 乐视控股(北京)有限公司 | 任务信息分析方法及装置 |
CN110069572A (zh) * | 2019-03-19 | 2019-07-30 | 深圳壹账通智能科技有限公司 | 基于大数据平台的hive任务调度方法、装置、设备及存储介质 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780704A (zh) * | 2020-10-22 | 2021-12-10 | 北京京东振世信息技术有限公司 | 一种任务处理方法和装置 |
CN112486982A (zh) * | 2020-11-17 | 2021-03-12 | 中信银行股份有限公司 | 一种数据获取方法、装置及存储介质 |
CN112861496A (zh) * | 2021-03-22 | 2021-05-28 | 平安商业保理有限公司 | 报表生成显示方法、装置、计算机设备和可读存储介质 |
CN113268318A (zh) * | 2021-04-07 | 2021-08-17 | 北京思特奇信息技术股份有限公司 | 一种任务调度的方法和分布式系统 |
CN113342490A (zh) * | 2021-05-31 | 2021-09-03 | 北京顶象技术有限公司 | 一种建模任务调度的执行方法和装置 |
CN113779336A (zh) * | 2021-09-08 | 2021-12-10 | 五八同城信息技术有限公司 | 用户行为数据的处理方法及装置、电子设备 |
CN113986380A (zh) * | 2021-10-27 | 2022-01-28 | 北京百度网讯科技有限公司 | 数据处理方法、装置和系统、电子设备及存储介质 |
CN113986380B (zh) * | 2021-10-27 | 2024-02-06 | 北京百度网讯科技有限公司 | 数据处理方法、装置和系统、电子设备及存储介质 |
CN114710403A (zh) * | 2022-03-30 | 2022-07-05 | 中国建设银行股份有限公司 | 数据调度方法、装置、设备、介质及程序产品 |
CN114710403B (zh) * | 2022-03-30 | 2024-04-19 | 中国建设银行股份有限公司 | 数据调度方法、装置、设备、介质及程序产品 |
CN114816717A (zh) * | 2022-05-19 | 2022-07-29 | 广州有信科技有限公司 | 计算机任务的执行方法、装置、设备及存储介质 |
CN117009327A (zh) * | 2023-09-27 | 2023-11-07 | 腾讯科技(深圳)有限公司 | 一种数据处理方法、装置及计算机设备、介质 |
CN117009327B (zh) * | 2023-09-27 | 2024-01-05 | 腾讯科技(深圳)有限公司 | 一种数据处理方法、装置及计算机设备、介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110069572B (zh) | 2022-08-02 |
CN110069572A (zh) | 2019-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020186809A1 (zh) | 基于大数据平台的hive任务调度方法、装置、设备及存储介质 | |
CN110297701B (zh) | 数据处理作业调度方法、装置、计算机设备及存储介质 | |
CN107341098B (zh) | 软件性能测试方法、平台、设备及存储介质 | |
WO2020015150A1 (zh) | 数据表动态导出方法、装置、计算机设备及存储介质 | |
CN110941546A (zh) | Web页面用例的自动化测试方法、装置、设备及存储介质 | |
WO2018184420A1 (zh) | 软件测试方法、装置、电子设备及介质 | |
CN108280023B (zh) | 任务执行方法、装置和服务器 | |
CN110764998B (zh) | 基于Django框架的数据比对方法、装置、设备及存储介质 | |
WO2020119422A1 (zh) | 测试数据的加载方法、装置、计算机设备及存储介质 | |
WO2020151181A1 (zh) | 基于区块链的跨平台数据更新方法、装置和计算机设备 | |
WO2020186808A1 (zh) | 页面权限测试方法、装置、计算机设备及存储介质 | |
CN111177113A (zh) | 数据迁移方法、装置、计算机设备和存储介质 | |
CN111737227B (zh) | 数据修改方法及系统 | |
WO2018188380A1 (zh) | 一种添加控件标识的方法和装置 | |
US20200371902A1 (en) | Systems and methods for software regression detection | |
CN111611009A (zh) | 数据库脚本管理方法、装置、计算机设备及存储介质 | |
CN113157411B (zh) | 一种基于Celery的可靠可配置任务系统及装置 | |
CN111124872A (zh) | 基于差异代码分析的分支检测方法、装置及存储介质 | |
CN113448862A (zh) | 软件版本测试方法、装置及计算机设备 | |
CN114356521A (zh) | 任务调度方法、装置、电子设备及存储介质 | |
CN110851324A (zh) | 基于日志的巡检处理方法、装置以及电子设备、存储介质 | |
CN111782207A (zh) | 任务流代码生成方法、装置、设备及存储介质 | |
WO2019148657A1 (zh) | 关联环境测试方法、电子装置及计算机可读存储介质 | |
CN104298671B (zh) | 数据统计分析方法及装置 | |
CN112235124B (zh) | 一种皮基站配置方法、装置、存储介质和电子装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19920216 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 21/01/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19920216 Country of ref document: EP Kind code of ref document: A1 |