WO2018126964A1 - Task execution method and apparatus and server - Google Patents

Task execution method and apparatus and server Download PDF

Info

Publication number
WO2018126964A1
WO2018126964A1 PCT/CN2017/118957 CN2017118957W WO2018126964A1 WO 2018126964 A1 WO2018126964 A1 WO 2018126964A1 CN 2017118957 W CN2017118957 W CN 2017118957W WO 2018126964 A1 WO2018126964 A1 WO 2018126964A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
file
database
script file
database script
Prior art date
Application number
PCT/CN2017/118957
Other languages
French (fr)
Chinese (zh)
Inventor
单立明
钟陈练
匡林林
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018126964A1 publication Critical patent/WO2018126964A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3628Software debugging of optimised code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/73Program documentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs

Definitions

  • the present disclosure relates to the field of computer technology, for example, to a task execution method, apparatus, and server.
  • Spark Spark is a big data processing framework built around speed, ease of use and complex analysis. It promotes MapReduce (a programming model) to a lower cost Suffle (data cleaning algorithm) in the data processing process. Higher level. Spark takes advantage of in-memory data storage and near real-time processing power, which is many times faster than other big data processing technologies.
  • the data cleaning task often needs to develop a large amount of code to support.
  • the engineering package will become bloated, which brings various drawbacks.
  • the code has a high repetition rate and serious human waste.
  • the complexity of the merged code is high, which will greatly waste human resources.
  • the maintenance cost of the code is too high.
  • the project finds that there is a loophole in the task that needs to be adjusted, it is necessary to stop the entire project. This will cause most good tasks to wait for a problem task, resulting in serious waste of resources.
  • the code is not flexible enough.
  • the present disclosure provides a task execution method, apparatus, and server to solve at least one of the problems of code duplication, large amount of engineering, and inconvenient task management in a data cleaning task.
  • the present disclosure provides a task execution method, including:
  • Reading a task description file of the task wherein the task description file records a path of a database common package for executing the task, a path of a database script file for representing an entity of the task, and a corresponding One parameter
  • a path of the task configuration file is also recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file.
  • the method further includes:
  • acquiring the database script file and replacing the variables in the database script file with the first parameter including:
  • the method further includes:
  • the tasks are read from the task list according to the order of the tasks in the task list.
  • the task description file includes information about an input table corresponding to the task and information of an output table;
  • the executing the database script file includes:
  • the input data of the database script is obtained from the input table and the database script file is executed, and the obtained result is added to the output table.
  • the task execution device of the present invention comprises:
  • a reading module configured to read a task description file corresponding to the task, where the task description file records a path of a database common package for executing the task, a path of a database script file for an entity representing the task, And a first parameter corresponding to the task, where the first parameter is used to replace a variable in the database script file;
  • a push module configured to push the database script file to a compute node, the compute node having the database common package
  • An execution module configured to invoke the database common package according to the path of the database common package on the computing node, obtain the database script file by using the database common package according to the path of the database script file, and use the database script file
  • the first parameter replaces a variable in the database script file, and executes the database script file to obtain an execution result of the task.
  • a path of the task configuration file is recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file.
  • the push module is further configured to push the task configuration file to the computing node
  • the execution module is further configured to acquire the second parameter in the task configuration file to replace a threshold value in the database script file by using the database common package according to the path of the task configuration file.
  • the execution module is configured to invoke, according to the path of the database common package, the database common package on the computing node, and obtain, according to the path of the database script file, the database common package.
  • the database script file generates a corresponding data exchange file according to a preset data exchange language, and extracts the first parameter from the data exchange file to replace a variable in the database script file, and Executing the database script file to obtain an execution result of the task.
  • the device further includes:
  • a task list module configured to add, before the task description file of the read task, the task to the record with at least the priority of the task, when the data required by the task has been acquired
  • the task list of a task the task is read from the task list according to the order of the tasks in the task list.
  • the task description file includes information about an input table corresponding to the task and information of an output table;
  • the execution module is configured to invoke the database common package according to the path of the database common package on the computing node, obtain a database script file by using the database common package according to the path of the database script file, and use the database script file
  • the first parameter replaces a variable in the database script file, and obtains input data of the database script from the input table and executes the database script file, and adds the obtained result to the output table.
  • the present disclosure also provides a server, comprising: the task execution device according to any of the above.
  • the present disclosure also provides a computer readable storage medium storing computer executable instructions for performing any of the methods described above.
  • the present disclosure also provides a server including one or more processors, a memory, and one or more programs, the one or more programs being stored in a memory, when executed by one or more processors, executing The above method.
  • the present disclosure also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, Having the computer perform any of the methods described above.
  • the task execution method, device and server provided by the present disclosure perform tasks based on multiple files compared with related technologies, and multiple files are independent of each other, which can effectively improve code flexibility, make code maintenance simple and convenient, and display database text.
  • the separation solves the drawback that the main program is huge and difficult to maintain; because the use of the database common package reduces the code repetition rate, the task code becomes simpler and saves manpower.
  • FIG. 1 is a flow chart of a task execution method according to an embodiment.
  • FIG. 2 is a flow chart of another task execution method of an embodiment.
  • FIG. 3 is a block diagram of a task execution apparatus of an embodiment.
  • FIG. 4 is a block diagram of another task execution apparatus of an embodiment.
  • FIG. 5 is a file structure diagram of a task execution method according to an embodiment.
  • FIG. 6 is a schematic diagram of a task execution method according to an embodiment.
  • FIG. 7 is a flow chart showing the operation of a task execution method according to an embodiment.
  • FIG. 8 is a flow chart showing the operation of another task execution method according to an embodiment.
  • FIG. 9 is a schematic diagram showing the hardware structure of a server according to an embodiment.
  • an embodiment of the present application provides a task execution method, including:
  • Step 110 The task description file corresponding to the task is read, and the path of the database common package for executing the task, the path of the database script file for indicating the entity of the task, and the first parameter corresponding to the task are recorded in the task description file.
  • a parameter is used to replace variables in the database script file.
  • the task description file may be an Extensible Markup Language (xml) format file
  • the database script file may be an sql format file
  • the tasks performed include, but are not limited to, a data cleaning task
  • the database public package Can be a public jar package.
  • step 120 the database script file is pushed to the computing node, and the computing node has a database common package.
  • Step 130 Call the database common package according to the path of the database common package on the computing node, obtain the database script file by using the database common package according to the path of the database script file, and replace the variable in the database script file with the first parameter, and execute Database script file to get the execution result of the task.
  • each data cleaning task can be written as a separate algorithm, and each data cleaning task is submitted by using the application deployment tool spark-submit.
  • the data cleaning task with the data source as the database table structure is extracted and made into a common public jar package file (for example, a database public package).
  • the developer can describe the input and output of the task by configuring the xml description file corresponding to the task.
  • the task execution method provided in this embodiment can solve the problems of code duplication, excessive engineering, inconvenient task management, and the like, and can also reduce the development cost of the data cleaning task with the data source as the database table structure. Maintenance costs.
  • a path of the task configuration file is recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file.
  • the method further includes: The task configuration file is pushed to the computing node; the second parameter in the task configuration file is obtained by the database common package according to the path of the task configuration file to replace the gate in the database script file. Limit.
  • acquiring the database script file and replacing the variable in the database script file with the first parameter comprises: acquiring the database script file and using the task description file according to a preset data exchange language. Generating a data exchange file, extracting the first parameter from the data exchange file to replace a variable in the database script file.
  • the method further includes: adding the task to the record in order according to the priority of the task, when the data required by the task has been acquired
  • the task list of one or more tasks the tasks are read from the task list according to the order of the tasks in the task list.
  • the task description file includes information of an input table corresponding to the task and information of an output table
  • the executing the database script file includes: acquiring input data of the database script from the input table. And executing the database script file, and adding the obtained result to the output table.
  • an embodiment of the present application provides a task execution method, including:
  • Step 210 The task description file corresponding to the task is read, and the path of the database common package for executing the task, the path of the database script file for indicating the entity of the task, and the first parameter corresponding to the task are recorded in the task description file.
  • One parameter is used to replace the variable in the database script file
  • the task description file also records the path of the task configuration file
  • the task configuration file records the second parameter for replacing the threshold value in the database script file
  • the task description file includes the task.
  • an original task can be composed of three parts: a task.xml file (ie, an xml format file), a task .sql file (ie, a file in sql format), and a task .conf file (ie, a file in a conf format). If there is no threshold, it can be omitted).
  • a task.xml file save the task type (sql task or RDD task), input the table (the database, table name, type, partition, etc. information/file), and output the table (the database, table name, type, partition)
  • execution time timed task
  • jar package path for task execution task .conf file
  • task.sql file path etc.
  • the task .sql file holds the task entity, that is, the sql statement with variables. Constant information such as task thresholds is saved in the task .conf file.
  • Step 220 Push the database script file to the computing node, push the task configuration file to the computing node, and have a database common package at the computing node.
  • a main program can be implemented. After starting the main program, the main program reads the information in the task .xml file into the memory task metadata list, and pushes the task .sql file and the task .conf file to at least one spark computing node.
  • Step 230 according to the priority of the task, whether the data required by the task has been acquired, adding the task to the task list recorded with at least one task arranged in order; reading from the task list according to the order of the tasks in the task list
  • the task description file After the task, according to the preset data exchange language, the task description file generates a corresponding data exchange file, extracts the first parameter from the data exchange file to replace the variable in the database script file, and invokes the database public according to the path of the database common package.
  • the package executes the database script file to get the execution result of the task.
  • the main program may add a task to the task scheduling list, that is, the task list, according to the task metadata description (data driven/time driven), data arrival status, task priority status, and the like.
  • the main program can use the spark-submit to submit tasks to the spark cluster and perform the above tasks on the spark cluster computing node.
  • the main program monitors the use of the spark resource.
  • the main program When the resource allows, the main program generates the task .json file, the sql public jar package, the parameters in the task.json file, and the task in the .conf file according to the required parameters in the task metadata. Threshold, and replace the variables in the task .sql file to generate a complete task sql statement (task entity).
  • the output type save the task execution result to the corresponding database or file.
  • Step 240 Replace the threshold value in the database script file by obtaining the second parameter in the task configuration file by using the database common package, and obtain the input data of the database script from the input table and execute the database script file, and add the obtained result to the output.
  • the public jar package that is, the database public package, can return the result of executing the task to the main program, and the main program judges the final execution result of the task.
  • comparing the multiple tasks can effectively improve the code flexibility and make the code maintenance simple and convenient. Separating the main program from the algorithm solves the drawback that the main program is huge and difficult to maintain. In addition, the extraction of the common jar package reduces the duplication of code, making the data cleaning task easier and saving manpower.
  • an embodiment of the present application provides a task execution apparatus, including: a reading module 310, a pushing module 320, and an execution module 330;
  • the reading module 310 is configured to read a task description file corresponding to the task, where the task description file records the path of the database common package for executing the task, the path of the database script file for the entity representing the task, and the corresponding task A parameter, the first parameter is used to replace variables in the database script file.
  • the task description file may adopt an xml format file
  • the database script file may be a sql format file
  • the tasks performed include, but are not limited to, a data cleaning task
  • the database public package may be a public jar package.
  • the push module 320 is configured to push the database script file to the computing node, where the computing node has a database common package.
  • the executing module 330 is configured to invoke the database common package according to the path of the database common package on the computing node, obtain the database script file by using the database common package according to the path of the database script file, and replace the variable in the database script file with the first parameter. And execute the database script file to get the execution result of the task.
  • each data cleaning task can be written into a separate algorithm, and the task is submitted by using the spark-submit, and the data cleaning task with the data source as the database table structure is extracted to be a common public jar package file.
  • the developer can describe the input and output of the task by configuring an XML description file corresponding to the task.
  • the task execution device provided by the embodiment can solve the problems of duplication of the cleaning task code, excessive engineering, inconvenient task management, and the development cost and maintenance cost of the data cleaning task with the data source as the database table structure.
  • a path of the task configuration file is recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file; the push module 320 further sets In order to push the task configuration file to the computing node, the execution module is further configured to acquire the second parameter in the task configuration file according to the path of the task configuration file by using the database common package. To replace the threshold in the database script file.
  • the execution module 330 is configured to invoke the database common package according to the path of the database common package on the computing node, and obtain the location by using the database common package according to the path of the database script file. Descripting a database script file and generating a corresponding data exchange file according to a preset data exchange language, and extracting the first parameter from the data exchange file to replace a variable in the database script file, And executing the database script file to obtain an execution result of the task.
  • the apparatus described with reference to FIG. 4 further includes: a task list module 430, configured to set, before the task description file of the task, according to the priority of the task, when the data required by the task has been acquired And adding the task to a task list recorded with one or more tasks arranged in order; reading the task from the task list according to an order of the tasks in the task list.
  • a task list module 430 configured to set, before the task description file of the task, according to the priority of the task, when the data required by the task has been acquired And adding the task to a task list recorded with one or more tasks arranged in order; reading the task from the task list according to an order of the tasks in the task list.
  • the task description file includes information about an input table corresponding to the task and information of an output table;
  • the execution module 330 is configured to invoke the database common package according to the path of the database common package on the computing node, obtain a database script file by using the database common package according to the path of the database script file, and use The first parameter replaces a variable in the database script file, and obtains input data of the database script from the input table and executes the database script file, and adds the obtained result to the output table.
  • another task execution apparatus including:
  • the reading module 410 is configured to read a task description file corresponding to the task, where the task description file records the path of the database common package for executing the task, the path of the database script file for representing the entity of the task, and the corresponding task a parameter, the first parameter is used to replace the variable in the database script file, the task description file also records the path of the task configuration file, and the task configuration file records the second parameter for replacing the threshold value of the database script file, and the task description file
  • the task description file records the path of the database common package for executing the task, the path of the database script file for representing the entity of the task, and the corresponding task a parameter
  • the first parameter is used to replace the variable in the database script file
  • the task description file also records the path of the task configuration file
  • the task configuration file records the second parameter for replacing the threshold value of the database script file
  • the information of the input table corresponding to the task and the information of the output table are included.
  • an original task can be composed of three parts: a task.xml file, a task.sql file, and a task.conf file (if no threshold can be omitted).
  • the task.xml file save the task type (sql task / RDD task), input the table (the database, table name, type, partition, etc. information / file), output table (the database, table name, type, partition) Such information/file), execution time (timed task), jar package path for task execution, task .conf file, task.sql file path, etc.
  • the task .sql file holds the task entity, that is, the sql statement with variables. Constant information such as task thresholds is saved in the task .conf file.
  • the push module 420 is configured to push the database script file to the computing node, push the task configuration file to the computing node, and have a database common package at the computing node.
  • a main program can be implemented. After starting the main program, the main program reads the information in the task .xml file into the memory task metadata list, and pushes the task .sql file and the task .conf file to at least one spark computing node.
  • the task list module 430 is configured to add the task to the task list recorded with at least one task arranged in order according to the priority of the task, when the data required by the task has been acquired; according to the order of the tasks in the task list, After reading the task in the task list, according to the preset data exchange language, the task description file generates a corresponding data exchange file, and the first parameter is extracted from the data exchange file to replace the variable in the database script file, and the database public package is The path calls the database common package to execute the database script file to get the execution result of the task.
  • the main program may add a task to the task scheduling list, that is, the task list, according to the task metadata description (data driven/time driven), data arrival status, task priority status, and the like.
  • the main program can use the spark-submit to submit tasks to the spark cluster and perform the above tasks on the spark cluster computing node.
  • the main program monitors the use of the spark resource.
  • the main program When the resource allows, the main program generates the task .json file, the sql public jar package, the parameters in the task.json file, and the task in the .conf file according to the required parameters in the task metadata. Threshold, and replace the variables in the task .sql file to generate a complete task sql statement (task entity).
  • the output type save the task execution result to the corresponding database or file.
  • the executing module 440 is configured to replace the threshold value in the database script file by obtaining the second parameter in the task configuration file by using the database common package, and obtain the input data of the database script from the input table and execute the database script file, and the obtained The result is added to the output table.
  • the public jar package returns the result of executing the task to the main program, and the main program determines the final execution result of the task.
  • the task independence can effectively improve the code flexibility and make the code maintenance simple and convenient compared with the related art.
  • Separating the main program from the algorithm solves the drawback that the main program is huge and difficult to maintain.
  • the extraction of the common jar package reduces the duplication of code, making the data cleaning task easier and saving manpower.
  • a server which includes any one of the task execution devices described in Embodiment 3 or Embodiment 4. It should be understood by those skilled in the art that the above-mentioned task execution apparatus may be included on the server, that is, each function module of the above-mentioned task execution apparatus may be implemented by server-based software and/or hardware, and the above embodiment may be implemented by the server of the embodiment. The technical effect of the task execution device.
  • a task named lte_subject_poorquality_cell_day first edits the task lte_subject_poorquality_cell_day.xml file according to the template.
  • the xml file records the following contents: task name; execution granularity, which is executed every day; indication information, indicating that the task exists sql statement; sql public jar package entry class and entry function (can be omitted, there is a default value in the main program);
  • the variable replacement rule can be a name surrounded by two ‘$’ symbols.
  • the threshold replacement rule can be a name surrounded by two ‘#’ symbols.
  • the method may include: an output table alias, an output table partition value, and a threshold value, wherein the output table alias corresponds to an alias in the task .xml file, and is replaced with an actual name of the output table when the algorithm public jar package is executed; and the output table partition value corresponds to the task.
  • the alias in the xml file is replaced with the actual value when the algorithm public jar package is executed; the threshold value corresponds to the value in the task.conf file, and is replaced with the actual value when the algorithm public jar package is executed.
  • the presence threshold value is taken as an example, and the threshold value is extracted to a separate configuration file to facilitate subsequent modification of the threshold value.
  • the main program scans all the tasks in the contract directory at startup, and the tasks added and modified separately can also be added in the form of patches by special interfaces.
  • the main program loads the original task file as shown in Figure 6.
  • the main program reads the task .xml file into memory and adds the task metadata list for use after generating the task .json file.
  • the task .conf and the task .sql file are used.
  • Pushed to the corresponding compute node (such as the spark Spark task node in the figure) for use by the sql public jar package when the task is executed.
  • a task with data is a task that can be performed. There are sparks in which computing resources are available to actually perform the task. When all the preparation conditions are met, the task will be submitted.
  • step 710 parsing the task .json file.
  • the sql public jar package reads the task .json file generated by the main program.
  • Step 720 Generate a parameter information replacement list.
  • Step 730 Read the sql file, and replace the item in the sql file containing the condition information replacement list. Get the task .sql file path and the task .conf file path to read the sql statement in the sql file.
  • the sql statement of each task can be multiple.
  • the time conditions, input tables, output tables, and thresholds in these statements are variables according to the framework constraint format, and need to be replaced in the sql public jar package.
  • the replacement value is obtained by reading the relevant parameters in the task .json file.
  • Step 740 According to the input type, establish a corresponding driver driver, and execute a complete sql statement. After the replacement is completed, the sql statement is a sql statement that can be run directly. At this time, according to the database type corresponding to the input table, a link is established with the database, and the sql statement is executed.
  • Step 750 Whether the input table and the output table are tables of the same database. If the output is the same as the input under the database, the task ends (insert statement). If the output is another database table, go to step 760.
  • Step 760 Store the execution result of the sql file, and save the execution result into a corresponding file.
  • Step 770 Determine whether the output is a file format. If the output is in file format, the corresponding end. If the output is not in the file format, go to step 780.
  • Step 780 Load the file of the execution result into the corresponding database. The overall flow of this example is shown in Figure 8, where the task is controlled in order by the task list.
  • Step 810 Load a task description file to generate task metadata.
  • Step 820 Generate a task .json file and execute the task according to the task scheduling list.
  • step 830 whether it is a Sql task, if yes, step 840 is performed, and if not, step 850 is performed.
  • Step 840 The sql public jar package runs the sql task, and the number of task lists is decreased by 1, and step 860 is performed.
  • Step 850 Corresponding to the operation of the RDD public jar package, the number of task lists is decreased by 1, and step 860 is performed.
  • step 860 the task list is empty. If it is empty, the process ends. If not, step 820 is performed.
  • the embodiment further provides a computer readable storage medium storing computer executable instructions for performing the above method.
  • FIG. 9 is a schematic diagram showing the hardware structure of a server according to an embodiment. As shown in FIG. 9, the server includes: one or more processors 910 and a memory 920. One processor 910 is taken as an example in FIG.
  • the server may also include an input device 930 and an output device 940.
  • the processor 910, the memory 920, the input device 930, and the output device 940 in the server may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
  • Input device 930 can receive input numeric or character information
  • output device 940 can include a display device such as a display screen.
  • the memory 920 is a computer readable storage medium that can be used to store software programs, computer executable programs, and modules.
  • the processor 910 performs various functional applications and data processing by executing software programs, instructions, and modules stored in the memory 920 to implement any of the above-described embodiments.
  • the memory 920 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the server, and the like.
  • the memory may include volatile memory such as random access memory (RAM), and may also include non-volatile memory such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
  • Memory 920 can be a non-transitory computer storage medium or a transitory computer storage medium.
  • the non-transitory computer storage medium such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 920 can optionally include memory remotely located relative to processor 910, which can be connected to the server over a network. Examples of the above networks may include the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • Input device 930 can be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the server.
  • Output device 940 can include a display device such as a display screen.
  • the server of this embodiment may also include communication means 950 for transmitting and/or receiving information over a communication network.
  • a person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by executing related hardware by a computer program, and the program can be stored in a non-transitory computer readable storage medium.
  • the program when executed, may include the flow of an embodiment of the method as described above, wherein the non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM). Wait.
  • the task execution method, device and server provided by the disclosure can solve the problems of code duplication, excessive engineering, inconvenient task management, etc. in the data cleaning task, and reduce development cost and maintenance cost.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

Disclosed are a task execution method and apparatus and a server. The method comprises : reading a task description file corresponding to a task, wherein a path of a database public packet for executing the task, a path of a database script file for representing an entity of the task, and a first parameter corresponding to the task are recorded in the task description file, with the first parameter being used for replacing a variable in the database script file (110); pushing the database script file to a computing node, with the computing node being provided with the database public packet (120); and invoking the database public packet on the computing node according to the path of the database public packet, acquiring, via the database public packet and according to the path of the database script file, the database script file, and replacing the variable in the database script file with the first parameter, and executing the database script file so as to obtain an execution result of the task (130).

Description

任务执行方法、装置和服务器Task execution method, device and server 技术领域Technical field
本公开涉及计算机技术领域,例如涉及一种任务执行方法、装置和服务器。The present disclosure relates to the field of computer technology, for example, to a task execution method, apparatus, and server.
背景技术Background technique
火花Spark是一个围绕速度、易用性和复杂分析构建的大数据处理框架,通过在数据处理过程中采用成本更低的Suffle(数据清洗算法)方式,将MapReduce(一种编程模型)提升到一个更高的层次。Spark利用内存数据存储和接近实时的处理能力,其性能比其他大数据处理技术要快很多倍。Spark Spark is a big data processing framework built around speed, ease of use and complex analysis. It promotes MapReduce (a programming model) to a lower cost Suffle (data cleaning algorithm) in the data processing process. Higher level. Spark takes advantage of in-memory data storage and near real-time processing power, which is many times faster than other big data processing technologies.
相关技术中数据清洗任务,往往需要开发大量的代码来支撑,当清洗任务很多的时候,往往工程包会变得臃肿不堪,从而带来多种弊端。例如:1.代码的重复率高,人力浪费严重。当工程很大的时候,不同的开发人员之间会存在相当一部分的重复功能代码,再合并代码复杂度高,将会大大浪费人力资源。2.代码的维护成本太高。当工程运行时发现某个任务存在漏洞需要调整时,势必要停掉整个工程,这样将导致大多数良好的任务等待一个问题任务的情况,造成资源的严重浪费。3.代码的灵活性差。当需要交付时,若某几个任务的代码存在问题,会导致项目整体无法交付。4.代码的调试、运行和转移等不便。由于任务多而导致的工程变大,会使代码的调试和运行时间变得更长,且会使代码转移变得艰难。In the related art, the data cleaning task often needs to develop a large amount of code to support. When there are many cleaning tasks, the engineering package will become bloated, which brings various drawbacks. For example: 1. The code has a high repetition rate and serious human waste. When the project is very large, there will be a considerable part of the repetitive function code between different developers. The complexity of the merged code is high, which will greatly waste human resources. 2. The maintenance cost of the code is too high. When the project finds that there is a loophole in the task that needs to be adjusted, it is necessary to stop the entire project. This will cause most good tasks to wait for a problem task, resulting in serious waste of resources. 3. The code is not flexible enough. When there is a need to deliver, if there is a problem with the code of several tasks, the whole project will not be delivered. 4. The debugging, running and transfer of the code are inconvenient. Larger projects due to more tasks can make the debugging and running time of the code longer and make code transfer difficult.
发明内容Summary of the invention
本公开提供一种任务执行方法、装置和服务器,以解决数据清洗任务中代码重复、工程量大、任务管理不便中的至少一个难题。The present disclosure provides a task execution method, apparatus, and server to solve at least one of the problems of code duplication, large amount of engineering, and inconvenient task management in a data cleaning task.
本公开提供一种任务执行方法,包括:The present disclosure provides a task execution method, including:
读取任务的任务描述文件,所述任务描述文件中记录有用于执行所述任务的数据库公共包的路径、用于表示所述任务的实体的数据库脚本文件的路径、以及所述任务对应的第一参数;Reading a task description file of the task, wherein the task description file records a path of a database common package for executing the task, a path of a database script file for representing an entity of the task, and a corresponding One parameter
将所述数据库脚本文件推送到计算节点,所述计算节点处具有所述数据库公共包;Pushing the database script file to a computing node, where the computing node has the database common package;
在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取所述数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,以及执行所述数据库脚本文件,以得到所述任务的执行结果。Calling the database common package according to the path of the database common package on the computing node, obtaining the database script file by using the database common package according to the path of the database script file, and replacing the first parameter with the first parameter a variable in the database script file, and executing the database script file to obtain an execution result of the task.
可选的,所述任务描述文件中还记录有任务配置文件的路径,所述任务配置文件记录有用于替换所述数据库脚本文件中的门限值的第二参数;所述方法还包括:Optionally, a path of the task configuration file is also recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file. The method further includes:
将所述任务配置文件推送到所述计算节点;Pushing the task configuration file to the computing node;
通过所述数据库公共包按所述任务配置文件的路径,获取所述任务配置文件中的所述第二参数来替换所述数据库脚本文件中的门限值。And obtaining, by the database common package, the second parameter in the task configuration file to replace a threshold value in the database script file according to a path of the task configuration file.
可选的,获取所述数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,包括:Optionally, acquiring the database script file and replacing the variables in the database script file with the first parameter, including:
获取所述数据库脚本文件并根据预设的数据交换语言,将所述任务描述文件生成数据交换文件,从所述数据交换文件中提取所述第一参数来替换所述数据库脚本文件中的变量。Obtaining the database script file and generating a data exchange file according to a preset data exchange language, and extracting the first parameter from the data exchange file to replace a variable in the database script file.
可选的,在所述读取任务的任务描述文件之前,还包括:Optionally, before the task description file of the task is read, the method further includes:
根据所述任务的优先级,在所述任务所需数据已获取时,将所述任务添加到记录有按顺序排列的至少一个任务的任务列表中;And according to the priority of the task, when the data required by the task has been acquired, adding the task to a task list recorded with at least one task arranged in order;
根据所述任务在所述任务列表中的顺序,从所述任务列表中读取所述任务。The tasks are read from the task list according to the order of the tasks in the task list.
可选的,所述任务描述文件中包括所述任务对应的输入表的信息和输出表的信息;Optionally, the task description file includes information about an input table corresponding to the task and information of an output table;
所述执行所述数据库脚本文件,包括:The executing the database script file includes:
从所述输入表获取所述数据库脚本的输入数据并执行所述数据库脚本文件,将得到的结果加入所述输出表中。The input data of the database script is obtained from the input table and the database script file is executed, and the obtained result is added to the output table.
本一种任务执行装置,包括:The task execution device of the present invention comprises:
读取模块,设置为读取任务对应的任务描述文件,所述任务描述文件中记录有用于执行所述任务的数据库公共包的路径、用于表示所述任务的实体的数据库脚本文件的路径、以及所述任务对应的第一参数,所述第一参数用于替换所述数据库脚本文件中的变量;a reading module, configured to read a task description file corresponding to the task, where the task description file records a path of a database common package for executing the task, a path of a database script file for an entity representing the task, And a first parameter corresponding to the task, where the first parameter is used to replace a variable in the database script file;
推送模块,设置为将所述数据库脚本文件推送到计算节点,所述计算节点处具有所述数据库公共包;以及a push module configured to push the database script file to a compute node, the compute node having the database common package;
执行模块,设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取所述数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,以及执行所述数据库脚本文件,以得到所述任务的执行结果。An execution module, configured to invoke the database common package according to the path of the database common package on the computing node, obtain the database script file by using the database common package according to the path of the database script file, and use the database script file The first parameter replaces a variable in the database script file, and executes the database script file to obtain an execution result of the task.
可选的,所述任务描述文件中还记录有任务配置文件的路径,所述任务配置文件记录有用于替换所述数据库脚本文件中的门限值的第二参数;Optionally, a path of the task configuration file is recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file.
所述推送模块还设置为将所述任务配置文件推送到所述计算节点;The push module is further configured to push the task configuration file to the computing node;
所述执行模块还设置为通过所述数据库公共包按所述任务配置文件的路径,获取所述任务配置文件中的所述第二参数来替换所述数据库脚本文件中的门限值。The execution module is further configured to acquire the second parameter in the task configuration file to replace a threshold value in the database script file by using the database common package according to the path of the task configuration file.
可选的,所述执行模块是设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取所述数据库脚本文件并根据预设的数据交换语言,将所述任务描述文件生成相应的数据交换文件,从所述数据交换文件中提取所述第一参数来替换所述数据库脚本文件中的变量,以及执行所述数据库脚本文件,以得到所述任务的执行结果。Optionally, the execution module is configured to invoke, according to the path of the database common package, the database common package on the computing node, and obtain, according to the path of the database script file, the database common package. And the database script file generates a corresponding data exchange file according to a preset data exchange language, and extracts the first parameter from the data exchange file to replace a variable in the database script file, and Executing the database script file to obtain an execution result of the task.
可选的,所述装置还包括:Optionally, the device further includes:
任务列表模块,设置为在所述读取任务的任务描述文件之前,根据所述任务的优先级,在所述任务所需数据已获取时,将所述任务添加到记录有按顺序排列的至少一个任务的任务列表中;根据所述任务在所述任务列表中的顺序,从所述任务列表中读取所述任务。a task list module, configured to add, before the task description file of the read task, the task to the record with at least the priority of the task, when the data required by the task has been acquired The task list of a task; the task is read from the task list according to the order of the tasks in the task list.
可选的,所述任务描述文件中包括所述任务对应的输入表的信息和输出表的信息;Optionally, the task description file includes information about an input table corresponding to the task and information of an output table;
所述执行模块是设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,以及从所述输入表获取所述数据库脚本的输入数据并执行所述数据库脚本文件,将得到的结果加入所述输出表中。The execution module is configured to invoke the database common package according to the path of the database common package on the computing node, obtain a database script file by using the database common package according to the path of the database script file, and use the database script file The first parameter replaces a variable in the database script file, and obtains input data of the database script from the input table and executes the database script file, and adds the obtained result to the output table.
本公开还提供一种服务器,包括:根据上述任一项所述的任务执行装置。The present disclosure also provides a server, comprising: the task execution device according to any of the above.
本公开还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任一方法。The present disclosure also provides a computer readable storage medium storing computer executable instructions for performing any of the methods described above.
本公开还提供一种服务器,该服务器包括一个或多个处理器、存储器以及一个或多个程序,所述一个或多个程序存储在存储器中,当被一个或多个处理器执行时,执行上述方法。The present disclosure also provides a server including one or more processors, a memory, and one or more programs, the one or more programs being stored in a memory, when executed by one or more processors, executing The above method.
本公开还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意一种方法。The present disclosure also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, Having the computer perform any of the methods described above.
本公开提供的任务执行方法、装置和服务器与相关技术相比,使任务基于多个文件来执行,多个文件彼此独立化可以有效提升代码灵活性,使代码维护变得简洁方便,将数据库文本分离解决了主程序庞大不易维护的弊端;由于数据库公共包的使用,降低了代码重复率的同时,让任务的代码变得更加简单,节约人力。The task execution method, device and server provided by the present disclosure perform tasks based on multiple files compared with related technologies, and multiple files are independent of each other, which can effectively improve code flexibility, make code maintenance simple and convenient, and display database text. The separation solves the drawback that the main program is huge and difficult to maintain; because the use of the database common package reduces the code repetition rate, the task code becomes simpler and saves manpower.
附图说明DRAWINGS
图1为一实施例的一种任务执行方法的流程图。FIG. 1 is a flow chart of a task execution method according to an embodiment.
图2为一实施例的另一种任务执行方法的流程图。2 is a flow chart of another task execution method of an embodiment.
图3为一实施例的一种任务执行装置的框图。3 is a block diagram of a task execution apparatus of an embodiment.
图4为一实施例的另一种任务执行装置的框图。4 is a block diagram of another task execution apparatus of an embodiment.
图5为一实施例的一种任务执行方法的文件结构图。FIG. 5 is a file structure diagram of a task execution method according to an embodiment.
图6为一实施例的一种任务执行方法的原理图。FIG. 6 is a schematic diagram of a task execution method according to an embodiment.
图7为一实施例的一种任务执行方法的工作流程图。FIG. 7 is a flow chart showing the operation of a task execution method according to an embodiment.
图8为一实施例的另一种任务执行方法的工作流程图。FIG. 8 is a flow chart showing the operation of another task execution method according to an embodiment.
图9为一实施例的一种服务器的硬件结构示意图。FIG. 9 is a schematic diagram showing the hardware structure of a server according to an embodiment.
具体实施方式detailed description
实施例一Embodiment 1
如图1所示,本申请的一个实施例中提供了一种任务执行方法,包括:As shown in FIG. 1 , an embodiment of the present application provides a task execution method, including:
步骤110,读取任务对应的任务描述文件,任务描述文件中记录有用于执行任务的数据库公共包的路径、用于表示任务的实体的数据库脚本文件的路径、以及任务对应的第一参数,第一参数用于替换数据库脚本文件中的变量。在本 实施例中,任务描述文件可以采用可扩展标记语言(Extensible Markup Language,xml)格式文件,数据库脚本文件可以是sql格式文件,所进行的任务包括但不限于数据清洗任务,该数据库公共包可以是公共jar包。Step 110: The task description file corresponding to the task is read, and the path of the database common package for executing the task, the path of the database script file for indicating the entity of the task, and the first parameter corresponding to the task are recorded in the task description file. A parameter is used to replace variables in the database script file. In this embodiment, the task description file may be an Extensible Markup Language (xml) format file, and the database script file may be an sql format file, and the tasks performed include, but are not limited to, a data cleaning task, and the database public package Can be a public jar package.
步骤120,将数据库脚本文件推送到计算节点,计算节点处具有数据库公共包。In step 120, the database script file is pushed to the computing node, and the computing node has a database common package.
步骤130,在所述计算节点上根据数据库公共包的路径调用数据库公共包,通过数据库公共包按数据库脚本文件的路径,获取数据库脚本文件并使用第一参数替换数据库脚本文件中的变量,以及执行数据库脚本文件,以得到任务的执行结果。根据本实施例的技术方案,可以将每个数据清洗任务写成单独算法,使用应用程序部署工具spark-submit提交每个数据清洗任务。将以数据源为数据库表结构的数据清洗任务提取出来,做成通用的公共jar包文件(例如数据库公共包)。开发人员可以通过配置与任务对应的xml描述文件来描述任务的输入、输出。本实施例提供的任务执行方法,可以解决相关技术中数据清洗任务中代码重复、工程过大、任务管理不便等难题,同时也可以降低以数据源为数据库表结构的数据清洗任务的开发成本和维护成本。Step 130: Call the database common package according to the path of the database common package on the computing node, obtain the database script file by using the database common package according to the path of the database script file, and replace the variable in the database script file with the first parameter, and execute Database script file to get the execution result of the task. According to the technical solution of the embodiment, each data cleaning task can be written as a separate algorithm, and each data cleaning task is submitted by using the application deployment tool spark-submit. The data cleaning task with the data source as the database table structure is extracted and made into a common public jar package file (for example, a database public package). The developer can describe the input and output of the task by configuring the xml description file corresponding to the task. The task execution method provided in this embodiment can solve the problems of code duplication, excessive engineering, inconvenient task management, and the like, and can also reduce the development cost of the data cleaning task with the data source as the database table structure. Maintenance costs.
可选地,所述任务描述文件中还记录有任务配置文件的路径,所述任务配置文件记录有用于替换所述数据库脚本文件中的门限值的第二参数;所述方法还包括:将所述任务配置文件推送到所述计算节点;通过所述数据库公共包按所述任务配置文件的路径,获取所述任务配置文件中的所述第二参数来替换所述数据库脚本文件中的门限值。Optionally, a path of the task configuration file is recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file. The method further includes: The task configuration file is pushed to the computing node; the second parameter in the task configuration file is obtained by the database common package according to the path of the task configuration file to replace the gate in the database script file. Limit.
可选地,获取所述数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,包括:获取所述数据库脚本文件并根据预设的数据交换语言,将所述任务描述文件生成数据交换文件,从所述数据交换文件中提取所述第一参数来替换所述数据库脚本文件中的变量。Optionally, acquiring the database script file and replacing the variable in the database script file with the first parameter comprises: acquiring the database script file and using the task description file according to a preset data exchange language. Generating a data exchange file, extracting the first parameter from the data exchange file to replace a variable in the database script file.
可选地,在所述读取任务的任务描述文件之前,还包括:根据所述任务的优先级,在所述任务所需数据已获取时,将所述任务添加到记录有按顺序排列的一个或多个任务的任务列表中;根据所述任务在所述任务列表中的顺序,从所述任务列表中读取所述任务。Optionally, before the task description file of the task is read, the method further includes: adding the task to the record in order according to the priority of the task, when the data required by the task has been acquired The task list of one or more tasks; the tasks are read from the task list according to the order of the tasks in the task list.
可选地,所述任务描述文件中包括所述任务对应的输入表的信息和输出表的信息;所述执行所述数据库脚本文件,包括:从所述输入表获取所述数据库脚本的输入数据并执行所述数据库脚本文件,将得到的结果加入所述输出表中。Optionally, the task description file includes information of an input table corresponding to the task and information of an output table, and the executing the database script file includes: acquiring input data of the database script from the input table. And executing the database script file, and adding the obtained result to the output table.
实施例二Embodiment 2
如图2所示,本申请的一个实施例中提供了一种任务执行方法,包括:As shown in FIG. 2, an embodiment of the present application provides a task execution method, including:
步骤210,读取任务对应的任务描述文件,任务描述文件中记录有用于执行任务的数据库公共包的路径、用于表示任务的实体的数据库脚本文件的路径、以及任务对应的第一参数,第一参数用于替换数据库脚本文件中的变量,任务描述文件中还记录任务配置文件的路径,任务配置文件记录有用于替换数据库脚本文件中的门限值的第二参数,任务描述文件中包括任务对应的输入表的信息、输出表的信息。Step 210: The task description file corresponding to the task is read, and the path of the database common package for executing the task, the path of the database script file for indicating the entity of the task, and the first parameter corresponding to the task are recorded in the task description file. One parameter is used to replace the variable in the database script file, the task description file also records the path of the task configuration file, the task configuration file records the second parameter for replacing the threshold value in the database script file, and the task description file includes the task. The information of the corresponding input table and the information of the output table.
在本实施例中,一个原始任务可以由三部分组成:任务.xml文件(即xml格式的文件)、任务.sql文件(即sql格式的文件)、任务.conf文件(即conf格式的文件,若没有门限值可以省略)。任务.xml文件中,保存了任务类型(sql任务或RDD任务),输入表(表所在数据库、表名称、类型、分区等信息/文件),输出表(表所在数据库、表名称、类型、分区等信息/文件),执行时间(定时任务),执行任务的jar包路径、任务.conf文件、任务.sql文件路径等。任务.sql文件中保存有任务实体,即带有变量的sql语句。任务.conf文件中保存了任务门限值等常量信息。In this embodiment, an original task can be composed of three parts: a task.xml file (ie, an xml format file), a task .sql file (ie, a file in sql format), and a task .conf file (ie, a file in a conf format). If there is no threshold, it can be omitted). In the task.xml file, save the task type (sql task or RDD task), input the table (the database, table name, type, partition, etc. information/file), and output the table (the database, table name, type, partition) Such information/file), execution time (timed task), jar package path for task execution, task .conf file, task.sql file path, etc. The task .sql file holds the task entity, that is, the sql statement with variables. Constant information such as task thresholds is saved in the task .conf file.
步骤220,将数据库脚本文件推送到计算节点,将任务配置文件推送到计算节点,计算节点处具有数据库公共包。基于本实施例的技术方案,可以实现一种主程序。启动主程序后,主程序会将任务.xml文件中的信息读取到内存的任务元数据列表,并将任务.sql文件、任务.conf文件推送到至少一个spark计算节点。Step 220: Push the database script file to the computing node, push the task configuration file to the computing node, and have a database common package at the computing node. Based on the technical solution of the embodiment, a main program can be implemented. After starting the main program, the main program reads the information in the task .xml file into the memory task metadata list, and pushes the task .sql file and the task .conf file to at least one spark computing node.
步骤230,根据任务的优先级、任务所需数据是否已获取,将任务添加到记录有按顺序排列的至少一个务的任务列表中;根据任务在任务列表中的顺序,从任务列表中读取任务后,根据预设的数据交换语言,将任务描述文件生成相应的数据交换文件,从数据交换文件中提取第一参数来替换数据库脚本文件中 的变量,以及根据数据库公共包的路径调用数据库公共包执行数据库脚本文件,以得到任务的执行结果。在本实施例中,主程序可以根据任务元数据描述情况(数据驱动/时间驱动)、数据到达情况、任务优先级情况等,将任务加入到任务调度列表,即任务列表。 Step 230, according to the priority of the task, whether the data required by the task has been acquired, adding the task to the task list recorded with at least one task arranged in order; reading from the task list according to the order of the tasks in the task list After the task, according to the preset data exchange language, the task description file generates a corresponding data exchange file, extracts the first parameter from the data exchange file to replace the variable in the database script file, and invokes the database public according to the path of the database common package. The package executes the database script file to get the execution result of the task. In this embodiment, the main program may add a task to the task scheduling list, that is, the task list, according to the task metadata description (data driven/time driven), data arrival status, task priority status, and the like.
在本实施例中,主程序可以使用spark-submit提交任务到spark集群并在spark集群计算节点上执行上述任务。主程序监控spark资源使用情况,当资源允许时,主程序根据任务元数据中所需参数,生成任务.json文件、sql公共jar包读取任务.json文件中的参数和任务.conf文件中的门限值,并将任务.sql文件中变量替换掉,生成完整的任务的sql语句(任务实体)。根据输入表类型获取json文件中对应的数据库连接信息(spark/gbase/mysql/其他),从而连接到对应数据库并执行任务实体中的sql语句。根据输出类型,保存任务执行结果到对应的数据库或文件中。In this embodiment, the main program can use the spark-submit to submit tasks to the spark cluster and perform the above tasks on the spark cluster computing node. The main program monitors the use of the spark resource. When the resource allows, the main program generates the task .json file, the sql public jar package, the parameters in the task.json file, and the task in the .conf file according to the required parameters in the task metadata. Threshold, and replace the variables in the task .sql file to generate a complete task sql statement (task entity). Obtain the corresponding database connection information (spark/gbase/mysql/other) in the json file according to the input table type, thereby connecting to the corresponding database and executing the sql statement in the task entity. According to the output type, save the task execution result to the corresponding database or file.
步骤240,通过数据库公共包按获取任务配置文件中的第二参数来替换数据库脚本文件中的门限值,以及从输入表获取数据库脚本的输入数据并执行数据库脚本文件,将得到的结果加入输出表中。在本实施例中,公共jar包,即数据库公共包可以将执行任务的结果返回给主程序,主程序判断任务最终执行结果。Step 240: Replace the threshold value in the database script file by obtaining the second parameter in the task configuration file by using the database common package, and obtain the input data of the database script from the input table and execute the database script file, and add the obtained result to the output. In the table. In this embodiment, the public jar package, that is, the database public package, can return the result of executing the task to the main program, and the main program judges the final execution result of the task.
根据本实施例的技术方案,与相关技术相比,将多个任务独立化可以有效提升代码灵活性,使代码维护变得简洁方便。将主程序与算法分离解决了主程序庞大不易维护的弊端,另外通过公共jar包的提取,降低了代码重复的同时,让数据清洗任务变得更加简单,节约人力。According to the technical solution of the embodiment, comparing the multiple tasks can effectively improve the code flexibility and make the code maintenance simple and convenient. Separating the main program from the algorithm solves the drawback that the main program is huge and difficult to maintain. In addition, the extraction of the common jar package reduces the duplication of code, making the data cleaning task easier and saving manpower.
实施例三Embodiment 3
如图3所示,本申请的一个实施例中提供了一种任务执行装置,包括:读取模块310、推送模块320和执行模块330;其中,As shown in FIG. 3, an embodiment of the present application provides a task execution apparatus, including: a reading module 310, a pushing module 320, and an execution module 330;
读取模块310,设置为读取任务对应的任务描述文件,任务描述文件中记录有用于执行任务的数据库公共包的路径、用于表示任务的实体的数据库脚本文件的路径、以及任务对应的第一参数,第一参数用于替换数据库脚本文件中的变量。在本实施例中,任务描述文件可以采用xml格式文件,数据库脚本文件可以是sql格式文件,所进行的任务包括但不限于数据清洗任务,该数据库公共 包可以是公共jar包。The reading module 310 is configured to read a task description file corresponding to the task, where the task description file records the path of the database common package for executing the task, the path of the database script file for the entity representing the task, and the corresponding task A parameter, the first parameter is used to replace variables in the database script file. In this embodiment, the task description file may adopt an xml format file, and the database script file may be a sql format file, and the tasks performed include, but are not limited to, a data cleaning task, and the database public package may be a public jar package.
推送模块320,设置为将数据库脚本文件推送到计算节点,计算节点处具有数据库公共包。The push module 320 is configured to push the database script file to the computing node, where the computing node has a database common package.
执行模块330,设置为在所述计算节点上根据数据库公共包的路径调用数据库公共包,通过数据库公共包按数据库脚本文件的路径,获取数据库脚本文件并使用第一参数替换数据库脚本文件中的变量,以及执行数据库脚本文件,以得到任务的执行结果。The executing module 330 is configured to invoke the database common package according to the path of the database common package on the computing node, obtain the database script file by using the database common package according to the path of the database script file, and replace the variable in the database script file with the first parameter. And execute the database script file to get the execution result of the task.
根据本实施例的技术方案,可以将每个数据清洗任务写成单独算法,使用spark-submit提交任务,将以数据源为数据库表结构的数据清洗任务提取出来,做成通用的公共jar包文件,开发人员可以通过配置与任务对应的XML描述文件来描述任务的输入、输出。本实施例提供的任务执行装置,可以解决清洗任务代码重复、工程过大、任务管理不便等难题,同时也降低了以数据源为数据库表结构的数据清洗任务的开发成本和维护成本。According to the technical solution of the embodiment, each data cleaning task can be written into a separate algorithm, and the task is submitted by using the spark-submit, and the data cleaning task with the data source as the database table structure is extracted to be a common public jar package file. The developer can describe the input and output of the task by configuring an XML description file corresponding to the task. The task execution device provided by the embodiment can solve the problems of duplication of the cleaning task code, excessive engineering, inconvenient task management, and the development cost and maintenance cost of the data cleaning task with the data source as the database table structure.
可选地,所述任务描述文件中还记录有任务配置文件的路径,所述任务配置文件记录有用于替换所述数据库脚本文件中的门限值的第二参数;所述推送模块320还设置为将所述任务配置文件推送到所述计算节点;所述执行模块还330设置为通过所述数据库公共包按所述任务配置文件的路径,获取所述任务配置文件中的所述第二参数来替换所述数据库脚本文件中的门限值。Optionally, a path of the task configuration file is recorded in the task description file, where the task configuration file records a second parameter used to replace a threshold value in the database script file; the push module 320 further sets In order to push the task configuration file to the computing node, the execution module is further configured to acquire the second parameter in the task configuration file according to the path of the task configuration file by using the database common package. To replace the threshold in the database script file.
可选地,所述执行模块330是设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取所述数据库脚本文件并根据预设的数据交换语言,将所述任务描述文件生成相应的数据交换文件,从所述数据交换文件中提取所述第一参数来替换所述数据库脚本文件中的变量,以及执行所述数据库脚本文件,以得到所述任务的执行结果。Optionally, the execution module 330 is configured to invoke the database common package according to the path of the database common package on the computing node, and obtain the location by using the database common package according to the path of the database script file. Descripting a database script file and generating a corresponding data exchange file according to a preset data exchange language, and extracting the first parameter from the data exchange file to replace a variable in the database script file, And executing the database script file to obtain an execution result of the task.
可选地,参考图4所述装置还包括:任务列表模块430,设置为在所述读取任务的任务描述文件之前,根据所述任务的优先级,在所述任务所需数据已获取时,将所述任务添加到记录有按顺序排列的一个或多个任务的任务列表中;根据所述任务在所述任务列表中的顺序,从所述任务列表中读取所述任务。Optionally, the apparatus described with reference to FIG. 4 further includes: a task list module 430, configured to set, before the task description file of the task, according to the priority of the task, when the data required by the task has been acquired And adding the task to a task list recorded with one or more tasks arranged in order; reading the task from the task list according to an order of the tasks in the task list.
可选地,所述任务描述文件中包括所述任务对应的输入表的信息和输出表的信息;Optionally, the task description file includes information about an input table corresponding to the task and information of an output table;
所述执行模块330是设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,以及从所述输入表获取所述数据库脚本的输入数据并执行所述数据库脚本文件,将得到的结果加入所述输出表中。The execution module 330 is configured to invoke the database common package according to the path of the database common package on the computing node, obtain a database script file by using the database common package according to the path of the database script file, and use The first parameter replaces a variable in the database script file, and obtains input data of the database script from the input table and executes the database script file, and adds the obtained result to the output table.
实施例四Embodiment 4
如图4所示,本申请的一个实施例中提供了另一种任务执行装置,包括:As shown in FIG. 4, in another embodiment of the present application, another task execution apparatus is provided, including:
读取模块410,设置为读取任务对应的任务描述文件,任务描述文件中记录有用于执行任务的数据库公共包的路径、用于表示任务的实体的数据库脚本文件的路径、以及任务对应的第一参数,第一参数用于替换数据库脚本文件中的变量,任务描述文件中还记录任务配置文件的路径,任务配置文件记录有用于替换数据库脚本文件的门限值的第二参数,任务描述文件中包括任务对应的输入表的信息、输出表的信息。The reading module 410 is configured to read a task description file corresponding to the task, where the task description file records the path of the database common package for executing the task, the path of the database script file for representing the entity of the task, and the corresponding task a parameter, the first parameter is used to replace the variable in the database script file, the task description file also records the path of the task configuration file, and the task configuration file records the second parameter for replacing the threshold value of the database script file, and the task description file The information of the input table corresponding to the task and the information of the output table are included.
在本实施例中,一个原始任务可以由三部分组成:任务.xml文件、任务.sql文件、任务.conf文件(若没有门限值可以省略)。任务.xml文件中,保存了任务类型(sql任务/RDD任务),输入表(表所在数据库、表名称、类型、分区等信息/文件),输出表(表所在数据库、表名称、类型、分区等信息/文件),执行时间(定时任务),执行任务的jar包路径、任务.conf文件、任务.sql文件路径等。任务.sql文件中保存有任务实体,即带有变量的sql语句。任务.conf文件中保存了任务门限值等常量信息。In this embodiment, an original task can be composed of three parts: a task.xml file, a task.sql file, and a task.conf file (if no threshold can be omitted). In the task.xml file, save the task type (sql task / RDD task), input the table (the database, table name, type, partition, etc. information / file), output table (the database, table name, type, partition) Such information/file), execution time (timed task), jar package path for task execution, task .conf file, task.sql file path, etc. The task .sql file holds the task entity, that is, the sql statement with variables. Constant information such as task thresholds is saved in the task .conf file.
推送模块420,设置为将数据库脚本文件推送到计算节点,将任务配置文件推送到计算节点,计算节点处具有数据库公共包。基于本实施例的技术方案,可以实现一种主程序。启动主程序后,主程序会将任务.xml文件中的信息读取到内存的任务元数据列表,并将任务.sql文件、任务.conf文件推送到至少一个spark计算节点。The push module 420 is configured to push the database script file to the computing node, push the task configuration file to the computing node, and have a database common package at the computing node. Based on the technical solution of the embodiment, a main program can be implemented. After starting the main program, the main program reads the information in the task .xml file into the memory task metadata list, and pushes the task .sql file and the task .conf file to at least one spark computing node.
任务列表模块430,设置为根据任务的优先级,在任务所需数据已获取时,将任务添加到记录有按顺序排列的至少一个任务的任务列表中;根据任务在任务列表中的顺序,从任务列表中读取任务后,根据预设的数据交换语言,将任 务描述文件生成相应的数据交换文件,从数据交换文件中提取第一参数来替换数据库脚本文件中的变量,以及根据数据库公共包的路径调用数据库公共包执行数据库脚本文件,以得到任务的执行结果。在本实施例中,主程序可以根据任务元数据描述情况(数据驱动/时间驱动)、数据到达情况、任务优先级情况等,将任务加入到任务调度列表,即任务列表。The task list module 430 is configured to add the task to the task list recorded with at least one task arranged in order according to the priority of the task, when the data required by the task has been acquired; according to the order of the tasks in the task list, After reading the task in the task list, according to the preset data exchange language, the task description file generates a corresponding data exchange file, and the first parameter is extracted from the data exchange file to replace the variable in the database script file, and the database public package is The path calls the database common package to execute the database script file to get the execution result of the task. In this embodiment, the main program may add a task to the task scheduling list, that is, the task list, according to the task metadata description (data driven/time driven), data arrival status, task priority status, and the like.
在本实施例中,主程序可以使用spark-submit提交任务到spark集群并在spark集群计算节点上执行上述任务。主程序监控spark资源使用情况,当资源允许时,主程序根据任务元数据中所需参数,生成任务.json文件、sql公共jar包读取任务.json文件中的参数和任务.conf文件中的门限值,并将任务.sql文件中变量替换掉,生成完整的任务的sql语句(任务实体)。根据输入表类型获取.json文件中对应的数据库连接信息(spark/gbase/mysql/其他),从而连接到对应数据库并执行任务实体中的sql语句。根据输出类型,保存任务执行结果到对应的数据库或文件中。In this embodiment, the main program can use the spark-submit to submit tasks to the spark cluster and perform the above tasks on the spark cluster computing node. The main program monitors the use of the spark resource. When the resource allows, the main program generates the task .json file, the sql public jar package, the parameters in the task.json file, and the task in the .conf file according to the required parameters in the task metadata. Threshold, and replace the variables in the task .sql file to generate a complete task sql statement (task entity). Obtain the corresponding database connection information (spark/gbase/mysql/other) in the .json file according to the input table type, thereby connecting to the corresponding database and executing the sql statement in the task entity. According to the output type, save the task execution result to the corresponding database or file.
执行模块440,设置为通过数据库公共包按获取任务配置文件中的第二参数来替换数据库脚本文件中的门限值,以及从输入表获取数据库脚本的输入数据并执行数据库脚本文件,将得到的结果加入输出表中。在本实施例中,公共jar包将执行任务的结果返回给主程序,主程序判断任务最终执行结果。The executing module 440 is configured to replace the threshold value in the database script file by obtaining the second parameter in the task configuration file by using the database common package, and obtain the input data of the database script from the input table and execute the database script file, and the obtained The result is added to the output table. In this embodiment, the public jar package returns the result of executing the task to the main program, and the main program determines the final execution result of the task.
根据本实施例的技术方案,与相关技术相比,将任务独立化可以有效提升代码灵活性,使代码维护变得简洁方便。将主程序与算法分离解决了主程序庞大不易维护的弊端,另外通过公共jar包的提取,降低了代码重复的同时,让数据清洗任务变得更加简单,节约人力。According to the technical solution of the embodiment, the task independence can effectively improve the code flexibility and make the code maintenance simple and convenient compared with the related art. Separating the main program from the algorithm solves the drawback that the main program is huge and difficult to maintain. In addition, the extraction of the common jar package reduces the duplication of code, making the data cleaning task easier and saving manpower.
实施例五Embodiment 5
本申请的一个实施例中提供了一种服务器,该服务器包括实施例三或实施例四中记载的任一种任务执行装置。本领域技术人员应当理解,服务器上可以包括上述的任务执行装置,即基于服务器的软件和/或硬件可实现上述任务执行装置的每个功能模块,以及通过本实施例的服务器可以实现上述实施例的任务执行装置的技术效果。In one embodiment of the present application, a server is provided, which includes any one of the task execution devices described in Embodiment 3 or Embodiment 4. It should be understood by those skilled in the art that the above-mentioned task execution apparatus may be included on the server, that is, each function module of the above-mentioned task execution apparatus may be implemented by server-based software and/or hardware, and the above embodiment may be implemented by the server of the embodiment. The technical effect of the task execution device.
本申请的实施例的一个应用示例如下:An application example of an embodiment of the present application is as follows:
1)编写原始任务描述文件,即图5所示的三个文件。1) Write the original task description file, which is the three files shown in Figure 5.
例如,名称为lte_subject_poorquality_cell_day的任务,首先根据模板编辑任务lte_subject_poorquality_cell_day.xml文件。该xml文件记载了以下内容:任务名称;执行粒度,为每天执行;指示信息,表示任务存在sql语句;sql公共jar包入口类和入口函数(可以省略掉,在主程序中有默认值);任务实体sql文件在spark计算节点上的路径;任务配置文件在spark计算节点上的路径;以下为任务中对依赖表信息的描述;输入表的信息描述;输出表信息描述;计算所需核数和内存;删除陈旧信息模式。For example, a task named lte_subject_poorquality_cell_day first edits the task lte_subject_poorquality_cell_day.xml file according to the template. The xml file records the following contents: task name; execution granularity, which is executed every day; indication information, indicating that the task exists sql statement; sql public jar package entry class and entry function (can be omitted, there is a default value in the main program); The path of the task entity sql file on the spark computing node; the path of the task configuration file on the spark computing node; the following is the description of the dependency table information in the task; the information description of the input table; the output table information description; And memory; delete stale information patterns.
编辑lte_subject_poorquality_cell_day.sql文件。其中,变量替换规则可以是使用两个‘$’符号包围起来的名称。门限值替换规则可以是使用两个‘#’符号包围起来的名称。可以包括:输出表别名、输出表分区值和门限值,其中输出表别名对应任务.xml文件中的别名,在算法公共jar包执行时替换成输出表实际名称;输出表分区值对应任务.xml文件中的别名,在算法公共jar包执行时替换成实际值;门限值对应任务.conf文件中的值,在算法公共jar包执行时替换成实际值。Edit the lte_subject_poorquality_cell_day.sql file. Among them, the variable replacement rule can be a name surrounded by two ‘$’ symbols. The threshold replacement rule can be a name surrounded by two ‘#’ symbols. The method may include: an output table alias, an output table partition value, and a threshold value, wherein the output table alias corresponds to an alias in the task .xml file, and is replaced with an actual name of the output table when the algorithm public jar package is executed; and the output table partition value corresponds to the task. The alias in the xml file is replaced with the actual value when the algorithm public jar package is executed; the threshold value corresponds to the value in the task.conf file, and is replaced with the actual value when the algorithm public jar package is executed.
根据上述sql文件是否有门限值决定是否编写lte_subject_poorquality_cell_day.conf文件,本示例以存在门限值为例,则提取门限值到单独的配置文件,方便之后对门限值的修改。According to whether the above sql file has a threshold value to determine whether to write the lte_subject_poorquality_cell_day.conf file, in this example, the presence threshold value is taken as an example, and the threshold value is extracted to a separate configuration file to facilitate subsequent modification of the threshold value.
2)将原始任务文件导入主程序。主程序在启动时会扫描所有约定目录下的任务,单独增加、修改的任务也可以用特殊接口单独以补丁的形式加入。主程序加载原始任务文件的过程如图6,主程序会将任务.xml文件读取到内存,加入任务元数据列表,供之后生成任务.json文件时使用;将任务.conf和任务.sql文件推送到对应的计算节点(例如图中的火花Spark任务节点)上,供任务执行时被sql公共jar包使用。2) Import the original task file into the main program. The main program scans all the tasks in the contract directory at startup, and the tasks added and modified separately can also be added in the form of patches by special interfaces. The main program loads the original task file as shown in Figure 6. The main program reads the task .xml file into memory and adds the task metadata list for use after generating the task .json file. The task .conf and the task .sql file are used. Pushed to the corresponding compute node (such as the spark Spark task node in the figure) for use by the sql public jar package when the task is executed.
3)等待任务执行。有数据的任务才是可以执行的任务。有spark存在计算资源才能真正执行任务,当所有准备条件都成立后,任务会被提交。3) Wait for the task to execute. A task with data is a task that can be performed. There are sparks in which computing resources are available to actually perform the task. When all the preparation conditions are met, the task will be submitted.
4)sql公共jar包的执行。如图7,步骤710、解析任务.json文件。当任务被 提交时,sql公共jar包读取主程序产生的任务.json文件。步骤720、生成参数信息替换列表。步骤730、读取sql文件,将sql文件中包含条件信息替换列表中的项目替换。获取任务.sql文件路径和任务.conf文件路径,从而读取sql文件中的sql语句。每个任务的sql语句可以是多条,这些语句中的时间条件、输入表、输出表、门限值都是按照框架约束格式的变量,需要在sql公共jar包中进行替换。替换值是通过读取任务.json文件中相关参数,整理后得到的。步骤740、根据输入类型,建立对应驱动driver,执行完整的sql语句。替换完成后的sql语句即为可以直接运行的sql语句。这时根据输入表对应的数据库类型,与数据库建立链接,执行sql语句。步骤750、输入表、输出表是否为相同数据库的表。如果输出与输入为相同数据库下的表,任务结束(insert语句)。如果输出为其他数据库表,执行步骤760。步骤760、存储sql文件的执行结果,将执行结果保存成相应文件。将执行结果生成相应文件,执行对应该数据库的存储过程并将执行结果入库。步骤770、判断输出是否为文件格式。如果输出为文件格式,对应结束。如果输出不是文件格式,执行步骤780。步骤780、将执行结果的文件加载到对应的数据库。本示例的整体流程如图8所示,其中通过任务列表控制任务按顺序全部执行。4) sql public jar package execution. As shown in Figure 7, step 710, parsing the task .json file. When the task is submitted, the sql public jar package reads the task .json file generated by the main program. Step 720: Generate a parameter information replacement list. Step 730: Read the sql file, and replace the item in the sql file containing the condition information replacement list. Get the task .sql file path and the task .conf file path to read the sql statement in the sql file. The sql statement of each task can be multiple. The time conditions, input tables, output tables, and thresholds in these statements are variables according to the framework constraint format, and need to be replaced in the sql public jar package. The replacement value is obtained by reading the relevant parameters in the task .json file. Step 740: According to the input type, establish a corresponding driver driver, and execute a complete sql statement. After the replacement is completed, the sql statement is a sql statement that can be run directly. At this time, according to the database type corresponding to the input table, a link is established with the database, and the sql statement is executed. Step 750: Whether the input table and the output table are tables of the same database. If the output is the same as the input under the database, the task ends (insert statement). If the output is another database table, go to step 760. Step 760: Store the execution result of the sql file, and save the execution result into a corresponding file. The execution result will generate the corresponding file, execute the stored procedure corresponding to the database and store the execution result in the library. Step 770: Determine whether the output is a file format. If the output is in file format, the corresponding end. If the output is not in the file format, go to step 780. Step 780: Load the file of the execution result into the corresponding database. The overall flow of this example is shown in Figure 8, where the task is controlled in order by the task list.
步骤810、加载任务描述文件,生成任务元数据。Step 810: Load a task description file to generate task metadata.
步骤820、按任务调度列表,生成任务.json文件并执行任务。Step 820: Generate a task .json file and execute the task according to the task scheduling list.
步骤830、是否为Sql任务,是则执行步骤840,不是则执行步骤850。In step 830, whether it is a Sql task, if yes, step 840 is performed, and if not, step 850 is performed.
步骤840、sql公共jar包运行sql任务,且任务列表数量减1,执行步骤860。Step 840: The sql public jar package runs the sql task, and the number of task lists is decreased by 1, and step 860 is performed.
步骤850、对应RDD公共jar包运行,任务列表数量减1,执行步骤860。Step 850: Corresponding to the operation of the RDD public jar package, the number of task lists is decreased by 1, and step 860 is performed.
步骤860、任务列表是否为空,为空则结束流程,不为空则执行步骤820。In step 860, the task list is empty. If it is empty, the process ends. If not, step 820 is performed.
5)sql公共jar包将最后的执行结果传给主程序。5) The sql public jar package passes the final execution result to the main program.
本实施例还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述方法。The embodiment further provides a computer readable storage medium storing computer executable instructions for performing the above method.
图9是一实施例的一种服务器的硬件结构示意图,如图9所示,该服务器包括:一个或多个处理器910和存储器920。图9中以一个处理器910为例。FIG. 9 is a schematic diagram showing the hardware structure of a server according to an embodiment. As shown in FIG. 9, the server includes: one or more processors 910 and a memory 920. One processor 910 is taken as an example in FIG.
所述服务器还可以包括:输入装置930和输出装置940。The server may also include an input device 930 and an output device 940.
所述服务器中的处理器910、存储器920、输入装置930和输出装置940可以通过总线或者其他方式连接,图9中以通过总线连接为例。The processor 910, the memory 920, the input device 930, and the output device 940 in the server may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
输入装置930可以接收输入的数字或字符信息,输出装置940可以包括显示屏等显示设备。 Input device 930 can receive input numeric or character information, and output device 940 can include a display device such as a display screen.
存储器920作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块。处理器910通过运行存储在存储器920中的软件程序、指令以及模块,从而执行多种功能应用以及数据处理,以实现上述实施例中的任意一种方法。The memory 920 is a computer readable storage medium that can be used to store software programs, computer executable programs, and modules. The processor 910 performs various functional applications and data processing by executing software programs, instructions, and modules stored in the memory 920 to implement any of the above-described embodiments.
存储器920可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据服务器的使用所创建的数据等。此外,存储器可以包括随机存取存储器(Random Access Memory,RAM)等易失性存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件或者其他非暂态固态存储器件。The memory 920 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the server, and the like. In addition, the memory may include volatile memory such as random access memory (RAM), and may also include non-volatile memory such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
存储器920可以是非暂态计算机存储介质或暂态计算机存储介质。该非暂态计算机存储介质,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器920可选包括相对于处理器910远程设置的存储器,这些远程存储器可以通过网络连接至服务器。上述网络的实例可以包括互联网、企业内部网、局域网、移动通信网及其组合。 Memory 920 can be a non-transitory computer storage medium or a transitory computer storage medium. The non-transitory computer storage medium, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 920 can optionally include memory remotely located relative to processor 910, which can be connected to the server over a network. Examples of the above networks may include the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
输入装置930可用于接收输入的数字或字符信息,以及产生与服务器的用户设置以及功能控制有关的键信号输入。输出装置940可包括显示屏等显示设备。 Input device 930 can be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the server. Output device 940 can include a display device such as a display screen.
本实施例的服务器还可以包括通信装置950,通过通信网络传输和/或接收信息。The server of this embodiment may also include communication means 950 for transmitting and/or receiving information over a communication network.
本领域普通技术人员可理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来执行相关的硬件来完成的,该程序可存储于一个非暂态计算机可读存储介质中,该程序在执行时,可包括如上述方法的实施例的流程,其中,该非暂态计算机可读存储介质可以为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by executing related hardware by a computer program, and the program can be stored in a non-transitory computer readable storage medium. The program, when executed, may include the flow of an embodiment of the method as described above, wherein the non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM). Wait.
工业实用性Industrial applicability
本公开提供的任务执行方法、装置和服务器,可以解决数据清洗任务中代码重复、工程过大、任务管理不便等难题,降低了开发成本和维护成本。The task execution method, device and server provided by the disclosure can solve the problems of code duplication, excessive engineering, inconvenient task management, etc. in the data cleaning task, and reduce development cost and maintenance cost.

Claims (12)

  1. 一种任务执行方法,包括:A task execution method, including:
    读取任务的任务描述文件,所述任务描述文件中记录有用于执行所述任务的数据库公共包的路径、用于表示所述任务的实体的数据库脚本文件的路径、以及所述任务对应的第一参数;Reading a task description file of the task, wherein the task description file records a path of a database common package for executing the task, a path of a database script file for representing an entity of the task, and a corresponding One parameter
    将所述数据库脚本文件推送到计算节点,所述计算节点处具有所述数据库公共包;Pushing the database script file to a computing node, where the computing node has the database common package;
    在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取所述数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,以及执行所述数据库脚本文件,以得到所述任务的执行结果。Calling the database common package according to the path of the database common package on the computing node, obtaining the database script file by using the database common package according to the path of the database script file, and replacing the first parameter with the first parameter a variable in the database script file, and executing the database script file to obtain an execution result of the task.
  2. 根据权利要求1所述的方法,其中,所述任务描述文件中还记录有任务配置文件的路径,所述任务配置文件记录有用于替换所述数据库脚本文件中的门限值的第二参数;所述方法还包括:The method according to claim 1, wherein a path of a task configuration file is recorded in the task description file, and the task configuration file records a second parameter for replacing a threshold value in the database script file; The method further includes:
    将所述任务配置文件推送到所述计算节点;Pushing the task configuration file to the computing node;
    通过所述数据库公共包按所述任务配置文件的路径,获取所述任务配置文件中的所述第二参数来替换所述数据库脚本文件中的门限值。And obtaining, by the database common package, the second parameter in the task configuration file to replace a threshold value in the database script file according to a path of the task configuration file.
  3. 根据权利要求1所述的方法,其中,获取所述数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,包括:The method of claim 1, wherein the obtaining the database script file and replacing the variables in the database script file with the first parameter comprises:
    获取所述数据库脚本文件并根据预设的数据交换语言,将所述任务描述文件生成数据交换文件,从所述数据交换文件中提取所述第一参数来替换所述数据库脚本文件中的变量。Obtaining the database script file and generating a data exchange file according to a preset data exchange language, and extracting the first parameter from the data exchange file to replace a variable in the database script file.
  4. 根据权利要求1所述的方法,其中,在所述读取任务的任务描述文件之前,还包括:The method of claim 1, wherein before the task description file of the task is read, the method further comprises:
    根据所述任务的优先级,在所述任务所需数据已获取时,将所述任务添加到记录有按顺序排列的至少一个任务的任务列表中;And according to the priority of the task, when the data required by the task has been acquired, adding the task to a task list recorded with at least one task arranged in order;
    根据所述任务在所述任务列表中的顺序,从所述任务列表中读取所述任务。The tasks are read from the task list according to the order of the tasks in the task list.
  5. 根据权利要求1所述的方法,其中,所述任务描述文件中包括所述任务对应的输入表的信息和输出表的信息;The method according to claim 1, wherein the task description file includes information of an input table corresponding to the task and information of an output table;
    所述执行所述数据库脚本文件,包括:The executing the database script file includes:
    从所述输入表获取所述数据库脚本的输入数据并执行所述数据库脚本文件,将得到的结果加入所述输出表中。The input data of the database script is obtained from the input table and the database script file is executed, and the obtained result is added to the output table.
  6. 一种任务执行装置,包括:A task execution device comprising:
    读取模块,设置为读取任务对应的任务描述文件,所述任务描述文件中记录有用于执行所述任务的数据库公共包的路径、用于表示所述任务的实体的数据库脚本文件的路径、以及所述任务对应的第一参数,所述第一参数用于替换所述数据库脚本文件中的变量;a reading module, configured to read a task description file corresponding to the task, where the task description file records a path of a database common package for executing the task, a path of a database script file for an entity representing the task, And a first parameter corresponding to the task, where the first parameter is used to replace a variable in the database script file;
    推送模块,设置为将所述数据库脚本文件推送到计算节点,所述计算节点处具有所述数据库公共包;以及a push module configured to push the database script file to a compute node, the compute node having the database common package;
    执行模块,设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取所述数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,以及执行所述数据库脚本文件,以得到所述任务的执行结果。An execution module, configured to invoke the database common package according to the path of the database common package on the computing node, obtain the database script file by using the database common package according to the path of the database script file, and use the database script file The first parameter replaces a variable in the database script file, and executes the database script file to obtain an execution result of the task.
  7. 根据权利要求6所述的装置,其中,所述任务描述文件中还记录有任务配置文件的路径,所述任务配置文件记录有用于替换所述数据库脚本文件中的门限值的第二参数;The apparatus according to claim 6, wherein a path of a task configuration file is recorded in the task description file, and the task configuration file records a second parameter for replacing a threshold value in the database script file;
    所述推送模块还设置为将所述任务配置文件推送到所述计算节点;The push module is further configured to push the task configuration file to the computing node;
    所述执行模块还设置为通过所述数据库公共包按所述任务配置文件的路径,获取所述任务配置文件中的所述第二参数来替换所述数据库脚本文件中的门限值。The execution module is further configured to acquire the second parameter in the task configuration file to replace a threshold value in the database script file by using the database common package according to the path of the task configuration file.
  8. 根据权利要求6所述的装置,其中,所述执行模块是设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取所述数据库脚本文件并根据预设的数据交换语言,将所述任务描述文件生成相应的数据交换文件,从所述数据交换文件中提取所述第一参数来替换所述数据库脚本文件中的变量,以及执行所述数据库脚本文件,以得到所述任务的执行结果。The apparatus of claim 6, wherein the execution module is configured to invoke the database common package on the computing node according to a path of the database common package, and press the database script through the database common package a path of the file, acquiring the database script file, and generating a corresponding data exchange file according to a preset data exchange language, and extracting the first parameter from the data exchange file to replace the database A variable in the script file, and executing the database script file to obtain an execution result of the task.
  9. 根据权利要求6所述的装置,其中,所述装置还包括:The apparatus of claim 6 wherein said apparatus further comprises:
    任务列表模块,设置为在所述读取任务的任务描述文件之前,根据所述任务的优先级,在所述任务所需数据已获取时,将所述任务添加到记录有按顺序排列的至少一个任务的任务列表中;根据所述任务在所述任务列表中的顺序,从所述任务列表中读取所述任务。a task list module, configured to add, before the task description file of the read task, the task to the record with at least the priority of the task, when the data required by the task has been acquired The task list of a task; the task is read from the task list according to the order of the tasks in the task list.
  10. 根据权利要求6所述的装置,其中,所述任务描述文件中包括所述任务对应的输入表的信息和输出表的信息;The apparatus according to claim 6, wherein the task description file includes information of an input table corresponding to the task and information of an output table;
    所述执行模块是设置为在所述计算节点上根据所述数据库公共包的路径调用所述数据库公共包,通过所述数据库公共包按所述数据库脚本文件的路径,获取数据库脚本文件并使用所述第一参数替换所述数据库脚本文件中的变量,以及从所述输入表获取所述数据库脚本的输入数据并执行所述数据库脚本文件,将得到的结果加入所述输出表中。The execution module is configured to invoke the database common package according to the path of the database common package on the computing node, obtain a database script file by using the database common package according to the path of the database script file, and use the database script file The first parameter replaces a variable in the database script file, and obtains input data of the database script from the input table and executes the database script file, and adds the obtained result to the output table.
  11. 一种服务器,包括:A server that includes:
    根据权利要求6至10中任一项所述的任务执行装置。A task execution device according to any one of claims 6 to 10.
  12. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-5任一项的方法。A computer readable storage medium storing computer executable instructions for performing the method of any of claims 1-5.
PCT/CN2017/118957 2017-01-04 2017-12-27 Task execution method and apparatus and server WO2018126964A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710003855.3 2017-01-04
CN201710003855.3A CN108280023B (en) 2017-01-04 2017-01-04 Task execution method and device and server

Publications (1)

Publication Number Publication Date
WO2018126964A1 true WO2018126964A1 (en) 2018-07-12

Family

ID=62789189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/118957 WO2018126964A1 (en) 2017-01-04 2017-12-27 Task execution method and apparatus and server

Country Status (2)

Country Link
CN (1) CN108280023B (en)
WO (1) WO2018126964A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222122A (en) * 2019-07-26 2019-09-10 深圳市元征科技股份有限公司 A kind of method of data synchronization and relevant device of MongoDB
CN110704210A (en) * 2019-09-20 2020-01-17 天翼电子商务有限公司 Script task calling method, system, medium and device
CN110990669A (en) * 2019-10-16 2020-04-10 广州丰石科技有限公司 DPI (deep packet inspection) analysis method and system based on rule generation
CN113239005A (en) * 2021-06-02 2021-08-10 上海许继电气有限公司 I, IV area data synchronization method and device for power monitoring system
CN113721824A (en) * 2021-08-10 2021-11-30 深圳市一博科技股份有限公司 Method for one-key setting of library path of CR5000 platform
CN113760489A (en) * 2020-09-21 2021-12-07 北京沃东天骏信息技术有限公司 Resource allocation method and device
CN115061785A (en) * 2022-04-15 2022-09-16 支付宝(杭州)信息技术有限公司 Information issuing method and device, storage medium and server
CN117609102A (en) * 2024-01-23 2024-02-27 云筑信息科技(成都)有限公司 Building industry Internet counting platform system testing method

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540858B (en) * 2019-09-23 2023-10-27 华为云计算技术有限公司 Task processing method, server, client and system
CN111027196B (en) * 2019-12-03 2023-04-28 南方电网科学研究院有限责任公司 Simulation analysis task processing method and device for power equipment and storage medium
CN114489995B (en) * 2022-02-15 2022-09-30 北京永信至诚科技股份有限公司 Distributed scheduling processing method and system
CN115061741A (en) * 2022-05-31 2022-09-16 北京奇艺世纪科技有限公司 Target plug-in operation method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880546A (en) * 2012-09-03 2013-01-16 上海方正数字出版技术有限公司 Software integration testing method and system based on extensible markup language (XML) database
CN103678098A (en) * 2012-09-06 2014-03-26 百度在线网络技术(北京)有限公司 HADOOP program testing method and system
EP2977899A2 (en) * 2014-06-27 2016-01-27 General Electric Company Integrating execution of computing analytics within a mapreduce processing environment
CN105487943A (en) * 2015-12-09 2016-04-13 浪潮电子信息产业股份有限公司 Method for automatically copying files to each node of cluster server

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009295013A (en) * 2008-06-06 2009-12-17 Hitachi Ltd Method, apparatus and program for database management
CN105224348A (en) * 2014-06-11 2016-01-06 中兴通讯股份有限公司 A kind of installation method of MySQL database and device
CN104317928A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 Service ETL (extraction-transformation-loading) method and service ETL system both based on distributed database
CN105808776A (en) * 2016-03-29 2016-07-27 中国建设银行股份有限公司 Data management system and method of distributed database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880546A (en) * 2012-09-03 2013-01-16 上海方正数字出版技术有限公司 Software integration testing method and system based on extensible markup language (XML) database
CN103678098A (en) * 2012-09-06 2014-03-26 百度在线网络技术(北京)有限公司 HADOOP program testing method and system
EP2977899A2 (en) * 2014-06-27 2016-01-27 General Electric Company Integrating execution of computing analytics within a mapreduce processing environment
CN105487943A (en) * 2015-12-09 2016-04-13 浪潮电子信息产业股份有限公司 Method for automatically copying files to each node of cluster server

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222122A (en) * 2019-07-26 2019-09-10 深圳市元征科技股份有限公司 A kind of method of data synchronization and relevant device of MongoDB
CN110704210A (en) * 2019-09-20 2020-01-17 天翼电子商务有限公司 Script task calling method, system, medium and device
CN110704210B (en) * 2019-09-20 2023-10-10 天翼电子商务有限公司 Script task calling method, system, medium and device
CN110990669A (en) * 2019-10-16 2020-04-10 广州丰石科技有限公司 DPI (deep packet inspection) analysis method and system based on rule generation
CN113760489A (en) * 2020-09-21 2021-12-07 北京沃东天骏信息技术有限公司 Resource allocation method and device
CN113760489B (en) * 2020-09-21 2024-05-17 北京沃东天骏信息技术有限公司 Resource allocation method and device
CN113239005A (en) * 2021-06-02 2021-08-10 上海许继电气有限公司 I, IV area data synchronization method and device for power monitoring system
CN113239005B (en) * 2021-06-02 2022-12-02 上海许继电气有限公司 I and IV area data synchronization method and device for power monitoring system
CN113721824B (en) * 2021-08-10 2024-05-03 深圳市一博科技股份有限公司 Method for setting library path by one key of CR5000 platform
CN113721824A (en) * 2021-08-10 2021-11-30 深圳市一博科技股份有限公司 Method for one-key setting of library path of CR5000 platform
CN115061785A (en) * 2022-04-15 2022-09-16 支付宝(杭州)信息技术有限公司 Information issuing method and device, storage medium and server
CN117609102A (en) * 2024-01-23 2024-02-27 云筑信息科技(成都)有限公司 Building industry Internet counting platform system testing method
CN117609102B (en) * 2024-01-23 2024-05-28 云筑信息科技(成都)有限公司 Building industry Internet counting platform system testing method

Also Published As

Publication number Publication date
CN108280023B (en) 2022-11-01
CN108280023A (en) 2018-07-13

Similar Documents

Publication Publication Date Title
WO2018126964A1 (en) Task execution method and apparatus and server
US10810110B1 (en) Methods, systems, and articles of manufacture for testing web services using a behavior-driven development domain specific language framework
CN110069572B (en) HIVE task scheduling method, device, equipment and storage medium based on big data platform
US10061858B2 (en) Method and apparatus for processing exploding data stream
US9672140B1 (en) Processing special requests at dedicated application containers
CN111124906A (en) Tracking method, compiling method and device based on dynamic embedded points and electronic equipment
CN104484216A (en) Method and device for generating service interface document and on-line test tool
CN108984155B (en) Data processing flow setting method and device
CN106557470B (en) Data extraction method and device
WO2020238597A1 (en) Hadoop-based data updating method, device, system and medium
AU2017254506B2 (en) Method, apparatus, computing device and storage medium for data analyzing and processing
CN110727429B (en) Front-end page generation method, device and equipment
CN113656503A (en) Data synchronization method, device and system and computer readable storage medium
CN110674083A (en) Workflow migration method, device, equipment and computer readable storage medium
CN110764894A (en) Timed task management method, device, equipment and storage medium
CN108399095B (en) Method, system, device and storage medium for supporting dynamic management of timed tasks
CN117032668A (en) Processing method, device, system and platform of visual rule engine
CN110188308B (en) Client automatic dotting reporting method, storage medium, equipment and system
CN113656001A (en) Platform component development method and device, computer equipment and storage medium
CN108804088B (en) Protocol processing method and device
CN112883088A (en) Data processing method, device, equipment and storage medium
CN111143310A (en) Log recording method and device and readable storage medium
WO2021036987A1 (en) Method and device for achieving operation and maintenance monitoring
CN115378996B (en) Method, device, equipment and storage medium for data transmission between systems
Weidner et al. Collabs: A Flexible and Performant CRDT Collaboration Framework

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17890421

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17890421

Country of ref document: EP

Kind code of ref document: A1