CN117407196A - Data importing method and device - Google Patents

Data importing method and device Download PDF

Info

Publication number
CN117407196A
CN117407196A CN202311464943.5A CN202311464943A CN117407196A CN 117407196 A CN117407196 A CN 117407196A CN 202311464943 A CN202311464943 A CN 202311464943A CN 117407196 A CN117407196 A CN 117407196A
Authority
CN
China
Prior art keywords
task
data import
data
queue
import
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311464943.5A
Other languages
Chinese (zh)
Inventor
杨博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202311464943.5A priority Critical patent/CN117407196A/en
Publication of CN117407196A publication Critical patent/CN117407196A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data importing method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: in response to receiving a data import request, judging whether a task queue corresponding to a data import task is designated; under the condition that a task queue is not specified, acquiring task information of a data importing task; acquiring task history priority corresponding to a target table according to the name of the target table to determine a task queue corresponding to a data importing task; under the condition that the task history priority cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity; and adding the data import task into the corresponding task queue, and processing the data import task by using the thread pool of the corresponding task queue to conduct data import. The implementation mode ensures that tasks with high real-time requirements can be completed preferentially, and simultaneously ensures that tasks with low real-time requirements can be executed opportunistically, so that the tasks are not always in a stagnation state.

Description

Data importing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for data import.
Background
At present, when service data generated on line is processed and then imported into a database for storage, data import tasks are mostly generated according to data to be imported, and then the data import tasks are stored into a task queue according to the submitting time of the data import tasks. The task management platform sequentially takes out the data import tasks from the task queue and then carries out data import processing according to the commit time of the data import tasks.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
the real-time requirements and the data quantity of the data importing tasks of different services are different, when tasks are processed sequentially according to the sequence of the task submitting time, the situation that tasks with high real-time requirements are not processed in time because tasks with high real-time requirements are arranged at the rear position in a task queue and the tasks with high data quantity occupy more system resources can occur, so that the real-time performance of data processing is seriously influenced.
Disclosure of Invention
Therefore, the embodiment of the invention provides a method and a device for importing data, which can distribute the data importing task into different task queues, and process the task by using a thread pool corresponding to each task queue, so that the task can be classified, important tasks can be prevented from being processed in time, the tasks with high real-time requirements can be guaranteed to be completed preferentially, and the tasks with low real-time requirements can be guaranteed to be executed opportunistically and cannot be in a stagnation state all the time.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data import method, including:
responding to a received data import request, and judging whether a task queue corresponding to a data import task is appointed according to the data import request;
under the condition that a task queue is not specified, acquiring task information of the data import task, wherein the task information comprises a target table name corresponding to the data import task and an import data volume;
acquiring task history priority corresponding to a target table according to the target table name, and determining a task queue corresponding to the data importing task according to the task history priority;
under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity;
and adding the data import task into the corresponding task queue, and processing the data import task by using a thread pool of the corresponding task queue to conduct data import.
Optionally, determining a task queue corresponding to the data import task according to the imported data amount includes: and determining a task queue corresponding to the data import task according to the import data quantity and the import data quantity threshold corresponding to each task queue.
Optionally, processing the data import task by using the thread pool of the corresponding task queue to conduct data import includes: acquiring the number of executing tasks in a thread pool of the corresponding task queue and a task number threshold of the corresponding task queue; taking out the data import task from the corresponding task queue and processing the data import task to conduct data import under the condition that the number of the executing tasks does not reach the task number threshold; and when the number of the executing tasks reaches the task number threshold, waiting until the number of the executing tasks in the thread pool of the corresponding task queue is reduced, and taking out the data import task from the corresponding task queue and processing the data import task to carry out data import.
Optionally, the task information further includes a sequential priority corresponding to the data import task; adding the data import task to the corresponding task queue, including: determining a target position of the data import task in the corresponding task queue according to the corresponding sequence priority of the data import task and the sequence priority of the existing task in the corresponding task queue; and adding the data import task to the target position in the corresponding task queue.
Optionally, the task information further includes a task state; after the data import task is processed by using the thread pool of the corresponding task queue, the method further comprises: acquiring a task state of the data import task, and judging whether the data import task fails to be executed according to the task state; under the condition that the data import task fails to be executed, acquiring the execution time length when the data import task fails to be executed last time; determining a first order priority of the data import task according to the execution time when the previous execution of the data import task fails; determining a first target position of the data import task in the corresponding task queue according to the first sequence priority and the sequence priority of the existing task in the corresponding task queue; and adding the data import task to the first target position in the corresponding task queue.
Optionally, the method further comprises: monitoring the executed time length of the data import task being executed in each task queue; and alarming to prompt a user to process the executing data import task under the condition that the executed time length exceeds the preset maximum execution time length of the task.
Optionally, the method further comprises: monitoring the waiting time of the data import task in each task queue; and alarming to prompt a user to adjust the task priority under the condition that the waiting time exceeds a preset waiting time threshold.
According to another aspect of the embodiment of the present invention, there is provided an apparatus for data import, including:
the first queue determining module is used for responding to the received data import request and judging whether a task queue corresponding to a data import task is appointed or not according to the data import request;
the task information acquisition module is used for acquiring task information of the data import task under the condition that a task queue is not specified, wherein the task information comprises a target table name and an import data volume corresponding to the data import task;
the second queue determining module is used for acquiring task history priorities corresponding to the target table according to the target table names so as to determine task queues corresponding to the data importing tasks according to the task history priorities;
the third queue determining module is used for determining a task queue corresponding to the data import task according to the import data quantity under the condition that the task history priority corresponding to the target table cannot be acquired;
And the import task processing module is used for adding the data import task into the corresponding task queue and processing the data import task by using the thread pool of the corresponding task queue so as to conduct data import.
According to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data importing method provided by the embodiment of the invention.
According to still another aspect of the embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method for data import provided by the embodiments of the present invention.
One embodiment of the above invention has the following advantages or benefits: judging whether a task queue corresponding to a data import task is appointed according to the data import request by responding to the received data import request; under the condition that a task queue is not specified, task information of a data import task is obtained, wherein the task information comprises a target table name corresponding to the data import task and an import data volume; acquiring task history priorities corresponding to the target table according to the target table names so as to determine task queues corresponding to the data importing tasks according to the task history priorities; under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity; the technical scheme that the data import task is added into the corresponding task queue, the thread pool of the corresponding task queue is used for processing the data import task to conduct data import can distribute the data import task into different task queues, and the thread pool corresponding to each task queue is used for conducting task processing, so that grading processing of the task can be achieved, important tasks can be prevented from being unable to be processed in time, tasks with high real-time requirements can be guaranteed to be completed preferentially, tasks with low real-time requirements can be guaranteed to be executed opportunely, and the tasks cannot be in a stagnation state all the time.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a method of data import according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a task queue allocation flow according to one embodiment of the invention;
FIG. 3 is a schematic diagram of a task processing flow process according to one embodiment of the invention;
FIG. 4 is a schematic diagram of the main modules of an apparatus for data import according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the technical scheme disclosed by the invention, the aspects of acquisition, collection, updating, analysis, processing, use, transmission, storage and the like of the related user personal information all conform to the rules of related laws and regulations, are used for legal purposes, and do not violate the popular public order. Necessary measures are taken for the personal information of the user, illegal access to the personal information data of the user is prevented, and the personal information security, network security and national security of the user are maintained.
At present, when service data generated on line is processed and then imported into a database for storage, data import tasks are mostly generated according to data to be imported, and then the data import tasks are stored into a task queue according to the submitting time of the data import tasks. The task management platform sequentially takes out the data import tasks from the task queue and then carries out data import processing according to the commit time of the data import tasks. However, the real-time requirements and the data volume of the data importing tasks of different services are different, when the tasks are processed sequentially according to the sequence of the task submitting time, the situation that the tasks with high real-time requirements are not processed in time because the tasks with high real-time requirements are arranged at the rear position in the task queue can not be processed in time, and the real-time performance of the data processing is seriously influenced because the tasks with large data volume occupy more system resources. Taking the introduction of online advertisement report data as an example, the online advertisement report is taken as an important tool for tracking advertisement putting effect after advertisement putting, and is an advertiser for accurately and timely displaying advertisement putting effects in various dimensions. The advertisement effect index is generally displayed, clicked, consumed, ordered, purchased, browsed and the like. The primary responsibility of the online advertisement report is to feed back various indexes of the advertisement to the advertiser in time, and the current data of the indexes are processed and calculated from an upstream data warehouse and then stored in a database for inquiry by a downstream report engine, and finally displayed to the advertiser. Under the on-line advertisement report data import scene, due to the huge data volume, a single table can often reach the level of trillion lines, even trillion lines, and a general relational database (such as a mysql database) is difficult to bear the writing, storage and real-time query of such massive data. The distributed database Doris (is a high-performance and real-time analysis database based on the MPP architecture) has the advantages of transverse expansion, write optimization and distributed query, and can well meet the requirements of writing, storing and real-time query of a large amount of data in an online advertisement report scene, so that the distributed database Doris widely applied.
When online advertisement index data is imported, some businesses with high real-time requirements (such as advertisement order business, advertisement commodity adding shopping cart business and the like) can use a data importing task of one batch of minute level to conduct near real-time importing, and some businesses without real-time requirements use a data importing task of hour level and even day level to conduct offline importing. However, since doris is used as a general OLAP (online analytical processing) database, all data import tasks submitted are handled directly and at the same time, so a unified task management platform is required to manage or buffer the task. The platform is responsible for receiving data import tasks of all submitted advertiser reports, uniformly storing the data import tasks in a queue, traversing the tasks in the queue in real time in the platform, and submitting the tasks to Doris. The management platform monitors the task state in the Doris in real time, and feeds back the task state to the user or gives an alarm.
The existing doris task management platform is used as a general task management platform, and cannot be perfectly adapted to the data importing task scene of the online advertisement report. Because the tasks with low priority on the service are processed according to the submitting time of the tasks, the tasks with high priority on the service can be executed before the tasks with high real-time requirements, especially when a large number of tasks refresh historical data, the task execution time is long because of the large amount of data of the tasks to be refreshed, the task quota of doris occupied for a long time (the maximum number of tasks allowed to run at the same time), the doris task quota is limited, and after the quota is full, the subsequent data import tasks wait. Therefore, important real-time stream data cannot be submitted due to no task quota, the real-time property of data import is seriously influenced, and negative influence is caused to advertisers.
In order to solve the technical problems in the prior art, the invention provides a data importing method, which can allocate data importing tasks to different queues (including a real-time queue, an offline queue and a data refreshing task queue for example) according to importing data quantity and real-time priority, and then execute data importing by threads corresponding to the queues respectively, so that the problem of task hierarchical importing under an online advertisement report storage scene is solved, and important tasks (such as user order data importing tasks, user shopping cart adding tasks and other tasks with high real-time requirements) cannot be timely processed. The multi-level priority queue is introduced to manage the slots of the doris tasks according to priority, so that the tasks with high real-time requirements can be completed preferentially, and meanwhile, tasks with low real-time requirements such as offline tasks and data refreshing tasks can be executed opportunistically, and the tasks are not always in a stagnation state.
The invention sets a task management service between the doris database and the upstream data warehouse to carry out the same management on the data import task so as to distribute the data import task to task queues with different priorities.
Fig. 1 is a schematic diagram of main steps of a method for importing data according to an embodiment of the present invention. As shown in fig. 1, the method for importing data according to the embodiment of the present invention mainly includes the following steps S101 to S105.
Step S101: and responding to the received data import request, and judging whether a task queue corresponding to the data import task is appointed according to the data import request. After receiving the data import request, the task management service of the embodiment of the invention creates a data import task and persistently stores the task information in a database. Three-level priority queues can be used for controlling the data import tasks in the task management service, namely a real-time queue, an offline queue and a data refreshing task queue, wherein the real-time requirements of the data import tasks in the real-time queue are highest, and the real-time requirements of the data import tasks in the data refreshing task queue are lowest. The task information may include, for example: the task unique identification, the identification code of the task unique identification, the task state, the target cluster corresponding to the task, the target database name, the target table name, the task creation time, the next retry time of the task, the retry times of the task, the order priority of the task and the like. In general, the target database name will include target cluster information, and the target table name will include information such as target cluster information and target database name.
For the received data import request, after creating the data import task, whether a task queue corresponding to the data import task is specified or not is judged according to the data import request. If a task queue corresponding to a data import task needs to be specified, a queue parameter is included in the data import request, for example, the following three task queues are specified by an incoming parameter queue_id, if the data import task enters a real-time queue if the data import task enters a queue_id=1, the data import task enters an offline queue if the data import task enters a queue_id=2, and if the data import task enters a data refresh task queue. In general, when the data import task is to refresh history data, the task queue needs to be specified because the data size is large and the real-time performance is not high.
Step S102: and under the condition that a task queue is not specified, acquiring task information of the data import task, wherein the task information comprises a target table name corresponding to the data import task and an import data volume. If the task queue is not specified, the task queue to which the data is considered to be associated is determined from the task information of the data-introducing task.
Step S103: and acquiring task history priority corresponding to the target table according to the target table name, so as to determine a task queue corresponding to the data importing task according to the task history priority. Under the condition that the task queue is not specified, firstly, the task history priority of the target table corresponding to the data import task is acquired, so that the task queue corresponding to the data import task is determined according to the task history priority. If the task history priority of the target table is highest, the data imported in the target table is indicated to have highest real-time requirement, and the data imported task can be distributed to a real-time queue.
Step S104: and under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity. If the data is not imported in the target table, the task history priority corresponding to the target table cannot be acquired, and the task queue can be determined according to the imported data amount of the data importing task. The imported data amount of the data import task can be obtained by querying the data path of the data import task through the interface of the data file management system.
According to one embodiment of the present invention, determining a task queue corresponding to the data import task according to the import data amount may specifically include: and determining a task queue corresponding to the data import task according to the import data quantity and the import data quantity threshold corresponding to each task queue. The threshold value of the data amount to be imported corresponding to each task queue may be preset, so that the task queue corresponding to the data import task is determined by combining the threshold value of the data amount to be imported corresponding to each task queue. Specifically, assume that the threshold value of the imported data amount of the real-time queue is λ real The threshold value of the imported data quantity of the offline queue is lambda offline . When the imported data amount of the data importing task is (0, lambda) real ]When the data importing task judges to enter a real-time queue; when the data import amount of the data import task is equal to (lambda) realoffline ]When the data is input, the data input task judges to enter an offline queue; imported data volume when data import task>λ offline And when the data import task judges that the data import task enters a data refreshing task queue.
FIG. 2 is a flow diagram of task queue allocation according to one embodiment of the invention. As shown in fig. 2, after receiving a data import request, generating a data import task, firstly, judging whether a task queue is designated, if yes, adding the data import task into the designated task queue; otherwise, judging whether the task history priority corresponding to the target table can be obtained, if so, importing the data into a task queue corresponding to the task history priority corresponding to the target table; otherwise, determining a task queue corresponding to the data import task according to the imported data quantity.
Step S105: and adding the data import task into the corresponding task queue, and processing the data import task by using a thread pool of the corresponding task queue to conduct data import. The task management service can add the data import task into the corresponding task queue according to the task queue corresponding to the data import task determined in the previous step, and then process the data import task by using the thread pool of the corresponding task queue. In the embodiment of the invention, the number of the tasks (task allocation numbers) which can be processed by doris can be allocated to each task queue in advance, so that the task queues can process tasks simultaneously, thereby ensuring the priority processing and the instantaneity of the tasks with high instantaneity requirements and ensuring that the tasks with low instantaneity requirements are not always in a stagnation state. For example, assuming that the task quota (quota) numbers of the thread pools of the real-time queue, the offline queue, and the data refresh task queue are real_queue_ quota, offline _queue_quota and temp_queue_quota, respectively, when the task management service starts, the upper limit values of the task quota numbers of the respective queues may be configured to be: max_real_queue_quota, max_offset_queue_quota, and max_temp_queue_quota. And a default value is set for the task allocation amount of each task queue, so that each queue can process tasks normally.
According to one embodiment of the present invention, the task information further includes an order priority corresponding to the data import task. Adding the data import task to the corresponding task queue may specifically include: determining a target position of the data import task in the corresponding task queue according to the corresponding sequence priority of the data import task and the sequence priority of the existing task in the corresponding task queue; and adding the data import task to the target position in the corresponding task queue. The sequence priority indicates the ordering position of a data import task in a corresponding task queue. According to an embodiment of the present invention, the calculation rule of the order priority is, for example: first, a default priority value is set, for example, 10; secondly, setting task priority values according to the corresponding business of the data import task, for example, the priority value of the display point elimination task is 10, the priority value of the order task is 8, the priority value of the purchase task is 6, and the like; third, setting a priority value of the accumulated execution time according to the accumulated execution time of the task, for example, the priority value when the accumulated execution time is <10min is 10, the priority value when 10min < = accumulated execution time is <20min is 20, the priority value when the accumulated execution time > = 20min is 30, the priority value when no accumulated execution time is 0, and so on; fourth, setting a data amount priority value according to the data amount corresponding to the data import task, for example, 10,1G = < priority value when data amount <1G is 5 when data amount <2G is priority value, 1 when data amount >2G is priority value, and so on; finally, the sum of the priority values obtained in the first to fourth steps is used as the sequence priority of one data import task, and the position of the data import task in the task queue can be determined according to the sequence priority of the data import task. Therefore, different sequence priorities can be set for each data import task according to the real-time requirements of the data import tasks, and therefore the task with high real-time requirements is guaranteed to be executed preferentially.
According to yet another embodiment of the present invention, the processing the data import task by using the thread pool of the corresponding task queue for data import may specifically include: acquiring the number of executing tasks in a thread pool of the corresponding task queue and a task number threshold of the corresponding task queue; taking out the data import task from the corresponding task queue and processing the data import task to conduct data import under the condition that the number of the executing tasks does not reach the task number threshold; and when the number of the executing tasks reaches the task number threshold, waiting until the number of the executing tasks in the thread pool of the corresponding task queue is reduced, and taking out the data import task from the corresponding task queue and processing the data import task to carry out data import. Specifically, for each task queue, there is a thread pool for processing tasks in the task queue, and there may be a thread in the thread pool for task scheduling.
FIG. 3 is a schematic diagram of a task processing flow process according to one embodiment of the invention. Taking a real-time queue as an example, as shown in fig. 3, a real-time task scheduler in a corresponding thread pool performs unified management on data import tasks in the real-time queue, and the real-time task scheduler acquires the number of tasks in the real-time queue, and if the number of tasks is 0, the process ends; otherwise, when there is a task quota (real_queue_quota > 0), the task is fetched from the real-time queue and executed, decrementing the real-time task quota (real_queue_quota=real_queue_quota-1). When the task allocation number (real_queue_quota) of the current real-time queue decreases to 0, that is, when there is no task quota, the real-time task scheduler indicates that the running real-time task reaches the upper limit, and after waiting for the thread pool allocated by doris for the real-time queue to execute a task, the task management service removes or retries the task after execution is completed or the task after execution fails, and releases the real-time task quota (real_queue_quota=real_queue_quota+1), and then notifies the real-time task scheduler to submit a new task until the number of tasks in the real-time queue is 0.
For a data import task in the offline queue, the offline task scheduler is similar to the real-time task scheduler in that the offline task scheduler traverses the tasks in the offline queue, and if there is an offline task quota (offluid_queue_quota > 0) currently, the tasks are fetched from the offline queue and executed, and an offline task quota (offluid_queue_quota=offluid_queue_quota-1) is consumed. If the current offline queue has no quota (offline_queue_quoa < =0), the offline task scheduler bulk-scheduler cannot submit offline tasks and enter a waiting state. When the offline task is executed, the task management service will recover the offline task quota (offlush_queue_quota=offlush_queue_quota+1) and notify the offline task scheduler to continue submitting the offline task.
For the data import task in the data refreshing task queue, the data import task is processed by a brush task scheduler temp-schedule, and all the data import tasks for data refreshing need to be provided with a specific mark when submitted, and the data import tasks for data refreshing without a special mark fail to be submitted. For a task determined to be a data import task for data refresh, it will have the highest priority in the data refresh task queue over a manually submitted data import task for data refresh. The data import task quota (quota) control for data refreshing is similar to the real-time queue and the offline queue, and tasks can be submitted when the quota is greater than 0, otherwise, the quota is waited for to recover. And the task management service monitors the tasks in real time, performs corresponding processing once the tasks are completed or fail, recovers the quota of the queue, and informs the brush-count task scheduler temp-seche handler to continue task submission.
According to a further embodiment of the invention, the task information further comprises task states, wherein the task states comprise, for example: unexecuted, completed, failed, etc. After the processing of the data import task by using the thread pool of the corresponding task queue, the method may further include: acquiring a task state of the data import task, and judging whether the data import task fails to be executed according to the task state; under the condition that the data import task fails to be executed, acquiring the execution time length when the data import task fails to be executed last time; determining a first order priority of the data import task according to the execution time when the previous execution of the data import task fails; determining a first target position of the data import task in the corresponding task queue according to the first sequence priority and the sequence priority of the existing task in the corresponding task queue; and adding the data import task to the first target position in the corresponding task queue. The data import tasks submitted to doris from the various queues may fail for a variety of reasons, either data or doris clusters. For a task in the queue that failed to execute, the task management service will initiate a retry to re-add the task to the previous task queue. At this time, the order priority of the task with the execution failure is recalculated as the first order priority, and the task with the execution failure is added to the task queue again according to the first order priority. Here, when calculating the first order priority, the execution duration of the data import task at the time of the previous execution failure may be acquired and taken as the accumulated execution time of the task, and then the first order priority is calculated according to the calculation rule of the aforementioned order priority.
According to a further embodiment of the invention, the method further comprises: monitoring the executed time length of the data import task being executed in each task queue; and alarming to prompt a user to process the executing data import task under the condition that the executed time length exceeds the preset maximum execution time length of the task. And for the tasks with the executed time periods exceeding the set maximum execution time periods of the tasks in the queues, informing the user of the alarm, and providing an interface for allowing the user to manually execute the tasks or perform interventional processing on the tasks.
According to a further embodiment of the invention, the method further comprises: monitoring the waiting time of the data import task in each task queue; and alarming to prompt a user to adjust the task priority under the condition that the waiting time exceeds a preset waiting time threshold. For the data import tasks with waiting time exceeding the preset waiting time threshold in each queue, the user can be warned and prompted, and an interface is provided for allowing the user to manually adjust the priority of the tasks. Therefore, the execution sequence of tasks with different real-time requirements can be more flexibly performed, and the requirements of various business scenes are met.
Fig. 4 is a schematic diagram of main modules of an apparatus for data import according to an embodiment of the present invention. As shown in fig. 4, the data importing apparatus 400 according to the embodiment of the present invention mainly includes: a first queue determining module 401, configured to determine, in response to receiving a data import request, whether a task queue corresponding to a data import task has been specified according to the data import request;
a task information obtaining module 402, configured to obtain task information of the data import task, where the task information includes a target table name and an import data volume corresponding to the data import task, where the task information is not assigned to a task queue;
a second queue determining module 403, configured to obtain a task history priority corresponding to a target table according to the target table name, so as to determine a task queue corresponding to the data import task according to the task history priority;
a third queue determining module 404, configured to determine, according to the amount of imported data, a task queue corresponding to the data import task if the task history priority corresponding to the target table cannot be obtained;
and the import task processing module 405 is configured to add the data import task to the corresponding task queue, and process the data import task by using the thread pool of the corresponding task queue to perform data import.
According to one embodiment of the invention, the third queue determination module 404 may also be configured to: and determining a task queue corresponding to the data import task according to the import data quantity and the import data quantity threshold corresponding to each task queue.
According to another embodiment of the present invention, import task processing module 405 can also be used to: acquiring the number of executing tasks in a thread pool of the corresponding task queue and a task number threshold of the corresponding task queue; taking out the data import task from the corresponding task queue and processing the data import task to conduct data import under the condition that the number of the executing tasks does not reach the task number threshold; and when the number of the executing tasks reaches the task number threshold, waiting until the number of the executing tasks in the thread pool of the corresponding task queue is reduced, and taking out the data import task from the corresponding task queue and processing the data import task to carry out data import.
According to another embodiment of the present invention, the task information further includes a sequential priority corresponding to the data import task; the import task processing module 405 may also be used to: determining a target position of the data import task in the corresponding task queue according to the corresponding sequence priority of the data import task and the sequence priority of the existing task in the corresponding task queue; and adding the data import task to the target position in the corresponding task queue.
According to a further embodiment of the invention, the task information further comprises a task state; the apparatus 400 for data import may further include a fail-over module (not shown in the figure) for: after the thread pool of the corresponding task queue is used for processing the data import task, acquiring the task state of the data import task, and judging whether the data import task fails to be executed according to the task state; under the condition that the data import task fails to be executed, acquiring the execution time length when the data import task fails to be executed last time; determining a first order priority of the data import task according to the execution time when the previous execution of the data import task fails; determining a first target position of the data import task in the corresponding task queue according to the first sequence priority and the sequence priority of the existing task in the corresponding task queue; and adding the data import task to the first target position in the corresponding task queue.
According to yet another embodiment of the present invention, the data importing apparatus 400 may further include a monitoring alarm module (not shown in the figure) for: monitoring the executed time length of the data import task being executed in each task queue; and alarming to prompt a user to process the executing data import task under the condition that the executed time length exceeds the preset maximum execution time length of the task.
According to yet another embodiment of the present invention, a monitoring alarm module (not shown) may also be used to: monitoring the waiting time of the data import task in each task queue; and alarming to prompt a user to adjust the task priority under the condition that the waiting time exceeds a preset waiting time threshold.
According to the technical scheme of the embodiment of the invention, whether a task queue corresponding to a data import task is appointed or not is judged according to the data import request by responding to the received data import request; under the condition that a task queue is not specified, task information of a data import task is obtained, wherein the task information comprises a target table name corresponding to the data import task and an import data volume; acquiring task history priorities corresponding to the target table according to the target table names so as to determine task queues corresponding to the data importing tasks according to the task history priorities; under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity; the technical scheme that the data import task is added into the corresponding task queue, the thread pool of the corresponding task queue is used for processing the data import task to conduct data import can distribute the data import task into different task queues, and the thread pool corresponding to each task queue is used for conducting task processing, so that grading processing of the task can be achieved, important tasks can be prevented from being unable to be processed in time, tasks with high real-time requirements can be guaranteed to be completed preferentially, tasks with low real-time requirements can be guaranteed to be executed opportunely, and the tasks cannot be in a stagnation state all the time.
Fig. 5 illustrates an exemplary system architecture 500 of a data import method or apparatus to which embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 505 via the network 504 using the terminal devices 501, 502, 503 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 501, 502, 503, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 501, 502, 503. The background management server can respond to the received data such as the data import request and judge whether a task queue corresponding to a data import task is appointed or not according to the data import request; under the condition that a task queue is not specified, acquiring task information of the data import task, wherein the task information comprises a target table name corresponding to the data import task and an import data volume; acquiring task history priority corresponding to a target table according to the target table name, and determining a task queue corresponding to the data importing task according to the task history priority; under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity; and adding the data import task into the corresponding task queue, processing the data import task by using a thread pool of the corresponding task queue to perform processing such as data import and feeding back a processing result (such as a data import result—only an example) to the terminal equipment.
It should be noted that, the method for importing data provided in the embodiment of the present invention is generally executed by the server 505, and accordingly, the device for importing data is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing a terminal device or server in accordance with an embodiment of the present invention. The terminal device or server shown in fig. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 601.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described units or modules may also be provided in a processor, for example, as: a processor includes a first queue determination module, a task information acquisition module, a second queue determination module, a third queue determination module, and an import task processing module. The names of these units or modules do not limit the units or modules themselves in some cases, for example, the import task processing module may also be described as a "module for adding the data import task to the corresponding task queue and processing the data import task for data import using the thread pool of the corresponding task queue".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: responding to a received data import request, and judging whether a task queue corresponding to a data import task is appointed according to the data import request; under the condition that a task queue is not specified, acquiring task information of the data import task, wherein the task information comprises a target table name corresponding to the data import task and an import data volume; acquiring task history priority corresponding to a target table according to the target table name, and determining a task queue corresponding to the data importing task according to the task history priority; under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity; and adding the data import task into the corresponding task queue, and processing the data import task by using a thread pool of the corresponding task queue to conduct data import.
According to the technical scheme of the embodiment of the invention, whether a task queue corresponding to a data import task is appointed or not is judged according to the data import request by responding to the received data import request; under the condition that a task queue is not specified, task information of a data import task is obtained, wherein the task information comprises a target table name corresponding to the data import task and an import data volume; acquiring task history priorities corresponding to the target table according to the target table names so as to determine task queues corresponding to the data importing tasks according to the task history priorities; under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity; the technical scheme that the data import task is added into the corresponding task queue, the thread pool of the corresponding task queue is used for processing the data import task to conduct data import can distribute the data import task into different task queues, and the thread pool corresponding to each task queue is used for conducting task processing, so that grading processing of the task can be achieved, important tasks can be prevented from being unable to be processed in time, tasks with high real-time requirements can be guaranteed to be completed preferentially, tasks with low real-time requirements can be guaranteed to be executed opportunely, and the tasks cannot be in a stagnation state all the time.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of data import, comprising:
responding to a received data import request, and judging whether a task queue corresponding to a data import task is appointed according to the data import request;
under the condition that a task queue is not specified, acquiring task information of the data import task, wherein the task information comprises a target table name corresponding to the data import task and an import data volume;
acquiring task history priority corresponding to a target table according to the target table name, and determining a task queue corresponding to the data importing task according to the task history priority;
under the condition that the task history priority corresponding to the target table cannot be acquired, determining a task queue corresponding to the data import task according to the import data quantity;
And adding the data import task into the corresponding task queue, and processing the data import task by using a thread pool of the corresponding task queue to conduct data import.
2. The method of claim 1, wherein determining a task queue corresponding to the data import task based on the amount of import data comprises:
and determining a task queue corresponding to the data import task according to the import data quantity and the import data quantity threshold corresponding to each task queue.
3. The method of claim 1, wherein processing the data import task for data import using the thread pool of the corresponding task queue comprises:
acquiring the number of executing tasks in a thread pool of the corresponding task queue and a task number threshold of the corresponding task queue;
taking out the data import task from the corresponding task queue and processing the data import task to conduct data import under the condition that the number of the executing tasks does not reach the task number threshold;
and when the number of the executing tasks reaches the task number threshold, waiting until the number of the executing tasks in the thread pool of the corresponding task queue is reduced, and taking out the data import task from the corresponding task queue and processing the data import task to carry out data import.
4. The method of claim 1, wherein the task information further comprises a sequential priority corresponding to the data import task;
adding the data import task to the corresponding task queue, including:
determining a target position of the data import task in the corresponding task queue according to the corresponding sequence priority of the data import task and the sequence priority of the existing task in the corresponding task queue;
and adding the data import task to the target position in the corresponding task queue.
5. The method of claim 1 or 4, wherein the task information further comprises a task state;
after the data import task is processed by using the thread pool of the corresponding task queue, the method further comprises:
acquiring a task state of the data import task, and judging whether the data import task fails to be executed according to the task state;
under the condition that the data import task fails to be executed, acquiring the execution time length when the data import task fails to be executed last time;
determining a first order priority of the data import task according to the execution time when the previous execution of the data import task fails;
Determining a first target position of the data import task in the corresponding task queue according to the first sequence priority and the sequence priority of the existing task in the corresponding task queue;
and adding the data import task to the first target position in the corresponding task queue.
6. The method according to claim 1, wherein the method further comprises:
monitoring the executed time length of the data import task being executed in each task queue;
and alarming to prompt a user to process the executing data import task under the condition that the executed time length exceeds the preset maximum execution time length of the task.
7. The method according to claim 1, wherein the method further comprises:
monitoring the waiting time of the data import task in each task queue;
and alarming to prompt a user to adjust the task priority under the condition that the waiting time exceeds a preset waiting time threshold.
8. An apparatus for importing data, comprising:
the first queue determining module is used for responding to the received data import request and judging whether a task queue corresponding to a data import task is appointed or not according to the data import request;
The task information acquisition module is used for acquiring task information of the data import task under the condition that a task queue is not specified, wherein the task information comprises a target table name and an import data volume corresponding to the data import task;
the second queue determining module is used for acquiring task history priorities corresponding to the target table according to the target table names so as to determine task queues corresponding to the data importing tasks according to the task history priorities;
the third queue determining module is used for determining a task queue corresponding to the data import task according to the import data quantity under the condition that the task history priority corresponding to the target table cannot be acquired;
and the import task processing module is used for adding the data import task into the corresponding task queue and processing the data import task by using the thread pool of the corresponding task queue so as to conduct data import.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202311464943.5A 2023-11-06 2023-11-06 Data importing method and device Pending CN117407196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311464943.5A CN117407196A (en) 2023-11-06 2023-11-06 Data importing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311464943.5A CN117407196A (en) 2023-11-06 2023-11-06 Data importing method and device

Publications (1)

Publication Number Publication Date
CN117407196A true CN117407196A (en) 2024-01-16

Family

ID=89497851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311464943.5A Pending CN117407196A (en) 2023-11-06 2023-11-06 Data importing method and device

Country Status (1)

Country Link
CN (1) CN117407196A (en)

Similar Documents

Publication Publication Date Title
US9268605B2 (en) Mechanism for facilitating sliding window resource tracking in message queues for fair management of resources for application servers in an on-demand services environment
CN109299348B (en) Data query method and device, electronic equipment and storage medium
US10776373B2 (en) Facilitating elastic allocation of organization-specific queue resources in an on-demand services environment
US20190213552A1 (en) Smart streaming of data between external systems and service providers in an on-demand environment
CN110738436A (en) method and device for determining available stock
CN112597126A (en) Data migration method and device
CN113127057A (en) Method and device for parallel execution of multiple tasks
CN112783887A (en) Data processing method and device based on data warehouse
CN108985805B (en) Method and device for selectively executing push task
CN112884181A (en) Quota information processing method and device
CN113742057A (en) Task execution method and device
CN113220705A (en) Slow query identification method and device
CN113760638A (en) Log service method and device based on kubernets cluster
CN112667368A (en) Task data processing method and device
CN117407196A (en) Data importing method and device
CN115438007A (en) File merging method and device, electronic equipment and medium
CN113626175B (en) Data processing method and device
CN113436003A (en) Duration determination method, duration determination device, electronic device, medium, and program product
CN113760176A (en) Data storage method and device
CN114612212A (en) Business processing method, device and system based on risk control
CN113762819A (en) Channel scheduling method and device
CN112862554A (en) Order data processing method and device
CN112015790A (en) Data processing method and device
CN113434754A (en) Method and device for determining recommended API (application program interface) service, electronic equipment and storage medium
CN111338916A (en) Method, device, electronic equipment and computer readable medium for processing service request

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination