WO2020228177A1 - Batch data processing method and apparatus, computer device and storage medium - Google Patents

Batch data processing method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2020228177A1
WO2020228177A1 PCT/CN2019/102672 CN2019102672W WO2020228177A1 WO 2020228177 A1 WO2020228177 A1 WO 2020228177A1 CN 2019102672 W CN2019102672 W CN 2019102672W WO 2020228177 A1 WO2020228177 A1 WO 2020228177A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
processing
time
task
Prior art date
Application number
PCT/CN2019/102672
Other languages
French (fr)
Chinese (zh)
Inventor
朱鹏程
王培�
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020228177A1 publication Critical patent/WO2020228177A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of big data technology, and in particular to a batch data processing method, device, computer equipment and storage medium.
  • the embodiments of the present application provide a batch data processing method, device, computer equipment, and storage medium to solve the problem of unreasonable system resources allocated during data batch processing.
  • a batch data processing method including:
  • Target idle information from the target idle time queue, where the target idle information includes a start time, a target idle time length, and an estimated number of threads;
  • the data to be processed corresponding to the quantity is determined as segmentation processing data
  • the system is determined to be in an idle state, the target processing thread corresponding to the estimated number of threads is obtained, and the target processing thread is used to switch the switch within the target idle time. Perform data processing on the processed data to obtain the data processing results;
  • the task status of each segmentation processing data in the data processing queue is updated.
  • a batch data processing device including:
  • the data processing queue creation module is used to select target batch tasks from the non-real-time task queue, and create a data processing queue based on the target batch tasks.
  • the data processing queue includes the data to be processed and the corresponding task status;
  • the target idle information determination module is used to determine the target idle information from the target idle time queue, and the target idle information includes the start time, the target idle time and the estimated number of threads;
  • the segmentation processing data determination module is used to obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the corresponding target number in the data processing queue
  • the to-be-processed data is determined to be segmented processing data
  • the system original load acquisition module is used to obtain the system original load when the current time of the system is the start time;
  • Data processing result acquisition module used to determine that the system is in an idle state if the original load of the system is less than the busy load threshold, obtain the target processing thread corresponding to the estimated number of threads, and use the target processing thread to split the target within the target idle time Data processing and obtaining data processing results;
  • the task status update module is used to update the task status of each processing data in the data processing queue based on the data processing result.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the target similarity is less than the preset similarity, determining that the reference tracking target corresponding to the target similarity is a lost tracking target in the current image;
  • the reference tracking target is a missing tracking target in N consecutive images after the current image, the reference tracking target is released.
  • One or more readable storage media storing computer readable instructions
  • the computer readable storage medium storing computer readable instructions
  • the one Or multiple processors perform the following steps:
  • the target similarity is less than the preset similarity, determining that the reference tracking target corresponding to the target similarity is a lost tracking target in the current image;
  • the reference tracking target is a missing tracking target in N consecutive images after the current image, the reference tracking target is released.
  • FIG. 1 is a schematic diagram of an application environment of a batch data processing method in an embodiment of the present application
  • FIG. 3 is another flowchart of a batch data processing method in an embodiment of the present application.
  • FIG. 5 is another flowchart of a batch data processing method in an embodiment of the present application.
  • FIG. 6 is another flowchart of a batch data processing method in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a batch data processing device in an embodiment of the present application.
  • Fig. 8 is a schematic diagram of a computer device in an embodiment of the present application.
  • the batch data processing method provided in the embodiment of the present application can be applied to the application environment shown in FIG. 1.
  • the batch data processing method is applied to a batch data processing system.
  • the batch data processing system includes a client and a server as shown in FIG. 1.
  • the client and the server communicate through a network for accurately dividing the batch data.
  • Use system idle time to complete batch data processing which not only ensures the progress and efficiency of data batch processing, but also does not affect the response speed of real-time tasks.
  • the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
  • the client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for processing batch data is provided.
  • the method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
  • S201 Select a target batch task from the non-real-time task queue, and create a data processing queue based on the target batch task.
  • the data processing queue includes the data to be processed and the corresponding task status.
  • the non-real-time task queue is a queue for storing non-real-time tasks, that is, at least one non-real-time task is stored in the non-real-time task queue.
  • Non-real-time tasks are a concept opposite to real-time tasks. Among them, real-time tasks are strong real-time tasks that require immediate response, such as user login, user query, and other operations performed by the user.
  • Non-real-time tasks are tasks that allow delay and can be processed asynchronously, including tasks that need to process batch data such as system-level log upload server tasks, statistical report generation calculations, and data analysis.
  • the target batch task refers to the non-real-time task that needs to be processed currently selected from at least one non-real-time task in the non-real-time task queue according to a preset sequence.
  • the target batch task can be a non-real-time task that requires batch data processing. task.
  • the order can be determined by the first-in-first-out principle of the queue, or by the priority order of different tasks and the first-in-first-out principle.
  • the data processing queue is a list created based on any target batch task and used to record the task status of each pending data in the target batch task.
  • the data to be processed refers to the smallest unit of data that needs to be processed in the target batch task, which can be understood as a log or a report and other data.
  • the task status corresponding to each data to be processed is used to reflect the status of each data to be processed by the system.
  • the task status specifically includes the unprocessed status, the processing status, the processing success status, and the processing failure status.
  • the server can select the non-real-time task that needs to be processed first from the non-real-time task queue according to the order of the non-real-time task queue, and determine it as the target batch task.
  • the system sorts non-real-time tasks according to preset sorting rules, and prioritizes the data that needs to be processed first to determine the order of data in the non-real-time tasks.
  • each data to be processed and the task status corresponding to the data to be processed are displayed. Understandably, when the data processing queue is created, each data processing queue is The task status of the data is unprocessed. When the system processes the data to be processed, the task status will be updated along with the processing process.
  • S202 Determine target idle information from the target idle time queue, where the target idle information includes a start time, an idle duration, and an estimated number of threads.
  • the target idle time queue refers to a queue that predicts the daily idle time of the system according to the time the system processes historical processing data.
  • Historical processing data refers to the information in the historical record that calls system resources to process historical real-time tasks.
  • the historical real-time task refers to the real-time task before the current time of the system.
  • Historical real-time tasks are strong real-time tasks that require immediate response, such as user login, user query, and other operations performed by the user.
  • the server has remaining threads in addition to the threads processing real-time tasks, the system is considered to be in idle time.
  • the target idle time queue contains at least one piece of original idle information, and the original idle information refers to information corresponding to an idle time whose idle duration is greater than a preset duration threshold in each day.
  • Each original idle information corresponds to a start time, original idle duration and estimated number of threads.
  • the original idle duration refers to the difference between the start time and the end time in the corresponding original idle information. For example, if the server is its original idle information from 7:00 to 7:15 every morning, the starting time of the original idle information is 7:00, and the idle time is 15 minutes.
  • the estimated number of threads is the predicted number of threads that can handle non-real-time tasks during idle time.
  • the target idle information refers to the original idle information selected from the target idle time queue whose starting time is closest to the current time of the system.
  • the server analyzes the processing time of the historical real-time tasks of the system based on the big data modeling method, and predicts the original idle time queue of the server every day. According to the current time of the system, the target idle time queue is selected from the pre-predicted target idle time queue.
  • the original idle information closest to the current time of the system is used as the target idle information to quickly determine the target idle information that can process the to-be-processed data, which improves the efficiency of determining the target idle information so that the target idle information corresponds to
  • Data batch processing is performed in the free time of the system, which speeds up the progress of the system in processing batch data, and ensures that the data to be processed can be processed later, so as to achieve the purpose of using the free time to perform batch processing of the data to be processed. For example, there are three original idle information in the predicted idle time queue, and their start times are 6:00, 7:00, and 8:00 respectively. If the current time of the system is 7:20, the target idle information at this time It is the original idle information corresponding to the start time of 8:00.
  • S203 Obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the data to be processed corresponding to the target number in the data processing queue, and determine it as Split processing data.
  • the target number refers to the number of data to be processed that can be processed by the thread corresponding to the estimated number of threads that the system calls within the target idle time period corresponding to the target idle information.
  • Splitting the processed data refers to the to-be-processed data that the system needs to process within the target idle time period corresponding to the target idle information.
  • the segmentation processing data is determined according to the actual data processing situation of the system, and the target number of segmentation processing data processed by the system each time is determined by the target idle time and the estimated number of threads.
  • the server performs time or type segmentation on the data to be processed in the data processing queue according to a preset data segmentation rule to obtain the number of batch data processed in the target idle time corresponding to the target idle information (ie Target number), and then select the data to be processed corresponding to the target number from the data processing queue, and determine it as segmentation processing data, so as to objectively allocate the number of batch data that can be processed in the free time corresponding to the target free information, which can ensure real-time Tasks are processed normally, and the target number of segmentation processing data can be processed in idle time.
  • a preset data segmentation rule to obtain the number of batch data processed in the target idle time corresponding to the target idle information (ie Target number)
  • time segmentation of the data to be processed can be to select the target number of data to be processed according to the order in which the data to be processed needs to be processed or to select the target according to the type of data to be processed (ie analysis type, log type, report type, etc.) The amount of data to be processed.
  • the original system load refers to the amount of system resources (real-time threads) occupied by the system when processing real-time tasks at the current time of the system. For example, at the start time of 8:00, if the system has the number of real-time threads N1 for processing real-time tasks and the number of reserved emergency threads N2, the system's original load is N1+N2. When the current time of the system is the starting time, the original load of the system is obtained to determine whether the system can process the segmentation processing data corresponding to non-real-time tasks in addition to real-time tasks.
  • the busy load threshold is a preset threshold used to evaluate whether the system is in a busy state and load.
  • the system is in an idle state and a busy state.
  • a busy load threshold is set on the system. Assuming that the maximum server load is M, the load occupied by the processing of real-time tasks is M*50% (M*50% is the busy load threshold) Below, the system is considered to be idle. Conversely, if the load occupied by the processing of the real-time task is more than M*50% (greater than or equal to), the system is deemed to be in a busy state.
  • the target processing thread refers to a thread that processes the segmentation processing data, and the target processing thread is a thread dedicated to processing non-real-time tasks.
  • the data processing result refers to the result of the system processing the segmentation processing data.
  • the current time of the system is the start time of the target idle information
  • the current load of the system is detected. If the current load of the system is less than the busy load threshold, the system is considered to be in an idle state, and the number of threads pre-allocated by the system is called
  • the corresponding target processing thread processes the segmentation processing data obtained during the target idle time corresponding to the target idle information, and processes the segmentation processing data when the system is idle, so as to allocate system resources reasonably, which avoids the system Processing batch data of non-real-time tasks occupies too many system resources, which leads to long processing time and slow processing speed of real-time tasks, and avoids failure to process the pending data corresponding to non-real-time tasks in time when the system is idle, resulting in system resources waste.
  • the communication loss between threads can be reduced, and the system performance loss can be reduced. That is, by assigning a target processing thread dedicated to processing non-real-time tasks, and using the target processing thread to process the segmentation processing data, the target processing thread does not need to switch between processing real-time tasks and data corresponding to non-real-time tasks, reducing The loss of communication between threads makes the loss of system performance smaller and achieves the purpose of rational use of system resources.
  • the processed data needs to be processed.
  • the task status is updated.
  • the system divides the corresponding processing data according to the processing failure status, so that the pending data of the task status as the processing failure status can be subsequently reviewed Or check and other processing to determine the reason for the processing failure; if the task status is updated to the processing success status when the processing is successful, the system removes the corresponding segmentation processing data from the data processing queue, and no further processing is performed.
  • the data processing result of each piece of data to be processed will be synchronized to the system, and the system will record and formulate the next execution rule, while ensuring that each piece of data to be processed is processed and successfully.
  • the system also presets a failure count threshold. If the processing failure count of any data to be processed reaches the failure count threshold, the corresponding reminder mechanism is triggered, and the corresponding pending data and processing process are recorded through the reminder mechanism The log information is sent to the audit terminal together, so that the audit staff of the audit terminal can deal with it accordingly.
  • the threshold of the number of failures is preset to limit the number of times that each data to be processed can be processed repeatedly. For example, the threshold of the number of failures can be set to three.
  • the system will automatically send a reminder email to the audit terminal, so that the auditor of the audit terminal can view the data to be processed, so as to realize the processing of the data to be processed. monitor.
  • the task state of all the data to be processed in the data processing queue is initialized to the unprocessed state.
  • the task status of the target amount of split processing data in the data processing queue is updated to the processing state.
  • the data processing results can be processed successfully or failed.
  • the task status corresponding to the segmentation processing data can be updated based on the data processing results. Processing success status or processing failure status, so as to update the task status of the data to be processed in the data processing queue in real time.
  • the current time of the system is the start time of the target idle information
  • the current load of the system is detected. If the current load of the system is not less than the busy load threshold, the system is determined to be in a busy state.
  • Affect the speed of the system processing real-time tasks it is necessary to re-determine a target idle information, that is, repeat the determination of the target idle information from the target idle time queue, and determine the next original idle information closest to the current time of the system as the target idle information. Realize the reasonable allocation of system resources and ensure the processing speed of real-time tasks.
  • a target batch task is selected from a non-real-time task queue, and a data processing queue is created based on the target batch task to reasonably allocate the processing sequence of data in the non-real-time task.
  • the target idle information is determined from the target idle time queue, which improves the efficiency of determining target idle information and speeds up the system to process batch data.
  • the segmentation processing data According to the target idle time and the estimated number of threads, determine the segmentation processing data corresponding to the target number to objectively allocate the amount of data that can be processed when the system is currently idle, thereby reducing the amount of single processing data, and ensuring that real-time tasks can be processed normally Under the premise, the segmentation processing data can be processed in the free time, so as to realize the reasonable allocation of the segmentation processing data that can be processed in each free time according to the system resources.
  • the target processing thread When the current time of the system is the starting time and the original load of the system is less than the busy load threshold, obtain the target processing thread corresponding to the estimated number of threads, and use the target processing thread to perform data segmentation processing within the target idle time Processing to reasonably allocate system resources and reduce the loss of communication between threads, so that the loss of system performance is small, and data processing results can be obtained. Based on the data processing result, update the task status of each sub-processing data in the data processing queue to ensure the successful processing of all data. If the original load of the system is not less than the busy load threshold, it is determined that the system is in a busy state, and the target idle information is determined from the target idle time queue repeatedly to ensure successful processing of the pending data.
  • the batch data processing method before step S201, that is, before selecting a target batch task from a non-real-time task queue and creating a data processing queue based on the target batch task, the batch data processing method further includes:
  • S301 Obtain a task processing request, where the task processing request includes a task to be processed and a task identifier corresponding to the task to be processed.
  • the task processing request refers to a request for processing all unprocessed tasks in the system.
  • the tasks to be processed include real-time tasks and non-real-time tasks.
  • Task identification refers to the identification of each task to be processed in advance, including real-time identification and non-real-time identification.
  • Real-time identification and non-real-time identification are designated by the system administrator. Take the simplest digital identification as an example, 1- represents real-time Identification, 2-9 represents non-real-time identification.
  • the real-time identification refers to the identification that indicates that the task to be processed is a real-time task, that is, if the task identification in a task processing request is a real-time identification, the corresponding task to be processed is a real-time task.
  • the non-real-time identification refers to an identification indicating that the task to be processed is a non-real-time task, that is, if the task identification in a task processing request is a non-real-time identification, the corresponding task to be processed is a non-real-time task. Understandably, real-time identification and non-real-time identification are relative. Tasks to be processed with real-time identification are real-time tasks and require immediate processing and response; tasks to be processed with non-real-time identification are non-real-time tasks and can be Processing when idle, in order to reasonably arrange the processing time of pending tasks and ensure the processing speed of real-time tasks.
  • the system when the system obtains the task processing request, it queries the task to be processed in the database, and each task to be processed corresponds to a task ID, and the task to be processed is processed correspondingly according to the carried task ID.
  • the task to be processed may be login, password retrieval, log processing, and report processing.
  • the system will mark login and password retrieval and other tasks that require real-time processing with real-time identification.
  • the system will process log and report processing, etc. Tasks that can be processed with a delay are marked as non-real-time identifiers.
  • the server obtains the task ID as a real-time ID, indicating that the task to be processed is a real-time task and needs to be processed immediately, then the system resource is called to process the task to be processed to ensure that the task to be processed corresponding to the real-time ID can be instantly processed. Responsive processing, improving the processing efficiency of real-time tasks.
  • the task type refers to the type of data processing corresponding to the task to be processed.
  • the types of tasks to be processed may be log processing types, report processing types, and analysis processing types.
  • Task priority refers to a parameter that determines the order of processing for each task to be processed when the system processes multiple tasks to be processed. For example, 2-9 represents non-real-time identification, and the smaller the number in 2-9, the higher the priority, and the more priority processing is required.
  • the task priority of the task to be processed is determined by the processing type, and the task to be processed is stored in the non-real-time task queue according to the order of priority from low to high.
  • the server determines the task priority corresponding to the task type according to the task type in the task to be processed, and stores the task to be processed with a higher priority in the non-real-time task.
  • the front position of the task queue to determine the order of task priority corresponding to different task types, and to make the pending tasks with the priority of the task priority processed, store the pending tasks in the non-real-time task queue, so as to arrange non-real-time tasks reasonably The processing sequence of real-time tasks.
  • the processing order between different task types can be determined according to the task priority corresponding to the task type of the task to be processed, and the task to be processed of the same task type is based on the first in first out of the queue. Principles are sorted.
  • the server obtains the task to be processed and the task identifier corresponding to the task to be processed to reasonably arrange the processing time of the task to be processed and ensure the processing speed of the real-time task.
  • the task identifier is a real-time identifier
  • the task to be processed is executed immediately to ensure the processing speed and response time of the task to be processed with the real-time identifier.
  • the task identifier is a non-real-time identifier
  • the task type in the task processing request is obtained to determine the task priority of the task to be processed. According to the order of task priority, the task to be processed is stored in the non-real-time task queue, which is reasonable Arrange the processing order of non-real-time tasks to ensure that the pending tasks with the first priority are processed first.
  • the batch data processing method before step S202, that is, before determining target idle information from the target idle time queue, the batch data processing method further includes:
  • historical processing data refers to the information in the historical record that calls system resources to process historical real-time tasks.
  • the historical real-time task refers to the real-time task before the current time of the system.
  • the historical processing time refers to the time interval formed between the start time and the end time of the server processing historical real-time tasks in the historical record.
  • the number of historical threads refers to the number of threads called when processing historical real-time tasks in historical records.
  • the server obtains a large amount of historical processing data in order to analyze the data corresponding to these historical processing data, to analyze the objective laws existing in the historical processing data, and determine the current situation of daily system resources, so that the subsequent reasonable allocation of system processing real-time Time and resources for tasks and non-real-time tasks.
  • S402 Perform big data modeling on historical processing time, historical processing number, and historical thread number based on machine learning algorithms, and obtain an original idle time queue.
  • the original idle time queue includes at least one original idle information, and each original idle information includes a start time , Original idle time and estimated number of threads.
  • the original idle time queue refers to a queue that predicts the daily idle time of the system according to the time the system processes historical processing data.
  • the original idle information refers to the information corresponding to the idle time in which the idle time is greater than the preset time threshold in each day.
  • historical processing time, historical processing quantity, and historical thread number are information about the system periodically processing data.
  • Machine learning algorithms are used to perform big data modeling on historical processing time, historical processing quantity, and historical thread number to determine the system Historical processing threads and idle states at any time of the day, so as to realize the starting time, idle time and the number of real-time threads that need to be processed in a certain idle interval, and calculate the estimation of the non-real-time tasks that can be processed during the idle time
  • the number of threads using machine learning algorithms to make the obtained original idle time queue objective and accurate.
  • the system divides historical processing data into weekly cycles, and uses machine learning algorithms to analyze historical idle information every Monday, Tuesday...Sunday, to obtain the original idle time queue, and quickly obtain regular original idle time. queue.
  • the machine learning algorithm includes but is not limited to logistic regression algorithm and LSTM neural network algorithm.
  • the first duration threshold is a preset threshold used to evaluate whether the idle duration of the system reaches the duration identified as idle time.
  • the setting of the first duration threshold can eliminate the situation that the original idle time is short, and ensure that each original idle time stored in the target idle time queue is within the original idle time corresponding to more non-real-time tasks to be processed. Data helps to reduce the loss of communication between subsequent threads to achieve the purpose of reasonable allocation of system resources.
  • the corresponding original idle information is stored in the target idle time queue, that is, the original idle information whose original idle time is greater than the first time threshold is assembled together to construct Target idle time queue.
  • the historical processing time, the historical processing number, and the historical thread number are based on a machine learning algorithm to perform big data modeling to make the obtained original idle time queue objective.
  • the original idle time is greater than the first time threshold, the original idle information is determined as the target idle information to ensure that the original idle time corresponding to each original idle time stored in the target idle time queue can handle more non-real time
  • the to-be-processed data corresponding to the task helps to reduce the loss of communication between subsequent threads to achieve the purpose of rationally allocating system resources.
  • step S203 the target number is acquired based on the number of data to be processed, the target idle time and the estimated number of threads, and the data to be processed corresponding to the target number is selected in the data processing queue , Determined as segmentation processing data, including:
  • S501 Use an estimated time calculation formula to calculate the number of data to be processed and the estimated number of threads, and obtain an estimated processing time corresponding to the data processing queue.
  • the estimated time calculation formula refers to a formula used to calculate the time required for the system to process the data to be processed in the data processing queue.
  • Estimated processing time refers to the time required for the system to process all the data to be processed in the data processing queue.
  • the estimated time calculation formula is T1 is the estimated processing time based on the data processing queue, S is the number of data to be processed, N p is the estimated number of threads that can be processed in idle time, and x is the data processing volume of each thread per unit time. According to the estimated time calculation formula, the estimated processing time of all data to be processed in the system processing data processing queue can be quickly calculated.
  • S502 Use the target quantity calculation formula to calculate the estimated processing time and the target idle time to obtain the target quantity.
  • the target quantity obtaining formula is a formula used to calculate the quantity of data to be processed by the system within the target idle time period.
  • the formula for calculating the target quantity is Among them, X is the target number, S is the number of data to be processed, T1 is the estimated processing time based on the data processing queue, and T2 is the target idle time. According to the target number calculation formula, the number of data processed by the system in the target idle time can be quickly obtained. For example, it is known that the amount of data in a certain data processing queue is 1000, and the estimated processing time based on 1000 to-be-processed data is the estimated processing time T1.
  • S503 According to a preset filtering rule, select the data to be processed corresponding to the target quantity in the data processing queue, and determine it as segmentation processing data.
  • the preset filtering rules refer to the preset rules for selecting the data to be processed.
  • the preset filtering rules usually filter the data to be processed in the order of task priority from high to low, so that the data processing sequence is sequential. Ensure that the data with the task priority first is processed first. Understandably, in the case of the same task priority, the queue's first-in-first-out principle is used to determine its screening order, so as to obtain the corresponding segmentation processing data.
  • the server selects the target amount of data to be processed in the data processing queue according to preset filtering rules (that is, the order of task priority from high to low) to determine the priority of the task to be processed first.
  • the data is determined to be segmented and processed and fed back to the system for processing.
  • the estimated time calculation formula is used to calculate the number of data to be processed and the estimated number of threads, and then the target number calculation formula is used to calculate the estimated processing time and target idle time.
  • the estimated time calculation formula and the target quantity calculation formula can quickly obtain the target quantity, ensure the objectivity of the target quantity, and ensure the subsequent accurate segmentation and processing of the data.
  • the data to be processed corresponding to the target quantity is selected according to the preset filtering rules, and determined as the segmentation processing data, so as to realize the reasonable distribution of the data to be processed.
  • preset filtering rules are set in advance to determine the task priority of the batch data.
  • the batch data processing method further includes:
  • the server monitors the task status of data segmentation processing in the data processing queue in real time.
  • the current time of the system is obtained. If the current time of the system is within the idle time from the start time, it is based on System current time and idle time, determine the remaining time, that is, the remaining time is the difference between the deadline of idle time and the current time of the system, so that when the remaining time is long, continue to process batch data and make full use of the idle time corresponding to the remaining time time. For example, if the start time is 8:00, the idle time is 30 minutes, and the current system time is 8:20, the remaining time is 10 minutes.
  • S602 If the remaining duration is greater than the second duration threshold, update the remaining duration to the target idle duration, and determine the corresponding processable data amount based on the updated target idle duration and the estimated number of threads.
  • the second duration threshold is a preset threshold for judging whether the remaining duration is long enough.
  • the second duration threshold may be the same as or different from the first duration threshold, and may be set to 30s or other values.
  • the server can determine the corresponding processable data volume by using the processable data volume obtaining formula.
  • K is the processable data volume corresponding to the remaining time
  • N p is the estimated number of threads that can be processed in idle time
  • x is the data processing volume of each thread per unit time
  • T3 idle time.
  • the server updates the remaining time to the target idle time, and then continues to process the pending data according to the updated target idle time to ensure that real-time tasks are not affected, and other pending data is further processed in batches. Speed up the processing speed of the data to be processed.
  • the server needs to first determine whether the remaining time is greater than the second time threshold.
  • the remaining time is greater than the second time threshold, it means that the remaining time corresponding to the target idle information is longer, and the system can be fully utilized to continue processing batch tasks without As a result, the system is too busy. Therefore, the corresponding processable data volume can be determined based on the remaining time, so that the to-be-processed data corresponding to the processable data volume can be processed within the remaining time, so as to improve the processing efficiency of batch data.
  • S603 Select the to-be-processed data corresponding to the number of processes that can be processed in the data processing queue, update to split processing data, obtain the target processing thread corresponding to the estimated number of threads, and use the target processing thread for repeated execution within the target idle time Perform data processing on the segmentation processing data to obtain the data processing result.
  • the server obtains the processed data corresponding to the number of processed data according to the acquired target idle time and the estimated number of threads, and updates it to segmentation processing data, which improves the processing efficiency of batch data and speeds up the data to be processed in the target batch task Then, the processing thread is used to perform data processing on the segmentation processing data to obtain the data processing result.
  • the server obtains the remaining time, and when the remaining time is greater than the second time threshold, the remaining time is updated to the target idle time, based on the updated target
  • the idle time and the estimated number of threads determine the corresponding processable data volume to improve the processing efficiency of batch data. Select the to-be-processed data corresponding to the number of processes that can be processed in the data processing queue, update it to split processing data, and obtain the target processing thread corresponding to the estimated number of threads to ensure that the system does not affect the processing of real-time tasks while speeding up batch data The processing speed.
  • the batch data processing method further includes: real-time monitoring of the current load of the system during the data processing process, if The current load of the system is greater than the burst load threshold, the target processing thread is released, the data processing of the segmentation processing data is stopped, and the task status of the segmentation processing data is updated to the stopped state.
  • the burst load threshold is a preset threshold used to evaluate whether the system receives a large burst of load. Generally speaking, the burst load threshold is greater than the average busy load.
  • the server's current load is greater than the burst load threshold, it means that the system currently receives a large number of task processing requests carrying real-time identifiers, making the current load of the system too heavy.
  • the processing of the target batch task needs to be suspended to release the target processing thread occupied by the target batch task processing, that is, the system will actively reduce the number of target processing threads in the batch processing to give priority to the data processing corresponding to the real-time task.
  • the target processing thread when the current load of the system is greater than the burst load threshold, the target processing thread is released, and the data processing of the segmentation processing data is stopped to update its task status to the stopped state; understandably, in the next idle time Inside, the server preferentially processes the segmentation processing data in the data processing queue with the task status in the stopped state to ensure the efficiency of data batch processing.
  • a batch data processing device is provided, and the batch data processing device corresponds to the batch data processing method in the foregoing embodiment one-to-one.
  • the batch data processing device includes a data processing queue creation module 701, a target idle information determination module 702, a segmentation processing data determination module 703, a system raw load acquisition module 704, a data processing result acquisition module 705, and task status Update module 706.
  • the detailed description of each functional module is as follows:
  • the data processing queue creation module 701 is used to select a target batch task from a non-real-time task queue, and create a data processing queue based on the target batch task.
  • the data processing queue includes the data to be processed and the corresponding task status.
  • the target idle information determining module 702 is configured to determine target idle information from the target idle time queue, and the target idle information includes a start time, a target idle time length, and an estimated number of threads.
  • the segmentation processing data determination module 703 is used to obtain the number of data to be processed corresponding to the data to be processed, and to obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the target number in the data processing queue.
  • the corresponding to-be-processed data is determined to be segmented processing data.
  • the system original load obtaining module 704 is configured to obtain the original system load when the current time of the system is the start time.
  • the data processing result acquisition module 705 is used to determine that the system is in an idle state if the original system load is less than the busy load threshold, acquire the target processing thread corresponding to the estimated number of threads, and use the target processing thread to split within the target idle time Process data for data processing, and obtain data processing results.
  • the task status update module 706 is used to update the task status of each piece of processing data in the data processing queue based on the data processing result.
  • the batch data processing device further includes: a system busy module.
  • the system busy module is used to determine that the system is in a busy state if the original load of the system is not less than the busy load threshold, and repeatedly execute to determine the target idle information from the target idle time queue.
  • the batch data processing device further includes: a task processing request acquisition module, a real-time identification module, and a non-real-time identification module.
  • the task processing request acquiring module is used to acquire the task processing request, and the task processing request includes the task to be processed and the task identifier corresponding to the task to be processed.
  • the real-time identification module is used to execute the task to be processed if the task identification is a real-time identification.
  • the non-real-time identification module is used to obtain the task type in the task processing request if the task identification is a non-real-time identification, determine the task priority of the task to be processed based on the task type, and store the task to be processed in the order of task priority Non-real-time task queue.
  • the batch data processing device further includes: a historical processing data acquisition module, an original idle time queue acquisition module, and an original idle information storage module.
  • the historical processing data acquisition module is used to acquire historical processing data.
  • the historical processing data includes historical processing time, historical processing quantity, and historical thread number.
  • the original idle time queue acquisition module is used to perform big data modeling of historical processing time, historical processing quantity, and historical thread number based on machine learning algorithms to obtain the original idle time queue.
  • the original idle time queue includes at least one piece of original idle information.
  • the original idle information includes the start time, the original idle time, and the estimated number of threads.
  • the original idle information storage module is configured to store the original idle information on the target idle time queue if the original idle time is greater than the first time threshold.
  • the segmentation processing data determination module 703 includes: an estimated processing time acquisition unit, a target quantity acquisition unit, and a preset screening rule unit.
  • the estimated processing time obtaining unit is used to calculate the number of data to be processed and the estimated number of threads using an estimated time calculation formula to obtain the estimated processing time corresponding to the data processing queue.
  • the target quantity obtaining unit is used to calculate the estimated processing time and the target idle time using the target quantity calculation formula to obtain the target quantity.
  • the preset screening rule unit is used to select the to-be-processed data corresponding to the target quantity in the data processing queue according to the preset screening rule, and determine it as segmentation processing data.
  • the batch data processing device further includes: a remaining time acquisition module, a processable data amount determination module, and a target processing thread acquisition module.
  • the remaining time obtaining module is used to obtain the remaining time based on the current time of the system, the starting time and the target idle time if the task status of each sub-processed data in the data processing queue is updated to the completed processing state.
  • the processable data amount determination module is configured to update the remaining time to the target idle time if the remaining time is greater than the second time threshold, and determine the corresponding processable data amount based on the updated target idle time and the estimated number of threads.
  • the target processing thread acquisition module is used to select the to-be-processed data corresponding to the number of processed data in the data processing queue, update it to split processing data, obtain the target processing thread corresponding to the estimated number of threads, and repeat execution when the target is idle Use the target processing thread to perform data processing on the segmentation processing data within the time period, and obtain the data processing result.
  • the batch data processing device further includes: a real-time monitoring module.
  • the real-time monitoring module is used to monitor the current load of the system in the process of data processing in real time. If the current load of the system is greater than the burst load threshold, the target processing thread will be released, the data processing of the segmentation processing data will be stopped, and the task of processing the data will be segmented The status is updated to the stopped status.
  • each module in the above-mentioned batch data processing device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium.
  • the database of the computer equipment is used to execute the data used or generated in the above batch data processing method, such as target batch tasks.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize a batch data processing method.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor executes the computer-readable instructions to implement batches in the foregoing embodiments.
  • the data processing method such as S201-S207 shown in FIG. 2, or shown in FIG. 3 to FIG. 6, is not repeated here to avoid repetition.
  • the processor implements the functions of the modules/units in this embodiment of the batch data processing device when the processor executes computer-readable instructions, for example, the data processing queue creation module 701, the target idle information determination module 702, and the segmentation shown in FIG.
  • the functions of the processed data determining module 703, the system original load acquiring module 704, the data processing result acquiring module 705, and the task status updating module 706 are not repeated here to avoid repetition.
  • one or more readable storage media storing computer readable instructions are provided.
  • the one or more processors execute the foregoing
  • the batch data processing method in the embodiment, for example, S201-S207 shown in FIG. 2, or shown in FIG. 3 to FIG. 6, is not repeated here to avoid repetition.
  • the one or more processors realize the functions of each module/unit in the embodiment of the batch data processing apparatus when executed, for example, FIG.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

Disclosed by the present application are a batch data processing method and apparatus, a computer device and a storage medium, the method comprising: selecting a target batch task from a non-real time task queue, and creating a data processing queue on the basis of the target batch task; determining target idle information from within a target idle time queue, and acquiring the quantity of data to be processed corresponding to the data to be processed; acquiring a target quantity on the basis of the quantity of data to be processed, the target idle duration and an estimated number of threads; selecting, from within the data processing queue, data to be processed that corresponds to the target quantity, and determining said data to be segmentation process data; when the original load of a system is less than a busy load threshold; performing data processing on the segmentation process data by using the target processing thread during the target idle duration, and acquiring a data processing result; and updating the task status of each piece of segmentation process data in the data processing queue on the basis of the data processing result. The described method may rationally allocate system resources, and guarantees the efficiency of batch data processing.

Description

批量数据处理方法、装置、计算机设备及存储介质Batch data processing method, device, computer equipment and storage medium
本申请以2019年5月16日提交的申请号为201910405149.0,名称为“批量数据处理方法、装置、计算机设备及存储介质”的中国发明申请为基础,并要求其优先权。This application is based on the Chinese invention application filed on May 16, 2019 with the application number 201910405149.0 and titled "Batch data processing method, device, computer equipment and storage medium", and claims its priority.
技术领域Technical field
本申请涉及大数据技术领域,尤其涉及一种批量数据处理方法、装置、计算机设备及存储介质。This application relates to the field of big data technology, and in particular to a batch data processing method, device, computer equipment and storage medium.
背景技术Background technique
随着大数据技术的发展,众多领域都会使用大数据技术对相关数据进行处理。但随着业务的增长和时间积累,数据库中的数据量达到上亿级别,若直接对数据库中的数据进行批量处理时,对系统的资源占用过大,而影响数据处理效率。例如,在对数据库中的数据进行批量处理时,由于所需处理的数据较多,导致其所需的系统资源较多且处理时间较长;若数据批量处理过程分配的系统资源较多,会占用实时任务的系统资源,影响需实时响应的实时任务的响应速度,导致用户的等待时间较长;若数据批量处理过程中分配的系统资源较少,会影响非实时任务的数据处理进度,导致非实时任务对应的数据积压。With the development of big data technology, many fields will use big data technology to process related data. However, with the growth of business and the accumulation of time, the amount of data in the database has reached hundreds of millions. If the data in the database is directly processed in batches, the system's resources are too large and the data processing efficiency is affected. For example, when processing data in a database in batches, because more data needs to be processed, it requires more system resources and longer processing time; if the data batch processing process allocates more system resources, it will Occupying system resources of real-time tasks affects the response speed of real-time tasks that require real-time response, resulting in longer waiting time for users; if less system resources are allocated during data batch processing, it will affect the data processing progress of non-real-time tasks, resulting in Data backlog corresponding to non-real-time tasks.
发明内容Summary of the invention
本申请实施例提供一种批量数据处理方法、装置、计算机设备及存储介质,以解决数据批量处理过程中分配的系统资源不合理时存在的问题。The embodiments of the present application provide a batch data processing method, device, computer equipment, and storage medium to solve the problem of unreasonable system resources allocated during data batch processing.
一种批量数据处理方法,包括:A batch data processing method, including:
从非实时任务队列中选取目标批量任务,基于所述目标批量任务创建数据处理队列,所述数据处理队列包括待处理数据和对应的任务状态;Selecting a target batch task from a non-real-time task queue, and creating a data processing queue based on the target batch task, the data processing queue including the data to be processed and the corresponding task status;
从目标空闲时间队列中确定目标空闲信息,所述目标空闲信息包括起始时间、目标空闲时长和预估线程数;Determine target idle information from the target idle time queue, where the target idle information includes a start time, a target idle time length, and an estimated number of threads;
获取所述待处理数据对应的待处理数据数量,基于所述待处理数据数量、所述目标空闲时长和所述预估线程数,获取目标数量,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据;Obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the target number from the data processing queue. The data to be processed corresponding to the quantity is determined as segmentation processing data;
在系统当前时间为所述起始时间时,获取系统原始负载;When the current time of the system is the start time, obtain the original load of the system;
若所述系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与所述预估线程数相对应的目标处理线程,在所述目标空闲时长内采用所述目标处理线程对所述切分处理数据进行数据处理,获取数据处理结果;If the original load of the system is less than the busy load threshold, the system is determined to be in an idle state, the target processing thread corresponding to the estimated number of threads is obtained, and the target processing thread is used to switch the switch within the target idle time. Perform data processing on the processed data to obtain the data processing results;
基于所述数据处理结果,更新所述数据处理队列中每一所述切分处理数据的任务状态。Based on the data processing result, the task status of each segmentation processing data in the data processing queue is updated.
一种批量数据处理装置,包括:A batch data processing device, including:
数据处理队列创建模块,用于从非实时任务队列中选取目标批量任务,基于目标批量任务创建数据处理队列,数据处理队列包括待处理数据和对应的任务状态;The data processing queue creation module is used to select target batch tasks from the non-real-time task queue, and create a data processing queue based on the target batch tasks. The data processing queue includes the data to be processed and the corresponding task status;
目标空闲信息确定模块,用于从目标空闲时间队列中确定目标空闲信息,目标空闲信息包括起始时间、目标空闲时长和预估线程数;The target idle information determination module is used to determine the target idle information from the target idle time queue, and the target idle information includes the start time, the target idle time and the estimated number of threads;
切分处理数据确定模块,用于获取待处理数据对应的待处理数据数量,基于待处理数据数量、目标空闲时长和预估线程数,获取目标数量,在数据处理队列中选取与目标数量相对应的待处理数据,确定为切分处理数据;The segmentation processing data determination module is used to obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the corresponding target number in the data processing queue The to-be-processed data is determined to be segmented processing data;
系统原始负载获取模块,用于在系统当前时间为起始时间时,获取系统原始负载;The system original load acquisition module is used to obtain the system original load when the current time of the system is the start time;
数据处理结果获取模块,用于若系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与预估线程数相对应的目标处理线程,在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理,获取数据处理结果;Data processing result acquisition module, used to determine that the system is in an idle state if the original load of the system is less than the busy load threshold, obtain the target processing thread corresponding to the estimated number of threads, and use the target processing thread to split the target within the target idle time Data processing and obtaining data processing results;
任务状态更新模块,用于基于数据处理结果,更新数据处理队列中每一切分处理数据的任务状态。The task status update module is used to update the task status of each processing data in the data processing queue based on the data processing result.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
获取原始视频数据,所述原始视频数据包括至少两帧图像;Acquiring original video data, where the original video data includes at least two frames of images;
从所述原始视频数据中选取基准图像,对所述基准图像进行目标检测,获取至少一个基准跟踪目标和对应的基准目标特征向量;Selecting a reference image from the original video data, performing target detection on the reference image, and acquiring at least one reference tracking target and a corresponding reference target feature vector;
对所述原始视频数据中的当前图像进行目标检测,获取至少一个当前跟踪目标和对应的当前目标特征向量;Performing target detection on the current image in the original video data to obtain at least one current tracking target and a corresponding current target feature vector;
计算任一所述基准目标特征向量与所有当前目标特征向量的特征相似度,以确定所述基准目标特征向量对应的目标相似度;Calculating the feature similarity between any one of the reference target feature vectors and all current target feature vectors to determine the target similarity corresponding to the reference target feature vector;
若所述目标相似度小于预设相似度,则确定所述目标相似度对应的基准跟踪目标在所述当前图像中为丢失跟踪目标;If the target similarity is less than the preset similarity, determining that the reference tracking target corresponding to the target similarity is a lost tracking target in the current image;
若所述基准跟踪目标在所述当前图像之后的连续N帧图像均为丢失跟踪目标,则释放所述基准跟踪目标。If the reference tracking target is a missing tracking target in N consecutive images after the current image, the reference tracking target is released.
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one Or multiple processors perform the following steps:
获取原始视频数据,所述原始视频数据包括至少两帧图像;Acquiring original video data, where the original video data includes at least two frames of images;
从所述原始视频数据中选取基准图像,对所述基准图像进行目标检测,获取至少一个基准跟踪目标和对应的基准目标特征向量;Selecting a reference image from the original video data, performing target detection on the reference image, and acquiring at least one reference tracking target and a corresponding reference target feature vector;
对所述原始视频数据中的当前图像进行目标检测,获取至少一个当前跟踪目标和对应的当前目标特征向量;Performing target detection on the current image in the original video data to obtain at least one current tracking target and a corresponding current target feature vector;
计算任一所述基准目标特征向量与所有当前目标特征向量的特征相似度,以确定所述基准目标特征向量对应的目标相似度;Calculating the feature similarity between any one of the reference target feature vectors and all current target feature vectors to determine the target similarity corresponding to the reference target feature vector;
若所述目标相似度小于预设相似度,则确定所述目标相似度对应的基准跟踪目标在所述当前图像中为丢失跟踪目标;If the target similarity is less than the preset similarity, determining that the reference tracking target corresponding to the target similarity is a lost tracking target in the current image;
若所述基准跟踪目标在所述当前图像之后的连续N帧图像均为丢失跟踪目标,则释放所述基准跟踪目标。If the reference tracking target is a missing tracking target in N consecutive images after the current image, the reference tracking target is released.
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1是本申请一实施例中批量数据处理方法的一应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a batch data processing method in an embodiment of the present application;
图2是本申请一实施例中批量数据处理方法的一流程图;2 is a flowchart of a batch data processing method in an embodiment of the present application;
图3是本申请一实施例中批量数据处理方法的另一流程图;FIG. 3 is another flowchart of a batch data processing method in an embodiment of the present application;
图4是本申请一实施例中批量数据处理方法的另一流程图;4 is another flowchart of a batch data processing method in an embodiment of the present application;
图5是本申请一实施例中批量数据处理方法的另一流程图;FIG. 5 is another flowchart of a batch data processing method in an embodiment of the present application;
图6是本申请一实施例中批量数据处理方法的另一流程图;FIG. 6 is another flowchart of a batch data processing method in an embodiment of the present application;
图7是本申请一实施例中批量数据处理装置的一示意图;FIG. 7 is a schematic diagram of a batch data processing device in an embodiment of the present application;
图8是本申请一实施例中计算机设备的一示意图。Fig. 8 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.
本申请实施例提供的批量数据处理方法,该批量数据处理方法可应用如图1所示的应用环境中。具体地,该批量数据处理方法应用在批量数据处理系统中,该批量数据处理系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于精确切分批量数据,利用系统空闲时间完成批量数据处理,既保证数据批量处理的进度和效率,又不影响实时任务的响应速度。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The batch data processing method provided in the embodiment of the present application can be applied to the application environment shown in FIG. 1. Specifically, the batch data processing method is applied to a batch data processing system. The batch data processing system includes a client and a server as shown in FIG. 1. The client and the server communicate through a network for accurately dividing the batch data. Use system idle time to complete batch data processing, which not only ensures the progress and efficiency of data batch processing, but also does not affect the response speed of real-time tasks. Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种批量数据处理方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 2, a method for processing batch data is provided. The method is applied to the server in FIG. 1 as an example for description, and includes the following steps:
S201:从非实时任务队列中选取目标批量任务,基于目标批量任务创建数据处理队列,数据处理队列包括待处理数据和对应的任务状态。S201: Select a target batch task from the non-real-time task queue, and create a data processing queue based on the target batch task. The data processing queue includes the data to be processed and the corresponding task status.
其中,非实时任务队列是用于存储非实时任务的队列,即非实时任务队列中存储至少一个非实时任务。非实时任务是与实时任务相对的概念。其中,实时任务是强实时的,要求马上回应的任务,如用户登录、用户查询和用户执行的其他操作对应的任务。非实时任务是允许延时的,可异步处理的任务,包括系统级日志上传服务器的任务、统计报表的生成计算和数据分析等需要处理批量数据的任务。The non-real-time task queue is a queue for storing non-real-time tasks, that is, at least one non-real-time task is stored in the non-real-time task queue. Non-real-time tasks are a concept opposite to real-time tasks. Among them, real-time tasks are strong real-time tasks that require immediate response, such as user login, user query, and other operations performed by the user. Non-real-time tasks are tasks that allow delay and can be processed asynchronously, including tasks that need to process batch data such as system-level log upload server tasks, statistical report generation calculations, and data analysis.
目标批量任务是指依据预设的先后顺序,从非实时任务队列中的至少一个非实时任务中选中的当前需要处理的非实时任务,该目标批量任务具体可以为需要进行批量数据处理的非实时任务。该先后顺序的确定可以由队列的先进先出原则确定,也可以由不同任务的优先级顺序和先进先出原则确定。The target batch task refers to the non-real-time task that needs to be processed currently selected from at least one non-real-time task in the non-real-time task queue according to a preset sequence. The target batch task can be a non-real-time task that requires batch data processing. task. The order can be determined by the first-in-first-out principle of the queue, or by the priority order of different tasks and the first-in-first-out principle.
数据处理队列是基于任一目标批量任务所创建的用于记录目标批量任务中每一待处理数据的任务状态的列表。待处理数据是指在目标批量任务中需要进行处理的最小单位数据,可以理解为一个日志或者一个报表等数据。每一待处理数据对应的任务状态是用于反映每一待处理数据被系统处理的状态,任务状态具体包括未处理状态、处理中状态、处理成功状态和处理失败状态这几种状态。The data processing queue is a list created based on any target batch task and used to record the task status of each pending data in the target batch task. The data to be processed refers to the smallest unit of data that needs to be processed in the target batch task, which can be understood as a log or a report and other data. The task status corresponding to each data to be processed is used to reflect the status of each data to be processed by the system. The task status specifically includes the unprocessed status, the processing status, the processing success status, and the processing failure status.
具体地,服务器可从非实时任务队列中,依据非实时任务队列的先后顺序,选取需要最先进行处理的非实时任务,确定为目标批量任务,通常情况下,由于系统中的非实时任务很多,系统根据预设的排序规则对非实时任务进行排序,对需要最先处理的数据进行优先处理,以确定非实时任务中数据的先后顺序。再基于目标批量任务中的数据创建数据处理队列,在该数据处理队列中显示每一待处理数据和该待处理数据对应的任务状态,可以理解地,在创建数据处理队列时,每一待处理数据的任务状态为未处理状态,在系统对待处理数据处理时,该任务状态将随着处理过程更新。Specifically, the server can select the non-real-time task that needs to be processed first from the non-real-time task queue according to the order of the non-real-time task queue, and determine it as the target batch task. Normally, there are many non-real-time tasks in the system. , The system sorts non-real-time tasks according to preset sorting rules, and prioritizes the data that needs to be processed first to determine the order of data in the non-real-time tasks. Then create a data processing queue based on the data in the target batch task. In the data processing queue, each data to be processed and the task status corresponding to the data to be processed are displayed. Understandably, when the data processing queue is created, each data processing queue is The task status of the data is unprocessed. When the system processes the data to be processed, the task status will be updated along with the processing process.
S202:从目标空闲时间队列中确定目标空闲信息,目标空闲信息包括起始时间、空闲时长和预估线程数。S202: Determine target idle information from the target idle time queue, where the target idle information includes a start time, an idle duration, and an estimated number of threads.
其中,目标空闲时间队列是指根据系统处理历史处理数据的时间预测系统每天的空闲时间的队列。历史处理数据是指历史记录中,调用系统资源对历史实时任务进行处理的信息。历史实时任务是指系统当前时间之前的实时任务。历史实时任务是强实时的,要求马上回应的任务,如用户登录、用户查询和用户执行的其他操作。当服务器除了处理实时任务的线程以外还有剩余的线程,则认为系统处于空闲时间。该目标空闲时间队列中包含至少一个原始空闲信息,该原始空闲信息是指每天中空闲时长大于预设时长阈值的空闲时间对应的信息。每一原始空闲信息对应一起始时间、原始空闲时长和预估线程数。原始空闲时长是指对应的原始空闲信息中的起始时间和结束时间之间的差值。例如,服务器在每天上午7:00-7:15为其一原始空闲信息,则该原始空闲信息的起始时间为7:00,而空闲时长为15分钟。预估线程数是预测出来的在空闲时长内可处理非实时任务的线程数。一般来说,预估线程数N P可以由CPU的总线程数N,减去在空闲时长预测需处理实时任务的实时线程数N1和预留的应急线程数N2计算获取,N P=N-N1-N2。目标空闲信息是指从目标空闲时间队列中选取的起始时间最接近系统当前时间的原始空闲信息。 Among them, the target idle time queue refers to a queue that predicts the daily idle time of the system according to the time the system processes historical processing data. Historical processing data refers to the information in the historical record that calls system resources to process historical real-time tasks. The historical real-time task refers to the real-time task before the current time of the system. Historical real-time tasks are strong real-time tasks that require immediate response, such as user login, user query, and other operations performed by the user. When the server has remaining threads in addition to the threads processing real-time tasks, the system is considered to be in idle time. The target idle time queue contains at least one piece of original idle information, and the original idle information refers to information corresponding to an idle time whose idle duration is greater than a preset duration threshold in each day. Each original idle information corresponds to a start time, original idle duration and estimated number of threads. The original idle duration refers to the difference between the start time and the end time in the corresponding original idle information. For example, if the server is its original idle information from 7:00 to 7:15 every morning, the starting time of the original idle information is 7:00, and the idle time is 15 minutes. The estimated number of threads is the predicted number of threads that can handle non-real-time tasks during idle time. Generally speaking, the estimated number of threads N P can be calculated from the total number of threads N of the CPU, minus the number of real-time threads predicted to process real-time tasks N1 and the number of reserved emergency threads N2 during idle time, N P = N- N1-N2. The target idle information refers to the original idle information selected from the target idle time queue whose starting time is closest to the current time of the system.
具体地,服务器基于大数据建模方式,对系统历史实时任务的处理时间进行分析,预测出来服务器每天的原始空闲时间的队列,根据系统当前时间,从预先预测的目标空闲时间队列中,选取在系统当前时间之后,且与系统当前时间最接近的原始空闲信息作为目标空闲信息,以快速确定可处理该待处理数据的目标空闲信息,提高了目标空闲信息的确定效率,以便在目标空闲信息对应的空闲时间内进行数据批量处理,加快了系统处理批量数据的进度,确保后续可对待处理数据进行处理,以达到利用空闲时间对待处理数据进行批量处理的目的。例如,在预测的空闲时间队列中包含三个原始空闲信息,其起始时间分别为6:00,7:00和8:00,若系统当前时间为7:20,则此时的目标空闲信息为起始时间为8:00对应的原始空闲信息。Specifically, the server analyzes the processing time of the historical real-time tasks of the system based on the big data modeling method, and predicts the original idle time queue of the server every day. According to the current time of the system, the target idle time queue is selected from the pre-predicted target idle time queue. After the current time of the system, the original idle information closest to the current time of the system is used as the target idle information to quickly determine the target idle information that can process the to-be-processed data, which improves the efficiency of determining the target idle information so that the target idle information corresponds to Data batch processing is performed in the free time of the system, which speeds up the progress of the system in processing batch data, and ensures that the data to be processed can be processed later, so as to achieve the purpose of using the free time to perform batch processing of the data to be processed. For example, there are three original idle information in the predicted idle time queue, and their start times are 6:00, 7:00, and 8:00 respectively. If the current time of the system is 7:20, the target idle information at this time It is the original idle information corresponding to the start time of 8:00.
S203:获取待处理数据对应的待处理数据数量,基于待处理数据数量、目标空闲时长和预估线程数,获取目标数量,在数据处理队列中选取与目标数量相对应的待处理数据,确定为切分处理数据。S203: Obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the data to be processed corresponding to the target number in the data processing queue, and determine it as Split processing data.
其中,目标数量是指在目标空闲信息对应的目标空闲时长内,系统调用预估线程数对应的线程能够处理待处理数据的数量。切分处理数据是指在目标空闲信息对应的目标空闲时长内,系统需要处理的待处理数据。该切分处理数据是根据系统实际处理数据的情况确定的,系统每一次处理的切分处理数据的目标数量由目标空闲时长和预估线程数确定。Among them, the target number refers to the number of data to be processed that can be processed by the thread corresponding to the estimated number of threads that the system calls within the target idle time period corresponding to the target idle information. Splitting the processed data refers to the to-be-processed data that the system needs to process within the target idle time period corresponding to the target idle information. The segmentation processing data is determined according to the actual data processing situation of the system, and the target number of segmentation processing data processed by the system each time is determined by the target idle time and the estimated number of threads.
具体地,服务器依据预先设置的数据切分规则,对数据处理队列中的待处理数据进行时间或类型的切分,以获取在目标空闲信息对应的目标空闲时长进行处理的批量数据的数量(即目标数量),再从数据处理队列中选取目标数量对应的待处理数据,确定为切分处理数据,以客观分配在目标空闲信息对应的空闲时间内可处理的批量数据的数量,既可确保实时任务正常处理,同时在空闲时间内可完成对目标数量的切分处理数据进行处理。具体地,对待处理数据进行时间切分可以是根据待处理数据需要进行处理的先后顺序选取目标数量的待处理数据或根据待处理数据的类型(即分析类型、日志类型和报表类型等)选取目标数量的待处理数据。Specifically, the server performs time or type segmentation on the data to be processed in the data processing queue according to a preset data segmentation rule to obtain the number of batch data processed in the target idle time corresponding to the target idle information (ie Target number), and then select the data to be processed corresponding to the target number from the data processing queue, and determine it as segmentation processing data, so as to objectively allocate the number of batch data that can be processed in the free time corresponding to the target free information, which can ensure real-time Tasks are processed normally, and the target number of segmentation processing data can be processed in idle time. Specifically, time segmentation of the data to be processed can be to select the target number of data to be processed according to the order in which the data to be processed needs to be processed or to select the target according to the type of data to be processed (ie analysis type, log type, report type, etc.) The amount of data to be processed.
S204:在系统当前时间为起始时间时,获取系统原始负载。S204: When the current time of the system is the start time, obtain the original load of the system.
其中,系统原始负载是指在系统当前时间时,系统处理实时任务时占用系统资源(实时线程)的多少。例如,在起始时间为8:00时,若系统此时处理实时任务的实时线程数N1和预留的应急线程数N2,则系统原始负载为N1+N2。在系统当前时间为起始时间时,获取系统原始负载,以判断系统除了处理实时任务外还能否处理非实时任务对应的切分处理数据。Among them, the original system load refers to the amount of system resources (real-time threads) occupied by the system when processing real-time tasks at the current time of the system. For example, at the start time of 8:00, if the system has the number of real-time threads N1 for processing real-time tasks and the number of reserved emergency threads N2, the system's original load is N1+N2. When the current time of the system is the starting time, the original load of the system is obtained to determine whether the system can process the segmentation processing data corresponding to non-real-time tasks in addition to real-time tasks.
S205:若系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与预估线程数相对应的目标处理线程,在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理,获取数据处理结果。S205: If the original system load is less than the busy load threshold, the system is determined to be in an idle state, the target processing thread corresponding to the estimated number of threads is obtained, and the target processing thread is used to perform data processing on the segmentation processing data within the target idle time, and obtain Data processing results.
其中,忙碌负载阈值是预先设置的用于评估系统是否处于忙碌状态时负载的阈值。系统处于空闲状态和忙碌状态是相对的,系统上设置有忙碌负载阈值,假设服务器最大负载为M,则实时任务的处理所占用的负载在M*50%(M*50%为忙碌负载阈值)以下时,则认定系统处于空闲状态。反之,若实时任务的处理所占用的负载在M*50%以上(大于或等于),则认定系统处于忙碌状态。目标处理线程是指对切分处理数据进行处理的线程,该目标处理线程为专用于处理非实时任务的线程。数据处理结果是指对系统对切分处理数据进行处理所得到的结果。Wherein, the busy load threshold is a preset threshold used to evaluate whether the system is in a busy state and load. The system is in an idle state and a busy state. A busy load threshold is set on the system. Assuming that the maximum server load is M, the load occupied by the processing of real-time tasks is M*50% (M*50% is the busy load threshold) Below, the system is considered to be idle. Conversely, if the load occupied by the processing of the real-time task is more than M*50% (greater than or equal to), the system is deemed to be in a busy state. The target processing thread refers to a thread that processes the segmentation processing data, and the target processing thread is a thread dedicated to processing non-real-time tasks. The data processing result refers to the result of the system processing the segmentation processing data.
具体地,在系统当前时间为目标空闲信息的起始时间时,检测系统当前负载,若该系统当前负载小于忙碌负载阈值,则认定系统处于空闲状态,调用系统预先分配的与预估线程数相对应的目标处理线程,对在目标空闲信息对应的目标空闲时长内对所获取的切分处理数据进行处理,在系统处于空闲时对切分处理数据进行处理,以合理分配系统资源,既避免系统处理非实时任务的批量数据时占用系统太多资源,而导致处理实时任务的时间长且处理速度缓慢,又避免在系统存在空闲时没有及时处理非实时任务对应的待处理数据,导致系统资源的浪费。可以理解地,通过预先分配与预估线程数相对应的目标处理线程对目标数量的非实时任务中的切分处理数据进行处理,可减少线程之间通信的损耗,使得系统性能损耗较小,即通过分配专用于处理非实时任务的目标处理线程,采用该目标处理线程对切分处理数据进行处理,使得该目标处理线程无需在处理实时任务和非实时任务对应的数据之间进行切换,减少线程之间通信的损耗,从而使得系统性能损耗较小,又达到合理利用系统资源的目的。Specifically, when the current time of the system is the start time of the target idle information, the current load of the system is detected. If the current load of the system is less than the busy load threshold, the system is considered to be in an idle state, and the number of threads pre-allocated by the system is called The corresponding target processing thread processes the segmentation processing data obtained during the target idle time corresponding to the target idle information, and processes the segmentation processing data when the system is idle, so as to allocate system resources reasonably, which avoids the system Processing batch data of non-real-time tasks occupies too many system resources, which leads to long processing time and slow processing speed of real-time tasks, and avoids failure to process the pending data corresponding to non-real-time tasks in time when the system is idle, resulting in system resources waste. It is understandable that by pre-allocating the target processing thread corresponding to the estimated number of threads to process the segmentation processing data in the target number of non-real-time tasks, the communication loss between threads can be reduced, and the system performance loss can be reduced. That is, by assigning a target processing thread dedicated to processing non-real-time tasks, and using the target processing thread to process the segmentation processing data, the target processing thread does not need to switch between processing real-time tasks and data corresponding to non-real-time tasks, reducing The loss of communication between threads makes the loss of system performance smaller and achieves the purpose of rational use of system resources.
S206:基于数据处理结果,更新数据处理队列中每一切分处理数据的任务状态。S206: Based on the data processing result, update the task status of each sub-process data in the data processing queue.
具体地,采用目标处理线程对切分处理数据进行处理后,由于每一切分处理数据的数据处理结果可能为成功,也可能为失败,在系统完成对切分处理数据处理后,需要对处理后的任务状态更新,此时,若处理失败,该任务状态更新为处理失败状态,系统根据该处理失败状态将对应的切分处理数据,以便后续对任务状态为处理失败状态的待处理数据进行审核或者校验等处理,以确定处理失败原因;若处理成功时,该任务状态更新为处理成功状态,则系统将该对应的切分处理数据从数据处理队列中除去,不再进行后续处理。Specifically, after the target processing thread is used to process the segmentation processing data, since the data processing result of each segmentation processing data may be a success or a failure, after the system completes the segmentation processing data processing, the processed data needs to be processed. The task status is updated. At this time, if the processing fails, the task status is updated to the processing failure status, and the system divides the corresponding processing data according to the processing failure status, so that the pending data of the task status as the processing failure status can be subsequently reviewed Or check and other processing to determine the reason for the processing failure; if the task status is updated to the processing success status when the processing is successful, the system removes the corresponding segmentation processing data from the data processing queue, and no further processing is performed.
进一步地,每条待处理数据的数据处理结果都会同步到系统,由系统记录并制定下次执行规则,同时保证每条待处理数据都处理且成功。本实施例中,系统还预先设置失败次数阈值,若任一待处理数据的处理失败次数达到该失败次数阈值,则触发相应的提醒机制,通过该提醒机制将对应的待处理数据和处理过程记录的日志信息一并发送给审核终端,以使审核终端的审核人员进行相应的处理。该失败次数阈值是预先设置的用于限定对每一待处理数据可以重复处理的次数,如该失败次数阈值可以设置为三次。例如,一待处理数据连续处理三次的数据处理结果均为处理失败,系统会自动发送提醒邮件给审核终端,以使审核终端的审核人员查看该待处理数据,从而实现对待处理数据的处理过程进行监控。Further, the data processing result of each piece of data to be processed will be synchronized to the system, and the system will record and formulate the next execution rule, while ensuring that each piece of data to be processed is processed and successfully. In this embodiment, the system also presets a failure count threshold. If the processing failure count of any data to be processed reaches the failure count threshold, the corresponding reminder mechanism is triggered, and the corresponding pending data and processing process are recorded through the reminder mechanism The log information is sent to the audit terminal together, so that the audit staff of the audit terminal can deal with it accordingly. The threshold of the number of failures is preset to limit the number of times that each data to be processed can be processed repeatedly. For example, the threshold of the number of failures can be set to three. For example, if the data processing results of three consecutive processing of data to be processed are all processing failures, the system will automatically send a reminder email to the audit terminal, so that the auditor of the audit terminal can view the data to be processed, so as to realize the processing of the data to be processed. monitor.
可以理解地,在基于目标批量任务创建数据处理队列时,将数据处理队列中所有待处理数据的任务状态初始化为未处理状态。在从数据处理队列中选取目标数量的待处理数据,确定为切分处理数据时,将数据处理队列中目标数量的切分处理数据的任务状态更新为处理中状态。在空闲时长内对切分处理数据进行数据处理,获取数据处理结果,该数据处理结果有处理成功和处理失败两种情况,此时,可基于数据处理结果更新切分处理数据对应的任务状态为处理成功状态或者处理失败状态,从而实时对数据处理队列中的待处理数据的任务状态进行更新。Understandably, when the data processing queue is created based on the target batch task, the task state of all the data to be processed in the data processing queue is initialized to the unprocessed state. When a target amount of to-be-processed data is selected from the data processing queue and determined to be split processing data, the task status of the target amount of split processing data in the data processing queue is updated to the processing state. Perform data processing on the segmentation processing data during the idle time to obtain the data processing results. The data processing results can be processed successfully or failed. At this time, the task status corresponding to the segmentation processing data can be updated based on the data processing results. Processing success status or processing failure status, so as to update the task status of the data to be processed in the data processing queue in real time.
S207:若系统原始负载不小于忙碌负载阈值,则认定系统处于忙碌状态,重复执行从 目标空闲时间队列中确定目标空闲信息。S207: If the original load of the system is not less than the busy load threshold, the system is determined to be in a busy state, and the target idle information is determined from the target idle time queue repeatedly.
具体地,在系统当前时间为目标空闲信息的起始时间时,检测系统当前负载,若该系统当前负载不小于忙碌负载阈值,则认定系统处于忙碌状态,为避免处理非实时任务占用系统资源,影响系统处理实时任务的速度,需重新确定一目标空闲信息,即重复执行从目标空闲时间队列中确定目标空闲信息,以将距离系统当前时间最近的下一个原始空闲信息确定为目标空闲信息,从而实现对系统资源的合理分配,并保证实时任务的处理速度。Specifically, when the current time of the system is the start time of the target idle information, the current load of the system is detected. If the current load of the system is not less than the busy load threshold, the system is determined to be in a busy state. In order to avoid processing non-real-time tasks from occupying system resources, Affect the speed of the system processing real-time tasks, it is necessary to re-determine a target idle information, that is, repeat the determination of the target idle information from the target idle time queue, and determine the next original idle information closest to the current time of the system as the target idle information. Realize the reasonable allocation of system resources and ensure the processing speed of real-time tasks.
本实施例所提供的批量数据处理方法中,从非实时任务队列中选取目标批量任务,基于目标批量任务创建数据处理队列,以合理分配非实时任务中数据的处理顺序。从目标空闲时间队列中确定目标空闲信息,提高了目标空闲信息的确定效率,加快了系统处理批量数据的速度。根据目标空闲时长和预估线程数,确定目标数量对应的切分处理数据,以客观分配系统当前空闲时可处理的数据数量,从而减少单次处理数据的量,在确保实时任务可正常处理的前提下,可在空闲时间内可完成对切分处理数据进行处理,以实现根据系统资源合理分配每一空闲时间可处理的切分处理数据。当系统当前时间为起始时间,且系统原始负载小于忙碌负载阈值的情况下,获取与预估线程数相对应的目标处理线程,在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理,以合理分配系统资源,减少线程之间通信的损耗,使得系统性能损耗较小,获取数据处理结果。基于数据处理结果,更新数据处理队列中每一切分处理数据的任务状态,以确保对全部数据成功处理。若系统原始负载不小于忙碌负载阈值,则认定系统处于忙碌状态,重复执行从目标空闲时间队列中确定目标空闲信息,以确保成功处理待处理数据。In the batch data processing method provided in this embodiment, a target batch task is selected from a non-real-time task queue, and a data processing queue is created based on the target batch task to reasonably allocate the processing sequence of data in the non-real-time task. The target idle information is determined from the target idle time queue, which improves the efficiency of determining target idle information and speeds up the system to process batch data. According to the target idle time and the estimated number of threads, determine the segmentation processing data corresponding to the target number to objectively allocate the amount of data that can be processed when the system is currently idle, thereby reducing the amount of single processing data, and ensuring that real-time tasks can be processed normally Under the premise, the segmentation processing data can be processed in the free time, so as to realize the reasonable allocation of the segmentation processing data that can be processed in each free time according to the system resources. When the current time of the system is the starting time and the original load of the system is less than the busy load threshold, obtain the target processing thread corresponding to the estimated number of threads, and use the target processing thread to perform data segmentation processing within the target idle time Processing to reasonably allocate system resources and reduce the loss of communication between threads, so that the loss of system performance is small, and data processing results can be obtained. Based on the data processing result, update the task status of each sub-processing data in the data processing queue to ensure the successful processing of all data. If the original load of the system is not less than the busy load threshold, it is determined that the system is in a busy state, and the target idle information is determined from the target idle time queue repeatedly to ensure successful processing of the pending data.
在一实施例中,如图3所示,在步骤S201之前,即在从非实时任务队列中选取目标批量任务,基于目标批量任务创建数据处理队列之前,批量数据处理方法还包括:In one embodiment, as shown in FIG. 3, before step S201, that is, before selecting a target batch task from a non-real-time task queue and creating a data processing queue based on the target batch task, the batch data processing method further includes:
S301:获取任务处理请求,任务处理请求包括待处理任务和与待处理任务相对应的任务标识。S301: Obtain a task processing request, where the task processing request includes a task to be processed and a task identifier corresponding to the task to be processed.
其中,任务处理请求是指对系统中所有未处理的任务进行处理的请求。本实施例中,待处理任务包括实时任务和非实时任务。任务标识是指预先对每一待处理任务进行标注的标识,具体包括实时标识和非实时标识,实时标识和非实时标识由系统管理员指定,以最简单的数字标识为例,1-代表实时标识,2-9代表非实时标识。其中,实时标识是指表示待处理任务为实时任务的标识,即若一任务处理请求中的任务标识为实时标识,则其对应的待处理任务为实时任务。非实时标识是指表示待处理任务为非实时任务的标识,即若一任务处理请求中的任务标识为非实时标识,则其对应的待处理任务为非实时任务。可以理解地,实时标识与非实时标识是相对的,携带有实时标识的待处理任务为实时任务,需要即时进行处理和响应;携带有非实时标识的待处理任务为非实时任务,可以在系统空闲时处理,以合理安排待处理任务的处理时间,并保证实时任务的处理速度。Among them, the task processing request refers to a request for processing all unprocessed tasks in the system. In this embodiment, the tasks to be processed include real-time tasks and non-real-time tasks. Task identification refers to the identification of each task to be processed in advance, including real-time identification and non-real-time identification. Real-time identification and non-real-time identification are designated by the system administrator. Take the simplest digital identification as an example, 1- represents real-time Identification, 2-9 represents non-real-time identification. Among them, the real-time identification refers to the identification that indicates that the task to be processed is a real-time task, that is, if the task identification in a task processing request is a real-time identification, the corresponding task to be processed is a real-time task. The non-real-time identification refers to an identification indicating that the task to be processed is a non-real-time task, that is, if the task identification in a task processing request is a non-real-time identification, the corresponding task to be processed is a non-real-time task. Understandably, real-time identification and non-real-time identification are relative. Tasks to be processed with real-time identification are real-time tasks and require immediate processing and response; tasks to be processed with non-real-time identification are non-real-time tasks and can be Processing when idle, in order to reasonably arrange the processing time of pending tasks and ensure the processing speed of real-time tasks.
具体地,系统获取到任务处理请求时,查询数据库中的待处理任务,每一待处理任务对应一任务标识,根据携带的任务标识对待处理任务进行相应处理。例如,该待处理任务可能为登录、找回密码、日志处理和报表处理等任务,系统将登录和找回密码标注等需要实时处理的任务标识标注上实时标识,系统将日志处理和报表处理等可以延时处理的任务标识标注上标注为非实时标识。Specifically, when the system obtains the task processing request, it queries the task to be processed in the database, and each task to be processed corresponds to a task ID, and the task to be processed is processed correspondingly according to the carried task ID. For example, the task to be processed may be login, password retrieval, log processing, and report processing. The system will mark login and password retrieval and other tasks that require real-time processing with real-time identification. The system will process log and report processing, etc. Tasks that can be processed with a delay are marked as non-real-time identifiers.
S302:若任务标识为实时标识,则执行待处理任务。S302: If the task identifier is a real-time identifier, execute the task to be processed.
具体地,服务器获取到任务标识为实时标识,说明该待处理任务为实时任务,需要立刻进行处理,则调用系统资源对该待处理任务进行处理,以确保实时标识对应的待处理任务能够即时被响应处理,提高实时任务的处理效率。Specifically, the server obtains the task ID as a real-time ID, indicating that the task to be processed is a real-time task and needs to be processed immediately, then the system resource is called to process the task to be processed to ensure that the task to be processed corresponding to the real-time ID can be instantly processed. Responsive processing, improving the processing efficiency of real-time tasks.
S303:若任务标识为非实时标识,则获取任务处理请求中的任务类型,基于任务类型确定待处理任务的任务优先级,依据任务优先级的顺序,将待处理任务存储在非实时任务队列中。S303: If the task identifier is a non-real-time identifier, obtain the task type in the task processing request, determine the task priority of the task to be processed based on the task type, and store the task to be processed in the non-real-time task queue according to the order of task priority .
其中,任务类型是指待处理任务对应的用于进行数据处理的类型。例如,待处理任务的类型可以是日志处理类型、报表处理类型和分析处理类型等任务类型。任务优先级是指在系统处理多个待处理任务时,决定各个待处理任务的接受处理的先后顺序的参数。例如,2-9代表非实时标识,且2-9中数字越小表示的优先级越高,越需要优先处理。一般来说,待处理任务的任务优先级由处理类型决定,根据优先级由低到高的顺序将待处理任务存储在非实时任务队列中。Among them, the task type refers to the type of data processing corresponding to the task to be processed. For example, the types of tasks to be processed may be log processing types, report processing types, and analysis processing types. Task priority refers to a parameter that determines the order of processing for each task to be processed when the system processes multiple tasks to be processed. For example, 2-9 represents non-real-time identification, and the smaller the number in 2-9, the higher the priority, and the more priority processing is required. Generally speaking, the task priority of the task to be processed is determined by the processing type, and the task to be processed is stored in the non-real-time task queue according to the order of priority from low to high.
具体地,在任务处理请求中的任务标识为非实时任务时,服务器根据待处理任务中的任务类型,确定该任务类型对应的任务优先级,将任务优先级高的待处理任务存储在非实时任务队列的靠前位置,以确定不同任务类型对应的任务优先级的顺序,并使任务优先级在先的待处理任务优先处理,将待处理任务存储在非实时任务队列中,从而合理安排非实时任务的处理顺序。进一步地,在非实时任务队列中,可先依据待处理任务的任务类型对应的任务优先级确定不同任务类型之间的处理顺序,并对同一种任务类型的待处理任务依据队列的先进先出原则进行排序。Specifically, when the task in the task processing request is identified as a non-real-time task, the server determines the task priority corresponding to the task type according to the task type in the task to be processed, and stores the task to be processed with a higher priority in the non-real-time task. The front position of the task queue to determine the order of task priority corresponding to different task types, and to make the pending tasks with the priority of the task priority processed, store the pending tasks in the non-real-time task queue, so as to arrange non-real-time tasks reasonably The processing sequence of real-time tasks. Further, in the non-real-time task queue, the processing order between different task types can be determined according to the task priority corresponding to the task type of the task to be processed, and the task to be processed of the same task type is based on the first in first out of the queue. Principles are sorted.
本实施例所提供的批量数据处理方法中,服务器通过获取待处理任务和与待处理任务相对应的任务标识,以合理安排待处理任务的处理时间并保证实时任务的处理速度。在任务标识为实时标识时,即时执行待处理任务,以保证携带实时标识的待处理任务的处理速度和响应时间。在任务标识为非实时标识时,则获取任务处理请求中的任务类型,以确定待处理任务的任务优先级,依据任务优先级的顺序,将待处理任务存储在非实时任务队列中,从而合理安排非实时任务的处理顺序,确保任务优先级在先的待处理任务先处理。In the batch data processing method provided in this embodiment, the server obtains the task to be processed and the task identifier corresponding to the task to be processed to reasonably arrange the processing time of the task to be processed and ensure the processing speed of the real-time task. When the task identifier is a real-time identifier, the task to be processed is executed immediately to ensure the processing speed and response time of the task to be processed with the real-time identifier. When the task identifier is a non-real-time identifier, the task type in the task processing request is obtained to determine the task priority of the task to be processed. According to the order of task priority, the task to be processed is stored in the non-real-time task queue, which is reasonable Arrange the processing order of non-real-time tasks to ensure that the pending tasks with the first priority are processed first.
在一实施例中,如图4所示,在步骤S202之前,即在从目标空闲时间队列中确定目标空闲信息之前,批量数据处理方法还包括:In an embodiment, as shown in FIG. 4, before step S202, that is, before determining target idle information from the target idle time queue, the batch data processing method further includes:
S401:获取历史处理数据,历史处理数据包括历史处理时间、历史处理数量和历史线程数。S401: Obtain historical processing data, which includes historical processing time, historical processing quantity, and historical thread number.
其中,历史处理数据是指历史记录中,调用系统资源对历史实时任务进行处理的信息。历史实时任务是指系统当前时间之前的实时任务。历史处理时间是指历史记录中,服务器处理历史实时任务从开始时间到结束时间之间的之间形成的时间区间。历史线程数是指历史记录中,处理历史实时任务时调用的线程数量。具体地,服务器通过获取大量的历史处理数据,以便对这些历史处理数据对应的数据进行分析,以分析历史处理数据中存在的客观规律,确定每天系统资源的当前情况,以便后续合理分配系统处理实时任务和非实时任务的时间和资源。Among them, historical processing data refers to the information in the historical record that calls system resources to process historical real-time tasks. The historical real-time task refers to the real-time task before the current time of the system. The historical processing time refers to the time interval formed between the start time and the end time of the server processing historical real-time tasks in the historical record. The number of historical threads refers to the number of threads called when processing historical real-time tasks in historical records. Specifically, the server obtains a large amount of historical processing data in order to analyze the data corresponding to these historical processing data, to analyze the objective laws existing in the historical processing data, and determine the current situation of daily system resources, so that the subsequent reasonable allocation of system processing real-time Time and resources for tasks and non-real-time tasks.
S402:基于机器学习算法对历史处理时间、历史处理数量和历史线程数进行大数据建模,获取原始空闲时间队列,原始空闲时间队列包括至少一个原始空闲信息,每一原始空闲信息包括起始时间、原始空闲时长和预估线程数。S402: Perform big data modeling on historical processing time, historical processing number, and historical thread number based on machine learning algorithms, and obtain an original idle time queue. The original idle time queue includes at least one original idle information, and each original idle information includes a start time , Original idle time and estimated number of threads.
其中,原始空闲时间队列是指根据系统处理历史处理数据的时间预测系统每天的空闲时间的队列。该原始空闲信息是指每天中空闲时长大于预设时长阈值的空闲时间对应的信息。Among them, the original idle time queue refers to a queue that predicts the daily idle time of the system according to the time the system processes historical processing data. The original idle information refers to the information corresponding to the idle time in which the idle time is greater than the preset time threshold in each day.
本实施例中,历史处理时间、历史处理数量和历史线程数是系统周期性处理数据的信息,采用机器学习算法对历史处理时间、历史处理数量和历史线程数进行大数据建模,以确定系统每天任意时刻的历史处理线程和空闲状态,从而实现对某一空闲区间的起始时间、空闲时长和需要进行实时处理的实时线程数,并计算在该空闲时长内可处理非实时任务的预估线程数,采用机器学习算法以使获取的原始空闲时间队列具有客观性和准确性。例如,系统将历史处理数据分为一周一个周期,采用机器学习算法对每周一、周二……周日中每天的历史空闲信息进行分析,以获取原始空闲时间队列,从中快速获取具有规律性的原始空闲时间队列。该机器学习算法包括但不限于逻辑回归算法和LSTM神经网络算法。In this embodiment, historical processing time, historical processing quantity, and historical thread number are information about the system periodically processing data. Machine learning algorithms are used to perform big data modeling on historical processing time, historical processing quantity, and historical thread number to determine the system Historical processing threads and idle states at any time of the day, so as to realize the starting time, idle time and the number of real-time threads that need to be processed in a certain idle interval, and calculate the estimation of the non-real-time tasks that can be processed during the idle time The number of threads, using machine learning algorithms to make the obtained original idle time queue objective and accurate. For example, the system divides historical processing data into weekly cycles, and uses machine learning algorithms to analyze historical idle information every Monday, Tuesday...Sunday, to obtain the original idle time queue, and quickly obtain regular original idle time. queue. The machine learning algorithm includes but is not limited to logistic regression algorithm and LSTM neural network algorithm.
S403:若原始空闲时长大于第一时长阈值,则将原始空闲信息存储在目标空闲时间队 列上。S403: If the original idle time is greater than the first time threshold, store the original idle information on the target idle time queue.
其中,第一时长阈值是预先设置的用于评估系统的空闲时长是否达到认定为空闲时间的时长的阈值。第一时长阈值的设置,可排除原始空闲时长较短的情况,确保存储在目标空闲时间队列中的每一原始空闲时间对应的原始空闲时长内,能够处理较多的非实时任务对应的待处理数据,有助于减少后续线程之间通信的损耗,以达到合理分配系统资源的目的。在本实施例中,在原始空闲时长大于第一时长阈值时,将对应的原始空闲信息存储在目标空闲时间队列上,即将每一原始空闲时长大于第一时长阈值的原始空闲信息集合在一起构建目标空闲时间队列。Wherein, the first duration threshold is a preset threshold used to evaluate whether the idle duration of the system reaches the duration identified as idle time. The setting of the first duration threshold can eliminate the situation that the original idle time is short, and ensure that each original idle time stored in the target idle time queue is within the original idle time corresponding to more non-real-time tasks to be processed. Data helps to reduce the loss of communication between subsequent threads to achieve the purpose of reasonable allocation of system resources. In this embodiment, when the original idle time is greater than the first time threshold, the corresponding original idle information is stored in the target idle time queue, that is, the original idle information whose original idle time is greater than the first time threshold is assembled together to construct Target idle time queue.
本实施例所提供的批量数据处理方法中,基于机器学习算法对历史处理时间、历史处理数量和历史线程数进行大数据建模,以使获得的原始空闲时间队列具有客观性。在原始空闲时长大于第一时长阈值时,则将原始空闲信息确定为目标空闲信息,确保存储在目标空闲时间队列中的每一原始空闲时间对应的原始空闲时长内,能够处理较多的非实时任务对应的待处理数据,有助于减少后续线程之间通信的损耗,以达到合理分配系统资源的目的。In the batch data processing method provided in this embodiment, the historical processing time, the historical processing number, and the historical thread number are based on a machine learning algorithm to perform big data modeling to make the obtained original idle time queue objective. When the original idle time is greater than the first time threshold, the original idle information is determined as the target idle information to ensure that the original idle time corresponding to each original idle time stored in the target idle time queue can handle more non-real time The to-be-processed data corresponding to the task helps to reduce the loss of communication between subsequent threads to achieve the purpose of rationally allocating system resources.
在一实施例中,如图5所示,步骤S203中,基于待处理数据数量、目标空闲时长和预估线程数,获取目标数量,在数据处理队列中选取与目标数量相对应的待处理数据,确定为切分处理数据,包括:In one embodiment, as shown in FIG. 5, in step S203, the target number is acquired based on the number of data to be processed, the target idle time and the estimated number of threads, and the data to be processed corresponding to the target number is selected in the data processing queue , Determined as segmentation processing data, including:
S501:采用预估时间计算公式对待处理数据数量和预估线程数进行计算,获取数据处理队列对应的预估处理时间。S501: Use an estimated time calculation formula to calculate the number of data to be processed and the estimated number of threads, and obtain an estimated processing time corresponding to the data processing queue.
其中,预估时间计算公式指用于计算系统处理对数据处理队列中的待处理数据所需的时间的公式。预估处理时间是指系统处理数据处理队列中所有待处理数据所需要的时间。具体地,预估时间计算公式为
Figure PCTCN2019102672-appb-000001
T1为依据数据处理队列的预估处理时间,S为待处理数据数量,N p为在空闲时间可处理的预估线程数,x为单位时间每一线程的数据处理量。根据预估时间计算公式以快速计算得到系统处理数据处理队列中所有待处理数据的预估处理时间。
Among them, the estimated time calculation formula refers to a formula used to calculate the time required for the system to process the data to be processed in the data processing queue. Estimated processing time refers to the time required for the system to process all the data to be processed in the data processing queue. Specifically, the estimated time calculation formula is
Figure PCTCN2019102672-appb-000001
T1 is the estimated processing time based on the data processing queue, S is the number of data to be processed, N p is the estimated number of threads that can be processed in idle time, and x is the data processing volume of each thread per unit time. According to the estimated time calculation formula, the estimated processing time of all data to be processed in the system processing data processing queue can be quickly calculated.
S502:采用目标数量计算公式对预估处理时间和目标空闲时长进行计算,获取目标数量。S502: Use the target quantity calculation formula to calculate the estimated processing time and the target idle time to obtain the target quantity.
其中,目标数量获取公式是用于计算在目标空闲时长内,系统能够处理待处理数据的数量的公式。目标数量计算公式为
Figure PCTCN2019102672-appb-000002
其中,X为目标数量,S为待处理数据数量,T1为依据数据处理队列的预估处理时间,T2为目标空闲时长。根据目标数量计算公式以快速获取在目标空闲时长内系统处理数据的数量。例如,已知某一数据处理队列的数据量为1000条,基于1000条待处理数据预估所需处理时间为预估处理时间T1。已知历史服务器运行情况,采用机器学习算法评估未来某个时间点T(即开始时间)开始服务器有空闲,在不影响实时任务处理的情况下,可处理批量数据,空闲时长为T2,则依据目标数量获取公式计算出其目标数量为X=1000*T2/T1条数据,同时设定T时刻开始处理。
Among them, the target quantity obtaining formula is a formula used to calculate the quantity of data to be processed by the system within the target idle time period. The formula for calculating the target quantity is
Figure PCTCN2019102672-appb-000002
Among them, X is the target number, S is the number of data to be processed, T1 is the estimated processing time based on the data processing queue, and T2 is the target idle time. According to the target number calculation formula, the number of data processed by the system in the target idle time can be quickly obtained. For example, it is known that the amount of data in a certain data processing queue is 1000, and the estimated processing time based on 1000 to-be-processed data is the estimated processing time T1. Knowing the operating conditions of the historical server, using a machine learning algorithm to evaluate that the server is idle at a certain time point in the future (that is, the start time), it can process batch data without affecting the real-time task processing, and the idle time is T2. The target quantity acquisition formula calculates that the target quantity is X=1000*T2/T1 pieces of data, and at the same time, it is set to start processing at time T.
S503:按预设筛选规则,在数据处理队列中选取与目标数量相对应的待处理数据,确定为切分处理数据。S503: According to a preset filtering rule, select the data to be processed corresponding to the target quantity in the data processing queue, and determine it as segmentation processing data.
其中,预设筛选规则是指预先设置的用于选取待处理数据的规则,该预设筛选规则通常按任务优先级由高到低的顺序筛选待处理数据,使得数据处理顺序具有先后性,以保证任务优先级在先的数据优先处理。可以理解地,在任务优先级相同的情况下,依据队列的 先进先出原则确定其筛选顺序,从而获取对应的切分处理数据。Among them, the preset filtering rules refer to the preset rules for selecting the data to be processed. The preset filtering rules usually filter the data to be processed in the order of task priority from high to low, so that the data processing sequence is sequential. Ensure that the data with the task priority first is processed first. Understandably, in the case of the same task priority, the queue's first-in-first-out principle is used to determine its screening order, so as to obtain the corresponding segmentation processing data.
具体地,服务器在数据处理队列中根据预设筛选规则(即任务优先级由高到低的顺序)选取数量为目标数量的待处理数据,以确定需要优先处理的任务优先级在先的待处理数据,确定为切分处理数据反馈给系统,以便系统处理。Specifically, the server selects the target amount of data to be processed in the data processing queue according to preset filtering rules (that is, the order of task priority from high to low) to determine the priority of the task to be processed first. The data is determined to be segmented and processed and fed back to the system for processing.
本实施例所提供的批量数据处理方法中,采用预估时间计算公式对待处理数据数量和预估线程数进行计算,然后采用目标数量计算公式对预估处理时间和目标空闲时长进行计算,通过预估时间计算公式和目标数量计算公式可快速获取目标数量,保证了该目标数量的客观性,同时保证后续精确切分处理数据。在数据处理队列中按预设筛选规则选取与目标数量相对应的待处理数据,确定为切分处理数据,以实现对待处理数据的合理分配。In the batch data processing method provided in this embodiment, the estimated time calculation formula is used to calculate the number of data to be processed and the estimated number of threads, and then the target number calculation formula is used to calculate the estimated processing time and target idle time. The estimated time calculation formula and the target quantity calculation formula can quickly obtain the target quantity, ensure the objectivity of the target quantity, and ensure the subsequent accurate segmentation and processing of the data. In the data processing queue, the data to be processed corresponding to the target quantity is selected according to the preset filtering rules, and determined as the segmentation processing data, so as to realize the reasonable distribution of the data to be processed.
进一步,在系统处理数据时,预先设置预设筛选规则,从而确定批量处理的数据的任务优先级,在处理批量数据时,可以控制使用多少目标处理线程进行处理,目标处理线程越多,占用系统资源越多,处理能力越强,对非实时任务对应的待处理数据的处理越快。如果出现特殊情况,如实时任务业务繁忙,则可以暂停批量处理或减少任务优先级在后的待处理数据的批量处理的资源分配,从而达到合理分配系统资源和需要进行处理的切分处理数据。Further, when the system is processing data, preset filtering rules are set in advance to determine the task priority of the batch data. When processing batch data, you can control how many target processing threads are used for processing. The more target processing threads, the more the system The more resources, the stronger the processing power, and the faster the processing of the to-be-processed data corresponding to non-real-time tasks. If a special situation occurs, such as a busy real-time task business, you can suspend batch processing or reduce the resource allocation for batch processing of data to be processed with a lower task priority, so as to achieve a reasonable allocation of system resources and segmentation processing data that needs to be processed.
在一实施例中,如图5所示,在步骤S206之后,即在基于数据处理结果,更新数据处理队列中每一切分处理数据的任务状态之后,批量数据处理方法还包括:In one embodiment, as shown in FIG. 5, after step S206, that is, after updating the task status of processing data for each segment in the data processing queue based on the data processing result, the batch data processing method further includes:
S601:若数据处理队列中每一切分处理数据的任务状态均更新为完成处理状态,则基于系统当前时间、起始时间和目标空闲时长,获取剩余时长。S601: If the task status of each piece of processing data in the data processing queue is updated to the completed processing status, then the remaining time is obtained based on the current time of the system, the start time and the target idle time.
具体地,服务器实时监控数据处理队列中切分处理数据的任务状态,在所有切分处理数据已经处理完成时,获取系统当前时间,若系统当前时间在起始时间起的空闲时长内,则基于系统当前时间和空闲时长,确定剩余时长,即该剩余时长为空闲时长的截止时间与系统当前时长的差值,以便在该剩余时长较长时,继续处理批量数据,充分利用剩余时长对应的空闲时间。例如若起始时间为8:00,空闲时长为30分钟,系统当前时间为8:20,则剩余时长为10分钟。Specifically, the server monitors the task status of data segmentation processing in the data processing queue in real time. When all segmentation processing data has been processed, the current time of the system is obtained. If the current time of the system is within the idle time from the start time, it is based on System current time and idle time, determine the remaining time, that is, the remaining time is the difference between the deadline of idle time and the current time of the system, so that when the remaining time is long, continue to process batch data and make full use of the idle time corresponding to the remaining time time. For example, if the start time is 8:00, the idle time is 30 minutes, and the current system time is 8:20, the remaining time is 10 minutes.
S602:若剩余时长大于第二时长阈值,则将剩余时长更新为目标空闲时长,基于更新后的目标空闲时长和预估线程数,确定对应的可处理数据量。S602: If the remaining duration is greater than the second duration threshold, update the remaining duration to the target idle duration, and determine the corresponding processable data amount based on the updated target idle duration and the estimated number of threads.
其中,第二时长阈值是预先设置,用于判断剩余时长是否足够长的阈值,第二时长阈值可以与第一时长阈值一样,也可以不一样,可以设置为30s或者其他数值。步骤S602中,服务器可采用可处理数据量获取公式确定对应的可处理数据量,该可处理数据量获取公式为K=N p*x*T3,K为剩余时长对应的可处理数据量,N p为为在空闲时间可处理的预估线程数,x为单位时间每一线程的数据处理量,T3为空闲时间。在剩余时长大于第二时长阈值时,服务器将剩余时长更新为目标空闲时长,然后根据更新后的目标空闲时长继续处理待处理数据,以确保不影响实时任务时,进一步批量处理其他待处理数据,加快待处理数据的处理速度。 The second duration threshold is a preset threshold for judging whether the remaining duration is long enough. The second duration threshold may be the same as or different from the first duration threshold, and may be set to 30s or other values. In step S602, the server can determine the corresponding processable data volume by using the processable data volume obtaining formula. The processable data volume obtaining formula is K=N p *x*T3, K is the processable data volume corresponding to the remaining time, N p is the estimated number of threads that can be processed in idle time, x is the data processing volume of each thread per unit time, and T3 is idle time. When the remaining time is greater than the second duration threshold, the server updates the remaining time to the target idle time, and then continues to process the pending data according to the updated target idle time to ensure that real-time tasks are not affected, and other pending data is further processed in batches. Speed up the processing speed of the data to be processed.
具体地,服务器需先判断剩余时长是否大于第二时长阈值,在剩余时长大于第二时长阈值时,则说明目标空闲信息对应的剩余时长较长,可以充分利用系统继续处理批量任务,而不会导致系统过于繁忙,因此,可基于剩余时长确定其对应的可处理数据量,以便在该剩余时长内对可处理数据量相对应的待处理数据进行处理,以提高批量数据的处理效率。Specifically, the server needs to first determine whether the remaining time is greater than the second time threshold. When the remaining time is greater than the second time threshold, it means that the remaining time corresponding to the target idle information is longer, and the system can be fully utilized to continue processing batch tasks without As a result, the system is too busy. Therefore, the corresponding processable data volume can be determined based on the remaining time, so that the to-be-processed data corresponding to the processable data volume can be processed within the remaining time, so as to improve the processing efficiency of batch data.
S603:在数据处理队列中选取与可处理数量相对应的待处理数据,更新为切分处理数据,获取与预估线程数相对应的目标处理线程,重复执行在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理,获取数据处理结果。S603: Select the to-be-processed data corresponding to the number of processes that can be processed in the data processing queue, update to split processing data, obtain the target processing thread corresponding to the estimated number of threads, and use the target processing thread for repeated execution within the target idle time Perform data processing on the segmentation processing data to obtain the data processing result.
具体地,服务器根据获取到的目标空闲时长和预估线程数获取对应可处理数量的待处 理数据,更新为切分处理数据,提高了批量数据的处理效率,加快该目标批量任务中待处理数据的迁移速度,然后采用处理线程对切分处理数据进行数据处理,获取数据处理结果。Specifically, the server obtains the processed data corresponding to the number of processed data according to the acquired target idle time and the estimated number of threads, and updates it to segmentation processing data, which improves the processing efficiency of batch data and speeds up the data to be processed in the target batch task Then, the processing thread is used to perform data processing on the segmentation processing data to obtain the data processing result.
本实施例所提供的批量数据处理方法中,切分处理数据处理完成后,服务器获取剩余时长,在剩余时长大于第二时长阈值时,则将剩余时长更新为目标空闲时长,基于更新后的目标空闲时长和预估线程数,确定对应的可处理数据量,以提高批量数据的处理效率。在数据处理队列中选取与可处理数量相对应的待处理数据,更新为切分处理数据,获取与预估线程数相对应的目标处理线程,以确保不影响系统处理实时任务同时可加快批量数据的处理速度。In the batch data processing method provided in this embodiment, after the segmentation process data processing is completed, the server obtains the remaining time, and when the remaining time is greater than the second time threshold, the remaining time is updated to the target idle time, based on the updated target The idle time and the estimated number of threads determine the corresponding processable data volume to improve the processing efficiency of batch data. Select the to-be-processed data corresponding to the number of processes that can be processed in the data processing queue, update it to split processing data, and obtain the target processing thread corresponding to the estimated number of threads to ensure that the system does not affect the processing of real-time tasks while speeding up batch data The processing speed.
在一实施例中,在步骤S205之后,即在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理之后,批量数据处理方法还包括:实时监控数据处理过程中的系统当前负载,若系统当前负载大于突发负载阈值,则释放目标处理线程,停止对切分处理数据进行数据处理,将切分处理数据的任务状态更新为停止状态。In an embodiment, after step S205, that is, after the target processing thread is used to perform data processing on the segmentation processing data within the target idle time period, the batch data processing method further includes: real-time monitoring of the current load of the system during the data processing process, if The current load of the system is greater than the burst load threshold, the target processing thread is released, the data processing of the segmentation processing data is stopped, and the task status of the segmentation processing data is updated to the stopped state.
其中,突发负载阈值是预先设置的用于评估系统是否接收突发的较大负载的阈值。一般来说,该突发负载阈值大于忙碌负载均值。Wherein, the burst load threshold is a preset threshold used to evaluate whether the system receives a large burst of load. Generally speaking, the burst load threshold is greater than the average busy load.
具体地,服务器在系统当前负载大于突发负载阈值时,说明系统当前接收到携带实时标识的任务处理请求的数量较多,使得系统当前负载过大,为了保证携带实时标识的任务处理请求的及时处理,此时,需暂停目标批量任务的处理,以释放目标批量任务处理所占用的目标处理线程,即系统会主动减少批量处理的目标处理线程的数量,以优先满足实时任务对应的数据处理。本实施例中,在系统当前负载大于突发负载阈值时,释放目标处理线程,停止对切分处理数据进行数据处理,以将其任务状态均更新为停止状态;可以理解地,在下一空闲时间内,服务器优先处理数据处理队列中,任务状态为停止状态的切分处理数据,以保证数据批量处理的效率。Specifically, when the server's current load is greater than the burst load threshold, it means that the system currently receives a large number of task processing requests carrying real-time identifiers, making the current load of the system too heavy. In order to ensure the timely processing of task processing requests carrying real-time identifiers Processing, at this time, the processing of the target batch task needs to be suspended to release the target processing thread occupied by the target batch task processing, that is, the system will actively reduce the number of target processing threads in the batch processing to give priority to the data processing corresponding to the real-time task. In this embodiment, when the current load of the system is greater than the burst load threshold, the target processing thread is released, and the data processing of the segmentation processing data is stopped to update its task status to the stopped state; understandably, in the next idle time Inside, the server preferentially processes the segmentation processing data in the data processing queue with the task status in the stopped state to ensure the efficiency of data batch processing.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.
在一实施例中,提供一种批量数据处理装置,该批量数据处理装置与上述实施例中批量数据处理方法一一对应。如图7所示,该批量数据处理装置包括数据处理队列创建模块701、目标空闲信息确定模块702、切分处理数据确定模块703、系统原始负载获取模块704、数据处理结果获取模块705和任务状态更新模块706。各功能模块详细说明如下:In one embodiment, a batch data processing device is provided, and the batch data processing device corresponds to the batch data processing method in the foregoing embodiment one-to-one. As shown in FIG. 7, the batch data processing device includes a data processing queue creation module 701, a target idle information determination module 702, a segmentation processing data determination module 703, a system raw load acquisition module 704, a data processing result acquisition module 705, and task status Update module 706. The detailed description of each functional module is as follows:
数据处理队列创建模块701,用于从非实时任务队列中选取目标批量任务,基于目标批量任务创建数据处理队列,数据处理队列包括待处理数据和对应的任务状态。The data processing queue creation module 701 is used to select a target batch task from a non-real-time task queue, and create a data processing queue based on the target batch task. The data processing queue includes the data to be processed and the corresponding task status.
目标空闲信息确定模块702,用于从目标空闲时间队列中确定目标空闲信息,目标空闲信息包括起始时间、目标空闲时长和预估线程数。The target idle information determining module 702 is configured to determine target idle information from the target idle time queue, and the target idle information includes a start time, a target idle time length, and an estimated number of threads.
切分处理数据确定模块703,用于获取待处理数据对应的待处理数据数量,基于待处理数据数量、目标空闲时长和预估线程数,获取目标数量,在数据处理队列中选取与目标数量相对应的待处理数据,确定为切分处理数据。The segmentation processing data determination module 703 is used to obtain the number of data to be processed corresponding to the data to be processed, and to obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the target number in the data processing queue. The corresponding to-be-processed data is determined to be segmented processing data.
系统原始负载获取模块704,用于在系统当前时间为起始时间时,获取系统原始负载。The system original load obtaining module 704 is configured to obtain the original system load when the current time of the system is the start time.
数据处理结果获取模块705,用于若系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与预估线程数相对应的目标处理线程,在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理,获取数据处理结果。The data processing result acquisition module 705 is used to determine that the system is in an idle state if the original system load is less than the busy load threshold, acquire the target processing thread corresponding to the estimated number of threads, and use the target processing thread to split within the target idle time Process data for data processing, and obtain data processing results.
任务状态更新模块706,用于基于数据处理结果,更新数据处理队列中每一切分处理数据的任务状态。The task status update module 706 is used to update the task status of each piece of processing data in the data processing queue based on the data processing result.
优选地,在系统原始负载获取模块704之后,批量数据处理装置还包括:系统忙碌模块。Preferably, after the system original load acquisition module 704, the batch data processing device further includes: a system busy module.
系统忙碌模块,用于若系统原始负载不小于忙碌负载阈值,则认定系统处于忙碌状态,重复执行从目标空闲时间队列中确定目标空闲信息。The system busy module is used to determine that the system is in a busy state if the original load of the system is not less than the busy load threshold, and repeatedly execute to determine the target idle information from the target idle time queue.
优选地,在数据处理队列创建模块701之前,批量数据处理装置还包括:任务处理请求获取模块、实时标识模块和非实时标识模块。Preferably, before the data processing queue creation module 701, the batch data processing device further includes: a task processing request acquisition module, a real-time identification module, and a non-real-time identification module.
任务处理请求获取模块,用于获取任务处理请求,任务处理请求包括待处理任务和与待处理任务相对应的任务标识。The task processing request acquiring module is used to acquire the task processing request, and the task processing request includes the task to be processed and the task identifier corresponding to the task to be processed.
实时标识模块,用于若任务标识为实时标识,则执行待处理任务。The real-time identification module is used to execute the task to be processed if the task identification is a real-time identification.
非实时标识模块,用于若任务标识为非实时标识,则获取任务处理请求中的任务类型,基于任务类型确定待处理任务的任务优先级,依据任务优先级的顺序,将待处理任务存储在非实时任务队列中。The non-real-time identification module is used to obtain the task type in the task processing request if the task identification is a non-real-time identification, determine the task priority of the task to be processed based on the task type, and store the task to be processed in the order of task priority Non-real-time task queue.
优选地,在从目标空闲信息确定模块702之前,批量数据处理装置还包括:历史处理数据获取模块、原始空闲时间队列获取模块和原始空闲信息存储模块。Preferably, before the module 702 for determining the target idle information, the batch data processing device further includes: a historical processing data acquisition module, an original idle time queue acquisition module, and an original idle information storage module.
历史处理数据获取模块,用于获取历史处理数据,历史处理数据包括历史处理时间、历史处理数量和历史线程数。The historical processing data acquisition module is used to acquire historical processing data. The historical processing data includes historical processing time, historical processing quantity, and historical thread number.
原始空闲时间队列获取模块,用于基于机器学习算法对历史处理时间、历史处理数量和历史线程数进行大数据建模,获取原始空闲时间队列,原始空闲时间队列包括至少一个原始空闲信息,每一原始空闲信息包括起始时间、原始空闲时长和预估线程数。The original idle time queue acquisition module is used to perform big data modeling of historical processing time, historical processing quantity, and historical thread number based on machine learning algorithms to obtain the original idle time queue. The original idle time queue includes at least one piece of original idle information. The original idle information includes the start time, the original idle time, and the estimated number of threads.
原始空闲信息存储模块,用于若原始空闲时长大于第一时长阈值,则将原始空闲信息存储在目标空闲时间队列上。The original idle information storage module is configured to store the original idle information on the target idle time queue if the original idle time is greater than the first time threshold.
优选地,切分处理数据确定模块703,包括:预估处理时间获取单元、目标数量获取单元和预设筛选规则单元。Preferably, the segmentation processing data determination module 703 includes: an estimated processing time acquisition unit, a target quantity acquisition unit, and a preset screening rule unit.
预估处理时间获取单元,用于采用预估时间计算公式对待处理数据数量和预估线程数进行计算,获取数据处理队列对应的预估处理时间。The estimated processing time obtaining unit is used to calculate the number of data to be processed and the estimated number of threads using an estimated time calculation formula to obtain the estimated processing time corresponding to the data processing queue.
目标数量获取单元,用于采用目标数量计算公式对预估处理时间和目标空闲时长进行计算,获取目标数量。The target quantity obtaining unit is used to calculate the estimated processing time and the target idle time using the target quantity calculation formula to obtain the target quantity.
预设筛选规则单元,用于按预设筛选规则,在数据处理队列中选取与目标数量相对应的待处理数据,确定为切分处理数据。The preset screening rule unit is used to select the to-be-processed data corresponding to the target quantity in the data processing queue according to the preset screening rule, and determine it as segmentation processing data.
优选地,在任务状态更新模块706之后,批量数据处理装置还包括:剩余时长获取模块、可处理数据量确定模块和目标处理线程获取模块。Preferably, after the task status update module 706, the batch data processing device further includes: a remaining time acquisition module, a processable data amount determination module, and a target processing thread acquisition module.
剩余时长获取模块,用于若数据处理队列中每一切分处理数据的任务状态均更新为完成处理状态,则基于系统当前时间、起始时间和目标空闲时长,获取剩余时长。The remaining time obtaining module is used to obtain the remaining time based on the current time of the system, the starting time and the target idle time if the task status of each sub-processed data in the data processing queue is updated to the completed processing state.
可处理数据量确定模块,用于若剩余时长大于第二时长阈值,则将剩余时长更新为目标空闲时长,基于更新后的目标空闲时长和预估线程数,确定对应的可处理数据量。The processable data amount determination module is configured to update the remaining time to the target idle time if the remaining time is greater than the second time threshold, and determine the corresponding processable data amount based on the updated target idle time and the estimated number of threads.
目标处理线程获取模块,用于在数据处理队列中选取与可处理数量相对应的待处理数据,更新为切分处理数据,获取与预估线程数相对应的目标处理线程,重复执行在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理,获取数据处理结果。The target processing thread acquisition module is used to select the to-be-processed data corresponding to the number of processed data in the data processing queue, update it to split processing data, obtain the target processing thread corresponding to the estimated number of threads, and repeat execution when the target is idle Use the target processing thread to perform data processing on the segmentation processing data within the time period, and obtain the data processing result.
优选地,在数据处理结果获取模块705之后,批量数据处理装置还包括:实时监控模块。Preferably, after the data processing result obtaining module 705, the batch data processing device further includes: a real-time monitoring module.
实时监控模块,用于实时监控数据处理过程中的系统当前负载,若系统当前负载大于突发负载阈值,则释放目标处理线程,停止对切分处理数据进行数据处理,将切分处理数据的任务状态更新为停止状态。The real-time monitoring module is used to monitor the current load of the system in the process of data processing in real time. If the current load of the system is greater than the burst load threshold, the target processing thread will be released, the data processing of the segmentation processing data will be stopped, and the task of processing the data will be segmented The status is updated to the stopped status.
关于批量数据处理装置的具体限定可以参见上文中对于批量数据处理方法的限定,在此不再赘述。上述批量数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the batch data processing device, please refer to the above limitation of the batch data processing method, which will not be repeated here. Each module in the above-mentioned batch data processing device can be implemented in whole or in part by software, hardware, and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构 图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于执行上述批量数据处理方法过程中采用或者生成的数据,如目标批量任务。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种批量数据处理方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 8. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium. The database of the computer equipment is used to execute the data used or generated in the above batch data processing method, such as target batch tasks. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to realize a batch data processing method. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中批量数据处理方法,例如图2所示S201-S207,或者图3至图6中所示,为避免重复,这里不再赘述。或者,处理器执行计算机可读指令时实现批量数据处理装置这一实施例中的各模块/单元的功能,例如图7所示的数据处理队列创建模块701、目标空闲信息确定模块702、切分处理数据确定模块703、系统原始负载获取模块704、数据处理结果获取模块705和任务状态更新模块706的功能,为避免重复,这里不再赘述。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. The processor executes the computer-readable instructions to implement batches in the foregoing embodiments. The data processing method, such as S201-S207 shown in FIG. 2, or shown in FIG. 3 to FIG. 6, is not repeated here to avoid repetition. Alternatively, the processor implements the functions of the modules/units in this embodiment of the batch data processing device when the processor executes computer-readable instructions, for example, the data processing queue creation module 701, the target idle information determination module 702, and the segmentation shown in FIG. The functions of the processed data determining module 703, the system original load acquiring module 704, the data processing result acquiring module 705, and the task status updating module 706 are not repeated here to avoid repetition.
在一实施例中,提供一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行上述实施例中批量数据处理方法,例如图2所示S201-S207,或者图3至图6中所示,为避免重复,这里不再赘述。或者,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现批量数据处理装置这一实施例中的各模块/单元的功能,例如图7所示的数据处理队列创建模块701、目标空闲信息确定模块702、切分处理数据确定模块703、系统原始负载获取模块704、数据处理结果获取模块705和任务状态更新模块706的功能,为避免重复,这里不再赘述。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In an embodiment, one or more readable storage media storing computer readable instructions are provided. When the computer readable instructions are executed by one or more processors, the one or more processors execute the foregoing The batch data processing method in the embodiment, for example, S201-S207 shown in FIG. 2, or shown in FIG. 3 to FIG. 6, is not repeated here to avoid repetition. Alternatively, when the computer-readable instructions are executed by one or more processors, the one or more processors realize the functions of each module/unit in the embodiment of the batch data processing apparatus when executed, for example, FIG. 7 The functions of the data processing queue creation module 701, the target idle information determination module 702, the segmentation processing data determination module 703, the system original load acquisition module 704, the data processing result acquisition module 705, and the task status update module 706 are shown to avoid duplication , I won’t repeat it here. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种批量数据处理方法,其特征在于,包括:A batch data processing method, characterized in that it comprises:
    从非实时任务队列中选取目标批量任务,基于所述目标批量任务创建数据处理队列,所述数据处理队列包括待处理数据和对应的任务状态;Selecting a target batch task from a non-real-time task queue, and creating a data processing queue based on the target batch task, the data processing queue including the data to be processed and the corresponding task status;
    从目标空闲时间队列中确定目标空闲信息,所述目标空闲信息包括起始时间、目标空闲时长和预估线程数;Determine target idle information from the target idle time queue, where the target idle information includes a start time, a target idle time length, and an estimated number of threads;
    获取所述待处理数据对应的待处理数据数量,基于所述待处理数据数量、所述目标空闲时长和所述预估线程数,获取目标数量,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据;Obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the target number from the data processing queue. The data to be processed corresponding to the quantity is determined as segmentation processing data;
    在系统当前时间为所述起始时间时,获取系统原始负载;When the current time of the system is the start time, obtain the original load of the system;
    若所述系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与所述预估线程数相对应的目标处理线程,在所述目标空闲时长内采用所述目标处理线程对所述切分处理数据进行数据处理,获取数据处理结果;If the original load of the system is less than the busy load threshold, the system is determined to be in an idle state, the target processing thread corresponding to the estimated number of threads is obtained, and the target processing thread is used to switch the switch within the target idle time. Perform data processing on the processed data to obtain the data processing results;
    基于所述数据处理结果,更新所述数据处理队列中每一所述切分处理数据的任务状态。Based on the data processing result, the task status of each segmentation processing data in the data processing queue is updated.
  2. 如权利要求1所述的批量数据处理方法,其特征在于,在所述在系统当前时间为所述起始时间时,获取系统原始负载之后,所述批量数据处理方法还包括:5. The batch data processing method according to claim 1, wherein after obtaining the original system load when the current system time is the start time, the batch data processing method further comprises:
    若所述系统原始负载不小于所述忙碌负载阈值,则认定系统处于忙碌状态,重复执行所述从目标空闲时间队列中确定目标空闲信息。If the original load of the system is not less than the busy load threshold, it is determined that the system is in a busy state, and the determination of target idle information from the target idle time queue is repeated.
  3. 如权利要求1所述的批量数据处理方法,其特征在于,在所述从非实时任务队列中选取目标批量任务,基于所述目标批量任务创建数据处理队列之前,所述批量数据处理方法还包括:The method for processing batch data according to claim 1, wherein before the selecting a target batch task from a non-real-time task queue and creating a data processing queue based on the target batch task, the batch data processing method further comprises :
    获取任务处理请求,所述任务处理请求包括待处理任务和与所述待处理任务相对应的任务标识;Acquiring a task processing request, where the task processing request includes a task to be processed and a task identifier corresponding to the task to be processed;
    若所述任务标识为实时标识,则执行所述待处理任务;If the task identifier is a real-time identifier, execute the task to be processed;
    若所述任务标识为非实时标识,则获取所述任务处理请求中的任务类型,基于所述任务类型确定所述待处理任务的任务优先级,依据所述任务优先级的顺序,将所述待处理任务存储在所述非实时任务队列中。If the task identifier is a non-real-time identifier, the task type in the task processing request is acquired, the task priority of the task to be processed is determined based on the task type, and the task priority is ordered according to the order of the task priority. The tasks to be processed are stored in the non-real-time task queue.
  4. 如权利要求1所述的批量数据处理方法,其特征在于,在所述从目标空闲时间队列中确定目标空闲信息之前,所述批量数据处理方法还包括:5. The batch data processing method according to claim 1, wherein before the determining target idle information from the target idle time queue, the batch data processing method further comprises:
    获取历史处理数据,所述历史处理数据包括历史处理时间、历史处理数量和历史线程数;Acquiring historical processing data, the historical processing data including historical processing time, historical processing quantity, and historical thread number;
    基于机器学习算法对所述历史处理时间、所述历史处理数量和所述历史线程数进行大数据建模,获取原始空闲时间队列,所述原始空闲时间队列包括至少一个原始空闲信息,每一所述原始空闲信息包括起始时间、原始空闲时长和预估线程数;Big data modeling is performed on the historical processing time, the historical processing quantity, and the historical thread number based on a machine learning algorithm, and an original idle time queue is obtained. The original idle time queue includes at least one piece of original idle information. The original idle information includes the start time, the original idle time and the estimated number of threads;
    若所述原始空闲时长大于第一时长阈值,则将所述原始空闲信息存储在所述目标空闲时间队列上。If the original idle time is greater than the first time threshold, the original idle information is stored on the target idle time queue.
  5. 如权利要求1所述的批量数据处理方法,其特征在于,所述基于所述待处理数据数量、所述目标空闲时长和所述预估线程数,获取目标数量,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据,包括:The batch data processing method according to claim 1, wherein the target number is obtained based on the number of data to be processed, the target idle time and the estimated number of threads, and the target number is stored in the data processing queue Select the to-be-processed data corresponding to the target quantity and determine it as segmentation processing data, including:
    采用预估时间计算公式对所述待处理数据数量和所述预估线程数进行计算,获取所述数据处理队列对应的预估处理时间;Calculating the number of data to be processed and the number of estimated threads by using an estimated time calculation formula to obtain the estimated processing time corresponding to the data processing queue;
    采用目标数量计算公式对所述预估处理时间和所述目标空闲时长进行计算,获取目标数量;Calculate the estimated processing time and the target idle time using a target quantity calculation formula to obtain the target quantity;
    按预设筛选规则,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据。According to a preset filtering rule, the data to be processed corresponding to the target quantity is selected from the data processing queue, and determined as the segmentation processing data.
  6. 如权利要求1所述的批量数据处理方法,其特征在于,在所述基于所述数据处理结果,更新所述数据处理队列中每一所述切分处理数据的任务状态之后,所述批量数据处理方法还包括:The method for batch data processing according to claim 1, wherein after the task status of each of the segmentation processing data in the data processing queue is updated based on the data processing result, the batch data Treatment methods also include:
    若所述数据处理队列中每一所述切分处理数据的任务状态均更新为完成处理状态,则基于系统当前时间、所述起始时间和所述目标空闲时长,获取剩余时长;If the task status of each segmentation processing data in the data processing queue is updated to a completed processing status, obtaining the remaining time based on the current system time, the start time, and the target idle time;
    若剩余时长大于第二时长阈值,则将所述剩余时长更新为目标空闲时长,基于更新后的所述目标空闲时长和所述预估线程数,确定对应的可处理数据量;If the remaining duration is greater than the second duration threshold, update the remaining duration to a target idle duration, and determine the corresponding processable data amount based on the updated target idle duration and the estimated number of threads;
    在所述数据处理队列中选取与所述可处理数量相对应的待处理数据,更新为切分处理数据,获取与所述预估线程数相对应的目标处理线程,重复执行在所述目标空闲时长内采用所述目标处理线程对所述切分处理数据进行数据处理,获取数据处理结果。Select the to-be-processed data corresponding to the number of processes that can be processed in the data processing queue, update it to split processing data, obtain the target processing thread corresponding to the estimated number of threads, and repeat execution when the target is idle The target processing thread is used to perform data processing on the segmentation processing data within the time period, and the data processing result is obtained.
  7. 如权利要求1所述的批量数据处理方法,其特征在于,在所述目标空闲时长内采用所述目标处理线程对所述切分处理数据进行数据处理之后,所述批量数据处理方法还包括:5. The batch data processing method according to claim 1, wherein after the target processing thread is used to perform data processing on the segmentation processing data within the target idle time period, the batch data processing method further comprises:
    实时监控数据处理过程中的系统当前负载,若系统当前负载大于突发负载阈值,则释放所述目标处理线程,停止对所述切分处理数据进行数据处理,将所述切分处理数据的任务状态更新为停止状态。Monitor the current load of the system during data processing in real time. If the current load of the system is greater than the burst load threshold, release the target processing thread, stop the data processing of the split processing data, and split the task of processing the data The status is updated to the stopped status.
  8. 一种批量数据处理装置,其特征在于,包括:A batch data processing device, characterized in that it comprises:
    数据处理队列创建模块,用于从非实时任务队列中选取目标批量任务,基于目标批量任务创建数据处理队列,数据处理队列包括待处理数据和对应的任务状态;The data processing queue creation module is used to select target batch tasks from the non-real-time task queue, and create a data processing queue based on the target batch tasks. The data processing queue includes the data to be processed and the corresponding task status;
    目标空闲信息确定模块,用于从目标空闲时间队列中确定目标空闲信息,目标空闲信息包括起始时间、目标空闲时长和预估线程数;The target idle information determination module is used to determine the target idle information from the target idle time queue, and the target idle information includes the start time, the target idle time and the estimated number of threads;
    切分处理数据确定模块,用于获取待处理数据对应的待处理数据数量,基于待处理数据数量、目标空闲时长和预估线程数,获取目标数量,在数据处理队列中选取与目标数量相对应的待处理数据,确定为切分处理数据;The segmentation processing data determination module is used to obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the corresponding target number in the data processing queue The to-be-processed data is determined to be segmented processing data;
    系统原始负载获取模块,用于在系统当前时间为起始时间时,获取系统原始负载;The system original load acquisition module is used to obtain the system original load when the current time of the system is the start time;
    数据处理结果获取模块,用于若系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与预估线程数相对应的目标处理线程,在目标空闲时长内采用目标处理线程对切分处理数据进行数据处理,获取数据处理结果;Data processing result acquisition module, used to determine that the system is in an idle state if the original load of the system is less than the busy load threshold, obtain the target processing thread corresponding to the estimated number of threads, and use the target processing thread to split the target within the target idle time Data processing and obtaining data processing results;
    任务状态更新模块,用于基于数据处理结果,更新数据处理队列中每一切分处理数据的任务状态。The task status update module is used to update the task status of each processing data in the data processing queue based on the data processing result.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer-readable instructions as follows step:
    从非实时任务队列中选取目标批量任务,基于所述目标批量任务创建数据处理队列,所述数据处理队列包括待处理数据和对应的任务状态;Selecting a target batch task from a non-real-time task queue, and creating a data processing queue based on the target batch task, the data processing queue including the data to be processed and the corresponding task status;
    从目标空闲时间队列中确定目标空闲信息,所述目标空闲信息包括起始时间、目标空闲时长和预估线程数;Determine target idle information from the target idle time queue, where the target idle information includes a start time, a target idle time length, and an estimated number of threads;
    获取所述待处理数据对应的待处理数据数量,基于所述待处理数据数量、所述目标空闲时长和所述预估线程数,获取目标数量,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据;Obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the target number from the data processing queue. The data to be processed corresponding to the quantity is determined as segmentation processing data;
    在系统当前时间为所述起始时间时,获取系统原始负载;When the current time of the system is the start time, obtain the original load of the system;
    若所述系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与所述预估线程数相对应的目标处理线程,在所述目标空闲时长内采用所述目标处理线程对所述切分 处理数据进行数据处理,获取数据处理结果;If the original load of the system is less than the busy load threshold, the system is determined to be in an idle state, the target processing thread corresponding to the estimated number of threads is obtained, and the target processing thread is used to switch the switch within the target idle time. Perform data processing on the processed data to obtain the data processing results;
    基于所述数据处理结果,更新所述数据处理队列中每一所述切分处理数据的任务状态。Based on the data processing result, the task status of each segmentation processing data in the data processing queue is updated.
  10. 如权利要求9所述的计算机设备,其特征在于,在所述在系统当前时间为所述起始时间时,获取系统原始负载之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein, when the current system time is the start time, after acquiring the original system load, the processor further implements the following when executing the computer-readable instruction step:
    若所述系统原始负载不小于所述忙碌负载阈值,则认定系统处于忙碌状态,重复执行所述从目标空闲时间队列中确定目标空闲信息。If the original load of the system is not less than the busy load threshold, it is determined that the system is in a busy state, and the determination of target idle information from the target idle time queue is repeated.
  11. 如权利要求9所述的计算机设备,其特征在于,在所述从非实时任务队列中选取目标批量任务,基于所述目标批量任务创建数据处理队列之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein before the target batch task is selected from the non-real-time task queue, and the data processing queue is created based on the target batch task, the processor executes the computer readable The following steps are also implemented when ordering:
    获取任务处理请求,所述任务处理请求包括待处理任务和与所述待处理任务相对应的任务标识;Acquiring a task processing request, where the task processing request includes a task to be processed and a task identifier corresponding to the task to be processed;
    若所述任务标识为实时标识,则执行所述待处理任务;If the task identifier is a real-time identifier, execute the task to be processed;
    若所述任务标识为非实时标识,则获取所述任务处理请求中的任务类型,基于所述任务类型确定所述待处理任务的任务优先级,依据所述任务优先级的顺序,将所述待处理任务存储在所述非实时任务队列中。If the task identifier is a non-real-time identifier, the task type in the task processing request is acquired, the task priority of the task to be processed is determined based on the task type, and the task priority is ordered according to the order of the task priority. The tasks to be processed are stored in the non-real-time task queue.
  12. 如权利要求9所述的计算机设备,其特征在于,在所述从目标空闲时间队列中确定目标空闲信息之前,所述处理器执行所述计算机可读指令时还实现如下步骤:9. The computer device according to claim 9, wherein before the determining target idle information from the target idle time queue, the processor further implements the following steps when executing the computer readable instruction:
    获取历史处理数据,所述历史处理数据包括历史处理时间、历史处理数量和历史线程数;Acquiring historical processing data, the historical processing data including historical processing time, historical processing quantity, and historical thread number;
    基于机器学习算法对所述历史处理时间、所述历史处理数量和所述历史线程数进行大数据建模,获取原始空闲时间队列,所述原始空闲时间队列包括至少一个原始空闲信息,每一所述原始空闲信息包括起始时间、原始空闲时长和预估线程数;Big data modeling is performed on the historical processing time, the historical processing quantity, and the historical thread number based on a machine learning algorithm, and an original idle time queue is obtained. The original idle time queue includes at least one piece of original idle information. The original idle information includes the start time, the original idle time and the estimated number of threads;
    若所述原始空闲时长大于第一时长阈值,则将所述原始空闲信息存储在所述目标空闲时间队列上。If the original idle time is greater than the first time threshold, the original idle information is stored on the target idle time queue.
  13. 如权利要求9所述的计算机设备,其特征在于,所述基于所述待处理数据数量、所述目标空闲时长和所述预估线程数,获取目标数量,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据,包括:The computer device according to claim 9, wherein the target number is obtained based on the number of data to be processed, the target idle time and the estimated number of threads, and the number of targets is selected from the data processing queue. The to-be-processed data corresponding to the target quantity is determined to be segmented processing data, including:
    采用预估时间计算公式对所述待处理数据数量和所述预估线程数进行计算,获取所述数据处理队列对应的预估处理时间;Calculating the number of data to be processed and the number of estimated threads by using an estimated time calculation formula to obtain the estimated processing time corresponding to the data processing queue;
    采用目标数量计算公式对所述预估处理时间和所述目标空闲时长进行计算,获取目标数量;Calculate the estimated processing time and the target idle time using a target quantity calculation formula to obtain the target quantity;
    按预设筛选规则,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据。According to a preset filtering rule, the data to be processed corresponding to the target quantity is selected from the data processing queue, and determined as the segmentation processing data.
  14. 如权利要求9所述的计算机设备,其特征在于,在所述基于所述数据处理结果,更新所述数据处理队列中每一所述切分处理数据的任务状态之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein after the task status of each of the segmentation processing data in the data processing queue is updated based on the data processing result, the processor executes all The following steps are also implemented when the computer-readable instructions are described:
    若所述数据处理队列中每一所述切分处理数据的任务状态均更新为完成处理状态,则基于系统当前时间、所述起始时间和所述目标空闲时长,获取剩余时长;If the task status of each segmentation processing data in the data processing queue is updated to a completed processing status, obtaining the remaining time based on the current system time, the start time, and the target idle time;
    若剩余时长大于第二时长阈值,则将所述剩余时长更新为目标空闲时长,基于更新后的所述目标空闲时长和所述预估线程数,确定对应的可处理数据量;If the remaining duration is greater than the second duration threshold, update the remaining duration to a target idle duration, and determine the corresponding processable data amount based on the updated target idle duration and the estimated number of threads;
    在所述数据处理队列中选取与所述可处理数量相对应的待处理数据,更新为切分处理数据,获取与所述预估线程数相对应的目标处理线程,重复执行在所述目标空闲时长内采用所述目标处理线程对所述切分处理数据进行数据处理,获取数据处理结果。Select the to-be-processed data corresponding to the number of processes that can be processed in the data processing queue, update it to split processing data, obtain the target processing thread corresponding to the estimated number of threads, and repeat execution when the target is idle The target processing thread is used to perform data processing on the segmentation processing data within the time period, and the data processing result is obtained.
  15. 一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    从非实时任务队列中选取目标批量任务,基于所述目标批量任务创建数据处理队列,所述数据处理队列包括待处理数据和对应的任务状态;Selecting a target batch task from a non-real-time task queue, and creating a data processing queue based on the target batch task, the data processing queue including the data to be processed and the corresponding task status;
    从目标空闲时间队列中确定目标空闲信息,所述目标空闲信息包括起始时间、目标空闲时长和预估线程数;Determine target idle information from the target idle time queue, where the target idle information includes a start time, a target idle time length, and an estimated number of threads;
    获取所述待处理数据对应的待处理数据数量,基于所述待处理数据数量、所述目标空闲时长和所述预估线程数,获取目标数量,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据;Obtain the number of data to be processed corresponding to the data to be processed, and obtain the target number based on the number of data to be processed, the target idle time and the estimated number of threads, and select the target number from the data processing queue. The data to be processed corresponding to the quantity is determined as segmentation processing data;
    在系统当前时间为所述起始时间时,获取系统原始负载;When the current time of the system is the start time, obtain the original load of the system;
    若所述系统原始负载小于忙碌负载阈值,则认定系统处于空闲状态,获取与所述预估线程数相对应的目标处理线程,在所述目标空闲时长内采用所述目标处理线程对所述切分处理数据进行数据处理,获取数据处理结果;If the original load of the system is less than the busy load threshold, the system is determined to be in an idle state, the target processing thread corresponding to the estimated number of threads is obtained, and the target processing thread is used to switch the switch within the target idle time. Perform data processing on the processed data to obtain the data processing results;
    基于所述数据处理结果,更新所述数据处理队列中每一所述切分处理数据的任务状态。Based on the data processing result, the task status of each segmentation processing data in the data processing queue is updated.
  16. 如权利要求15所述的可读存储介质,其特征在于,在所述在系统当前时间为所述起始时间时,获取系统原始负载之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 15, wherein when the current time of the system is the start time, after obtaining the original load of the system, the computer-readable instructions are executed by one or more processors When executed, the one or more processors are caused to further execute the following steps:
    若所述系统原始负载不小于所述忙碌负载阈值,则认定系统处于忙碌状态,重复执行所述从目标空闲时间队列中确定目标空闲信息。If the original load of the system is not less than the busy load threshold, it is determined that the system is in a busy state, and the determination of target idle information from the target idle time queue is repeated.
  17. 如权利要求15所述的可读存储介质,其特征在于,在所述从非实时任务队列中选取目标批量任务,基于所述目标批量任务创建数据处理队列之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium of claim 15, wherein before the target batch task is selected from the non-real-time task queue, and the data processing queue is created based on the target batch task, the computer-readable instruction is When executed by the one or more processors, the one or more processors further execute the following steps:
    获取任务处理请求,所述任务处理请求包括待处理任务和与所述待处理任务相对应的任务标识;Acquiring a task processing request, where the task processing request includes a task to be processed and a task identifier corresponding to the task to be processed;
    若所述任务标识为实时标识,则执行所述待处理任务;If the task identifier is a real-time identifier, execute the task to be processed;
    若所述任务标识为非实时标识,则获取所述任务处理请求中的任务类型,基于所述任务类型确定所述待处理任务的任务优先级,依据所述任务优先级的顺序,将所述待处理任务存储在所述非实时任务队列中。If the task identifier is a non-real-time identifier, the task type in the task processing request is acquired, the task priority of the task to be processed is determined based on the task type, and the task priority is ordered according to the order of the task priority. The pending tasks are stored in the non-real-time task queue.
  18. 如权利要求15所述的可读存储介质,其特征在于,在所述从目标空闲时间队列中确定目标空闲信息之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium of claim 15, wherein before the target idle information is determined from the target idle time queue, when the computer-readable instructions are executed by one or more processors, the One or more processors also perform the following steps:
    获取历史处理数据,所述历史处理数据包括历史处理时间、历史处理数量和历史线程数;Acquiring historical processing data, the historical processing data including historical processing time, historical processing quantity, and historical thread number;
    基于机器学习算法对所述历史处理时间、所述历史处理数量和所述历史线程数进行大数据建模,获取原始空闲时间队列,所述原始空闲时间队列包括至少一个原始空闲信息,每一所述原始空闲信息包括起始时间、原始空闲时长和预估线程数;Big data modeling is performed on the historical processing time, the historical processing quantity, and the historical thread number based on a machine learning algorithm, and an original idle time queue is obtained. The original idle time queue includes at least one piece of original idle information. The original idle information includes the start time, the original idle time and the estimated number of threads;
    若所述原始空闲时长大于第一时长阈值,则将所述原始空闲信息存储在所述目标空闲时间队列上。If the original idle time is greater than the first time threshold, the original idle information is stored on the target idle time queue.
  19. 如权利要求15所述的可读存储介质,其特征在于,所述基于所述待处理数据数量、所述目标空闲时长和所述预估线程数,获取目标数量,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据,包括:The readable storage medium according to claim 15, wherein the target number is obtained based on the number of data to be processed, the target idle time and the estimated number of threads, and the target number is stored in the data processing queue Select the to-be-processed data corresponding to the target quantity and determine it as segmentation processing data, including:
    采用预估时间计算公式对所述待处理数据数量和所述预估线程数进行计算,获取所述数据处理队列对应的预估处理时间;Calculating the number of data to be processed and the number of estimated threads by using an estimated time calculation formula to obtain the estimated processing time corresponding to the data processing queue;
    采用目标数量计算公式对所述预估处理时间和所述目标空闲时长进行计算,获取目标 数量;Calculate the estimated processing time and the target idle time using a target quantity calculation formula to obtain the target quantity;
    按预设筛选规则,在所述数据处理队列中选取与所述目标数量相对应的待处理数据,确定为切分处理数据。According to a preset filtering rule, the data to be processed corresponding to the target quantity is selected from the data processing queue and determined as the segmentation processing data.
  20. 如权利要求15所述的可读存储介质,其特征在于,在所述基于所述数据处理结果,更新所述数据处理队列中每一所述切分处理数据的任务状态之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 15, wherein after the task status of each of the segmentation processing data in the data processing queue is updated based on the data processing result, the computer can When the read instruction is executed by one or more processors, the one or more processors further execute the following steps:
    若所述数据处理队列中每一所述切分处理数据的任务状态均更新为完成处理状态,则基于系统当前时间、所述起始时间和所述目标空闲时长,获取剩余时长;If the task status of each segmentation processing data in the data processing queue is updated to a completed processing status, obtaining the remaining time based on the current system time, the start time, and the target idle time;
    若剩余时长大于第二时长阈值,则将所述剩余时长更新为目标空闲时长,基于更新后的所述目标空闲时长和所述预估线程数,确定对应的可处理数据量;If the remaining duration is greater than the second duration threshold, update the remaining duration to a target idle duration, and determine the corresponding processable data amount based on the updated target idle duration and the estimated number of threads;
    在所述数据处理队列中选取与所述可处理数量相对应的待处理数据,更新为切分处理数据,获取与所述预估线程数相对应的目标处理线程,重复执行在所述目标空闲时长内采用所述目标处理线程对所述切分处理数据进行数据处理,获取数据处理结果。Select the to-be-processed data corresponding to the number of processes that can be processed in the data processing queue, update it to split processing data, obtain the target processing thread corresponding to the estimated number of threads, and repeat execution when the target is idle The target processing thread is used to perform data processing on the segmentation processing data within the time period, and the data processing result is obtained.
PCT/CN2019/102672 2019-05-16 2019-08-27 Batch data processing method and apparatus, computer device and storage medium WO2020228177A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910405149.0 2019-05-16
CN201910405149.0A CN110297711B (en) 2019-05-16 2019-05-16 Batch data processing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020228177A1 true WO2020228177A1 (en) 2020-11-19

Family

ID=68026784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102672 WO2020228177A1 (en) 2019-05-16 2019-08-27 Batch data processing method and apparatus, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110297711B (en)
WO (1) WO2020228177A1 (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991127B (en) * 2019-10-17 2021-01-19 广东高云半导体科技股份有限公司 Task execution method and device, computer equipment and storage medium
CN110750693A (en) * 2019-10-21 2020-02-04 北京百度网讯科技有限公司 Data processing method, device, equipment and medium
CN110764916B (en) * 2019-10-30 2022-06-03 北京声智科技有限公司 Information processing method, device, storage medium and equipment
CN110765297A (en) * 2019-11-01 2020-02-07 广东三维家信息科技有限公司 Picture data management method and device and electronic equipment
CN110928711A (en) * 2019-11-26 2020-03-27 多点(深圳)数字科技有限公司 Task processing method, device, system, server and storage medium
CN111078733B (en) * 2019-11-26 2024-02-09 金蝶软件(中国)有限公司 Batch task processing method, device, computer equipment and storage medium
CN113127185B (en) * 2019-12-31 2023-11-10 北京懿医云科技有限公司 Task execution queue processing method and device, storage medium and electronic equipment
CN111338787B (en) * 2020-02-04 2023-09-01 浙江大华技术股份有限公司 Data processing method and device, storage medium and electronic device
CN111343652B (en) * 2020-02-18 2021-08-13 腾讯科技(深圳)有限公司 Data processing method and related equipment
CN111352988B (en) * 2020-02-29 2023-05-23 重庆百事得大牛机器人有限公司 Big data warehouse storage, analysis and extraction system aiming at legal information
CN111506410B (en) * 2020-04-21 2023-05-12 北京思特奇信息技术股份有限公司 Background batch processing business optimization method, system and storage medium
CN111897631B (en) * 2020-07-15 2022-08-30 上海携旅信息技术有限公司 Batch process based model inference system, method, electronic device and medium
CN111737010B (en) * 2020-07-30 2024-02-02 腾讯科技(深圳)有限公司 Task processing method and device, graphic task processing system and storage medium
CN112035481B (en) * 2020-08-31 2023-10-27 中国平安财产保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN111949424A (en) * 2020-09-18 2020-11-17 成都精灵云科技有限公司 Method for realizing queue for processing declarative events
CN112199441B (en) * 2020-09-28 2023-11-24 中国平安人寿保险股份有限公司 Data synchronous processing method, device, equipment and medium based on big data platform
CN112099958B (en) * 2020-11-17 2021-03-02 深圳壹账通智能科技有限公司 Distributed multi-task management method and device, computer equipment and storage medium
CN112804295A (en) * 2020-12-24 2021-05-14 宝能(广州)汽车研究院有限公司 Vehicle OTA server, load balancing method thereof and computer-readable storage medium
CN112835867A (en) * 2021-01-11 2021-05-25 中国农业银行股份有限公司 Data preprocessing method and device
CN112799945B (en) * 2021-01-29 2024-03-15 中国工商银行股份有限公司 Batch file verification method and device
CN113076181B (en) * 2021-03-04 2023-09-26 山东英信计算机技术有限公司 Data processing flow optimization method, system and storage medium
CN113111240A (en) * 2021-04-20 2021-07-13 康键信息技术(深圳)有限公司 Log monitoring method and device, electronic equipment and readable storage medium
CN113194039B (en) * 2021-04-23 2023-01-31 京东科技信息技术有限公司 Method and device for segmenting system data flow, electronic equipment and storage medium
CN113205130B (en) * 2021-04-28 2023-05-02 五八有限公司 Data auditing method and device, electronic equipment and storage medium
CN113268328A (en) * 2021-05-26 2021-08-17 平安国际融资租赁有限公司 Batch processing method and device, computer equipment and storage medium
CN113254176B (en) * 2021-05-28 2023-02-07 平安普惠企业管理有限公司 Project management method and device, computer equipment and storage medium
CN113377501A (en) * 2021-06-08 2021-09-10 中国农业银行股份有限公司 Data processing method, apparatus, device, medium, and program product
CN113391896B (en) * 2021-06-15 2023-09-22 北京京东振世信息技术有限公司 Task processing method and device, storage medium and electronic equipment
CN113391857A (en) * 2021-07-12 2021-09-14 上海哔哩哔哩科技有限公司 Instruction processing method and device
CN113971552B (en) * 2021-10-26 2022-10-14 中电金信软件有限公司 Batch data processing method, device, equipment and storage medium
CN113992684B (en) * 2021-10-26 2022-10-28 中电金信软件有限公司 Method, device, processing node, storage medium and system for sending data
CN115118768A (en) * 2022-06-27 2022-09-27 平安壹钱包电子商务有限公司 Task distribution method and device, storage medium and electronic equipment
CN115277595B (en) * 2022-07-26 2023-04-25 深圳证券通信有限公司 Data transmission method and related device
CN116483544B (en) * 2023-06-15 2023-09-19 阿里健康科技(杭州)有限公司 Task processing method, device, computer equipment and storage medium
CN116775255B (en) * 2023-08-15 2023-11-21 长沙伊士格信息科技有限责任公司 Global integration system supporting wide integration scene
CN116795453B (en) * 2023-08-28 2023-11-03 成都中科合迅科技有限公司 Multi-CPU architecture call control method and system for application program
CN117056085B (en) * 2023-10-11 2023-12-22 深圳安天网络安全技术有限公司 Load balancing method, device and safety protection system
CN117573907B (en) * 2024-01-16 2024-04-26 北京航空航天大学杭州创新研究院 Mobile robot data storage method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101882161A (en) * 2010-06-23 2010-11-10 中国工商银行股份有限公司 Application level asynchronous task scheduling system and method
CN102393822A (en) * 2011-11-30 2012-03-28 中国工商银行股份有限公司 Batch scheduling system and method
US20170083380A1 (en) * 2015-09-18 2017-03-23 Salesforce.Com, Inc. Managing resource allocation in a stream processing framework

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8694994B1 (en) * 2011-09-07 2014-04-08 Amazon Technologies, Inc. Optimization of packet processing by delaying a processor from entering an idle state
CN102591721A (en) * 2011-12-30 2012-07-18 北京新媒传信科技有限公司 Method and system for distributing thread execution task
CN105389209B (en) * 2015-12-25 2019-04-26 中国建设银行股份有限公司 A kind of asynchronous batch tasks processing method and system
CN106547612B (en) * 2016-10-18 2020-10-20 深圳怡化电脑股份有限公司 Multitasking method and device
CN109726006B (en) * 2017-10-27 2023-06-06 伊姆西Ip控股有限责任公司 Method, apparatus and computer storage medium for controlling a storage system
CN107918864B (en) * 2017-11-23 2021-06-04 平安科技(深圳)有限公司 Electronic insurance policy generation method and device, computer equipment and storage medium
CN108536532B (en) * 2018-04-23 2021-06-22 中国农业银行股份有限公司 Batch task processing method and system
CN109492024A (en) * 2018-10-26 2019-03-19 平安科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101882161A (en) * 2010-06-23 2010-11-10 中国工商银行股份有限公司 Application level asynchronous task scheduling system and method
CN102393822A (en) * 2011-11-30 2012-03-28 中国工商银行股份有限公司 Batch scheduling system and method
US20170083380A1 (en) * 2015-09-18 2017-03-23 Salesforce.Com, Inc. Managing resource allocation in a stream processing framework

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIANG, LIFANG: "On-line and Batches", FINANCIAL TECHNOLOGY TIME, 31 May 2012 (2012-05-31), pages 39 - 42, XP009524236, ISSN: 2095-0799 *

Also Published As

Publication number Publication date
CN110297711B (en) 2024-01-19
CN110297711A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
WO2020228177A1 (en) Batch data processing method and apparatus, computer device and storage medium
US10754706B1 (en) Task scheduling for multiprocessor systems
WO2020211579A1 (en) Processing method, device and system for distributed bulk processing system
CN107291547B (en) Task scheduling processing method, device and system
Shah et al. The MDS queue: Analysing the latency performance of erasure codes
US10460241B2 (en) Server and cloud computing resource optimization method thereof for cloud big data computing architecture
EP3270287A1 (en) Scheduling method and system for video analysis tasks
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
CN110489447B (en) Data query method and device, computer equipment and storage medium
WO2021159638A1 (en) Method, apparatus and device for scheduling cluster queue resources, and storage medium
WO2019192263A1 (en) Task assigning method, apparatus and device
US20150199218A1 (en) Job scheduling based on historical job data
US10884667B2 (en) Storage controller and IO request processing method
CN112486642B (en) Resource scheduling method, device, electronic equipment and computer readable storage medium
Zhong et al. Speeding up Paulson’s procedure for large-scale problems using parallel computing
CN115185679A (en) Task processing method and device for artificial intelligence algorithm, server and storage medium
CN113391911A (en) Big data resource dynamic scheduling method, device and equipment
WO2019029721A1 (en) Task scheduling method, apparatus and device, and storage medium
CN113742059B (en) Task allocation method, device, computer equipment and storage medium
US11388050B2 (en) Accelerating machine learning and profiling over a network
US20220229692A1 (en) Method and device for data task scheduling, storage medium, and scheduling tool
CN114564281A (en) Container scheduling method, device, equipment and storage medium
WO2018119899A1 (en) Storage controller and io request processing method
CN114661415A (en) Scheduling method and computer system
CN109062707B (en) Electronic device, method for limiting inter-process communication thereof and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928683

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19928683

Country of ref document: EP

Kind code of ref document: A1