CN116303533A - Task processing method and device, electronic equipment and storage medium - Google Patents

Task processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116303533A
CN116303533A CN202310185342.4A CN202310185342A CN116303533A CN 116303533 A CN116303533 A CN 116303533A CN 202310185342 A CN202310185342 A CN 202310185342A CN 116303533 A CN116303533 A CN 116303533A
Authority
CN
China
Prior art keywords
processing task
processing
task
data
service data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310185342.4A
Other languages
Chinese (zh)
Inventor
赵艳杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202310185342.4A priority Critical patent/CN116303533A/en
Publication of CN116303533A publication Critical patent/CN116303533A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a task processing method, a device, electronic equipment and a storage medium, relating to the technical field of computers, comprising the following steps: acquiring a first processing task aiming at first service data; the first service data comprises service data of a plurality of objects; determining a target object of which the data volume of the service data is larger than a preset threshold value from the objects based on the data volume of the service data of the objects; extracting second service data of the target object from the first service data, and determining third service data except the second service data in the first service data; generating a second processing task for the second service data and a third processing task for the third service data; respectively executing the second processing task and the third processing task to obtain corresponding processing results; and combining the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task, so that the probability of failure in executing the first processing task can be reduced.

Description

Task processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a task processing method, a task processing device, an electronic device, and a storage medium.
Background
Spark is a processing engine for performing data processing, and in the related art, a technician develops a Spark task for service data when performing data analysis on service data generated by a service. And executing Spark task to obtain processing result of service data. For example, the Spark task is executed to perform filtering, aggregation, and other processes on the service data. Subsequently, technicians can upgrade and optimize the service based on the processing result of the service data.
However, the service data includes service data generated by a plurality of objects using the service, and in the process of performing the Spark task, since the data amount of the service data generated by each object is different, if the data amount of the service data of a single object is too large, the data amount of the service data of other objects is far exceeded, and data tilting may occur, which may cause the Spark task to fail to be performed.
Disclosure of Invention
An embodiment of the invention aims to provide a task processing method, a device, electronic equipment and a storage medium, so as to avoid the situation that a first processing task fails to execute due to data inclination and reduce the probability of failure in executing the first processing task. The specific technical scheme is as follows:
In a first aspect of the present invention, there is provided a task processing method, including:
acquiring a processing task aiming at first service data as a first processing task; wherein the first service data comprises service data of a plurality of objects;
determining an object with the data volume of the service data larger than a preset threshold value from the objects based on the data volume of the service data of the objects as a target object; wherein the preset threshold is determined based on the data amount of the business data of each object;
extracting service data of the target object from the first service data to serve as second service data, and determining other service data except the second service data in the first service data to serve as third service data;
generating a processing task for the second service data as a second processing task, and generating a processing task for the third service data as a third processing task;
respectively executing the second processing task and the third processing task to obtain a processing result of the second processing task and a processing result of the third processing task;
And combining the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task.
Optionally, before determining, from the objects, the object whose data amount of the service data is greater than the preset threshold value, as the target object, based on the data amount of the service data of each object, the method further includes:
obtaining a parameter value of a key field from a structured query language SQL execution plan of the first processing task to obtain respective object identifiers of the objects;
and for each object, acquiring the parameter value of the Number Rows field corresponding to the object identification of the object and the parameter value of the Data size field from the SQL execution plan of the first processing task to obtain the Data size of the business Data of the object.
Optionally, before determining, from the objects, the object whose data amount of the service data is greater than the preset threshold value, as the target object, based on the data amount of the service data of each object, the method further includes:
sequencing the business data of each object according to the sequence of the small data size of the business data of each object to obtain a target sequencing result;
And determining the data quantity of the business data at the appointed position of the target sequencing result as a preset threshold value.
Optionally, after the acquiring the processing task for the first service data as the first processing task, the method further includes:
acquiring a memory configured for an Executor process for executing the first processing task and a memory configured for a Driver process for executing the first processing task from the configuration information of the first processing task;
calculating the sum of the memory configured for the execu tor process for executing the first processing task and the memory configured for the Driver process for executing the first processing task to obtain a memory used when executing the first processing task;
if the data size of the first service data is larger than the memory used when the first processing task is executed, determining that memory overflow occurs when the first processing task is executed;
the method further comprises the steps of:
if no target object with the data volume of the service data larger than a preset threshold exists in the objects, dividing the first processing task to obtain a plurality of sub-processing tasks of the first processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when the first processing task is executed;
And respectively executing all the sub-processing tasks of the first processing task to obtain the processing results of all the sub-processing tasks of the first processing task, and combining the processing results of all the sub-processing tasks of the first processing task to obtain the processing results of the first processing task.
Optionally, before the second processing task and the third processing task are executed respectively to obtain a processing result of the second processing task and a processing result of the third processing task, the method further includes:
acquiring a memory configured for an Executor process for executing the second processing task from the configuration information of the second processing task, and a memory configured for a Driver process for executing the second processing task, and acquiring a memory configured for an Executor process for executing the third processing task and a memory configured for a Driver process for executing the third processing task from the configuration information of the third processing task;
calculating the sum of the memory configured for the execu tor process for executing the second processing task and the memory configured for the Driver process for executing the second processing task to obtain a memory used when executing the second processing task, and calculating the sum of the memory configured for the execu tor process for executing the third processing task and the memory configured for the Driver process for executing the third processing task to obtain a memory used when executing the third processing task;
If the data size of the second service data is larger than the memory used when the second processing task is executed, determining that memory overflow occurs when the second processing task is executed, and dividing the second processing task to obtain a plurality of sub-processing tasks of the second processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when the second processing task is executed;
if the data size of the third service data is larger than the memory used when the third processing task is executed, determining that memory overflow occurs when the third processing task is executed, and dividing the third processing task to obtain a plurality of sub-processing tasks of the third processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when executing the third processing task;
the step of executing the second processing task and the third processing task respectively to obtain a processing result of the second processing task and a processing result of the third processing task includes:
respectively executing all sub-processing tasks of the second processing task to obtain processing results of all the sub-processing tasks of the second processing task, and combining the processing results of all the sub-processing tasks of the second processing task to obtain processing results of the second processing task;
And respectively executing all the sub-processing tasks of the third processing task to obtain the processing results of all the sub-processing tasks of the third processing task, and combining the processing results of all the sub-processing tasks of the third processing task to obtain the processing results of the third processing task.
Optionally, the acquiring, as the first processing task, a processing task for the first service data includes:
acquiring a processing task of service data aiming at a target service as a fourth processing task;
acquiring a memory configured for an Executor process for executing the fourth processing task and a memory configured for a Driver process for executing the fourth processing task from the configuration information of the fourth processing task;
calculating the sum of the memory configured for the Executor process for executing the fourth processing task and the memory configured for the Driver process for executing the fourth processing task to obtain a memory used when executing the fourth processing task;
if the data volume of the service data of the target service is larger than the memory used when the fourth processing task is executed, determining that memory overflow occurs when the fourth processing task is executed;
Dividing the fourth processing task to obtain a plurality of first processing tasks; the data size of the first service data processed by each first processing task is not larger than the memory used when executing the fourth processing task.
Optionally, after the combining processing is performed on the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task, the method further includes:
and combining the processing results of the first processing tasks to obtain the processing result of a fourth processing task aiming at the business data of the target business.
In a second aspect of the present invention, there is also provided a task processing device, the device including:
the task acquisition module is used for acquiring a processing task aiming at the first service data as a first processing task; wherein the first service data comprises service data of a plurality of objects;
the object determining module is used for determining an object with the data volume of the service data larger than a preset threshold value from the objects based on the data volume of the service data of the objects as a target object; wherein the preset threshold is determined based on the data amount of the business data of each object;
The data determining module is used for extracting the service data of the target object from the first service data to serve as second service data, and determining other service data except the second service data in the first service data to serve as third service data;
the task generating module is used for generating a processing task aiming at the second service data as a second processing task and generating a processing task aiming at the third service data as a third processing task;
the first task execution module is used for respectively executing the second processing task and the third processing task to obtain a processing result of the second processing task and a processing result of the third processing task;
and the first processing result acquisition module is used for carrying out combination processing on the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task.
Optionally, the apparatus further includes:
the object identification acquisition module is used for acquiring parameter values of key fields from a Structured Query Language (SQL) execution plan of the first processing task before the object determination module executes the object with the data volume of the business data larger than the preset threshold value determined from the objects as the target object, so as to obtain respective object identifications of the objects;
The Data volume obtaining module is configured to obtain, for each object, a parameter value of a Number row Rows field corresponding to an object identifier of the object and a parameter value of a Data volume Data size field from an SQL execution plan of the first processing task, so as to obtain a Data volume of service Data of the object.
Optionally, the apparatus further includes:
the sorting module is used for executing sorting of the business data of each object according to the order of the small data volume of the business data of each object before the object determining module executes the object of which the data volume of the business data is larger than a preset threshold value from the objects and serves as a target object, so as to obtain a target sorting result;
and the preset threshold determining module is used for determining the data quantity of the business data at the designated position of the target sorting result as a preset threshold.
Optionally, the apparatus further includes:
the first memory acquisition module is used for acquiring a processing task aiming at first service data from configuration information of the first processing task after the task acquisition module executes the processing task serving as the first processing task, and acquiring a memory configured for an Executor process for executing the first processing task and a memory configured for a Driver process for executing the first processing task;
The first memory determining module is used for calculating the sum of the memory configured for the Executor process for executing the first processing task and the memory configured for the Driver process for executing the first processing task to obtain the memory used when executing the first processing task;
the first memory overflow determining module is used for determining that memory overflow occurs when the first processing task is executed if the data size of the first service data is larger than the memory used when the first processing task is executed;
the apparatus further comprises:
the task dividing module is used for dividing the first processing task to obtain a plurality of sub-processing tasks of the first processing task if no target object with the data volume of the service data larger than a preset threshold exists in the objects; the data volume of the business data processed by each sub-processing task is not more than the memory used when the first processing task is executed;
and the second task execution module is used for respectively executing all the sub-processing tasks of the first processing task to obtain the processing results of all the sub-processing tasks of the first processing task, and combining the processing results of all the sub-processing tasks of the first processing task to obtain the processing results of the first processing task.
Optionally, the apparatus further includes:
the second memory obtaining module is configured to obtain, from configuration information of the second processing task, a memory configured for an Executor process for executing the second processing task, a memory configured for a Driver process for executing the second processing task, and obtain, from configuration information of the third processing task, a memory configured for an Executor process for executing the third processing task, and a memory configured for a Driver process for executing the third processing task, before the first task executing module executes the second processing task and the third processing task, respectively, to obtain a processing result of the second processing task and a processing result of the third processing task;
the second memory determining module is used for calculating the sum of the memory configured for the Executor process for executing the second processing task and the memory configured for the Driver process for executing the second processing task to obtain the memory used for executing the second processing task, and calculating the sum of the memory configured for the Executor process for executing the third processing task and the memory configured for the Driver process for executing the third processing task to obtain the memory used for executing the third processing task;
The second memory overflow determining module is used for determining that memory overflow occurs when the second processing task is executed if the data size of the second service data is larger than the memory used when the second processing task is executed, and dividing the second processing task to obtain a plurality of sub-processing tasks of the second processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when the second processing task is executed;
a third memory overflow determining module, configured to determine that memory overflow occurs when the third processing task is executed if the data size of the third service data is greater than the memory used when the third processing task is executed, and divide the third processing task to obtain a plurality of sub-processing tasks of the third processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when executing the third processing task;
the first task execution module is specifically configured to execute each sub-processing task of the second processing task, obtain a processing result of each sub-processing task of the second processing task, and combine and process the processing results of each sub-processing task of the second processing task to obtain a processing result of the second processing task;
And respectively executing all the sub-processing tasks of the third processing task to obtain the processing results of all the sub-processing tasks of the third processing task, and combining the processing results of all the sub-processing tasks of the third processing task to obtain the processing results of the third processing task.
Optionally, the task obtaining module is specifically configured to obtain a processing task of service data for the target service as a fourth processing task;
acquiring a memory configured for an Executor process for executing the fourth processing task and a memory configured for a Driver process for executing the fourth processing task from the configuration information of the fourth processing task;
calculating the sum of the memory configured for the Executor process for executing the fourth processing task and the memory configured for the Driver process for executing the fourth processing task to obtain a memory used when executing the fourth processing task;
if the data volume of the service data of the target service is larger than the memory used when the fourth processing task is executed, determining that memory overflow occurs when the fourth processing task is executed;
dividing the fourth processing task to obtain a plurality of first processing tasks; the data size of the first service data processed by each first processing task is not larger than the memory used when executing the fourth processing task.
Optionally, the apparatus further includes:
and the second processing result acquisition module is used for executing the combination processing of the processing results of the second processing task and the processing results of the third processing task after the first processing result acquisition module is used for executing the combination processing of the processing results of the first processing task to obtain the processing results of the fourth processing task aiming at the service data of the target service.
In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the steps of the task processing method when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements any of the task processing methods described above.
In yet another aspect of the invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the task processing methods described above.
The embodiment of the invention provides a task processing method, a device, electronic equipment and a storage medium, wherein a processing task aiming at first service data is obtained and used as a first processing task; the first service data comprises service data of a plurality of objects; determining an object with the data volume of the service data larger than a preset threshold value from the objects based on the data volume of the service data of the objects as a target object; the preset threshold is determined based on the data amount of the business data of each object; extracting service data of the target object from the first service data to serve as second service data, and determining other service data except the second service data in the first service data to serve as third service data; generating a processing task for the second service data as a second processing task, and generating a processing task for the third service data as a third processing task; respectively executing the second processing task and the third processing task to obtain a processing result of the second processing task and a processing result of the third processing task; and combining the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task.
Based on the above processing, before the first processing task is executed, a target object, that is, an object having a data skew, whose data amount is greater than a preset threshold value may be determined from the objects based on the data amount of the service data of each object corresponding to the first processing task. Furthermore, the second business data of the target object is extracted from the first business data, a second processing task aiming at the second business data is generated, a third business data except the second business data is determined, and a third processing task aiming at the third business data is generated, namely, the object with larger data volume of the business data can be independently processed, the situation that the data is inclined due to the fact that the data volume of the business data of a single object is overlarge and far exceeds the data volume of the business data of other objects can be avoided, the situation that the execution of the first processing task fails due to the data inclination can be avoided, and the probability of the execution failure of the first processing task is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flowchart of a first task processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a second task processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a third task processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a fourth task processing method according to an embodiment of the present invention;
FIG. 5 is a flowchart of a fifth task processing method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a task processing device according to an embodiment of the present invention;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
In the related art, the service data includes service data generated by a plurality of objects using the service, and in the process of executing the Spark task, since the data amount of the service data generated by each object is different, if the data amount of the service data of a single object is too large and far exceeds the data amount of the service data of other objects, data tilting may occur, and the Spark task execution may fail.
In order to solve the above problems, an embodiment of the present invention provides a task processing method, where the method is applied to an electronic device, and the electronic device may be a server. The method comprises the steps that the electronic equipment obtains a first processing task aiming at first service data of service data containing a plurality of objects; and determining a target object of which the data amount of the service data is greater than a preset threshold value from the objects based on the data amount of the service data of the objects. Further, the electronic device extracts second service data of the target object from the first service data, determines third service data other than the second service data in the first service data, generates a second processing task for the second service data, and generates a third processing task for the third service data. Further, the electronic equipment executes the second processing task and the third processing task respectively to obtain a processing result of the second processing task and a processing result of the third processing task; and combining the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task.
Based on the above processing, before the first processing task is executed, a target object whose data amount of the service data is greater than a preset threshold value, which is determined based on the data amount of the service data of each object, may be determined from each object based on the data amount of the service data of each object, which indicates that the data amount of the service data of the target object is greater than the data amount of the service data of other objects, that is, the object whose data is inclined. Furthermore, the second business data of the target object is extracted from the first business data, a second processing task aiming at the second business data is generated, a third business data except the second business data is determined, and a third processing task aiming at the third business data is generated, namely, the object with larger data volume of the business data can be independently processed, the situation that the data is inclined due to the fact that the data volume of the business data of a single object is overlarge and far exceeds the data volume of the business data of other objects can be avoided, the situation that the execution of the first processing task fails due to the data inclination can be avoided, and the probability of the execution failure of the first processing task is reduced.
Referring to fig. 1, fig. 1 is a flowchart of a task processing method according to an embodiment of the present invention, where the method may include the following steps:
s101: and acquiring a processing task aiming at the first service data as a first processing task.
Wherein the first service data comprises service data of a plurality of objects;
s102: and determining an object with the data volume of the service data larger than a preset threshold value from the objects as a target object based on the data volume of the service data of the objects.
The preset threshold is determined based on the data quantity of the business data of each object.
S103: extracting service data of the target object from the first service data as second service data, and determining other service data except the second service data in the first service data as third service data.
S104: a processing task for the second traffic data is generated as a second processing task and a processing task for the third traffic data is generated as a third processing task.
S105: and respectively executing the second processing task and the third processing task to obtain a processing result of the second processing task and a processing result of the third processing task.
S106: and combining the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task.
According to the task processing method provided by the embodiment of the invention, before the first processing task is executed, the target object with the data volume of the service data larger than the preset threshold value is determined from the objects based on the data volume of the service data of the objects corresponding to the first processing task, and the preset threshold value is determined based on the data volume of the service data of the objects, so that the data volume of the service data of the target object is larger than the data volume of the service data of other objects, and the target object is the object with the data inclination. Furthermore, the second business data of the target object is extracted from the first business data, a second processing task aiming at the second business data is generated, a third business data except the second business data is determined, and a third processing task aiming at the third business data is generated, namely, the object with larger data volume of the business data can be independently processed, the situation that the data is inclined due to the fact that the data volume of the business data of a single object is overlarge and far exceeds the data volume of the business data of other objects can be avoided, the situation that the execution of the first processing task fails due to the data inclination can be avoided, and the probability of the execution failure of the first processing task is reduced.
For step S101, the first service data is: data generated in the operation process of the target service. For example, the target service is a video service, when the user uploads a video to the electronic device while using the video service, the first service data may include: running log of target service, service information of target service, etc. The service information includes: user information of a user uploading video, recording of a user uploading video, a video uploaded by a user, and the like.
In order to perform upgrading optimization on the first service, data analysis may be performed on first service data generated by the first service, for example, behavior analysis may be performed on a record of a user uploading video based on a record of the user uploading video, and subsequently, a technician may perform upgrading optimization on the service based on a result of the behavior analysis, and so on.
The first processing task may be a Spark task for analyzing the first traffic data. The first service data includes service data of a plurality of objects, the plurality of objects being objects related to the target service. For example, the first processing task is: for each user, analyzing the behavior of the user for uploading the video, wherein the plurality of objects are a plurality of users; alternatively, the first processing task is: for each video, analyzing the uploading behavior of each user for the video, wherein the plurality of objects are a plurality of videos; alternatively, the first processing task is: and analyzing the video uploading behaviors of the users of each user grade aiming at each user grade, and enabling the objects to be a plurality of user grades.
For step S102, the service data of one object includes: the first service data is service data related to the object, and the data volume of the service data of one object represents the size of the service data of the object.
For example, the target service is a video service, and when the object is a user, service data of one user may include: user information of the user, a record of video uploaded by the user, and the like. When the object is a video, the service data of one video may include: user information of a user uploading the video, and the like. When the object is a user class, the service data of one user class may include: user information of each user of the user class, recording of video uploaded by each user of the user class, and the like.
In some embodiments, based on fig. 1, referring to fig. 2, before step S102, the method may further include the steps of:
s107: and acquiring parameter values of key fields from the SQL execution plan of the first processing task to obtain respective object identifications of the objects.
S108: and for each object, acquiring the parameter value of the Number Rows field corresponding to the object identification of the object and the parameter value of the Data size field from the SQL execution plan of the first processing task to obtain the Data volume of the business Data of the object.
SQL (Structured Query Language ) execution of the first processing task is programmed to: the execution rules of the SQL statements of the first processing task, for example, the execution order of the SQL statements, which SQL statements use an index manner in the course of executing the SQL statements, which SQL statements are scanned in full table, etc. The SQL statement of the first processing task is: a programming language for processing the first traffic data.
The electronic device can acquire the SQL sentence of the first processing task, and process the SQL sentence of the first processing task through an Explain (interpretation) function to obtain an SQL execution plan of the first processing task. Further, from the SQL execution plan of the first processing task, the parameter values of the key field are obtained, where the parameter values of the key field are object identifiers of a plurality of objects that the first processing task needs to process.
For example, the first processing task is: for each user, analyzing the behavior of the user for uploading the video, wherein the plurality of objects are a plurality of users, and the identification of the users can be the number, the name, the mobile phone number and the like of the users; alternatively, the first processing task is: for each video, the uploading behavior of each user for the video is analyzed, the plurality of objects are a plurality of videos, and the identification of the videos can be the number, the name and the like of the videos.
Further, for each object, the electronic device obtains, from an Output portion of the SQL execution plan of the first processing task, a parameter value of a Number Rows field corresponding to an object identifier of the object, where the parameter value of the Number Rows represents: the number of rows of the data table, the data table is used for storing the business data of the object. The electronic equipment acquires a parameter value of a Data Size (Data Size) field corresponding to the object identification of the object from an SQL execution plan of the first processing task, wherein the parameter value of the Data Size represents: column number of each row in the data table. Further, the Data amount of the service Data of the object is obtained based on the parameter value of the Number Rows field and the parameter value of the Data Size field.
The preset threshold is determined based on the data amount of the service data of each object, for example, the preset threshold is an average value of the data amounts of the service data of each object.
In some embodiments, prior to step S102, the method may further comprise the steps of:
and sequencing the business data of each object according to the sequence of the small data volume of the business data of each object to obtain a target sequencing result. And determining the data quantity of the business data at the appointed position of the target sequencing result as a preset threshold value.
The specified position may be empirically set by a skilled person, for example, a 3/4 position, a 1/2 position, etc., but is not limited thereto. When the designated position is a 3/4 position, that is, in order of decreasing data amount of the service data of each object, the data amount of the service data at the 3/4-minute number is determined as a preset threshold. When the designated position is a 1/2 position, that is, in order of the smaller data amount of the service data of each object, the data amount of the service data at the median is determined as a preset threshold.
Furthermore, the electronic device may determine, from among the objects, an object whose data amount of the service data is greater than a preset threshold, and obtain a target object, that is, an object whose data is tilted.
For step S103 and step S104, in order to avoid a situation in which the execution of the first processing task fails due to the data amount of the service data of the single object being too large, which is far exceeding the data amount of the service data of the other objects, after determining the target object in which the data tilt occurs, the electronic device may extract the service data of the target object from the first service data as the second service data, and determine other service data than the second service data in the first service data as the third service data. The third service data is the service data of other objects than the target object. Furthermore, the electronic device may generate a processing task for the second service data, to obtain a second processing task, and generate a processing task for the third service data, to obtain a third processing task.
The target service is a video service, and the first service data includes a record of video uploaded by each user. The first processing task is: for each user, the behavior of the user to upload video is analyzed. If the plurality of objects includes 3 users, wherein the record of the video uploaded by the user 1 is 9, the record of the video uploaded by the user 2 is 15, and the record of the video uploaded by the user 3 is 500, the data volume of the service data of the user 3 is far greater than the data volumes of the service data of other users, and the user 3 is the target object.
Furthermore, the electronic device extracts the record of the video uploaded by the user 3 from the first service data to obtain second service data, generates a second processing task for the record of the video uploaded by the user 3, determines the record of the video uploaded by the user 1 and the record of the video uploaded by the user 2 in the first service data as third service data, and generates a second processing task for the record of the video uploaded by the user 1 and the record of the video uploaded by the user 2.
For step S105 and step S106, after obtaining the second processing task and the third processing task, the electronic device may execute the second processing task and the third processing task respectively in a parallel processing manner, to obtain a processing result of the second processing task and a processing result of the third processing task. Furthermore, the electronic device may combine the processing result of the second processing task with the processing result of the third processing task to obtain the processing result of the first processing task.
For the above embodiment, the electronic device executes the second processing task for the recording of the video uploaded by the user 3, and obtains the behavior analysis result of the video uploaded by the user 3, which is denoted as R1. The electronic equipment executes a second processing task aiming at the record of the video uploaded by the user 1 and the record of the video uploaded by the user 2, and obtains the behavior analysis results of the video uploaded by the user 1 and the video uploaded by the user 2, and the behavior analysis results are recorded as R2. And further, merging the behavior analysis result of the video uploaded by the user 3 with the behavior analysis results of the video uploaded by the user 1 and the user 2, namely merging the R1 and the R2 to obtain the behavior analysis result of the video uploaded by each user corresponding to the target service.
In some embodiments, in the process Of executing the first processing task, an OOM (Out Memory) condition may also occur, which may also cause the execution failure Of the first processing task, so the electronic device may further determine whether the Memory overflow occurs when executing the first processing task, and perform response processing according to the result.
Accordingly, on the basis of fig. 1, referring to fig. 3, after step S101, the method may further include the steps of:
s109: and acquiring a memory configured for an Executor process for executing the first processing task and a memory configured for a Driver process for executing the first processing task from the configuration information of the first processing task.
S110: and calculating the sum value of the memory configured for the exechamor process for executing the first processing task and the memory configured for the Driver process for executing the first processing task to obtain the memory used when executing the first processing task.
S111: if the data volume of the first service data is larger than the memory used when the first processing task is executed, the memory overflow is determined to occur when the first processing task is executed.
Accordingly, the method may further comprise the steps of:
s112: and if no target object with the data volume of the service data larger than the preset threshold exists in each object, dividing the first processing task to obtain a plurality of sub-processing tasks of the first processing task.
The data size of the service data processed by each sub-processing task is not larger than the memory used when the first processing task is executed.
S113: and respectively executing all the sub-processing tasks of the first processing task to obtain the processing results of all the sub-processing tasks of the first processing task, and combining the processing results of all the sub-processing tasks of the first processing task to obtain the processing results of the first processing task.
The configuration information of the first processing task is configured by a technician when the first processing task is developed, and the configuration information of the first processing task includes operation parameters of the first processing task, for example, a memory configured for an Executor process for executing the first processing task and a memory configured for a Driver process for executing the first processing task.
Therefore, the electronic device may obtain the configuration information of the first processing task, and obtain, from the configuration information of the first processing task, a memory configured for an Executor process for executing the first processing task, and a memory configured for a Driver process for executing the first processing task.
The Spark tasks are performed by a Spark cluster that includes a Master node and a Worker node. The Driver process runs in a Master node or a Worker node, and is used for applying an Executor process for executing Spark tasks to an administrator (for example, a yarn (resource manager)) of the Spark task cluster, and distributing the Spark tasks to the Executor process. The execu process runs on a worker node, and the execu process includes a thread pool, each thread in the thread pool being used to execute Spark tasks.
Further, the electronic device may calculate a sum of the memory configured for the Executor process for executing the first processing task and the memory configured for the Driver process for executing the first processing task, to obtain the memory used when executing the first processing task.
Then, the electronic device may compare the data amount of the first service data with the memory used when the first processing task is executed, and if the data amount of the first service data is not greater than the memory used when the first processing task is executed, it is determined that no memory overflow occurs when the first processing task is executed, and the electronic device may not perform processing.
If the data size of the first service data is greater than the memory used in performing the first processing task, the electronic device may determine that a memory overflow may occur in performing the first processing task. The electronic device may then determine whether a target object for which data tilting has occurred has been determined.
The electronic device executes a process of judging whether memory overflow occurs when executing the first processing task and a process of determining a target object with data inclination according to the parallel processing mode. Thus, if the electronic device determines that a memory overflow may occur while performing the first processing task, and has determined that the target object is data-skewed. The electronic device may generate a second processing task for the second business data of the target object and a third processing task for the third business data of other objects than the target object, that is, perform steps S103 to S106 in the foregoing embodiments.
Because the data volume of the second service data processed by the second processing task and the data volume of the third service data processed by the third processing task are smaller than the first service data, the situation that the memory overflows in the process of executing the first processing task can be avoided to a certain extent, and further the execution failure of the first processing task is avoided.
If the electronic device determines that memory overflow occurs when the first processing task is executed, and no target object with the data volume of the service data being greater than the preset threshold exists in each object, that is, no target object with data inclination exists, the electronic device can perform slicing processing on the first service data to obtain a plurality of data slices, and the data volume of each data slice is not greater than the memory used for executing the first processing task.
The electronic device may then generate a processing task for each data slice, resulting in a plurality of sub-processing tasks of the first processing task. Furthermore, the electronic device may execute each sub-processing task of the first processing task in a parallel processing manner, obtain a processing result of each sub-processing task of the first processing task, and combine the processing results of each sub-processing task of the first processing task, to obtain a processing result of the first processing task.
Based on the above processing, when it is determined that the data size of the first service data is larger than the memory used when the first processing task is executed, it is determined that memory overflow occurs when the first processing task is executed, and the first processing task is split into a plurality of sub-processing tasks, so that the situation that memory overflow occurs in the process of executing the first processing task can be avoided to a certain extent, and further execution failure of the first processing task is avoided.
In addition, the electronic device executes each sub-processing task of the first processing task in a parallel processing mode and executes each sub-processing task of the third processing task in a parallel processing mode, so that the execution time consumption of the first processing task can be shortened, and the task processing efficiency can be improved.
In some embodiments, after the second processing task and the third processing task are generated, in order to avoid a situation that the second processing task and the third processing task fail to execute, and further cause the first processing task to fail to execute, the electronic device may further determine whether memory overflow occurs during the process of executing the second processing task and during the process of executing the third processing task, respectively.
Accordingly, on the basis of fig. 1, referring to fig. 4, before step S105, the method may further include the steps of:
s114: the memory configured for the execu tor process for executing the second processing task and the memory configured for the Driver process for executing the second processing task are obtained from the configuration information of the second processing task, and the memory configured for the execu tor process for executing the third processing task and the memory configured for the Driver process for executing the third processing task are obtained from the configuration information of the third processing task.
S115: and calculating the sum of the memory configured for the execu tor process for executing the second processing task and the memory configured for the Driver process for executing the second processing task to obtain the memory used when executing the second processing task, and calculating the sum of the memory configured for the execu tor process for executing the third processing task and the memory configured for the Driver process for executing the third processing task to obtain the memory used when executing the third processing task.
S116: if the data volume of the second service data is larger than the memory used when the second processing task is executed, determining that memory overflow occurs when the second processing task is executed, and dividing the second processing task to obtain a plurality of sub-processing tasks of the second processing task.
Wherein the data size of the service data processed by each sub-processing task is not greater than the memory used when executing the second processing task.
S117: if the data volume of the third service data is larger than the memory used when the third processing task is executed, determining that memory overflow occurs when the third processing task is executed, and dividing the third processing task to obtain a plurality of sub-processing tasks of the third processing task.
Wherein the data size of the service data processed by each sub-processing task is not greater than the memory used when executing the third processing task.
Accordingly, step S105 may include the steps of:
s1051: and respectively executing all the sub-processing tasks of the second processing task to obtain the processing results of all the sub-processing tasks of the second processing task, and combining the processing results of all the sub-processing tasks of the second processing task to obtain the processing results of the second processing task.
S1052: and respectively executing all the sub-processing tasks of the third processing task to obtain the processing results of all the sub-processing tasks of the third processing task, and combining the processing results of all the sub-processing tasks of the third processing task to obtain the processing results of the third processing task.
The manner in which the electronic device determines whether a memory overflow occurs during the second processing task and the manner in which a memory overflow occurs during the third processing task are similar to the manner in which the electronic device determines whether a memory overflow occurs during the first processing task may be referred to in the description of the foregoing embodiments.
And when the fact that the memory overflows in the second processing task process is determined, the electronic equipment does not need to process. When it is determined that memory overflow occurs in the second processing task, the electronic device may perform slicing processing on the second service data to obtain a plurality of data slices, where a data volume of each data slice is not greater than a memory used for executing the second processing task.
The electronic device may generate a processing task for each data slice, resulting in a plurality of sub-processing tasks of the second processing task. Furthermore, the electronic device may execute each sub-processing task of the second processing task in a parallel processing manner, obtain a processing result of each sub-processing task of the second processing task, and combine the processing results of each sub-processing task of the second processing task, to obtain a processing result of the second processing task.
And when the fact that the memory overflows in the process of the third processing task is determined, the electronic equipment does not need to process. When it is determined that memory overflow occurs in the third processing task, the electronic device may perform slicing processing on the third service data to obtain a plurality of data slices, where a data volume of each data slice is not greater than a memory used for executing the third processing task.
The electronic device generates processing tasks aiming at each data fragment, and a plurality of sub-processing tasks of the third processing task are obtained. Further, the electronic device executes each sub-processing task of the third processing task according to the parallel processing mode, obtains a processing result of each sub-processing task of the third processing task, and performs merging processing on the processing results of each sub-processing task of the third processing task, so as to obtain a processing result of the third processing task.
Based on the above processing, it can be determined whether the second processing task overflows the memory during the execution process and whether the third processing task overflows the memory during the execution process, when it is determined that the memory overflows the second processing task during the execution process, the second processing task is split into a plurality of sub-processing tasks, so that the situation that the memory overflows during the execution process of the second processing task can be avoided to a certain extent, further the execution failure of the second processing task is avoided, and when it is determined that the memory overflows during the execution process of the third processing task, the third processing task is split into a plurality of sub-processing tasks, so that the situation that the memory overflows during the execution process of the third processing task can be avoided to a certain extent, and further the execution failure of the third processing task is avoided. Further, failure of execution of the first processing task can be avoided.
In addition, the electronic device executes each sub-processing task of the second processing task in a parallel processing mode and executes each sub-processing task of the third processing task in a parallel processing mode, so that the execution time consumption of the first processing task can be shortened, and the task processing efficiency can be improved.
In some embodiments, the first service data may be part of data generated during the operation of the target service, and correspondingly, based on fig. 1, referring to fig. 5, step S101 may include the following steps:
S1011: and acquiring a processing task of service data aiming at the target service as a fourth processing task.
S1012: and acquiring a memory configured for an exechamor process for executing the fourth processing task and a memory configured for a Driver process for executing the fourth processing task from the configuration information of the fourth processing task.
S1013: and calculating the sum value of the memory configured for the execu tor process for executing the fourth processing task and the memory configured for the Driver process for executing the fourth processing task to obtain the memory used when executing the fourth processing task.
S1014: if the data volume of the service data of the target service is larger than the memory used in executing the fourth processing task, determining that memory overflow occurs in executing the fourth processing task.
S1015: dividing the fourth processing task to obtain a plurality of first processing tasks.
The data size of the first service data processed by each first processing task is not larger than the memory used when executing the fourth processing task.
The electronic device acquires a processing task (i.e., a fourth processing task) for the service data of the target service, that is, a Spark task for analyzing the service data of the target service. The electronic device may determine whether a memory overflow occurs during the execution of the fourth processing task, and may refer to the description of the foregoing embodiment in a similar manner to the determination of whether a memory overflow occurs during the execution of the first processing task.
Further, if the electronic device determines that memory overflow does not occur during execution of the fourth processing task, the electronic device may not be processing. If the electronic device determines that memory overflow occurs during the execution of the fourth processing task, the electronic device may perform slicing processing on the service data of the target service to obtain a plurality of data slices, where each data slice is the first service data in the foregoing embodiment, and the data volume of each data slice is not greater than the memory used for executing the fourth processing task.
Further, the electronic device generates a processing task for each data slice, and the first processing task in the foregoing embodiment can be obtained. The electronic device may process the first processing task according to the method provided in the foregoing embodiment, to obtain a processing result of the first processing task.
In some embodiments, after step S106, the method may further comprise the steps of:
and combining the processing results of the first processing tasks to obtain the processing result of a fourth processing task aiming at the business data of the target business.
Corresponding to the method embodiment of fig. 1, referring to fig. 6, fig. 6 is a block diagram of a task processing device according to an embodiment of the present invention, where the device includes:
A task obtaining module 601, configured to obtain a processing task for the first service data as a first processing task; wherein the first service data comprises service data of a plurality of objects;
an object determining module 602, configured to determine, from among the objects, an object whose data amount of the service data is greater than a preset threshold, as a target object, based on the data amount of the service data of the objects; wherein the preset threshold is determined based on the data amount of the business data of each object;
a data determining module 603, configured to extract service data of the target object from the first service data, as second service data, and determine other service data in the first service data except the second service data, as third service data;
a task generating module 604, configured to generate a processing task for the second service data as a second processing task, and generate a processing task for the third service data as a third processing task;
a first task execution module 605, configured to execute the second processing task and the third processing task respectively, to obtain a processing result of the second processing task and a processing result of the third processing task;
And a first processing result obtaining module 606, configured to combine the processing result of the second processing task with the processing result of the third processing task to obtain the processing result of the first processing task.
Optionally, the apparatus further includes:
an object identifier obtaining module, configured to obtain, when the object determining module 602 executes a data amount of service data based on each object, determine, from each object, an object whose data amount of service data is greater than a preset threshold, and before the object is a target object, obtain a parameter value of a key field from a structured query language SQL execution plan of the first processing task, so as to obtain respective object identifiers of each object;
the Data volume obtaining module is configured to obtain, for each object, a parameter value of a Number row Rows field corresponding to an object identifier of the object and a parameter value of a Data volume Data size field from an SQL execution plan of the first processing task, so as to obtain a Data volume of service Data of the object.
Optionally, the apparatus further includes:
a sorting module, configured to perform sorting of the service data of each object according to the order of the data amount of the service data of each object from the object determining module 602 to obtain a target sorting result, where the sorting module is configured to perform sorting of the service data of each object based on the data amount of the service data of each object, and determine, from the objects, that the data amount of the service data is greater than a preset threshold value, before the objects are used as target objects;
And the preset threshold determining module is used for determining the data quantity of the business data at the designated position of the target sorting result as a preset threshold.
Optionally, the apparatus further includes:
a first memory obtaining module, configured to obtain, after the task obtaining module 601 performs obtaining a processing task for first service data as a first processing task, a memory configured for an Executor process for executing the first processing task and a memory configured for a Driver process for executing the first processing task from configuration information of the first processing task;
the first memory determining module is used for calculating the sum of the memory configured for the Executor process for executing the first processing task and the memory configured for the Driver process for executing the first processing task to obtain the memory used when executing the first processing task;
the first memory overflow determining module is used for determining that memory overflow occurs when the first processing task is executed if the data size of the first service data is larger than the memory used when the first processing task is executed;
the apparatus further comprises:
The task dividing module is used for dividing the first processing task to obtain a plurality of sub-processing tasks of the first processing task if no target object with the data volume of the service data larger than a preset threshold exists in the objects; the data volume of the business data processed by each sub-processing task is not more than the memory used when the first processing task is executed;
and the second task execution module is used for respectively executing all the sub-processing tasks of the first processing task to obtain the processing results of all the sub-processing tasks of the first processing task, and combining the processing results of all the sub-processing tasks of the first processing task to obtain the processing results of the first processing task.
Optionally, the apparatus further includes:
a second memory obtaining module, configured to obtain, from configuration information of the second processing task, a memory configured for an Executor process for executing the second processing task, and a memory configured for a Driver process for executing the second processing task, and obtain, from configuration information of the third processing task, a memory configured for an Executor process for executing the third processing task, and a memory configured for a Driver process for executing the third processing task, before the first task executing module 605 executes the second processing task and the third processing task, respectively, to obtain a processing result of the second processing task and a processing result of the third processing task;
The second memory determining module is used for calculating the sum of the memory configured for the Executor process for executing the second processing task and the memory configured for the Driver process for executing the second processing task to obtain the memory used for executing the second processing task, and calculating the sum of the memory configured for the Executor process for executing the third processing task and the memory configured for the Driver process for executing the third processing task to obtain the memory used for executing the third processing task;
the second memory overflow determining module is used for determining that memory overflow occurs when the second processing task is executed if the data size of the second service data is larger than the memory used when the second processing task is executed, and dividing the second processing task to obtain a plurality of sub-processing tasks of the second processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when the second processing task is executed;
a third memory overflow determining module, configured to determine that memory overflow occurs when the third processing task is executed if the data size of the third service data is greater than the memory used when the third processing task is executed, and divide the third processing task to obtain a plurality of sub-processing tasks of the third processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when executing the third processing task;
The first task execution module 605 is specifically configured to execute each sub-processing task of the second processing task, obtain a processing result of each sub-processing task of the second processing task, and combine the processing results of each sub-processing task of the second processing task to obtain a processing result of the second processing task;
and respectively executing all the sub-processing tasks of the third processing task to obtain the processing results of all the sub-processing tasks of the third processing task, and combining the processing results of all the sub-processing tasks of the third processing task to obtain the processing results of the third processing task.
Optionally, the task obtaining module 601 is specifically configured to obtain a processing task of service data for a target service as a fourth processing task;
acquiring a memory configured for an Executor process for executing the fourth processing task and a memory configured for a Driver process for executing the fourth processing task from the configuration information of the fourth processing task;
calculating the sum of the memory configured for the Executor process for executing the fourth processing task and the memory configured for the Driver process for executing the fourth processing task to obtain a memory used when executing the fourth processing task;
If the data volume of the service data of the target service is larger than the memory used when the fourth processing task is executed, determining that memory overflow occurs when the fourth processing task is executed;
dividing the fourth processing task to obtain a plurality of first processing tasks; the data size of the first service data processed by each first processing task is not larger than the memory used when executing the fourth processing task.
Optionally, the apparatus further includes:
and the second processing result obtaining module is configured to perform, after the first processing result obtaining module 606 performs merging processing on the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task, perform merging processing on the processing result of each first processing task to obtain the processing result of a fourth processing task for the service data of the target service.
According to the task processing device provided by the embodiment of the invention, before the first processing task is executed, the target object with the data volume of the service data larger than the preset threshold value is determined from the objects based on the data volume of the service data of the objects corresponding to the first processing task, and the preset threshold value is determined based on the data volume of the service data of the objects, so that the data volume of the service data of the target object is larger than the data volume of the service data of other objects, and the target object is the object with the data inclination. Furthermore, the second business data of the target object is extracted from the first business data, a second processing task aiming at the second business data is generated, a third business data except the second business data is determined, and a third processing task aiming at the third business data is generated, namely, the object with larger data volume of the business data can be independently processed, the situation that the data is inclined due to the fact that the data volume of the business data of a single object is overlarge and far exceeds the data volume of the business data of other objects can be avoided, the situation that the execution of the first processing task fails due to the data inclination can be avoided, and the probability of the execution failure of the first processing task is reduced.
The embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 perform communication with each other through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the steps of the task processing method according to any one of the above embodiments when executing the program stored in the memory 703.
The communication bus mentioned by the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the task processing method according to any one of the above embodiments.
In a further embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the task processing method as described in any of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, computer readable storage media and computer program product embodiments, the description is relatively simple as it is substantially similar to method embodiments, as relevant points are found in the partial description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

1. A method of task processing, the method comprising:
acquiring a processing task aiming at first service data as a first processing task; wherein the first service data comprises service data of a plurality of objects;
determining an object with the data volume of the service data larger than a preset threshold value from the objects based on the data volume of the service data of the objects as a target object; wherein the preset threshold is determined based on the data amount of the business data of each object;
extracting service data of the target object from the first service data to serve as second service data, and determining other service data except the second service data in the first service data to serve as third service data;
generating a processing task for the second service data as a second processing task, and generating a processing task for the third service data as a third processing task;
Respectively executing the second processing task and the third processing task to obtain a processing result of the second processing task and a processing result of the third processing task;
and combining the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task.
2. The method according to claim 1, wherein before determining, from the respective objects, an object whose data amount of the service data is greater than a preset threshold value, as the target object, based on the data amount of the service data of the respective objects, the method further comprises:
obtaining a parameter value of a key field from a structured query language SQL execution plan of the first processing task to obtain respective object identifiers of the objects;
and for each object, acquiring the parameter value of the Number Rows field corresponding to the object identification of the object and the parameter value of the Data size field from the SQL execution plan of the first processing task to obtain the Data size of the business Data of the object.
3. The method according to claim 1, wherein before determining, from the respective objects, an object whose data amount of the service data is greater than a preset threshold value, as the target object, based on the data amount of the service data of the respective objects, the method further comprises:
Sequencing the business data of each object according to the sequence of the small data size of the business data of each object to obtain a target sequencing result;
and determining the data quantity of the business data at the appointed position of the target sequencing result as a preset threshold value.
4. The method of claim 1, wherein after the acquiring the processing task for the first traffic data as the first processing task, the method further comprises:
acquiring a memory configured for an Executor process for executing the first processing task and a memory configured for a Driver process for executing the first processing task from the configuration information of the first processing task;
calculating the sum of the memory configured for the execu tor process for executing the first processing task and the memory configured for the Driver process for executing the first processing task to obtain a memory used when executing the first processing task;
if the data size of the first service data is larger than the memory used when the first processing task is executed, determining that memory overflow occurs when the first processing task is executed;
The method further comprises the steps of:
if no target object with the data volume of the service data larger than a preset threshold exists in the objects, dividing the first processing task to obtain a plurality of sub-processing tasks of the first processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when the first processing task is executed;
and respectively executing all the sub-processing tasks of the first processing task to obtain the processing results of all the sub-processing tasks of the first processing task, and combining the processing results of all the sub-processing tasks of the first processing task to obtain the processing results of the first processing task.
5. The method of claim 1, wherein prior to said executing the second processing task and the third processing task, respectively, obtaining a processing result of the second processing task and a processing result of the third processing task, the method further comprises:
acquiring a memory configured for an Executor process for executing the second processing task from the configuration information of the second processing task, and a memory configured for a Driver process for executing the second processing task, and acquiring a memory configured for an Executor process for executing the third processing task and a memory configured for a Driver process for executing the third processing task from the configuration information of the third processing task;
Calculating the sum of the memory configured for the execu tor process for executing the second processing task and the memory configured for the Driver process for executing the second processing task to obtain a memory used when executing the second processing task, and calculating the sum of the memory configured for the execu tor process for executing the third processing task and the memory configured for the Driver process for executing the third processing task to obtain a memory used when executing the third processing task;
if the data size of the second service data is larger than the memory used when the second processing task is executed, determining that memory overflow occurs when the second processing task is executed, and dividing the second processing task to obtain a plurality of sub-processing tasks of the second processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when the second processing task is executed;
if the data size of the third service data is larger than the memory used when the third processing task is executed, determining that memory overflow occurs when the third processing task is executed, and dividing the third processing task to obtain a plurality of sub-processing tasks of the third processing task; the data volume of the business data processed by each sub-processing task is not more than the memory used when executing the third processing task;
The step of executing the second processing task and the third processing task respectively to obtain a processing result of the second processing task and a processing result of the third processing task includes:
respectively executing all sub-processing tasks of the second processing task to obtain processing results of all the sub-processing tasks of the second processing task, and combining the processing results of all the sub-processing tasks of the second processing task to obtain processing results of the second processing task;
and respectively executing all the sub-processing tasks of the third processing task to obtain the processing results of all the sub-processing tasks of the third processing task, and combining the processing results of all the sub-processing tasks of the third processing task to obtain the processing results of the third processing task.
6. The method according to claim 1, wherein the acquiring a processing task for the first service data as the first processing task comprises:
acquiring a processing task of service data aiming at a target service as a fourth processing task;
acquiring a memory configured for an Executor process for executing the fourth processing task and a memory configured for a Driver process for executing the fourth processing task from the configuration information of the fourth processing task;
Calculating the sum of the memory configured for the Executor process for executing the fourth processing task and the memory configured for the Driver process for executing the fourth processing task to obtain a memory used when executing the fourth processing task;
if the data volume of the service data of the target service is larger than the memory used when the fourth processing task is executed, determining that memory overflow occurs when the fourth processing task is executed;
dividing the fourth processing task to obtain a plurality of first processing tasks; the data size of the first service data processed by each first processing task is not larger than the memory used when executing the fourth processing task.
7. The method according to claim 6, wherein after the combining processing is performed on the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task, the method further comprises:
and combining the processing results of the first processing tasks to obtain the processing result of a fourth processing task aiming at the business data of the target business.
8. A task processing device, the device comprising:
The task acquisition module is used for acquiring a processing task aiming at the first service data as a first processing task; wherein the first service data comprises service data of a plurality of objects;
the object determining module is used for determining an object with the data volume of the service data larger than a preset threshold value from the objects based on the data volume of the service data of the objects as a target object; wherein the preset threshold is determined based on the data amount of the business data of each object;
the data determining module is used for extracting the service data of the target object from the first service data to serve as second service data, and determining other service data except the second service data in the first service data to serve as third service data;
the task generating module is used for generating a processing task aiming at the second service data as a second processing task and generating a processing task aiming at the third service data as a third processing task;
the first task execution module is used for respectively executing the second processing task and the third processing task to obtain a processing result of the second processing task and a processing result of the third processing task;
And the first processing result acquisition module is used for carrying out combination processing on the processing result of the second processing task and the processing result of the third processing task to obtain the processing result of the first processing task.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-7 when executing a program stored on a memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-7.
CN202310185342.4A 2023-03-01 2023-03-01 Task processing method and device, electronic equipment and storage medium Pending CN116303533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310185342.4A CN116303533A (en) 2023-03-01 2023-03-01 Task processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310185342.4A CN116303533A (en) 2023-03-01 2023-03-01 Task processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116303533A true CN116303533A (en) 2023-06-23

Family

ID=86835414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310185342.4A Pending CN116303533A (en) 2023-03-01 2023-03-01 Task processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116303533A (en)

Similar Documents

Publication Publication Date Title
CN108009236B (en) Big data query method, system, computer and storage medium
JP7369706B2 (en) Automatic database query load evaluation and adaptive processing
CN108255620B (en) Service logic processing method, device, service server and system
CN107688626B (en) Slow query log processing method and device and electronic equipment
US10594579B2 (en) System monitoring device
CN113760677A (en) Abnormal link analysis method, device, equipment and storage medium
CN110851339A (en) Method and device for reporting buried point data, storage medium and terminal equipment
US11474809B2 (en) Upgrades based on analytics from multiple sources
CN110602207A (en) Method, device, server and storage medium for predicting push information based on off-network
CN114116422A (en) Hard disk log analysis method, hard disk log analysis device and storage medium
CN114281648A (en) Data acquisition method and device, electronic equipment and storage medium
CN107330031B (en) Data storage method and device and electronic equipment
US20180004626A1 (en) Non-transitory computer-readable storage medium, evaluation method, and evaluation device
US10885038B2 (en) System and method for adaptive information storage management
CN116303533A (en) Task processing method and device, electronic equipment and storage medium
CN116303320A (en) Real-time task management method, device, equipment and medium based on log file
CN114595146A (en) AB test method, device, system, electronic equipment and medium
CN109635033B (en) Method for processing million-level stock data, collecting logs and importing logs into database
CN110955710A (en) Method and device for processing dirty data in data exchange operation
CN117389841B (en) Method and device for monitoring accelerator resources, cluster equipment and storage medium
CN111142898A (en) Data leakage-proof terminal upgrading method and system based on group intelligent mode
JPWO2020065778A1 (en) Information processing equipment, control methods, and programs
CN115396319B (en) Data stream slicing method, device, equipment and storage medium
CN109951739B (en) Video service processing method and device and electronic equipment
CN115665286B (en) Interface clustering method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination