CN110908986A - Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment - Google Patents

Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment Download PDF

Info

Publication number
CN110908986A
CN110908986A CN201911087206.1A CN201911087206A CN110908986A CN 110908986 A CN110908986 A CN 110908986A CN 201911087206 A CN201911087206 A CN 201911087206A CN 110908986 A CN110908986 A CN 110908986A
Authority
CN
China
Prior art keywords
task
data
computing
layering
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911087206.1A
Other languages
Chinese (zh)
Other versions
CN110908986B (en
Inventor
冯若寅
万仕龙
邹晓峰
仲跻炜
朱彭生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ouye Yunshang Co ltd
Original Assignee
Ouye Yunshang Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ouye Yunshang Co ltd filed Critical Ouye Yunshang Co ltd
Priority to CN201911087206.1A priority Critical patent/CN110908986B/en
Publication of CN110908986A publication Critical patent/CN110908986A/en
Application granted granted Critical
Publication of CN110908986B publication Critical patent/CN110908986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a layering method of a computing task, a distributed scheduling method, a distributed scheduling device, electronic equipment and a computer readable storage medium, wherein the layering method of the computing task is used for layering the computing task in a big data platform, and the computing task in the big data platform comprises a data acquisition task, a data acquisition task and a data processing module, wherein the data acquisition task acquires data from a business system to obtain a data set; a data cleaning calculation task for calculating the data to be cleaned in the data set; a detail data calculation task to calculate the detail data in the dataset; an application data computation task to compute application data in the dataset, the hierarchical method comprising the steps of: extracting a computing task in a big data platform task set; analyzing the ER association relation of the computing task; and layering based on the ER association relation. According to the layering mode of the computing tasks, the analysis and the layering arrangement of the computing task aggregate are realized.

Description

Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment
Technical Field
The invention relates to a big data platform technology, in particular to a layering method, a distributed scheduling method and device of a computing task, electronic equipment and a computer readable storage medium.
Background
As the business development range gradually expands, a company large data platform usually supports more and more data computing tasks and also gradually becomes an important support platform for data services. Taking a big data platform of an e-commerce enterprise as an example, taking 30 data acquisition tasks and 20-30 calculation tasks at the beginning of a report task of an e-commerce analysis center as starting points, and gradually covering more than 400 data acquisition tasks and more than 700 data analysis calculation tasks of services such as a consignment report, a financial report, supply chain services, risk early warning, GMV operation daily report and the like; meanwhile, the object level of the data service is also changed from the business middle station to the business decision. Therefore, by establishing a big data base platform optimization project, the targeted optimization needs are listed as project completion targets, and the corresponding application problem is solved.
Under the background of wide and comprehensive development of business, the need of updating the algorithm at any time according to business logic adjustment is indispensable, great human resources need to be invested in the high-frequency version updating management of thousands of computing tasks on a platform, and the configuration and operation accuracy rate is unknown. Before the invention, in the original graphical manual management mode, the release and the on-line of about 10 new computing tasks can be manually configured and checked at least one week, and the on-line is realized under the condition of the frequency of changing the compressed version. These functional design deficiencies greatly limit the large-scale application of large data platforms.
The existing layering of computing tasks for big data platforms is roughly divided by adopting an integral set according to naming rules of data objects: for example, all tasks at the ods layer are in a first batch set, all tasks at the odsp layer are in a second batch set, all tasks at the dw layer are in a third batch set, and all tasks at the dm layer are in a fourth batch set. A plurality of tasks in each task set are manually divided into a plurality of groups (for example, if the total data structure objects in the layer are 100, the corresponding computing tasks of 100 objects are manually grouped into 10 groups and 10 groups in total) and are placed into an old scheduler, and a next computing task set is entered after a batch of sets are sequentially completed. However, the method cannot meet the requirement of accurate data calculation, and in addition, an upstream-downstream relationship exists between the same batch of calculation task sets, so that the problem of data update frequently occurs in a batch of calculation, and the usability of the platform is greatly influenced.
The existing layering mode can still meet the function requirement under the condition that the service range of the calculation task is not enlarged. However, with the development of services, the platform includes a large number of computation tasks of service logic association, non-data isolated island, deep fusion and mining, data structures in the same layer are gradually diversified, and ER association relations are gradually complex, so that the requirements cannot be met more and more.
Disclosure of Invention
In view of the above, the present invention provides a computing task layering method, a distributed scheduling method based on the computing task layering method, an apparatus, an electronic device, and a computer-readable storage medium, which can clarify upstream and downstream layering information of a computing task, and implement an intelligent management system that is capable of performing automatic batching, analyzing and then summarizing, querying and maintaining easily, and defining logically a computing task layering scheduling configuration.
To solve the above technical problem, in one aspect, the present invention provides a method for layering computing tasks in a big data platform,
the computing tasks in the big data platform include:
a data acquisition task, which acquires data from a business system to obtain a data set;
a data cleaning calculation task for calculating the data to be cleaned in the data set;
a detail data calculation task to calculate the detail data in the dataset;
an application data calculation task to calculate application data in the dataset;
the layering method comprises the following steps:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
According to some embodiments of the invention, resolving the ER associations of the computing tasks includes resolving the ER associations of individual jobs of each computing task.
According to some embodiments of the invention, said resolving ER associations of individual jobs per task comprises:
and extracting keywords in the text information of the calculation task, and acquiring the ER association relation of the calculation task according to the keywords.
Further, the analyzing the ER association relationship of the single job of each task further includes:
performing correlation analysis on a reference field of a data table upstream of the computing task, and determining a source of a specific field associated with the computing task;
and when the data structure field of the source business system is changed, updating the corresponding field of the calculation task and the associated information of the downstream layer in time.
Further, the keywords in the text information of the computing task are identified through a text recognition technique.
Further, the keyword includes one or more of a call field, a connection field, and a naming-rule-based field in the text information.
Further, after the computing task is extracted, the method further comprises the following steps:
dividing the computing task into a plurality of batches according to a preset rule,
in the layering based on the ER association relationship, in each batch, the computing tasks are layered based on the ER association relationship.
Further, the predetermined rule includes partitioning according to a time echelon and/or partitioning according to business logic.
According to some embodiments of the invention, resolving the ER associations of the computing task further comprises:
and importing the ER association relation of the single job of each computing task into the database through a structured query language.
Further, said layering based on said ER association comprises:
sorting all data in the database, and classifying the data based on a preset rule;
calculating the hierarchical value of each calculation task based on the ER association relation of all the calculation tasks;
the computing task is layered based on the layering values.
Further, a hierarchical value for each of the computing tasks is computed based on an online analysis.
In a second aspect, an embodiment of the present invention provides a distributed scheduling method, configured to schedule a task in a big data platform, including the following steps:
according to the layering method of the computing task of any embodiment, the computing task is layered;
based on the layering results, the computing tasks are distributed to a plurality of computing nodes of the big data platform to be performed by the respective tasks by the plurality of computing nodes.
In a third aspect, an embodiment of the present invention provides a device for layering computing tasks, which is used for layering computing tasks in a big data platform,
the computing tasks in the big data platform include:
a data acquisition task, which acquires data from a business system to obtain a data set;
a data cleaning calculation task for calculating the data to be cleaned in the data set;
a detail data calculation task to calculate the detail data in the dataset;
an application data computation task to compute application data in the dataset,
the layering device includes:
the extraction module is used for extracting the computing tasks in the big data platform task set;
the analysis module is used for analyzing the ER association relation of the computing task;
and the layering module is used for layering based on the ER association relation.
In a fourth aspect, an embodiment of the present invention provides an electronic device for layering computing tasks, including:
one or more processors;
one or more memories having computer readable code stored therein, which when executed by the one or more processors, causes the processors to perform the steps of:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
In a fifth aspect, embodiments of the invention provide a non-transitory computer storage medium having computer readable code stored therein, which when executed by one or more processors, causes the processors to perform the steps of:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
The technical scheme of the invention at least has one of the following beneficial effects:
according to the layering mode of the computing tasks, the analysis and layering arrangement of the computing task aggregate are realized, and scientific data definition is provided for the task execution aggregate and the distribution module of the subsequent distributed scheduling program;
the original platform task hierarchy is roughly divided by adopting an integral set: for example, all tasks at the ods layer are in a first batch set, all tasks at the odsp layer are in a second batch set, all tasks at the dw layer are in a third batch set, and all tasks at the dm layer are in a fourth batch set. However, the relationships of the odsp layer, the dw layer, and the dm layer cannot be arranged accurately. From the requirement of accuracy of data calculation, it is necessary to execute a downstream calculation task after an upstream data table is effectively updated to ensure that real-time update of data is synchronously performed according to update of service data, so that the odsp layer, the dw layer and the dm layer are arranged into subtask layers to meet the requirement of providing an accurate calculation result after task aggregate layered scheduling. Therefore, according to the layering method, the ER incidence relations are hierarchically arranged, analysis of the ER incidence relations is combined with upstream and downstream relation calculation, the method has the characteristics of accurate judgment, efficient extraction, rapid analysis and definition and the like, and has a decisive effect on large-scale operation hierarchical arrangement;
most functional products of the data management platform do not have an ER association analysis function, a small number of products providing ER association query of a computing task only provide a simple query interface, upstream analysis can be performed on a single computing task, information cannot be displayed in a large range, the efficiency of a mouse operating query interface is extremely low, and meanwhile, a query result cannot be used for secondary development and function definition except for viewing.
Drawings
FIG. 1 is a flow diagram of a hierarchical method of computing tasks according to one embodiment of the invention;
FIG. 2 is a flow diagram of a hierarchical method of computing tasks according to another embodiment of the invention;
fig. 3 is a schematic view of a specific flow of layering based on the ER association relationship in a layering method of a computing task according to an embodiment of the present invention;
FIG. 4 is another schematic flow diagram of a process according to the method of FIG. 3;
FIG. 5 is a flow chart of a distributed scheduling method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a hierarchical apparatus of computing tasks in accordance with an embodiment of the present invention;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The layering method of the computing task is used for layering the task in the big data platform.
For example, hierarchical tasks in a big data platform may include:
0_ ods layer data acquisition task- >1_ odsp layer data cleaning calculation task- >2_ dw0, 2_ dw1 layer detail data calculation task- >3_ dm0, 3_ dm1, 3_ dm2, 3_ dm3 and 3_ dm4 layer application data calculation task- >4_ h2m layer data push task.
And each layer of tasks have clear successive calculation relationship according to arrows.
In order to layer tasks in the big data platform, as shown in fig. 1, a method for layering computing tasks according to an embodiment of the present invention includes:
and step S1, extracting the computing tasks in the big data platform task set.
That is, first, a calculation task that needs to be layered is invoked.
And step S2, analyzing the ER association relation of the calculation task.
According to some embodiments of the invention, ER associations for individual jobs of each computing task are resolved.
Further, as shown in fig. 2, resolving ER associations for individual jobs for each computing task may be performed by:
and step S21, extracting keywords in the text information of the calculation task, and acquiring the ER association relation of the calculation task according to the keywords.
The keywords may be recognized, for example, by conventional text recognition techniques, such as OCR recognition techniques and the like. In addition, the keyword may include one or more of a call Field (FROM), a connection field (JOIN), and a naming-rule based field in the text information.
Still further, resolving ER associations for individual jobs for each computing task may further include:
step S22, performing correlation analysis on the reference field of the data table at the upstream of the computing task, and determining the source of the specific field associated with the computing task;
and step S23, when the data structure field of the source service system is changed, updating the corresponding field of the calculation task and the related information of the downstream layer in time.
Therefore, through correlation analysis of the reference field of the upstream data table, the specific field source associated with the calculation task can be clear, when the field name, the data type, the meaning description and the like of the data structure field of the source service system are changed, the data acquisition ods layer and the data cleaning odsp layer can effectively manage the change and guide the change of the associated data dictionary and index information of the downstream dw layer and the dm layer and the correction of the calculation task, and the fine management of a large data platform is realized.
Still further, the method can further comprise the following steps:
step S24, importing the ER association of the single job of each computing task into the database through Structured Query Language (SQL). Thus, a query channel based on data results of system command lines or SQL is provided.
And step S3, layering based on the ER association relation. According to some embodiments of the present invention, after the ER association of the single job of each computing task is parsed, for example, the ER association of the single job is extracted by extracting keywords such as FROM, JOIN, naming rules of structure objects, and the like in the computing task definition content text information, an upstream task set of the single job may be determined based on the ER association. On the basis, after the ER association relation of each computing task is imported into the database, the information of the layer where the computing task is located can be obtained through calculation, the layer where the computing task is located is further confirmed, and the tasks are layered accordingly. After determining the layer at which all computing tasks should be, then the computing tasks in the big data platform may be layered as such.
Specifically, as shown in fig. 3, the layering based on the ER association relationship may specifically include:
step S31, sorting all data in the database, and classifying based on predetermined rules.
After the ER relation of each computing task is obtained through analysis, the ER relation is imported into a database through SQL, and then the original data is sorted and classified based on preset rules.
Step S32, calculating the hierarchical value of each calculation task based on the ER incidence relation of all calculation tasks.
After the preliminary sorting and classification, ER association relation analysis and calculation are carried out to obtain each layering value.
Specifically, the hierarchical value of each calculation task may be calculated by an analysis algorithm based on OLAP (online analytical processing) calculation.
And step S33, layering the calculation task based on the layering value.
Specifically, for example, the calculation tasks having the same hierarchical value may be output after removing the duplicate value. For example, the data is output to a configuration file of a scheduling task set of a big data platform, thereby performing layering of computing tasks.
FIG. 4 illustrates an example of layering based on the ER associations.
As shown in fig. 4, after the calculation of each hierarchical value, the calculation tasks having the same hierarchical value are hierarchically performed one by performing the filtering one by one based on a certain search filtering rule (for example, the same hierarchical value).
In summary, that is, after extracting the computing tasks in the large data platform task set, first in step S2, the ER association information of the individual jobs of each computing task is parsed, and thereafter in step S3, the association between each computing task is calculated and a hierarchical value is obtained, and the hierarchy is performed based on the hierarchical value.
According to the hierarchical algorithm of the calculation tasks, the upstream and downstream hierarchical relation of the calculation tasks is determined, automatic batching is achieved, analysis is carried out first and then summary is carried out, query is easy to maintain, the hierarchical scheduling of the calculation tasks can be defined logically, the hierarchical algorithm has the characteristics of accurate judgment, efficient extraction, rapid analysis and definition and the like, and has a decisive effect on the hierarchical arrangement of large-scale operation.
Preferably, as shown in fig. 2, after the computing tasks are extracted, the computing tasks may be first divided into a plurality of batches according to a certain rule (step S12 shown in fig. 2), and then within each batch, the computing tasks are layered based on the ER association relationship.
For example, the ER association relationship may be analyzed and layered by dividing batches according to a time echelon, or dividing batches according to business logic, and the like. Therefore, customized scheduling of small-range service data can be effectively realized, and the information refreshing frequency of customized data is improved; the peak value of the total resource overhead of the task aggregate is split into a plurality of batches, the running health of the system is effectively improved, the system availability under the running condition of the data calculation task is improved, and the platform availability and stability can be effectively increased.
Analyzing the ER association relation of the computing tasks, and importing the structured original data of all the computing tasks into a database through SQL syntax; then, analyzing and classifying the upstream and downstream relation of the calculation task based on an analytical algorithm of OLAP calculation; providing a data result query channel based on a system command line or SQL; and then, removing the repeated value of the calculation task with the same value of the layer where the calculation task is positioned, and outputting the calculation task to a configuration file of a scheduling task set as a layering result.
In the following, a distributed scheduling method according to an embodiment of the present invention is described with reference to fig. 3.
As shown in fig. 5, the distributed scheduling method according to the embodiment of the present invention includes:
step 101, layering computing tasks.
The specific hierarchical method may be implemented by the hierarchical method of the computing task according to any of the embodiments.
And 102, distributing the computing tasks to a plurality of computing nodes of the big data platform based on the layering result so that the plurality of computing nodes execute respective tasks.
Therefore, according to the distributed scheduling method provided by the embodiment of the invention, because the scheduling is performed after the hierarchy based on the ER association relationship, the real-time updating of the data can be ensured to be performed synchronously according to the updating of the service data, and the requirement of providing an accurate calculation result after the task total set hierarchy scheduling can be met.
FIG. 6 illustrates a hierarchical apparatus 1000 of computing tasks, according to an embodiment of the invention, including: an extraction module 1001, an analysis module 1002, and a layering module 1003.
The extraction module 1001 is configured to extract a computing task from a big data platform task set.
The parsing module 1002 is configured to parse the ER association relationship of the computing task.
The layering module 1003 is configured to perform layering based on the ER association relationship.
In addition, the specific parsing method of the parsing module 1002, the specific layering method of the layering module 1003, and the like, may refer to the description of the layering method of the calculation task, and are not described herein again.
Further, as shown in fig. 7, an embodiment of the present invention provides an electronic device, including: a processor 1401 and a memory 1402, in which memory 1402 computer program instructions are stored, wherein the computer program instructions, when executed by the processor, cause the processor 1401 to perform the steps of:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
Further, as shown in fig. 7, the electronic apparatus further includes a network interface 1403, an input device 1404, a hard disk 1405, and a display device 1406.
The various interfaces and devices described above may be interconnected by a bus architecture. A bus architecture may be any architecture that may include any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 1401, and one or more memories, represented by memory 1402, are coupled together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.
The network interface 1403 may be connected to a network (e.g., the internet, a local area network, etc.), obtain relevant data from the network, and store the relevant data in the hard disk 1405.
The input device 1404 may receive various instructions from an operator and send them to the processor 1401 for execution. The input device 1404 may include a keyboard or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, among others.
The display device 1406 may display a result obtained by the processor 1401 executing the instruction.
The memory 1402 is used for storing programs and data necessary for operating the operating system, and data such as intermediate results in the calculation process of the processor 1401.
It will be appreciated that the memory 1402 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memory 1402 of the apparatus and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 1402 stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system 14021 and application programs 14014.
The operating system 14021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 14014 includes various applications, such as a Browser (Browser), and the like, for implementing various application services. A program implementing a method according to an embodiment of the invention may be included in the application 14014.
The processor 1401 extracts a computing task from the large data platform task set when calling and executing an application program and data stored in the memory 1402, specifically, a program or an instruction stored in the application 14014; analyzing the ER association relation of the computing task; and layering based on the ER association relation.
The methods disclosed by the above-described embodiments of the present invention may be applied to the processor 1401, or may be implemented by the processor 1401. Processor 1401 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware integrated logic circuits or software in the processor 1401. The processor 1401 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 1402, and a processor 1401 reads information in the memory 1402 and performs the steps of the above method in combination with hardware thereof.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is caused to execute the following steps:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (15)

1. A method for layering computing tasks in a big data platform,
the computing tasks in the big data platform include:
a data acquisition task, which acquires data from a business system to obtain a data set;
a data cleaning calculation task for calculating the data to be cleaned in the data set;
a detail data calculation task to calculate the detail data in the dataset;
an application data computation task to compute application data in the dataset,
the layering method comprises the following steps:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
2. The hierarchical method according to claim 1, wherein the resolving the ER associations of the computing tasks comprises:
and analyzing the ER association relation of the single job of each computing task.
3. The hierarchical method of computing tasks according to claim 2, wherein the resolving ER associations of individual jobs for each task comprises:
and extracting keywords in the text information of the calculation task, and acquiring the ER association relation of the calculation task according to the keywords.
4. The hierarchical method of computing tasks according to claim 3, wherein the resolving ER associations of individual jobs for each task further comprises:
performing correlation analysis on a reference field of a data table upstream of the computing task, and determining a source of a specific field associated with the computing task;
and when the data structure field of the source business system is changed, updating the corresponding field of the calculation task and the associated information of the downstream layer in time.
5. The hierarchical method of computing tasks according to claim 3, characterized in that the keywords in the text information of the computing tasks are identified by text recognition techniques.
6. The hierarchical method of computing tasks of claim 5, wherein the keywords comprise one or more of call fields, connection fields, and naming-rule based fields in the textual information.
7. The method for layering computing tasks according to claim 2, further comprising the following steps after extracting the computing tasks:
dividing the computing task into a plurality of batches according to a preset rule,
in the layering based on the ER association relationship, in each batch, the computing tasks are layered based on the ER association relationship.
8. The method of claim 7, wherein the predetermined rules include partitioning by time echelon and/or partitioning by business logic.
9. The hierarchical method of computing tasks according to claim 2, wherein the resolving ER associations of the computing tasks further comprises:
and importing the ER association relation of the single job of each computing task into the database through a structured query language.
10. The method of claim 9, wherein said layering based on said ER associations comprises:
sorting all data in the database, and classifying the data based on a preset rule;
calculating the hierarchical value of each calculation task based on the ER association relation of all the calculation tasks;
the computing task is layered based on the layering values.
11. The method of claim 10, wherein the hierarchy value for each of the computing tasks is calculated based on an online analysis.
12. A distributed scheduling method is used for scheduling tasks in a big data platform, and is characterized by comprising the following steps:
the method for layering computing tasks according to any one of claims 1 to 11;
based on the layering results, the computing tasks are distributed to a plurality of computing nodes of the big data platform to be performed by the respective tasks by the plurality of computing nodes.
13. A device for layering computing tasks in a big data platform is provided,
the computing tasks in the big data platform include:
a data acquisition task, which acquires data from a business system to obtain a data set;
a data cleaning calculation task for calculating the data to be cleaned in the data set;
a detail data calculation task to calculate the detail data in the dataset;
an application data computation task to compute application data in the dataset,
the layering device includes:
the extraction module is used for extracting the computing tasks in the big data platform task set;
the analysis module is used for analyzing the ER association relation of the computing task;
and the layering module is used for layering based on the ER association relation.
14. An electronic device for layering computing tasks, comprising:
one or more processors;
one or more memories having computer readable code stored therein, which when executed by the one or more processors, causes the processors to perform the steps of:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
15. A non-transitory computer storage medium having computer readable code stored therein, which when executed by one or more processors, causes the processors to perform the steps of:
extracting a computing task in a big data platform task set;
analyzing the ER association relation of the computing task;
and layering based on the ER association relation.
CN201911087206.1A 2019-11-08 2019-11-08 Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment Active CN110908986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911087206.1A CN110908986B (en) 2019-11-08 2019-11-08 Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911087206.1A CN110908986B (en) 2019-11-08 2019-11-08 Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110908986A true CN110908986A (en) 2020-03-24
CN110908986B CN110908986B (en) 2020-10-30

Family

ID=69816840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911087206.1A Active CN110908986B (en) 2019-11-08 2019-11-08 Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110908986B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113886085A (en) * 2021-09-30 2022-01-04 支付宝(杭州)信息技术有限公司 Task execution method and device in multi-party security computing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164261A (en) * 2011-12-15 2013-06-19 中国移动通信集团公司 Multicenter data task processing method, multicenter data task processing device and multicenter data task processing system
WO2014070106A1 (en) * 2012-10-31 2014-05-08 Nanyang Technological University Multi-screen media delivery systems and methods
US20160012344A1 (en) * 2011-03-29 2016-01-14 Manyworlds, Inc. Expertise Discovery in Social Networks
CN109947954A (en) * 2018-07-09 2019-06-28 北京邮电大学 Multitask coordinated recognition methods and system
CN110245023A (en) * 2019-06-05 2019-09-17 欧冶云商股份有限公司 Distributed scheduling method and device, electronic equipment and computer storage medium
US20190303475A1 (en) * 2018-03-30 2019-10-03 Microsoft Technology Licensing, Llc Learning Optimizer for Shared Cloud

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160012344A1 (en) * 2011-03-29 2016-01-14 Manyworlds, Inc. Expertise Discovery in Social Networks
CN103164261A (en) * 2011-12-15 2013-06-19 中国移动通信集团公司 Multicenter data task processing method, multicenter data task processing device and multicenter data task processing system
WO2014070106A1 (en) * 2012-10-31 2014-05-08 Nanyang Technological University Multi-screen media delivery systems and methods
US20190303475A1 (en) * 2018-03-30 2019-10-03 Microsoft Technology Licensing, Llc Learning Optimizer for Shared Cloud
CN109947954A (en) * 2018-07-09 2019-06-28 北京邮电大学 Multitask coordinated recognition methods and system
CN110245023A (en) * 2019-06-05 2019-09-17 欧冶云商股份有限公司 Distributed scheduling method and device, electronic equipment and computer storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JAVA成功之路: "阿里如何实现秒级百万TPS?搜索离线大数据平台架构解读", 《HTTPS://WWW.JIANSHU.COM/P/932172EDFDD8》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113886085A (en) * 2021-09-30 2022-01-04 支付宝(杭州)信息技术有限公司 Task execution method and device in multi-party security computing

Also Published As

Publication number Publication date
CN110908986B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
EP2929467B1 (en) Integrating event processing with map-reduce
US11823072B2 (en) Customer behavior predictive modeling
US8533235B2 (en) Infrastructure and architecture for development and execution of predictive models
CN110825526B (en) Distributed scheduling method and device based on ER relationship, equipment and storage medium
US11615076B2 (en) Monolith database to distributed database transformation
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN114021156A (en) Method, device and equipment for organizing vulnerability automatic aggregation and storage medium
CN110908986B (en) Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment
CN108549672A (en) A kind of intelligent data analysis method and system
TWI659321B (en) System and method for analyzing industry relevance
CN115455091A (en) Data generation method and device, electronic equipment and storage medium
CN111159213A (en) Data query method, device, system and storage medium
CN114860872A (en) Data processing method, device, equipment and storage medium
CN117271481B (en) Automatic database optimization method and equipment
CN112527880B (en) Method, device, equipment and medium for collecting metadata information of big data cluster
CN114416174A (en) Model reconstruction method and device based on metadata, electronic equipment and storage medium
CN114020508A (en) Data processing method and device, electronic equipment and storage medium
CN115809252A (en) SQL statement operation method, device, equipment and storage medium
CN116737792A (en) Method, device, equipment and storage medium for data integration
CN115658702A (en) Data processing method and device, electronic equipment and readable storage medium
CN117670240A (en) Method and device for managing tasks to be handled, readable storage medium and electronic equipment
CN114579572A (en) Data table ranking method, apparatus, device, medium, and program product
CN118113739A (en) Data paging query method and device
CN117273782A (en) Crowd circling method and device and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant