CN114880103B

CN114880103B - System and method for flight task adaptation hadoop ecology

Info

Publication number: CN114880103B
Application number: CN202210807461.4A
Authority: CN
Inventors: 汪昱帅; 冯治; 赵晨曦; 刘峰; 余任杰
Original assignee: China Electronic System Technology Co ltd; CLP Cloud Digital Intelligence Technology Co Ltd
Current assignee: China Electronic System Technology Co ltd; Zhongdian Cloud Computing Technology Co ltd
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2022-09-09
Anticipated expiration: 2042-07-11
Also published as: CN114880103A

Abstract

The invention provides a system and a method for adapting a flash task to hadoop ecology, wherein the system comprises: the application layer is used for constructing the fink task and submitting the fink task to the proxy service layer, and sending information of task stop, task running or task commissioning to the proxy service layer; the agent service layer is used for submitting and issuing tasks, stopping the tasks, checking the task state, running the tasks and trying to run the tasks; the system comprises a Hadoop layer, a Prometous module and a monitoring layer, wherein the Hadoop layer is used for executing a flink task, generating a task execution log and generating a monitoring index and pushing the monitoring index to the monitoring layer; the monitoring service layer is used for sending the running state and the performance index of the flink task to the task checking module of the proxy service layer; and the storage layer comprises Redis, kafka, elastic search and a temporary data storage service module. According to the system and the method for flight task adaptation hadoop ecology provided by the exemplary embodiment of the invention, the operability and controllability of the system on the flight task can be improved, the system resources are saved, and the system and the method are suitable for various production scenes for data governance.

Description

System and method for adapting flink task to hadoop ecology

Technical Field

The invention relates to the field of big data service, in particular to a system and a method for adapting a flight task to hadoop ecology.

Background

The flink technology is a real-time computing task based on big data ecology, and the core of the flink technology is a distributed stream data flow engine written in Java and Scala. Flink executes arbitrary stream data programs in a data parallel and pipelined manner, and Flink's pipelined runtime system can execute batch and stream processing programs. In addition, the runtime of Flink itself supports the execution of iterative algorithms. After the flink task runs to a large data computing engine such as yarn and the like, the flink task can reside in the computing engine in real time, and work such as computing tasks and the like which need to be carried out can be completed in real time.

The existing realization that a flight task is submitted to a yarn cluster in an application program or the flight task is simply submitted in the application program only solves the deployment problem that a Hadoop configuration file and a third party depend on a jar package, and has the following defects:

1. the deployment problems of a Hadoop configuration file and a third-party dependent jar package are simply adapted, the flight task is submitted to a horn computing engine of the Hadoop through a service layer, the control of the service layer is separated after the flight task is submitted, the mutual interference possibly existing between different application programs is seemingly avoided, and the atomicity of a single independent flight task is not actually solved;

2. the flight task is simply submitted to the yann calculation engine through the service layer, and performance adaptation according to the task quantity of the task or the health state load of the calculation engine is not performed, so that performance parameters required by the task cannot be flexibly configured through an application system, and various performance parameters of a single task cannot be adjusted according to the actual demand of the flight task, and the task cannot be mastered in an actual complex production scene;

3. because the flink task is managed and executed by the yann computing engine, the execution log and the performance index of the flink task can be checked only in the yann computing engine, and the log of the flink task, the monitoring index and the like cannot be acquired in real time;

4. when a flight task is submitted to a horn computing engine for hosting, a dirty data problem occurs to the task when the flight task fails or an environmental problem or a data problem occurs;

5. automatically linking a stopping point of the execution of the flink task before modification after the task cannot be modified when a user needs to adjust the running flink task in order to adapt a ck mechanism;

6. the flight task states of the application system and the yarn computing engine cannot be accurately synchronized in real time due to the fact that Prometous monitoring is not adopted;

7. because the flink task is a real-time calculation streaming task, a temporary data storage service is not adopted, and an online operation result or a result set which influences normal task operation during trial operation cannot be obtained.

Therefore, how to provide a more comprehensive and controllable method for adapting the flight task to the hadoop ecology becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of this, the present invention is directed to solve the problems that in the prior art, by submitting a flight job to a yann cluster in an application program, the submitted flight task cannot be completely mastered, and task logs, states, and performance monitoring cannot be accurately obtained.

In one aspect, the invention provides a system for adapting a flink task to hadoop ecology, comprising:

the application layer comprises a task issuing module and a task management module, is used for constructing a fink task and submitting the fink task to the proxy service layer, and is used for sending information of task stop, task running or task commissioning to the proxy service layer;

the agent service layer comprises a task adaptation analysis module, a task submission module, a task stopping module, a task state checking module, a task running continuing module and a task running trial module, is used for adapting and analyzing the tasks, submitting the resource dependence required by the tasks to the HDFS of the Hadoop layer and submitting and issuing, stopping the tasks, checking the task state, running the tasks and running trial requests of the tasks;

the Hadoop layer comprises an HDFS module and a YARN calculation engine module, wherein the HDFS module comprises a dependency library, a ck file library and a savepoint file library; the YARN computing engine module comprises a task execution unit, a task execution log unit and a task recording unit; the HDFS module is used for writing resource dependence required by a task submitted by a task analysis unit in an application layer and writing a ck file and a savepoint file generated by the YARN calculation engine module; the YARN computing engine module is used for executing a flink task, generating a task execution log and a Prometous module which is used for generating a monitoring index and pushing the monitoring index to a monitoring layer;

the monitoring service layer comprises a Prometheus module and is used for acquiring monitoring indexes from the task execution log module of the hadoop layer in real time and sending the running state and performance indexes of the flink task to the task check module of the proxy service layer;

the storage layer comprises Redis, Kafka, an elastic search and a temporary data storage service module, wherein the Redis is used for storing the HDFS address of a ck or savepoint snapshot file, the Kafka is used for recording the task state of each flash in real time, the elastic search is used for storing the execution log of each flash task in real time, and the temporary data storage service module is used for storing task trial run data.

Furthermore, in the application layer of the flight task adaptation hadoop ecological system, the task issuing module comprises a task construction unit and a task demand configuration unit, the task construction unit is used for constructing the fink task, and the task demand configuration unit is used for configuring the task demand according to the task scene and submitting the fink task to the proxy service layer; the task management module comprises a task stop request unit, a task run-through request unit and a task trial run request unit, which are respectively used for sending information of task stop, task run-through or task trial run.

Further, the invention provides a proxy service layer of the flush task adaptation hadoop ecological system, which comprises the following steps:

the task adaptation analysis module comprises a task adaptation unit and a task analysis unit, wherein the task adaptation unit is used for adapting log4j dependence and Prometheus parameters into a fink task; the task analysis unit is used for analyzing the fink task submitted by the application layer, adding checkpoint configuration parameters for the fink task obtained through analysis, and submitting the resource dependence required by the task to an HDFS module of the Hadoop layer;

the task submitting module is used for self-defining task performance parameters, adapting the self-defined task performance parameters to the analyzed fink task and submitting the adapted fink task to a YARN computing engine module of a Hadoop layer through an application mode;

the task stopping module is used for submitting a task stopping request to a task execution unit of a YARN computing engine module of the Hadoop layer according to the task stopping request information, recording an execution parameter, a task attribute, a task name and jobid information of the flink task, and storing a savepoint file address corresponding to the stopping task to the Redis of the storage layer;

the task state checking module is used for acquiring the running state and the performance index of the flink task from the monitoring service layer, judging whether the running state of the flink task in the YARN computing engine module is checked according to the performance index, and synchronizing the acquired running state of the flink task to be recorded in the kafka of the storage layer;

the task continuous running module comprises an address acquisition unit and a task continuous running submission unit, wherein the address acquisition unit is used for acquiring the address of the ck or savepoint snapshot file from the HDFS module and storing the address to Redis of the storage layer; the task running continuation submitting unit is used for acquiring ck or savepoint snapshots of each flight task needing running from Redis of the storage layer according to the task running continuation request information and resubmitting the task running continuation request to a YARN computing engine module of the Hadoop layer;

and the task trial operation module is used for analyzing the trial operation tasks through the calcite model and writing a result set of the trial operation into the temporary data storage service module.

Furthermore, in the HDFS module of the Hadoop layer of the flash task adaptation Hadoop ecological system, a dependency lib library is used for writing resource dependencies required by tasks submitted by a task analysis unit in an application layer; the ck file library is used for writing a ck file generated when the task fails or the creation time is reached; the savepoint file library is used for writing savepoint files generated when the task stops.

Further, the YARN calculation engine module of the Hadoop layer of the flush task adaptation Hadoop ecosystem of the present invention includes:

the task execution unit is used for distributing a calculation space for the flink task submitted by the task submitting module in the proxy service layer, loading a dependency lib library of the flink task in the HDFS module and executing the flink task;

the task execution log unit is used for analyzing log4j dependence of the fink task in the application layer, generating a task execution log and writing the task execution log into an elastic search of the storage layer; the Prometolus module is used for reading Prometous parameters adapted by the task adaptation unit in the application layer of the fink task, generating a monitoring index and pushing the monitoring index to the monitoring layer;

and the task recording unit is used for generating a ck file when the task fails or reaches the creation time according to the checkpoint parameter added by the task analyzing unit in the application layer of the fink task, generating a savepoint file when the task stops, and providing a snapshot file for task failure and task running.

On the other hand, the invention provides a method for adapting a flink task to hadoop ecology, which comprises the following steps:

constructing a fink task by adopting a task construction unit of a task release module in an application layer, configuring a task requirement according to a task scene by adopting a task requirement configuration unit and submitting the fink task to an agent service layer;

adopting a task adaptation unit of a task adaptation analysis module in a proxy service layer to adapt log4j dependence and Prometous parameters to a fink task, adopting the task analysis unit to analyze the fink task submitted by an application layer, adding checkpoint configuration parameters for the fink task obtained by analysis, and submitting resource dependence required by the task to a dependence lib library of a Hadoop layer HDFS module;

self-defining task performance parameters by adopting a task submitting module in an agent service layer, adapting the self-defined task performance parameters to the analyzed fink task, and submitting the adapted fink task to a YARN calculation engine module of a Hadoop layer through an application mode;

distributing a computing space for the flink task submitted by a task submitting module in the agent service layer by adopting a task executing unit of a YARN computing engine module in the agent service layer, loading a dependent lib library of the flink task in an HDFS module, and executing the flink task;

adopting a task execution log unit of a YARN computing engine module in the proxy service layer, analyzing log4j dependence of the fink task in the application layer, which is adapted by a task adaptation unit, generating a task execution log, and writing the task execution log into an elastic search of a storage layer; reading Prometheus parameters adapted by a task adaptation unit in an application layer of the fink task by adopting a task execution log unit, generating a monitoring index and pushing the monitoring index to a Prometheus module of the monitoring layer;

generating a ck file when the task fails or reaches the creation time and generating a savepoint file when the task stops according to a checkpoint parameter added by a task analysis unit in an application layer by adopting a task recording unit of a YARN computing engine module in a proxy service layer, and providing a snapshot file for task failure and task running;

and acquiring the running state and the performance index of the flink task from the Prometheus module of the monitoring service layer by adopting a task state checking module in the proxy service layer, judging whether the running state of the flink task in the YARN computing engine module is checked according to the performance index, and synchronizing the acquired running state of the flink task to the kafka of the storage layer for recording.

Further, the method for adapting the flink task to the hadoop ecology, provided by the invention, comprises the following steps:

generating a savepoint file by using a task recording unit in a Hadoop layer YARN computing engine module when a task stops, and writing the generated savepoint file into a savepoint file library of a Hadoop layer HDFS module;

and a task stopping request unit of a task management module in the application layer is adopted to send task stopping information to a task stopping module of the proxy service layer, the task stopping module submits a task stopping request to a task execution unit of a YARN computing engine module of the Hadoop layer according to the task stopping request information, records execution parameters, task attributes, task names and jobid information of the flink task, and stores savepoint file addresses corresponding to the stopping tasks to the Redis of the storage layer.

generating a ck file by adopting a task recording unit in a Hadoop layer YARN computing engine module when a task fails or reaches the creation time, and writing the generated ck file into a ck file library of a Hadoop layer HDFS module;

a task running request unit of a task management module in an application layer sends task running information to a task running module of an agent service layer, an address acquisition unit in the task running module acquires the address of a ck or savepoint snapshot file from an HDFS module, and the address is stored to Redis of a storage layer; and acquiring a ck or savepoint snapshot of each flight task needing to be run from the Redis of the storage layer by adopting a task running submitting unit in the task running module according to the task running request information, and resubmitting the task running request to a YARN calculation engine module of the Hadoop layer.

Further, the method for adapting the flink task to the hadoop ecology, provided by the invention, comprises the following steps: and sending task trial running information to a task trial running request unit of the agent service layer by adopting a task trial running request unit of the task management module in the application layer, analyzing a trial running task by adopting a task trial running module through a calcite model, and writing a result set of the trial running into the temporary data storage service module.

Finally, the invention also provides a terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the program.

The system and the method for adapting the flight task to hadoop ecology have the following advantages that:

1. realizing atomicity of a single independent flight task running on a horn computing engine of a hadoop ecology;

2. the performance parameters required by the tasks are flexibly configured through the application layer, so that the flink tasks can be scientifically and flexibly regulated and controlled according to specific production scenes;

3. the method comprises the following steps of realizing real-time acquisition of a flink task log and performance monitoring indexes;

4. the problem that dirty data can occur to a task when a flink task on a computing engine fails is avoided;

5. the problem that when the task needs to be modified due to the practice requirement, the task cannot be modified, so that the stopping point of the execution of the flink task before modification is automatically connected is solved;

6. the real-time accurate synchronization of the application system and the flight task state on the yarn computing engine is realized;

7. and the on-line acquisition of the task trial operation result is realized.

The method and the system for the flash task adaptation hadoop ecology can improve the operability and controllability of the system on the flash task, reduce the labor intensity of users, save system resources, have good expandability and flexibility, and are suitable for various production scenes for data management.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is an architecture diagram of a system for adapting a flink task to hadoop ecology according to an exemplary first embodiment of the present invention.

Fig. 2 is a flowchart of a method for adapting a flink task to hadoop ecology according to a fourth, fifth, sixth, and seventh embodiment of the present invention.

Fig. 3 is a timing chart illustrating flight task issuance in the method for adapting a flight task to hadoop ecology according to the fourth exemplary embodiment of the present invention.

FIG. 4 is a timing diagram illustrating log submission and acquisition in a method for adapting a flink task to hadoop ecology according to a fourth embodiment of the present invention.

FIG. 5 is a timing diagram illustrating the stopping of a flink task according to a fifth exemplary method for adapting a flink task to hadoop ecology.

FIG. 6 is a timing diagram illustrating flink task continuous running according to a method for adapting a flink task to hadoop ecology according to a sixth exemplary embodiment of the present invention.

Fig. 7 is a timing chart of flight task commissioning according to the method for adapting a flight task to hadoop ecology according to the seventh embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below with reference to the accompanying drawings.

It should be noted that, in the case of no conflict, the features in the following embodiments and examples may be combined with each other; moreover, based on the embodiments in the present disclosure, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

The terms referred to in the following examples are to be construed as follows:

ck: a checkpoint event is sent by a checkpoint process (LGWR/CKPT process), when the checkpoint event occurs, a DBWR writes a dirty block into a disk, and meanwhile, file headers of a data file and a control file are also updated to record checkpoint information.

Savepoint: the method is a logic point in the transaction process of the flink computing task, is used for canceling part of transactions, and automatically deletes all the saving points defined in the transaction when the transaction is ended. When executing a rolback, it may be rolled back to the specified point by specifying a save point.

Prometheus: is a second project graduating from the Cloud Native Computing Foundation (CNCF) following Kubernets. Prometheus is an open source version similarly realized by a Google monitoring system BorgMon, the whole system consists of monitoring services, alarm services, a time sequence database and other parts, and various surrounding ecological index collectors (exporters), and is a mainstream cloud original monitoring alarm system at present.

Hadoop: hadoop is a distributed system infrastructure developed by the Apache Foundation. A user can develop a distributed program without knowing the distributed underlying details. The power of the cluster is fully utilized to carry out high-speed operation and storage.

YARN: the Apache Hadoop YARN (a yeet Anter Resource coordinator, Another Resource coordinator) is a new Hadoop Resource manager, is a universal Resource management system, can provide uniform Resource management and scheduling for upper-layer application, and brings great benefits to the cluster in the aspects of utilization rate, uniform Resource management, data sharing and the like.

HDFS (Hadoop distributed File System): HDFS is one of the components of Hadoop, designed to fit into a Distributed File System (Distributed File System) running on general purpose hardware (comfort hardware). It has many similarities with existing distributed file systems. But at the same time, its distinction from other distributed file systems is also clear. HDFS is a highly fault tolerant system suitable for deployment on inexpensive machines. HDFS provides high throughput data access and is well suited for application on large-scale data sets. HDFS relaxes a portion of the POSIX constraints to achieve the goal of streaming read file system data. HDFS was originally developed as an infrastructure for the Apache Nutch search engine project. HDFS is part of the Apache Hadoop Core project.

The theoretical technology of the invention is briefly described as follows:

various flink tasks needing to be submitted are analyzed through an application layer, performance parameters required by the current task are assembled through a proxy service layer, resources required by the task are uploaded to a lib directory of the current task on an HDFS, atomicity of a single flink task is achieved, and scientific and flexible regulation and control of the flink task can be guaranteed according to a specific production scene; acquiring the state of a flink task on a YARN computing engine through a proxy service layer, and synchronizing to kafka middleware in real time to provide for an application layer; acquiring a monitoring index of a flink task from a Prometheus module through a proxy service layer; the method comprises the steps that a source code of log4j is modified to adapt to a flink task to obtain a log of the flink task in real time; the method comprises the steps that savepoint of a flink task which is actively stopped and checkpoint which is automatically generated are configured and generated through a proxy service layer, and corresponding information is stored in redis middleware, so that the problems that a stopping point of the flink task before modification is automatically linked after the task is modified, dirty data occurs in the task when the task fails and the like are solved; and analyzing the corresponding flinksql task by using a calcite model, and adapting the corresponding sink to realize real-time trial operation of the task result and obtain the task without influencing normal operation.

Fig. 1 is an architecture diagram of a system for adapting a flink task to hadoop ecology according to an exemplary first embodiment of the present invention, as shown in fig. 1, the system of the present embodiment includes:

the Hadoop layer comprises an HDFS module and a YARN calculation engine module, wherein the HDFS module comprises a dependency library, a ck file library and a savepoint file library; the YARN computing engine module comprises a task execution unit, a task execution log unit and a task recording unit; the HDFS module is used for writing resource dependence required by a task submitted by a task analysis unit in an application layer and writing ck files and savepoint files generated by the YARN calculation engine module; the YARN computing engine module is used for executing a flink task, generating a task execution log and a Prometous module which is used for generating a monitoring index and pushing the monitoring index to a monitoring layer;

the monitoring service layer comprises a Prometous module and a task checking module, wherein the Prometous module is used for acquiring a monitoring index from a task execution log module of the hadoop layer in real time and sending the running state and the performance index of the flink task to the agent service layer;

In the application layer of the flight task adaptation hadoop ecology system, the task issuing module comprises a task constructing unit and a task demand configuring unit, wherein the task constructing unit is used for constructing a fink task, and the task demand configuring unit is used for configuring a task demand according to a task scene and submitting the fink task to the proxy service layer; the task management module comprises a task stop request unit, a task run-through request unit and a task trial run request unit, which are respectively used for sending information of task stop, task run-through or task trial run.

A second embodiment of the present invention provides a system for adapting a flink task to hadoop ecology, where this embodiment is a preferred embodiment of the system shown in fig. 1, and a proxy service layer of the system in this embodiment includes:

the task adaptation analysis module comprises a task adaptation unit and a task analysis unit, wherein the task adaptation unit is used for adapting log4j dependence and Prometheus parameters into a fink task; the task analysis unit is used for analyzing a fink task submitted by the application layer, adding checkpoint configuration parameters for the fink task obtained through analysis, and submitting resource dependence required by the task to an HDFS module of the Hadoop layer;

and the task trial operation module is used for analyzing the trial operation task through the calcite model and writing a result set of the trial operation into the temporary data storage service module.

The third exemplary embodiment of the present invention provides a system for adapting a flink task to a hadoop ecology, and this embodiment is a preferred embodiment of the system shown in fig. 1.

In the HDFS module of the Hadoop layer of the system, a dependency lib library is used for writing resource dependencies required by tasks submitted by a task analysis unit in an application layer; the ck file library is used for writing a ck file generated when the task fails or the creation time is reached; the savepoint file library is used for writing savepoint files generated when the task stops.

The YARN calculation engine module of the Hadoop layer of the system of this embodiment includes:

the task execution unit is used for distributing a calculation space for the flink task submitted by the task submitting module in the proxy service layer, loading a dependency library of the flink task in the HDFS module and executing the flink task;

the task execution log unit is used for analyzing log4j dependence of the fink task in the application layer, adapted by the task adaptation unit, generating a task execution log, and writing the task execution log into an elastic search of the storage layer; the Prometheus module is used for reading Prometheus parameters adapted by the task adaptation unit in the application layer of the fink task, generating monitoring indexes and pushing the monitoring indexes to the monitoring layer;

A fourth embodiment of the present invention provides a method for adapting a flash task to a hadoop ecology, where the method of this embodiment adopts the system for adapting a flash task to a hadoop ecology shown in fig. 1 to perform a flash task to a hadoop ecology, and a flow of the method of this embodiment is shown in fig. 2, and specifically, the method for issuing and submitting a flash task by adopting the system for adapting a flash task to a hadoop ecology shown in fig. 1 according to the method shown in fig. 2 includes:

adopting a task execution log unit of a YARN calculation engine module in the proxy service layer, analyzing log4j dependence of a fink task in the application layer adapted by a task adaptation unit, generating a task execution log, and writing the task execution log into an elastic search of a storage layer; reading Prometheus parameters adapted by a task adaptation unit in an application layer of the fink task by adopting a task execution log unit, generating a monitoring index and pushing the monitoring index to a Prometheus module of the monitoring layer;

and generating a ck file when the task fails or reaches the creation time and generating a savepoint file when the task stops according to a checkpoint parameter added by a task analysis unit of the fink task in the application layer by adopting a task recording unit of a YARN computing engine module in the proxy service layer, and providing a snapshot file for task failure and task running.

And acquiring the running state and the performance index of the flink task from a Prometheus module of a monitoring service layer by adopting a task state checking module in the proxy service layer, judging whether the running state of the flink task in the YARN computing engine module is checked according to the performance index, and synchronizing the acquired running state of the flink task to the kafka of the storage layer for recording.

Fig. 3 is a timing diagram of issuing a flink task according to the method of this embodiment, where an invoking party sends a command request to a proxy service layer, the proxy service layer returns status information to be submitted to a user, and requests an HDFS module to download jar to be run correspondingly, and when a flink sql and a flink dag task invoke request, the obtained jar is used to submit the flink task to a YARN calculation engine module, which returns a result to the proxy service layer, which returns the result to the invoking party, and writes a corresponding task status into kafka. And the caller sends the consumption task state to kafka, and the kafka modifies the corresponding task state and sends the modified task state to the caller.

Fig. 4 is a sequence diagram of log submission and acquisition in the method according to this embodiment, where the YARN writes a corresponding task log into an elastic search, the caller sends a command request to the proxy service layer, the proxy service layer queries corresponding log information from the elastic search, acquires a corresponding log, and returns the result to the caller.

An exemplary fifth embodiment of the present invention provides a method for adapting a flash task to a hadoop ecology, where the method of this embodiment uses the system for adapting a flash task to a hadoop ecology shown in fig. 1 to perform a flash task to a hadoop ecology, and a flow of the method of this embodiment refers to fig. 2, and specifically, the method for stopping a flash task using the system for adapting a flash task to a hadoop ecology shown in fig. 1 according to the method shown in fig. 2 includes:

Fig. 5 is a timing diagram of a flink task stop according to the method of this embodiment, where a caller sends a command request to a proxy service layer, the proxy service layer cancels the running of the flink task in the YARN computing engine module, writes savepoint information when the task stops into redis and writes the task state (stopped) into kafka, the proxy service layer returns the result of the task stop to the caller, and the kafka consumes the task state (stopped) to the caller.

An exemplary sixth embodiment of the present invention provides a method for adapting a flash task to a hadoop ecology, where the method of this embodiment adopts the system for adapting a flash task to a hadoop ecology shown in fig. 1 to perform a flash task adapted hadoop ecology, and a flow of the method of this embodiment is shown in fig. 2, and specifically, the method for performing a continuous run of a flash task according to the method shown in fig. 2, which includes:

Fig. 6 is a timing diagram of flink task running according to the method of this embodiment, where a call party sends a command request to a proxy service layer, the proxy service layer obtains savepoint information of a current running task from a Redis request, runs a flink task in YARN, the YARN returns a result of the task running request to the proxy service layer, and the proxy service layer returns the result to the call party.

An exemplary seventh embodiment of the present invention provides a method for adapting a flash task to a hadoop ecology, where the method of this embodiment adopts the system for adapting a flash task to a hadoop ecology shown in fig. 1 to perform a flash task adapting hadoop ecology, and a flow of the method of this embodiment is shown in fig. 2, and specifically, the method for performing a pilot run of a flash task according to the method shown in fig. 2 by adopting the system for adapting a flash task to a hadoop ecology shown in fig. 1 includes: and sending task trial running information to a task trial running request unit of the agent service layer by adopting a task trial running request unit of the task management module in the application layer, analyzing a trial running task by adopting a task trial running module through a calcite model, and writing a result set of the trial running into the temporary data storage service module.

Fig. 7 is a timing diagram of pilot run of flink tasks according to the method of this embodiment, where the caller sends a command request to the proxy service layer, the proxy service layer submits a pilot run task to the YARN, and the YARN temporarily stores the pilot run result in the temporary data storage service module.

The invention also provides a terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the program.

The terminal device has the corresponding technical effects of the system and the method for the flash task adaptation hadoop ecology.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A system for flight task adaptation hadoop ecology, comprising:

the storage layer comprises Redis, Kafka, elastic search and a temporary data storage service module, wherein the Redis is used for storing HDFS addresses of ck or savepoint snapshot files, the Kafka is used for recording task states of all the flash in real time, the elastic search is used for storing execution logs of all the flash tasks in real time, and the temporary data storage service module is used for storing task trial operation data.

2. The flink task adaptation hadoop ecology system according to claim 1, wherein in the application layer, the task issuing module comprises a task constructing unit and a task requirement configuring unit, the task constructing unit is used for constructing a fink task, and the task requirement configuring unit is used for configuring a task requirement according to a task scene and submitting the fink task to the proxy service layer; the task management module comprises a task stop request unit, a task run-through request unit and a task trial run request unit, which are respectively used for sending information of task stop, task run-through or task trial run.

3. The system for flight task adaptation to hadoop ecology of claim 1, wherein the proxy service layer comprises:

the task state checking module is used for acquiring the running state and the performance index of the flink task from the monitoring service layer, judging whether to check the running state of the flink task in the YARN computing engine module according to the performance index, and synchronizing the acquired running state of the flink task to be recorded in kafka of the storage layer;

the task continuous running module comprises an address acquisition unit and a task continuous running submission unit, wherein the address acquisition unit is used for acquiring the address of the ck or savepoint snapshot file from the HDFS module and storing the address to Redis of the storage layer; the task running continuation submitting unit is used for acquiring a ck or savepoint snapshot of each flight task needing running from the Redis of the storage layer according to the task running continuation request information and resubmitting the task running continuation request to a YARN computing engine module of the Hadoop layer;

4. The fly task adaptation Hadoop ecology system according to claim 1, wherein in an HDFS module of the Hadoop layer, a dependency lib library is used for writing resource dependencies required by tasks submitted by a task parsing unit in an application layer; the ck file library is used for writing a ck file generated when the task fails or the creation time is reached; the savepoint file library is used for writing savepoint files generated when the task stops.

5. The system for flight task adaptation to Hadoop ecology of claim 1, wherein the YARN compute engine module of the Hadoop layer comprises:

the task execution log unit is used for analyzing log4j dependence of the fink task in the application layer, generating a task execution log and writing the task execution log into an elastic search of the storage layer; the Prometheus module is used for reading Prometheus parameters adapted by the task adaptation unit in the application layer of the fink task, generating monitoring indexes and pushing the monitoring indexes to the monitoring layer;

6. A method for flight task adaptation hadoop ecology, which is implemented by the system for flight task adaptation hadoop ecology according to any one of claims 1 to 5, and which comprises:

7. A method for flight task adaptation hadoop ecology, which is implemented by the system for flight task adaptation hadoop ecology according to any one of claims 1 to 5, and which comprises:

8. A method for flight task adaptation hadoop ecology, which is implemented by the system for flight task adaptation hadoop ecology according to any one of claims 1 to 5, and which comprises:

9. A method for flight task adaptive hadoop ecology, which is performed by using the system for flight task adaptive hadoop ecology of any one of claims 1 to 5, and which comprises: and sending task trial running information to a task trial running request unit of the agent service layer by adopting a task trial running request unit of the task management module in the application layer, analyzing a trial running task by adopting a task trial running module through a calcite model, and writing a result set of the trial running into the temporary data storage service module.

10. A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 6 to 9 when executing the program.