CN115114333A - Multi-engine visual data stream implementation method, device, equipment and storage medium - Google Patents

Multi-engine visual data stream implementation method, device, equipment and storage medium Download PDF

Info

Publication number
CN115114333A
CN115114333A CN202210724590.7A CN202210724590A CN115114333A CN 115114333 A CN115114333 A CN 115114333A CN 202210724590 A CN202210724590 A CN 202210724590A CN 115114333 A CN115114333 A CN 115114333A
Authority
CN
China
Prior art keywords
data
engine
source computing
open source
computing engines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210724590.7A
Other languages
Chinese (zh)
Inventor
王雪原
李彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yuannian Technology Co ltd
Original Assignee
Beijing Yuannian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yuannian Technology Co ltd filed Critical Beijing Yuannian Technology Co ltd
Priority to CN202210724590.7A priority Critical patent/CN115114333A/en
Publication of CN115114333A publication Critical patent/CN115114333A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Stored Programmes (AREA)

Abstract

The application provides a method, a device and equipment for realizing multi-engine visual data stream and a computer readable storage medium. The multi-engine visual data flow implementation method comprises the following steps: acquiring a configuration file and target data; analyzing the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises the identifications of at least two open source computing engines; accessing at least two corresponding open source computing engines according to the identifiers of the at least two open source computing engines; and carrying out data processing on the target data by utilizing at least two open source computing engines. According to the embodiment of the application, the flexibility of data processing can be improved.

Description

Multi-engine visual data stream implementation method, device, equipment and storage medium
Technical Field
The application belongs to the technical field of multi-engine visual data stream, and particularly relates to a method, a device, equipment and a computer readable storage medium for realizing multi-engine visual data stream.
Background
The visualized data flow engine generally relies on an open-source big data computing engine to complete computing tasks, and the operation flow of the visualized data flow engine is shown in fig. 1: the front-end generates the configuration and then the back-end passes the configuration to the visualization dataflow engine. The visual data stream engine sequentially performs the following steps on the visual data stream engine: (1) obtaining and analyzing configuration; (2) generating an open source computing engine operation according to the configuration; (3) submitting and managing open source computing engine jobs. And finally, running the operation by utilizing the open source computing engine.
However, the current visualization data flow scheme has at least the following disadvantages: only one open source computing engine can be accessed, and a plurality of open source computing engines cannot be accessed, so that the flexibility of data processing is limited. Therefore, how to improve the flexibility of data processing is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The embodiment of the application provides a method, a device and equipment for realizing multi-engine visual data stream and a computer readable storage medium, which can improve the flexibility of data processing.
In a first aspect, an embodiment of the present application provides a method for implementing a multi-engine visualization data stream, including:
acquiring a configuration file and target data;
analyzing the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises the identifications of at least two open source computing engines;
accessing at least two corresponding open source computing engines according to the identifiers of the at least two open source computing engines;
and carrying out data processing on the target data by utilizing at least two open source computing engines.
Further, the configuration attribute information further includes identifications of at least two data structures; after the configuration file is analyzed to obtain the corresponding configuration attribute information, the method further comprises the following steps:
determining at least two corresponding data structures based on the identifications of the at least two data structures;
data analysis is performed on the target data using at least two data structures.
Further, the configuration attribute information also comprises a data stream identification, a data stream name and a data stream step;
each data stream step comprises an input identifier of the step, an output identifier of the step, an identifier of an open source computing engine used in the step and an identifier of a data structure used in the step;
the identification of the open source computing engine used in the step is one of the identifications of at least two open source computing engines; the identification of the data structure used in the step is one of the identifications of at least two data structures.
Further, the data processing of the target data by using at least two open source computing engines and the data analysis of the target data by using at least two data structures comprises:
in the first data flow step, a first open source computing engine is used for carrying out data processing on first target data, and a first data structure is used for carrying out data analysis on the first target data after the data processing;
in the second data flow step, a second open source computing engine is used for carrying out data processing on second target data, and a second data structure is used for carrying out data analysis on the second target data after the data processing;
the first open-source computing engine and the second open-source computing engine are different open-source computing engines; the first data structure and the second data structure are different data structures.
Further, the first open-source computing engine and the second open-source computing engine are different open-source computing engines in a Spark engine and a flight engine.
Further, the first data structure and the second data structure are different data structures of a DataFrame data structure and a DataSet data structure.
Further, obtaining the configuration file comprises:
after the front-end equipment generates the configuration file, the configuration file is transmitted through the back-end equipment to obtain the configuration file.
In a second aspect, an embodiment of the present application provides an apparatus for implementing a multi-engine visual data stream, including:
the acquisition module is used for acquiring the configuration file and the target data;
the analysis module is used for analyzing the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises identifiers of at least two open source computing engines;
the access module is used for accessing the corresponding at least two open source computing engines according to the identifiers of the at least two open source computing engines;
and the data processing module is used for processing the target data by utilizing at least two open source computing engines.
Further, the configuration attribute information further includes identifications of at least two data structures; the device still includes:
the determining module is used for determining at least two corresponding data structures based on the identifications of the at least two data structures;
and the data analysis module is used for carrying out data analysis on the target data by utilizing at least two data structures.
Further, the configuration attribute information also comprises a data stream identification, a data stream name and a data stream step;
each data stream step comprises an input identifier of the step, an output identifier of the step, an identifier of an open source computing engine used in the step and an identifier of a data structure used in the step;
the identification of the open source computing engine used in the step is one of the identifications of at least two open source computing engines; the identification of the data structure used in the step is one of the identifications of at least two data structures.
Further, the data processing module and the data analysis module are used for:
in the first data flow step, a first open source computing engine is used for carrying out data processing on first target data, and a first data structure is used for carrying out data analysis on the first target data after the data processing;
in the second data flow step, a second open source computing engine is used for carrying out data processing on second target data, and a second data structure is used for carrying out data analysis on the second target data after the data processing;
the first open-source computing engine and the second open-source computing engine are different open-source computing engines; the first data structure and the second data structure are different data structures.
Further, the first open-source computing engine and the second open-source computing engine are different open-source computing engines in a Spark engine and a flight engine.
Further, the first data structure and the second data structure are different data structures of a DataFrame data structure and a DataSet data structure.
Further, the obtaining module is configured to:
after the front-end equipment generates the configuration file, the configuration file is transmitted through the back-end equipment to obtain the configuration file.
In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a multi-engine visual data stream implementation method as illustrated in the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, implement the multi-engine visualization data flow implementation method as shown in the first aspect.
The method, the device and the equipment for realizing the multi-engine visual data stream and the computer readable storage medium can improve the flexibility of data processing.
The multi-engine visual data flow implementation method comprises the following steps: acquiring a configuration file and target data; analyzing the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises the identifications of at least two open source computing engines; accessing at least two corresponding open source computing engines according to the identifiers of the at least two open source computing engines; and carrying out data processing on the target data by utilizing at least two open source computing engines.
Therefore, the method accesses the corresponding at least two open source computing engines according to the identifiers of the at least two open source computing engines, and performs data processing on the target data by using the at least two open source computing engines. That is, a plurality of open source computing engines are integrated, one data stream can simultaneously use the plurality of open source computing engines to process data, and the flexibility of data processing is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments or the technical solutions in the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow diagram of a prior art visualization dataflow engine;
FIG. 2 is a flowchart illustrating a method for implementing a multi-engine visualization data stream according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a multi-engine visualization data flow implementation apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an xx device provided in an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The visualized data flow engine generally relies on an open-source big data computing engine to complete computing tasks, and the operation flow of the visualized data flow engine is shown in fig. 1: the front-end generates the configuration and then the back-end passes the configuration to the visualization dataflow engine. The visual data stream engine sequentially performs the following steps on the visual data stream engine: (1) obtaining and analyzing configuration; (2) generating an open source computing engine operation according to the configuration; (3) submitting and managing open source computing engine jobs. And finally, running the operation by utilizing the open source computing engine.
However, the current visualization data flow scheme has at least the following disadvantages: 1) only one open source computing engine can be accessed, and a plurality of open source computing engines cannot be accessed, so that the flexibility of data processing is limited. 2) Data analysis can be carried out only by relying on one data structure, and data analysis can not be carried out by any data structure, so that the flexibility of data analysis is limited.
In order to solve the prior art problems, embodiments of the present application provide a method, an apparatus, a device, and a computer-readable storage medium for implementing a multi-engine visual data stream. First, a method for implementing a multi-engine visualized data stream provided by the embodiment of the present application is described below.
Fig. 2 is a flowchart illustrating a method for implementing a multi-engine visualization data flow according to an embodiment of the present application. As shown in fig. 2, the multi-engine visualization data flow implementation method includes:
s201, acquiring a configuration file and target data;
wherein, in one embodiment, obtaining the configuration file comprises:
after the front-end equipment generates the configuration file, the configuration file is transmitted through the back-end equipment to obtain the configuration file.
The user can generate the configuration file on the front-end equipment according to the user requirement, and then the configuration file is transmitted through the back-end equipment.
S202, analyzing the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises the identifications of at least two open source computing engines;
s203, accessing at least two corresponding open source computing engines according to the identifiers of the at least two open source computing engines;
and S204, performing data processing on the target data by utilizing at least two open source computing engines.
Therefore, the method accesses the corresponding at least two open source computing engines according to the identifiers of the at least two open source computing engines, and performs data processing on the target data by using the at least two open source computing engines. That is, a plurality of open source computing engines are integrated, one data stream can simultaneously use the plurality of open source computing engines to process data, and the flexibility of data processing is greatly improved.
From the above, the current visualization data flow scheme has at least the following disadvantages: data analysis can be carried out only by relying on one data structure, and data analysis can not be carried out by any data structure, so that the flexibility of data analysis is limited.
To increase flexibility of data analysis, in one embodiment, the configuration attribute information further includes an identification of at least two data structures; after the configuration file is analyzed to obtain the corresponding configuration attribute information, the method further comprises the following steps:
determining at least two corresponding data structures based on the identifications of the at least two data structures;
data analysis is performed on the target data using at least two data structures.
The embodiment utilizes at least two data structures to perform data analysis on the target data, and can improve the flexibility of data analysis.
Further, in one embodiment, the configuration attribute information further includes a data stream identification, a data stream name, and a data stream step;
each data stream step comprises an input identifier of the step, an output identifier of the step, an identifier of an open source computing engine used in the step and an identifier of a data structure used in the step;
the identification of the open source computing engine used in the step is one of the identifications of at least two open source computing engines; the identification of the data structure used in the step is one of the identifications of at least two data structures.
To increase flexibility of data processing and data analysis simultaneously, in one embodiment, data processing of target data using at least two open source computing engines and data analysis of target data using at least two data structures includes:
in the first data flow step, a first open source computing engine is used for carrying out data processing on first target data, and a first data structure is used for carrying out data analysis on the first target data after the data processing;
in the second data flow step, a second open-source computing engine is used for carrying out data processing on second target data, and a second data structure is used for carrying out data analysis on the second target data after the data processing;
the first open-source computing engine and the second open-source computing engine are different open-source computing engines; the first data structure and the second data structure are different data structures.
Each data flow step in this embodiment employs a corresponding open source computing engine and data structure. In addition, different open source computing engines and data structures are adopted among different data flow steps, so that the flexibility of data processing and data analysis can be improved simultaneously.
To further increase the flexibility of data processing, in one embodiment, the first open-source computing engine and the second open-source computing engine are different open-source computing engines of a Spark engine and a Flink engine.
The Spark engine is a similar open source clustered computing environment as Hadoop, but there are some differences between the two that make it more advantageous in terms of certain workloads, in other words, it enables memory distributed datasets, in addition to being able to provide interactive queries, it can optimize iterative workloads. The Spark engine is implemented in the Scala language, which uses Scala as its application framework. Unlike Hadoop, the Spark engine and the Scala can be tightly integrated, where the Scala can manipulate distributed datasets as easily as manipulating native collection objects. Although the Spark engine is created to support iterative work on a distributed dataset, it is actually a complement to Hadoop and can run in parallel in a Hadoop file system.
This behavior may be supported by a third party cluster framework named messos. Spark engine, can be used to build large, low-latency data analysis applications.
The Spark engine has three main features: first, the high-level API takes away from the focus of the cluster itself, and the Spark engine application developer can concentrate on the computations itself that the application is going to do.
Second, the Spark engine is fast, supporting interactive computing and complex algorithms.
Finally, Spark engine is a general purpose engine that can be used to perform a variety of operations, including SQL queries, text processing, machine learning, etc., and before the advent of Spark engine we generally needed to learn a variety of engines to handle these needs separately.
Performance characteristics of Spark engine:
(1) faster speed:
the Spark engine is 100 times faster than Hadoop under memory computation.
(2) Ease of use:
the Spark engine provides 80 advanced operators.
(3) Universality:
spark engine provides a large number of libraries including SQL, DataFrames, MLlib, GraphX, Spark engine. Developers can seamlessly combine the use of these libraries within the same application.
(4) A variety of resource managers are supported:
the Spark engine supports HadoopYARN, Apache meso, and its own independent cluster manager Spark engine ecosystem.
Characteristics of the Flink engine:
(1) simultaneously, high throughput, low delay and high performance are supported;
(2) event Time concept (Event Time) is supported:
most window calculations use the system Time (Process Time), which is the current Time of the system host when the event is transmitted to the computing framework. The Flink engine can support window calculations based on temporal Event Time (Event Time) semantics, i.e., the Time at which the Time is generated. The mechanism based on time drive enables the stream system to calculate accurate results even if the events arrive out of order, and maintains the original time sequence of the time. The influence of network transmission or hardware system is avoided as much as possible.
(3) Support state computation:
the Flink engine realizes state management, so-called state is that intermediate result data of an operator is stored in a memory or a file system in a streaming calculation process, and after the intermediate result data enter the operator at the next time, the intermediate result data can be obtained from the previous state to calculate the current result, so that the result is not required to be counted based on all original data each time, the performance of the system is greatly improved, the consumption of resources in the calculation process is reduced, and the method plays a very important role in a streaming calculation scene with large data volume and very complex calculation logic.
(4) Supporting highly flexible Window (Window) operations:
in a streaming application, data is continuous, and a certain range of aggregation calculation needs to be performed on the data in a window manner, for example, to count how many users click on a certain web page in the past minute, in this case, we must define a window for collecting data of the last minute and perform statistical calculation on the data in the window. The Flink engine divides the window into window operations based on the types of Time, count, Session, Data-drive and the like, the window can be customized by using flexible trigger conditions to support a complex stream transmission mode, and a user can define different window trigger mechanisms to meet different requirements.
(5) Fault tolerance implemented based on lightweight distributed snapshots (CheckPoint):
the flight engine can run on thousands of nodes in a distributed mode, the flow of a large-scale calculation task is disassembled into small calculation processes, then the tasks are distributed to the parallel nodes for processing, and the problem that data are inconsistent due to errors in the event processing process can be automatically found in the task execution process. Such as: node down, network transmission problems, or a computing service restart due to a user upgrading or fixing a problem, etc. Under the conditions, the state information in the execution process is stored persistently based on the distributed snapshot technology CheckPoints, and ONCE the task is abnormally stopped, the Flink engine can automatically recover the task from the CheckPoints so as to ensure the consistency of the data in the processing process (exact-ONCE).
(6) Independent memory management is realized based on JVM:
memory management is a part of all calculation frames which need to be considered in a key way, particularly for a calculation scene with a large calculation amount, how data is managed in a memory is very important, aiming at the memory management, a Flink engine realizes a self-management memory mechanism, the influence of a JVM GC on a system is reduced as much as possible, in addition, the Flink engine converts all data objects into binary systems through a serialization/deserialization method to be stored in the memory, and the risk of performance reduction or task abnormity caused by the GC is reduced, so that the Flink engine is more stable than other frames of distributed processing, and the operation of the whole application cannot be influenced by the problems of the JVM GC and the like.
(5) Save Point:
for a streaming application running for 7 × 24 hours, data is continuously accessed, and the termination of the application within a period of time may cause data loss or inaccurate calculation results, such as upgrading cluster versions, halting operation and maintenance operations, and the like. It is worth mentioning that the Flink engine stores the snapshot executed by the task on the storage medium through the Save Points technology, and when the task is restarted, the saved Save Points can be directly engaged in to restore the original computing state, so that the task continues to operate according to the state before shutdown, and the Sava Points technology can enable users to better manage and operate and maintain.
Based on the characteristics and performances of the Spark engine and the Flink engine, the flexibility of data processing can be further improved.
To further increase the flexibility of data analysis, in one embodiment, the first data structure and the second data structure are different data structures of a DataFrame data structure and a DataSet data structure.
To support the processing of structured data, Spark SQL provides a new data structure DataFrame. DataFrame is a data set consisting of named columns. It is conceptually equivalent to a table in a relational database or a data frame in the R/Python language. Since Spark SQL supports the development of multiple languages, each language defines an abstraction of DataFrame.
The specific Scheme structure in the DataFrame, namely the column name and the column field type are known, so that the advantages of reducing data reading and better optimizing the execution plan are achieved, and the query efficiency is ensured.
The Dataset is also a distributed data set, is introduced in the Spark1.6 version, integrates the advantages of RDD and DataFrame, has strong type characteristics, simultaneously supports Lambda functions, but can only be used in Scala and Java languages. After Spark2.0, Spark fuses the APIs of DataFrame and Dataset together to provide a structured API (structured API) for the developer's convenience, i.e. the user can complete the operation of both APIs through a standard set of APIs.
Based on the characteristics and the performance of the DataFrame data structure and the DataSet data structure, the flexibility of data analysis can be further improved.
Specifically, in one embodiment, a visual data stream is described by a configuration file, the following json string describes a visual data stream:
Figure BDA0003710699740000121
Figure BDA0003710699740000131
description of the fields: flowId is the data flow identification, name is the data flow name, steps is the data flow step, each step has 5 attributes, stepId is the step identification, inputLanes is the input identification of the step, outputLanes is the output identification of the step, engine is the identification of the open source engine used by the step, and datastructure is the data structure identification of the step.
Engine and datastructure are unique attributes of the application, an Engine and a data structure used in the step are specified through the two attributes, a visual data stream Engine determines which Engine and which data structure are used for operating logic contained in the step according to the two attributes, the data stream can simultaneously use two open source computing engines and two data structures, step1 uses a Spark Engine for data processing, a DataFrame data structure for data analysis, step2 uses a Flink Engine for data processing, and a DataSet data structure for data analysis.
The embodiment integrates a plurality of open source computing engines, and one data stream can simultaneously use the plurality of open source computing engines to process data, thereby greatly improving the flexibility of data processing. The data analysis method allows a developer to analyze data in any data structure, one data stream can analyze data by using a plurality of data structures, and the flexibility of data analysis is greatly improved.
Fig. 3 is a schematic structural diagram of a multi-engine visualization data flow implementation apparatus provided in an embodiment of the present application, where the multi-engine visualization data flow implementation apparatus includes:
an obtaining module 301, configured to obtain a configuration file and target data;
the analysis module 302 is configured to analyze the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises the identifications of at least two open source computing engines;
the access module 303 is configured to access at least two corresponding open-source computing engines according to the identifiers of the at least two open-source computing engines;
and the data processing module 304 is used for performing data processing on the target data by utilizing at least two open source computing engines.
In one embodiment, the configuration attribute information further includes an identification of at least two data structures; the device still includes:
the determining module is used for determining at least two corresponding data structures based on the identifications of the at least two data structures;
and the data analysis module is used for carrying out data analysis on the target data by utilizing at least two data structures.
In one embodiment, the configuration attribute information further includes a data flow identification, a data flow name, and a data flow step;
each data stream step comprises an input identifier of the step, an output identifier of the step, an identifier of an open source computing engine used in the step and an identifier of a data structure used in the step;
the identification of the open source computing engine used in the step is one of the identifications of at least two open source computing engines; the identification of the data structure used in the step is one of the identifications of at least two data structures.
In one embodiment, the data processing module 304 and the data analysis module are configured to:
in the first data flow step, a first open source computing engine is used for carrying out data processing on first target data, and a first data structure is used for carrying out data analysis on the first target data after the data processing;
in the second data flow step, a second open source computing engine is used for carrying out data processing on second target data, and a second data structure is used for carrying out data analysis on the second target data after the data processing;
the first open-source computing engine and the second open-source computing engine are different open-source computing engines; the first data structure and the second data structure are different data structures.
In one embodiment, the first open source computing engine and the second open source computing engine are different open source computing engines of a Spark engine and a flight engine.
In one embodiment, the first data structure and the second data structure are different ones of a DataFrame data structure, a DataSet data structure.
In one embodiment, the obtaining module 301 is configured to:
after the front-end equipment generates the configuration file, the configuration file is transmitted through the back-end equipment to obtain the configuration file.
Each module in the apparatus shown in fig. 3 has a function of implementing each step in fig. 2, and can achieve the corresponding technical effect, and for brevity, is not described again here.
Fig. 4 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
The electronic device may include a processor 401 and a memory 402 storing computer program instructions.
Specifically, the processor 401 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 402 may include mass storage for data or instructions. By way of example, and not limitation, memory 402 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 402 may include removable or non-removable (or fixed) media, where appropriate. The memory 402 may be internal or external to the electronic device, where appropriate. In particular embodiments, memory 402 may be non-volatile solid-state memory.
In one embodiment, the Memory 402 may be a Read Only Memory (ROM). In one embodiment, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically Alterable ROM (EAROM), or flash memory, or a combination of two or more of these.
The processor 401 reads and executes the computer program instructions stored in the memory 402 to implement any one of the multi-engine visual data stream implementation methods in the above embodiments.
In one example, the electronic device may also include a communication interface 403 and a bus 410. As shown in fig. 4, the processor 401, the memory 402, and the communication interface 403 are connected via a bus 410 to complete communication therebetween.
The communication interface 403 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
Bus 410 includes hardware, software, or both to couple the components of the electronic device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 410 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
In addition, in combination with the multi-engine visualization data stream implementation method in the foregoing embodiment, the embodiment of the present application may provide a computer-readable storage medium to implement the method. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any one of the multi-engine visual data stream implementation methods of the above embodiments.
It is to be understood that the present application is not limited to the particular arrangements and instrumentalities described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed at the same time.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (10)

1. A multi-engine visual data flow implementation method is characterized by comprising the following steps:
acquiring a configuration file and target data;
analyzing the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises the identifications of at least two open source computing engines;
accessing at least two corresponding open source computing engines according to the identifiers of the at least two open source computing engines;
and performing data processing on the target data by utilizing the at least two open source computing engines.
2. The method of claim 1, wherein the configuration attribute information further comprises an identification of at least two data structures; after analyzing the configuration file to obtain corresponding configuration attribute information, the method further includes:
determining at least two corresponding data structures based on the identifications of the at least two data structures;
and performing data analysis on the target data by utilizing the at least two data structures.
3. The method of claim 2, wherein the configuration attribute information further comprises a data stream identification, a data stream name, and a data stream step;
each data stream step comprises an input identification of the step, an output identification of the step, an identification of an open source computing engine used in the step and an identification of a data structure used in the step;
wherein, the identifier of the open source computing engine used in the step is one of the identifiers of the at least two open source computing engines; the identification of the data structure used in said step is one of the identifications of said at least two data structures.
4. The method for implementing multi-engine visual data flow according to claim 3, wherein said performing data processing on said target data using said at least two open source computing engines and said performing data analysis on said target data using said at least two data structures comprises:
in the first data flow step, a first open source computing engine is used for carrying out data processing on first target data, and a first data structure is used for carrying out data analysis on the first target data after the data processing;
in the second data flow step, a second open-source computing engine is used for carrying out data processing on second target data, and a second data structure is used for carrying out data analysis on the second target data after the data processing;
the first open-source computing engine and the second open-source computing engine are different open-source computing engines; the first data structure and the second data structure are different data structures.
5. The method for implementing multi-engine visual data stream according to claim 4, wherein the first open source computing engine and the second open source computing engine are different open source computing engines of Spark engine and Flink engine.
6. The method of claim 4, wherein the first data structure and the second data structure are different data structures of a DataFrame data structure and a DataSet data structure.
7. The method for implementing multi-engine visual data stream according to any one of claims 1 to 6, wherein obtaining the configuration file comprises:
and after the front-end equipment generates the configuration file, transmitting the configuration file through the back-end equipment to obtain the configuration file.
8. A multi-engine visual data stream implementation device, comprising:
the acquisition module is used for acquiring the configuration file and the target data;
the analysis module is used for analyzing the configuration file to obtain corresponding configuration attribute information; the configuration attribute information at least comprises the identifications of at least two open source computing engines;
the access module is used for accessing the corresponding at least two open source computing engines according to the identifiers of the at least two open source computing engines;
and the data processing module is used for performing data processing on the target data by utilizing the at least two open source computing engines.
9. An electronic device, characterized in that the electronic device comprises: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a multi-engine visual data stream implementation method as recited in any of claims 1-7.
10. A computer-readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the multi-engine visualization data flow implementation method of any of claims 1-7.
CN202210724590.7A 2022-06-23 2022-06-23 Multi-engine visual data stream implementation method, device, equipment and storage medium Pending CN115114333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210724590.7A CN115114333A (en) 2022-06-23 2022-06-23 Multi-engine visual data stream implementation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210724590.7A CN115114333A (en) 2022-06-23 2022-06-23 Multi-engine visual data stream implementation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115114333A true CN115114333A (en) 2022-09-27

Family

ID=83328823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210724590.7A Pending CN115114333A (en) 2022-06-23 2022-06-23 Multi-engine visual data stream implementation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115114333A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117634866A (en) * 2024-01-25 2024-03-01 中国人民解放军国防科技大学 Method, device, equipment and medium for processing data among nodes of workflow scheduling engine

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117634866A (en) * 2024-01-25 2024-03-01 中国人民解放军国防科技大学 Method, device, equipment and medium for processing data among nodes of workflow scheduling engine
CN117634866B (en) * 2024-01-25 2024-04-19 中国人民解放军国防科技大学 Method, device, equipment and medium for processing data among nodes of workflow scheduling engine

Similar Documents

Publication Publication Date Title
US20210286811A1 (en) Continuous cloud-scale query optimization and processing
US9983974B2 (en) Dynamic tracing using ranking and rating
US10810103B2 (en) Method and system for identifying event-message transactions
Agarwal et al. Reoptimizing data parallel computing
US11520761B2 (en) Cloud-based platform instrumentation and monitoring system for maintenance of user-configured programs
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
US11804952B2 (en) Method and system for log data analytics based on SuperMinHash signatures
Turaga et al. Design principles for developing stream processing applications
CN112395333B (en) Method, device, electronic equipment and storage medium for checking data abnormality
US9706005B2 (en) Providing automatable units for infrastructure support
CN115335821B (en) Offloading statistics collection
CN115373835A (en) Task resource adjusting method and device for Flink cluster and electronic equipment
WO2020140624A1 (en) Method for extracting data from log, and related device
US11188532B2 (en) Successive database record filtering on disparate database types
CN107391528B (en) Front-end component dependent information searching method and equipment
CN115114333A (en) Multi-engine visual data stream implementation method, device, equipment and storage medium
US20160140019A1 (en) Event summary mode for tracing systems
Requeno et al. Quantitative analysis of apache storm applications: the newsasset case study
CN112115113B (en) Data storage system, method, device, equipment and storage medium
CN111611479A (en) Data processing method and related device for network resource recommendation
CN115544089A (en) Data processing method, device, equipment and storage medium
CN108459940B (en) Configuration information modification method and device of application performance management system and electronic equipment
CN111292223A (en) Graph calculation processing method and device, electronic equipment and storage medium
CN111309795B (en) Business abnormality positioning method and device, electronic equipment and medium
CN116955056A (en) Data relationship processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination