CN115757603A

CN115757603A - Visual data modeling system and method

Info

Publication number: CN115757603A
Application number: CN202211475857.XA
Authority: CN
Inventors: 刘帅; 任喆; 王楠; 王在清; 谷春喜
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-03-07

Abstract

The invention provides a visual data modeling system and a visual data modeling method, which comprise the following steps: the front end displays the source field of the data table, generates canvas and provides the canvas for a user to realize a data model; the front end submits a data model to a rear end scheduling system for debugging; the front end transmits a confirmation instruction and an operation strategy to the back end, and the back end stores the confirmation instruction and the operation strategy; the back-end scheduling system acquires task scheduling information, analyzes the data model and sends a corresponding analysis task to the clickwouse cluster; the ClickHouse cluster automatically splits tasks and stores execution results; and the front end generates a page, displays the task query result and completes data modeling. The invention can self-define the data model, display the debugging state of the data model in real time and enable a user to quickly know the correctness of the model; the ClickHouse is used as a data carrier and a large-scale parallel processing frame of the bottom layer of the system, so that the influence on the performance of a source database is reduced, and the data analysis speed is increased; and a display page is automatically generated, so that a user can flexibly display data according to the service requirement of the user.

Description

Visual data modeling system and method

Technical Field

The invention belongs to the technical field of big data analysis, and particularly relates to a visual data modeling technology.

Background

At present, with the increasing development of big data technology, the online rate of enterprise data is continuously improved, and enterprises often accumulate a large amount of data in hand. How to separate out valuable information from large amounts of data becomes a management requirement for many enterprises. Visualization data modeling systems have evolved to focus users on the data itself. The method is suitable for personnel who practice business in person, and the personnel often cannot specialize in data modeling technology. Therefore, research on a set of convenient and beautiful front-end scheme and a highly extensible and highly concurrent back-end scheme is needed on the system.

Patent document CN 202010497130.6 discloses a data modeling method and device based on Spark SQL (Spark SQL is a module for Spark to process structured data, and can be used as a distributed SQL query engine) and materialized views, which provides various data source plug-ins, and provides a drag-and-drag convenient visual data modeling mode for users by using jsplus. And a materialized view is provided, and the performance of a data model created by a user for inquiring is improved. The problem that analysis of a data model fails under the condition of insufficient resources because a traditional JVM (JAVA virtual machine) or a Relational Database (RDBMS) is used for executing SQL (SQL is a standard computer language for accessing and processing the database) tasks aiming at an offline data warehouse is solved; meanwhile, the problems that the materialized view after the user data model is operated is stored only singly and cannot flexibly aim at the user storage scene are solved. However, the scheme is based on a Spark (open source cluster computing framework) framework, when the scheme is operated, multiple data sources are directly connected for reading and writing, when large data volume analysis is faced, huge performance influence can be caused on a source database, and data can not be normally used by data owners.

Disclosure of Invention

The invention provides a visual data modeling system and a visual data modeling method, which aim to: the problem that when a Spark framework is operated, a plurality of data sources are directly connected for reading and writing, and when large data volume analysis is faced, huge performance influence is caused on a source database, so that data can not be normally used by a data owner is solved; meanwhile, the problems of resource waste and incapability of visually displaying data results caused by problems of model pre-debugging, model result displaying and the like in an actual data modeling scene are solved.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a visual data modeling method based on clickwouse, which is a columnar database for online analysis, typically for massively parallel analytical computing. The method comprises the following steps:

in response to a user's instruction, the front end exposes the source fields of the data table and generates a canvas for the user to implement the data model. Therefore, a user can realize the data model according to the requirement by checking the source field of the data table.

And responding to a debugging instruction of a user, submitting a data model to a back-end scheduling system by the front end, receiving the submitted data model by the back-end scheduling system, debugging by the back-end scheduling system, recording the log of each step, and returning the log to the front end.

And responding to the instruction of the user, the front end transmits a confirmation instruction and an operation strategy to the back end, and the back end stores the confirmation instruction and the operation strategy for a long time. Namely, the user checks the log to confirm that the data model is correct and defines the operation strategy.

The back-end scheduling system scans the persistence layer, acquires task scheduling information, analyzes the data model, and sends a corresponding analysis task to the ClickHouse cluster. In this step, the back-end scheduling center receives the task scheduling request, analyzes the data model of the formal task, and analyzes the complex SQL of each model according to the model and delivers the complex SQL to the ClickHouse cluster for execution.

And the ClickHouse cluster automatically splits the task, and stores the execution result and directly stores the execution result into a newly-built result table.

And finally, responding to the instruction of the user, generating a page at the front end, displaying a task query result and finishing data modeling.

Further, the method for implementing the data model by the user comprises the following steps: the user uses a front-end graph editing engine (e.g., ANTV-X6) to model the data table by dragging and dragging different operators, data sources, and wiring the different operators, data sources on the interface canvas.

Further, the operators include, but are not limited to: basic operators such as filtering, association, grouping and conversion and user-defined operators.

Further, the data model debugging is as follows: and after the user preliminarily realizes the data model, operators can be selected for debugging, and during debugging, the operators are singly executed until the operators selected by the user are executed.

Further, the user-defined operation policy is: the method comprises three types of execution strategies of timing operation, quantitative execution and immediate execution.

Furthermore, when a formal task is scheduled to analyze the data model, the sql statements analyzed by each operator are combined and uniformly submitted to a ClickHouse cluster as a task.

Further, the front-end generated page is: and the front end generates a result query page by analyzing the data structure of the result table and the operation strategy of the data model.

According to the visual data modeling method provided by the technical scheme, the influence on the data source base can be reduced by extracting the unified data to the ClickHouse; by providing a plurality of operators and using the ANTV-X6, a canvas for constructing a data model is provided for a user, and the user can carry out portable visual data modeling through dragging and pulling and connecting lines; and the model pre-debugging and the model result page display are provided, the portability of the data query of the user is improved, and the following 3 parts of functions are provided:

data modeling visualization: and (3) performing data modeling on the interface canvas by using an ANTV-X6 front-end open source component and dragging, pulling and connecting the data source and the operator. The user can define a data model which accords with the current business requirement by combining different data sources and operators.

The dispatching center: different execution logics are adopted by analyzing the operation strategy defined by the debugging task or the formal task. And the debugging task aims at the operators, and SQL of each operator is analyzed and delivered to the ClickHouse cluster for execution. And (4) analyzing the complex SQL of each model by the formal task aiming at the model, and delivering the complex SQL to the ClickHouse cluster for execution.

And (3) displaying a model result: aiming at the calculated model result, the front end can analyze the operation strategy and the result table field to obtain the query parameter of the model result, and display pages of different results can be obtained by rendering the list of different model result sets and the query parameter. Meanwhile, aiming at different data results, the front end can define logic such as field aliases, field display forms, condition aliases and the like to self-define a display page.

Thus, in a second aspect, the invention also provides a visual data modeling system comprising a front end, a back end, and a database. Wherein:

the front end comprises a modeling canvas module and a result display module;

the rear end comprises a model management module, a scheduling center module and a unified query module;

the database includes a metadata module and a ClickHouse cluster.

Specifically, the modeling canvas module, the model management module and the metadata module generate interaction to form a first user modeling process; the model management module, the metadata module, the scheduling center module and the ClickHouse cluster generate interaction to form a model debugging and running process; and the result display module, the unified query module and the ClickHouse cluster generate interaction to form a model result query process.

In summary, due to the adoption of the scheme, the invention has the beneficial effects that:

1. the visualized data modeling system and method based on the ClickHouse provided by the invention use the front-end open source component of the ANTV-X6, so that actual business personnel can define the data model in a portable way, and the debugging state of the data model is displayed in real time, so that a user can quickly know the correctness of the model.

2. According to the visualized data modeling system and method based on the ClickHouse, the ClickHouse is used as a data carrier and a large-scale parallel processing frame of the bottom layer of the system, so that the performance influence on a source database is reduced, and the data analysis speed is increased.

3. According to the visualized data modeling system and method based on the ClickHouse, provided by the invention, aiming at the result analyzed by the data model, the display page is automatically generated, and the configuration function is provided for the user, so that the user can flexibly display data according to the service requirement of the user.

Description of the drawings:

in order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.

FIG. 1 is an architecture diagram of a ClickHouse-based visual data modeling system according to the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to explain the technical means of the present application, the following description will be given by way of specific examples.

Example 1

The embodiment provides a visualization data modeling system based on clickwouse, and as shown in fig. 1, the system includes a modeling canvas module 101 and a result presentation module 102 at the front end, a model management module 103, a dispatch center module 104 and a unified query module 105 at the back end, and a metadata module 106 and a clickwouse cluster 107 of a database.

The modeling canvas module 101, the model management module 103 and the metadata 106 generate interaction to form a first user modeling process, and the interaction content of the modeling canvas module 101 and the model management module 103 mainly includes model basic information, debugging model commands, operation strategies and the like. The interaction content of the model management module 103 and the metadata module is mainly model basic information and operation policy.

The model management module 103, the metadata module 106, the scheduling center module 104 and the ClickHouse cluster 107 generate interaction to form a model debugging and running process, the interaction content of the model management module 103 and the scheduling center module 104 is mainly model debugging information, the interaction content of the metadata module 106 and the scheduling center module 104 is mainly a model running strategy, and the interaction content of the scheduling center module 104 and the ClickHouse cluster 107 is mainly a task execution SQL.

The result display module 102, the unified query module 105 and the ClickHouse cluster 107 generate interaction to form a model result query process, the interaction contents of the result display module 102 and the unified query module 105 mainly comprise result table basic information and result list information, and the interaction contents of the unified query module 105 and the ClickHouse cluster 107 mainly comprise query SQL statements.

The canvas provided by the modeling canvas module 101 is realized by using ANTV-X6, an operator and a data source can be dragged on the canvas, the operator and the data source form a directed acyclic graph through connecting lines, and finally a data model is formed. The data model comprises operator information, data source information and connecting line information. And the module can determine the debugging command and set the operation strategy of the model according to the operation of the user.

The result displaying module 102 may render the page in real time according to the result table information query interface and the result query interface provided by the unified query module 105. Various query conditions for the model results and listing information for the model results may be presented.

The model management module 103 provides a model basic information query interface, a function of setting a model operation strategy, a function of debugging and triggering a model, and a function of analyzing model operator information. The dispatch center module 104 may be triggered to perform the debugging and collect the debugging information.

The scheduling center module 104 provides functions of invoking and executing SQL and analyzing operation policy. The debugging tasks provided by the model management module 103 can be scheduled, executed in the clickwouse cluster 107, and debugging information is returned; or obtaining the operation policy from the metadata 106, resolving and scheduling the operation policy, and executing the operation policy in the ClickHouse cluster 107.

Wherein the unified query module 105 provides an interface for querying the result table information and an interface for querying the result list information. Query SQL statements may be generated by parsing the query parameters transmitted by the result presentation module 102 for execution in the ClickHouse cluster 107.

The metadata 106 mainly stores operator basic information, model operation strategies, and model debugging information.

The clickwouse 107 is mainly divided into two functions, namely data storage and data query analysis. The data storage mainly stores a data source table, a data result table and a debugging data result table. The assumed data query analysis function is mainly an analysis task triggered by the dispatch center module 104 and a query task triggered by the unified query module 105.

Example 2

The embodiment is a visualization data modeling method based on ClickHouse, which mainly comprises the following implementation steps:

1. in response to a user's instruction, the front end exposes the source fields of the data table and generates a canvas for the user to implement the data model.

2. Responding to the debugging instruction of the user, the front end debugs the data model submitted to the back end scheduling system, and the back end scheduling system debugs, records the log of each step and returns the log to the front end.

3. Responding to the instruction of the user, transmitting a confirmation instruction and an operation strategy to the back end by the front end, and storing for a long time by the back end;

4. the back-end scheduling system scans the persistence layer, acquires task scheduling information, analyzes the data model, and sends a corresponding analysis task to the ClickHouse cluster.

5. And the ClickHouse cluster automatically splits the tasks and stores the execution result.

6. And responding to the instruction of the user, generating a page at the front end, displaying a task query result, and completing data modeling.

The operation based on the method is as follows:

a user carries out data modeling on an interface canvas by dragging and pulling a data source and an operator through a modeling canvas module 101 realized by ANTV-X6. And defining a data model which accords with the current business requirement by combining different data sources and operators. Then, a user can select an operator on the canvas for debugging, the model management module 103 analyzes the SQL corresponding to the operator to trigger the scheduling center module 104 to call the ClickHouse cluster 107 for execution, and the model management module 103 records debugging information to the metadata 106. After the user debugging is passed, an operation strategy can be set for the model, and the operation strategy defined by the user is as follows: the method comprises three types of execution strategies of timing operation, quantitative execution and immediate execution. The model management module 103 records the operation policy to the metadata 106. Then, the scheduling center module 104 obtains the operation policy of the model and the SQL statement corresponding to the model from the metadata 106, and submits the SQL statement as a task to the ClickHouse cluster 107 in a unified manner according to the operation policy. The user may query the model results from the result presentation module 102, and the result presentation module 102 may parse the operation policy and the result table fields to obtain query parameters of the model results, and may obtain presentation pages of different results by rendering the list of different model result sets and the query parameters. Meanwhile, for different data results, the front end can define logic such as field aliases, field display forms, condition aliases and the like to freely define the display page. At this point, the user completes data modeling from business modeling to model query.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A visual data modeling method is characterized by comprising the following steps:

responding to the instruction of a user, displaying a source field of the data table at the front end, generating canvas and providing the canvas for the user to realize a data model;

responding to a debugging instruction of a user, the front end debugs a data model submitted to the back end scheduling system, and the back end scheduling system debugs, records the log of each step and returns the log to the front end;

responding to the instruction of the user, transmitting a confirmation instruction and an operation strategy to the back end by the front end, and storing for a long time by the back end;

the back-end scheduling system scans the persistence layer, acquires task scheduling information, analyzes the data model and sends a corresponding analysis task to the ClickHouse cluster;

the ClickHouse cluster automatically splits tasks and stores execution results;

and responding to the instruction of the user, generating a page at the front end, displaying a task query result, and completing data modeling.

2. A visual data modeling method according to claim 1, characterized in that said method of implementing a data model is: responding to the user to use a front-end diagram editing engine, and modeling the data table on the interface canvas by dragging and pulling different operators and data sources and connecting lines of the different operators and the data sources; the operators include, but are not limited to: basic operators such as filtering, association, grouping, conversion and the like and user-defined operators.

3. A visual data modeling method as claimed in claim 1 wherein the data model is debugged as: and debugging according to the operator selection of the user, wherein during debugging, the operator is executed singly until the operator selected by the user is executed.

4. A visual data modeling method in accordance with claim 1 wherein the operating strategy comprises: the method comprises three types of execution strategies of timing operation, quantitative execution and immediate execution.

5. The visual data modeling method of claim 1, wherein when a formal task is scheduled to parse the data model, sql statements parsed by each operator are combined and submitted as a task to the ClickHouse cluster in a unified manner.

6. The visual data modeling method of claim 1, wherein the front-end generation page is: and the front end generates a result query page by analyzing the data structure of the result table and the operation strategy of the data model.

7. A visual data modeling system, the system comprising a front end, a back end, and a database;

the front end comprises a modeling canvas module (101) and a result display module (102);

the rear end comprises a model management module (103), a scheduling center module (104) and a unified query module (105);

the database comprises a metadata module (106) and a ClickHouse cluster (107);

the modeling canvas module (101), the model management module (103) and the metadata module (106) generate interaction to form a first user modeling process; the model management module (103), the metadata module (106), the scheduling center module (104) and the ClickHouse cluster (107) generate interaction to form a model debugging and running process; and the result display module (102), the unified query module (105) and the ClickHouse cluster (107) generate interaction to form a model result query process.

8. A visual data modeling system according to claim 7, characterized in that the interaction content of the modeling canvas module (101) and the model management module (103) comprises model basic information, debugging model commands, operation policies; the interaction content of the model management module (103) and the metadata module (106) comprises model basic information and operation strategies.

9. The visual data modeling system of claim 7, wherein the model management module (103) and dispatch center module (104) interaction content is model debugging information; the interaction content of the metadata module (106) and the dispatching center module (104) is a model operation strategy; the dispatching center module (104) and the ClickHouse cluster (107) interact content to execute a task SQL.

10. The visual data modeling system of claim 7, wherein the interactive content of the results presentation module (102) and unified query module (105) is results list basis information and results list information; the interactive content of the unified query module (105) and the ClickHouse cluster (107) is a query SQL statement.