CN116226788A

CN116226788A - Modeling method integrating multiple data types and related equipment

Info

Publication number: CN116226788A
Application number: CN202310500354.1A
Authority: CN
Inventors: 山其本; 李潘; 王耀威; 李志播; 黄伟阳
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2023-06-06
Anticipated expiration: 2043-05-06
Also published as: CN116226788B

Abstract

The invention discloses a modeling method and related equipment for fusing multiple data types, wherein the method comprises the following steps: adding a database object, a file object and a graph database object; creating a collision model on the collision model page, and pushing the database object, the file object and the graph database object to a model canvas; when the components are connected and stored, checking the connection specification, wherein the checking does not pass through error reporting prompt; filling in a task operation name, and analyzing and operating the whole collision model through an operation controller, an SQL script converter and a data buffer; and after the collision model is operated, the result data and log information of each operation of the collision model are checked in a result check page. According to the invention, the data which originally needs to be operated in various environments is uniformly converted into the data which is operated in the FLINK engine, so that the analysis and comparison of the cross data types are realized, and the model construction of the zero code is realized through the operation processing component.

Description

Modeling method integrating multiple data types and related equipment

Technical Field

The present invention relates to the field of data analysis modeling technology, and in particular, to a modeling method, system, terminal and computer readable storage medium for integrating multiple data types.

Background

With the development of the age, structured data, text data, big data, graph data, time sequence data and the like are layered endlessly, so that the progress of society and technology is greatly promoted, and various big data platforms and data processing tools are also continuously enriched. When extracting and analyzing multiple types of data in the current big data age, data integration is often needed first, different database types, different table structures and fields are integrated into one big database system, and then SQL sentences are written through analysis tools for query.

However, the conventional processing mode needs to build a set of large data middle tables, access different types of data, carry out data management and fusion, and meanwhile, needs to have the professional capability of using a database and SQL writing capability for a user; the SQL sentences or functions written by the user can only be used in the data analysis, and the SQL sentences need to be rewritten next time when the user encounters the same type of service scene.

Secondly, when many data are analyzed, the CSV and EXCEL file data are imported into the database or loaded by a data analysis tool and then compared with the database table to be compared; moreover, currently popular graph databases have few tools to analyze and compare against database tables, and often the graph databases are merely presented as queries after being imported into the graph database.

Accordingly, the prior art is still in need of improvement and development.

Disclosure of Invention

The invention mainly aims to provide a modeling method, a modeling system, a modeling terminal and a computer readable storage medium which are integrated with various data types, and aims to solve the problem that the data processing in the prior art is imperfect and is not easy to operate.

In order to achieve the above object, the present invention provides a modeling method of fusing a plurality of data types, the modeling method of fusing a plurality of data types including the steps of:

adding a database object, a file object and a graph database object;

creating a collision model on the collision model page, and pushing the database object, the file object and the graph database object to a model canvas;

when the components are connected and stored, checking the connection specification, wherein the checking does not pass through error reporting prompt;

filling in a task operation name, and analyzing and operating the whole collision model through an operation controller, an SQL script converter and a data buffer;

and after the collision model is operated, the result data and log information of each operation of the collision model are checked in a result check page.

The modeling method integrating multiple data types, wherein the adding database object, file object and graph database object specifically comprises the following steps:

storing data source information through JDBC connection parameters of a page configuration database, selecting a table under the data source after the data source is configured, and setting a required field as a data object;

uploading file resources to a system in a file uploading mode, storing the file resources in a MINIO object storage database, setting file names on pages and previewing the preset number of lines of data of the files;

and selecting and designating entities or relations under the graph data source through the connection parameters of the page configuration graph database and storing, and generating a graph database object.

The modeling method integrating multiple data types, wherein the step of creating a collision model on a collision model page, pushing a database object, a file object and a graph database object to a model canvas, further comprises:

and performing data object connection construction.

The modeling method integrating multiple data types is characterized in that the operation controller is used for being responsible for starting and closing the operation of the model, monitoring the operation condition of the model and recording log information;

the SQL script converter is used for converting the connection relation of the model component into an executable SQL sentence;

the data buffer is responsible for storing and recording temporary data and result data of each component node operation of the model.

The modeling method for fusing multiple data types, wherein the model canvas comprises an object list column, a component list column, a toolbar and a canvas.

The modeling method integrating multiple data types, wherein the component comprises the following steps: intersection, union, difference set, aggregation, filter row, compute column, field settings, and SQL operator.

In addition, to achieve the above object, the present invention further provides a modeling system that merges multiple data types, wherein the modeling system that merges multiple data types includes:

the object configuration module is used for adding a database object, a file object and a graph database object;

the model assembly module is used for creating a collision model on the collision model page and pushing the database object, the file object and the graph database object to a model canvas;

the standard checking module is used for checking the connection standard when the components are connected and stored, and checking the connection standard without error reporting;

the model operation module is used for filling in task operation names, analyzing and operating the whole collision model through the operation controller, the SQL script converter and the data buffer;

and the result checking module is used for checking the result data and log information of each operation of the collision model on the result checking page after the operation of the collision model is completed.

In addition, to achieve the above object, the present invention also provides a terminal, wherein the terminal includes: the system comprises a memory, a processor and a modeling program which is stored in the memory and can run on the processor and is used for fusing multiple data types, wherein the modeling program fusing multiple data types is executed by the processor to realize the steps of the modeling method fusing multiple data types.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium storing a modeling program fusing a plurality of data types, which when executed by a processor, implements the steps of the modeling method fusing a plurality of data types as described above.

In the invention, database objects, file objects and graph database objects are added; creating a collision model on the collision model page, and pushing the database object, the file object and the graph database object to a model canvas; when the components are connected and stored, checking the connection specification, wherein the checking does not pass through error reporting prompt; filling in a task operation name, and analyzing and operating the whole collision model through an operation controller, an SQL script converter and a data buffer; and after the collision model is operated, the result data and log information of each operation of the collision model are checked in a result check page. According to the invention, the data which originally needs to be operated in various environments is uniformly converted into the data which is operated in the FLINK engine, so that the analysis and comparison of the cross data types are realized, and the model construction of the zero code is realized through the operation processing component.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a modeling method of the present invention incorporating multiple data types;

FIG. 2 is a flow chart of a modeling process for implementing multiple data types in a preferred embodiment of the modeling method of the present invention incorporating multiple data types;

FIG. 3 is a flow chart of a model operation process in a preferred embodiment of the modeling method of the present invention incorporating multiple data types;

FIG. 4 is a schematic view of an interface of a collision model in a preferred embodiment of the modeling method of the present invention incorporating multiple data types;

FIG. 5 is a schematic diagram of a preferred embodiment of a modeling system incorporating multiple data types of the present invention;

FIG. 6 is a schematic diagram of the operating environment of a preferred embodiment of the terminal of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

According to the invention, data analysis is performed in the same model by integrating a plurality of data sources such as a database, CSV, EXCEL, graph database and the like, and the modeling analysis process is perfected by using rich components such as intersection, union, difference, aggregation, screening rows, calculation columns, field setting, SQL operators and functions and the like, so that zero-code and zero-threshold data analysis modeling is realized. The model is executed in the FLINK environment, and the obtained result data can be previewed and checked in a log. Apache Flink is an open source stream processing framework that is applied to distributed, high performance, high availability data stream applications. Techniques used in the present invention include JDBC data source connection, MINIO object data storage, FLINK, graph database, SQL statement conversion, etc.

The modeling method for fusing multiple data types according to the preferred embodiment of the present invention, as shown in fig. 1, includes the following steps:

step S10, adding a database object, a file object and a graph database object.

Specifically, as shown in fig. 2, a database object is added: and storing the data source information through the JDBC connection parameters of the page configuration database. After the data source is configured, selecting a table under the data source and setting a required field as a data object.

And when the database object is added, the data source access configuration page is carried out to support the database. Clicking the button to pop up the data source configuration page, selecting the data source type, filling in the connection parameter information such as the data source name, IP address, port number, user name, password, and the like, clicking the button to test the connection condition of the data source, and clicking the button to store the data source. After the data source is successfully stored, a data source object menu is entered, an object button is clicked, an object name and description are input in a newly-added object page, a data source is selected, a data table drop-down list box displays all tables under the data source according to the selected data source, after the data table needed to be used is selected, all field information of the table is read and displayed, field display or hiding setting can be carried out behind the field list, and hidden fields are not displayed in a model. The background connection of the database reads the database and acquires the table and field information under the database through the connection mode of JDBC. Namely, the drag modeling is carried out by configuring JDBC connection parameters of the database and taking the database table as a data object.

As shown in fig. 2, a file object is added: and uploading CSV and EXCEL file resources to a system in a file uploading mode, storing the files in a MINIO object storage database, setting file names on pages, and previewing the first 10 lines of data of the files. The structured data file necessary for CSV and EXCEL files has a header name in the first row and is not empty, the number of columns is not more than 30, and the file size is not more than 100M.

When a file object is added, entering a file object configuration page, selecting local CSV and EXCEL files through an uploading button, filling in information such as file names and descriptions, and uploading resources. The structured data file necessary for the uploaded CSV, EXCEL file must have a header name in the first row and not be empty, a column number of not more than 30 columns, and a file size of not more than 100. The successfully uploaded file is saved to the MINIO database, and list display is carried out on the file object page, and the first 10 rows of data viewing can be carried out by selecting the file click [ preview ] button.

As shown in fig. 2, a graph database object is added: and configuring and storing connection parameters of the graph database through the page, and then selecting an entity or a relation under a designated graph data source to generate a graph database object.

When the graph database object is added, the user enters into the graph database object configuration page, clicks the [ new increment ] button to pop up the connection configuration page of the graph database, can select the connection types of the NEBULA and NEO4J graph databases, and fills in the connection parameter information such as names, IP addresses, ports, user names, passwords, graph spaces and the like. After the creation is successful, clicking a map object button in a map object menu page, inputting an object name and description in the newly added object page, selecting an entity or relationship needed to be used by the map space, and confirming storage. The map database obtains map space, entity list, relation list and attribute information in the map database through CLIENT CLIENTs of which the back ends are configured with NEBULA and NEO 4J.

And S20, creating a collision model on the collision model page, and pushing the database object, the file object and the graph database object to a model canvas.

Specifically, as shown in fig. 2, a collision model is created: and creating a collision model on the collision model page, and filling in the name and description. And entering a collision model page, clicking a model button, and filling in a name and description to create a model. Clicking the model name in the collision model list can enter the editing page of the model. The collision model page consists of an object list column, a component list column, a tool bar and a canvas, and the model assembly module interface is detailed in fig. 4.

And entering a model editing page, pushing the objects and components from the object list column and the component list column into the canvas, and pulling out a connecting line to connect with other components by moving a mouse to a small dot on the objects and the components. The starting node must be a data object, and the object is connected with a data processing component, so that the component can be connected with other components again to form a complete collision model.

The data source object (i.e., database object) is dragged to the canvas: entering the configuration page of the collision model, selecting an object from the list of data source objects (i.e., database objects) on the left hand side, and pushing the object into the model canvas.

The file object is dragged to the canvas: entering a configuration page of the collision model, and selecting an object from a file object list on the left side to push the object into a model canvas.

The diagram object (i.e., diagram database object) is dragged to the canvas: entering the configuration page of the collision model, selecting an object from the left list of graph objects (i.e., graph database objects) to push into the model canvas.

Data component connection configuration such as cross-correlation difference: entering a configuration page of the collision model, selecting a component from the left data processing component list, pushing the component into the canvas, and connecting with a data source object, a file object or a graph object. And the connection relation of the data processing components is configured according to the service requirement, the component configuration information is stored, and finally the output components are configured, so that a data collision model is created after all the configuration is completed.

The invention provides components such as intersection, union, difference set, aggregation, screening row, calculation column, field setting, SQL operator and the like, and embeds FLINK function set.

Intersection of: the method comprises the steps of carrying out the same data matching filtering on the data object, setting the associated field, taking the field value to output the same data, providing two modes of equivalent connection and left connection, and selecting the field to be output.

Union: the method includes the steps of merging data of two data objects, merging and outputting key fields of the two data through corresponding fields of a master table and a slave table, and providing duplicate removal and duplicate options.

Difference set: the method comprises the steps of performing data elimination on a data object, deleting associated field data existing in a slave table from the master table by setting associated fields of the master table and the slave table, and outputting the data.

Polymerization: grouping by one or more fields, commonly referred to as dimension columns, and then aggregate some fields, such as summing, averaging, maximizing minimum, or other aggregation function. The aggregation component is internally provided with a common FLINKSQL aggregation function, and when the aggregation component is used, field selection and function selection are carried out for configuration.

Screening: the condition screening is carried out on the existing data, and the condition setting is similar to WHERE condition setting after SQL sentences. And selecting a field needing condition filtering, and setting the matching condition of the field to be equal to, greater than, less than, greater than or equal to, less than or equal to, including, not including, being empty, not being empty, and the like and the matched value.

The calculation column: the calculation of the columns is to filter and add a new column to the output columns, some columns can be unhooked without outputting to the next stage, and some columns need to be manually added, for example: the first column a1 and the second column a2 of a data set are both numerical values, and the third column A3 is a1+a2 to add the third column to sum the first two columns of data can be used as a point [ new addition ] button. The field name, the data type, the field alias and the field annotation of the column are filled in the newly added column window page, the content of the column can be written in an editing frame, the use of an arithmetic formula and a function is supported, and five general time and date functions, character string functions, arithmetic functions, hash functions and conditional functions are built in.

Setting fields: whether the field configuration is output to the next node, the field alias modification, the field annotation modification, and the like are performed on all the fields of the current component.

SQL operator: the method supports users to write SQL sentences for data processing, and the SQL sentences are mapped by taking { $table_no } as an accessed object, so that a plurality of input objects can be configured and sequentially correspond to each other according to the connection sequence of the components. Such as { $table_1} corresponding to the first connection point of the component box and { $table_2} corresponding to the second connection point of the component box. Statements written within the SQL operator must conform to the syntax specification of FLINKSQL.

And step S30, when the components are connected and stored, checking the connection specification, wherein the checking does not pass through the error reporting prompt.

Specifically, when the object and the component are stored after connection configuration is finished, connection rule verification of the component is performed, and the components are not allowed to exist in isolation, are not allowed to have no object, are not allowed to be input and output in a mismatching mode, and the like. Each component has a connection specification according to own connection parameters and output, and connection parameters which do not follow the specification can report errors and output suggestion.

Intersection component: there must be two upper-level connections, at least a pair of field association conditions are set, whether the connection type is left connection or internal connection, setting the fields to be output, etc., and if the configuration is incorrect during storage, an error prompt is reported.

Union component: there must be two upper-level connections, the master table field that must be configured to output and the corresponding slave table field, the master table selects several output fields, the slave table needs to configure several fields corresponding to the master table field, and if the configuration is incorrect during storage, an error prompt is given.

A difference component: there must be two upper level connections, the matching associated fields of the master and slave tables must be configured, the master table must be configured with output fields, and an error prompt may be issued if the configuration is incorrect when saved.

And (3) an aggregation component: the grouping field, the aggregation function and the aggregation field must be configured, the aggregation function must be a FLINKSQL built-in function, and the functions which do not exist cannot be filled in, and if the configuration is incorrect during storage, an error prompt is reported.

Screening line components: at least one field filter condition must be set, including: equal to, greater than, less than, equal to or less than, less than or equal to, contain, not contain, be empty and not be empty etc., different judgment can be carried out on different types of fields, and misinformation prompt can not be carried out on type inconsistency.

A calculate column component: all fields connected at the upper level are output by the default of the calculation column, and the addition and deletion configuration can be carried out, but at least one output field is required, the alias of the output field column cannot be null, the fields are required to meet the built-in function of FLINKSQL if the added calculation function is added for conversion, and if the configuration is incorrect during storage, an error prompt can be given.

A field setting component: the field component defaults to output all fields and information of the upper connection, can add and delete configuration and add comments, but has at least one output field, and if the configuration is incorrect during storage, an error prompt can be reported.

SQL operator component: the { $table_no } format must be used as an input table, a statement must be written according to the grammar specification of FLINKSQL, the beginning of SELECT is necessary, the SQL operator is not allowed to directly use INSERT, UPDATE statements, and if the configuration is incorrect during storage, an error prompt is reported.

When the component parameter information is configured and stored, errors are reported according to respective rules of the components, and a suggestion is output, for example: the intersection component sets the Id field (integer) of table1 to be associated with the setId field (string) of table2, but the suggestion that table2 has an Id field of integer type: id of Table1 is more matched to Id connection of Table 2; when the aggregation component sets the grouping statistics of table1, selects the type field for grouping, selects SUM (number) for summation, and the component finds that the number field is a character string type which does not accord with the specification, and can carry out suggestion: please select the numerical types of Id, math, englist, etc. to make function statistics.

And S40, filling in a task operation name, and analyzing and operating the whole collision model through an operation controller, an SQL script converter and a data buffer.

Specifically, as shown in fig. 2, after the model is created, a [ operation ] button can be clicked, a task operation name is filled in, the front end submits model operation parameters to the background, and the background analyzes and operates the whole data model through an operation controller, an SQL script converter and a data buffer; the running controller is used for being responsible for starting and closing the running of the model, monitoring the running condition of the model and recording log information; the SQL script converter is used for converting the connection relation of the model component into an executable SQL sentence; the data buffer is responsible for storing and recording temporary data and result data of each component node operation of the model.

After the collision model is built, clicking a [ run ] button on the toolbar to run the model. The model operation process flow diagram is detailed in fig. 3. And the model operation starts to read the model connection configuration information from the database, and extract the object data source configuration and the object connection relationship configuration. And respectively reading a data table of a data source through JDBC connection according to the object data type, reading data through CSV and EXCEL file storage addresses, and reading data of an entity or a relation through a graph database Client. The object connection relation configuration is obtained, the object information, the component parameter configuration information and the connection information of the object and the component are analyzed, all the information is packaged into a class and submitted to the SQL script converter, and all the information is converted into SQL sentences by the SQL script converter.

The SQL script converter first parses the object connection configuration, and creates a temporary table name for each object, for example: the data TABLE1 of the data source is acquired, a temporary TABLE tmp_jdbc_table1 is created, the CSV file creates a temporary TABLE tmp_csv_table2, and the entity object of the graph data person creates a tmp_gp_table3. Then, the SQL script transformer replaces the object in the component connection with a temporary table, and transforms into a format of select. And after the SQL sentence integration is finished, SQL grammar verification is performed. The SQL grammar checking aims to stop the task to run for error reporting reminding by finding grammar errors before submitting the FLINK running, and reduces the running waste of server resources. SQL grammar checking judges the constituent structures and functions, keywords and the like of CREATE TABLE, SELECT, INSERT INTO, WHERE, WITH, JOIN, GROUP BY and ORDER BY, and which component link statement is wrong transmits error information to an error analysis module for processing.

After the statement is verified, the statement is submitted to the FLINK for operation, the FLINK is decomposed according to all the submitted SQL statements, and the operation statement of each component is regarded as an operation node, and the conversion statement of each node is operated in sequence. In the running process, the running controller monitors the running condition of each component, stops tasks when running errors occur, records and stores error information, and transmits the error information to the error analysis module for processing.

The error analysis module is responsible for analyzing error information and classifying the error information into three types of data processing abnormality, system resource abnormality and script execution abnormality. The data processing abnormality prompts errors such as data acquisition connection, data extraction and the like; prompting system resource abnormality aiming at software and hardware environment operation errors of FLINK; aiming at the execution errors of the submitted and operated SQL sentences, the script execution anomalies extract error positions and classify according to the commonly used error conditions of SQL, such as: time field format mismatch, string type conversion format error, comparison field format mismatch, function configuration error, null value present, table or field absence, etc. And after the script execution abnormality is positioned to the error point, the script execution abnormality is related to the component in which the error occurs, and error reporting reminding and suggestion feedback are carried out by combining the characteristics of the component. The advice feedback will be directed to the wrong connection configuration parameters and rule advice will be made in combination with the component characteristics.

When the assembly runs, the data to be run is extracted into the memory for data processing, when a large amount of data memory is insufficient, the hard disk resource is called for storage until the assembly node runs completely, and the memory and the hard disk resource are released before the operation of the next assembly node is entered. Such as: the intersection component is connected with the two

data objects

1 and 2, data of the data objects 1 and 2 are extracted into the memory to form a temporary table1 and a temporary table2 in the running process, the temporary table3 is obtained after intersection processing, and before the temporary table3 is processed with the next component, resources of the temporary table1 and the temporary table2 in the memory are released, so that unnecessary consumption of the memory is reduced.

After each component node runs, recording temporary table output data and running logs of the components, and storing final result data in a warehouse.

And S50, after the collision model is operated, checking result data and log information of each operation of the collision model in a result checking page.

Specifically, after the model is finished, all running model tasks are checked in a task result menu page. Data result viewing, log viewing, and task deletion may be performed on the task list.

Further, as shown in fig. 2 and fig. 4, the method for modeling based on the FLINK fusion of multiple data types of the present invention comprises the following steps:

s1, adding a database object: selecting a data source in a database resource menu page, selecting a required data source type, such as selecting MYSQL, filling parameter information such as a data source name of MYSQL student data, an IP address, a port number, a user name, a password and the like, and clicking (determining) to save the data source configuration. After the data source configuration is finished, a database object button is selected (newly added) in an object management page, a mysql student data source is selected in the page, then a student table is selected in a data table drop-down frame, all fields of the table can appear below the data table, and alias setting and checking filtering can be carried out on the fields.

S2, adding a file object: and selecting (uploading) a file resource menu page, selecting a local CSV (client service provider) file and an EXCEL file on the uploading file page, setting a file name and describing click submission. File resources are submitted to the background and stored in the MINIO object storage database, and the file object can preview the first 10 lines of data and headers. Such as: and submitting a student score list xlsx file, storing the name of a file object and the link address of the file in MINIO in a service database after successful uploading, and when data preview is carried out, removing the link address to acquire the file, reading the header of the file and the front ten data, and returning the header and the front ten data to the front end.

S3, adding a graph database object: selecting a map data source in a map database resource menu page, selecting a required map data source type, such as selecting a NEBULA map database, filling parameter information such as a map data source name of 'student household', an IP address, a port number, a user name, a password, a map space and the like, and clicking (determining) to save configuration. And after the map data source is configured, selecting a map database object button in an object management page, selecting a 'student household' map in the page, and then selecting a 'student household' entity in an entity drop-down box.

S4, creating a collision model: clicking a [ new ] button on a collision model menu page, and filling in a 'bad student information model' and model description to create a model.

S5, pushing the data source object to the canvas: on the configuration page of the collision model, a student table object is selected from the left data source object list to be pushed into the model canvas.

S6, pushing the file object to the canvas: and selecting a student score table xlsx object from a file object list on the left side in a configuration page of the collision model, and pushing the student score table xlsx object into a model canvas.

S7, drawing objects are pushed to canvas: and selecting an object of 'student household' from the left graph object list to push into a model canvas on a configuration page of the collision model.

S8, data component connection configuration such as cross difference: selecting an intersecting set from a component list on the left side in a configuration page of the collision model, connecting the intersecting set with objects of a student table and a student score table respectively, clicking the component, selecting a student_no field of the student table and a student_no2 field of the student score table in a component configuration in a branching mode, and clicking and storing all the fields by default. Selecting a component from a component list on the left side, connecting the component with the component in the intersection, and inputting an average score into a field name in an edit box by a point button in the configuration of the component in the computation, the alias "allavg" is used for selecting an arithmetic function from a function list, selecting an average function avg (), and selecting the fields of "math", "englist" and "mattics" output by the "intersection" from the avg (), so as to obtain the average score of three achievements. Selecting a component from a component list on the left side, connecting the component list with the student household, selecting the student_no field of the component list with the student_no3 field of the student household in a sharing mode in the component configuration, setting the connection mode as left connection, adding a filtering condition, setting an allavg 60, and finally checking the address, contact mode, guardian and other fields of the related student household. The comprehensive information of students with average score lower than 60 can be queried through the connection integration of the data objects of three sources and the components. And finally, selecting a data source component from the left output component list to be connected with the intersection, and setting a data source name, a library name, a table name and a data updating mode which need to be output on a configuration page of the data source component, so that a collision model is completed through complete configuration.

S9, collision comparison operation: after the model is built, a button can be clicked to run, and the model can be waited for to run by filling in the task running name.

S10, checking results and logs: and (5) a task viewing page is accessed, and searching of task names can be performed through keywords and time. And clicking a result data and log information button in the task list to check the result data and the log, and downloading the data in a CSV format on a result data display page.

The method for modeling based on FLINK fusion of multiple data types is to realize modeling analysis of multiple data types by configuring a connection component, SQL statement conversion at the back end and FLINK distributed stream processing engine cooperation. Conventional data modeling analysis tools are often limited to specific data types or require database scripting, nor do they compare the graph database to modeling analysis of other data types. According to the invention, different types of data sources are used as data objects through access configuration of the multiple types of data sources, the data objects and the data processing components are dragged to be configured and connected to form a model, the connection relation of the model is translated into FLINKSQL sentences through script conversion, the FLINK environment is operated, and finally the output result is stored and displayed. According to the invention, data which originally need to be operated in various environments are uniformly converted into the data which is operated in the FLINK engine, so that analysis and comparison of cross data types are realized, and model construction of zero codes is realized through an operation processing component.

By configuring CSV, EXCEL, graph database, main stream database and other types of data, data comparison is carried out in the same data analysis model, no codes are used, the operation flow is assembled, and the obtained result data is subjected to preview display. And data analysis among the mainstream database, the CSV, the EXCEL and the map data is processed through the same data model fusion. And the data comparison analysis and the result output are carried out through rich data processing components, so that the zero code modeling analysis is realized.

Further, as shown in fig. 5, based on the modeling method for fusing multiple data types, the invention further correspondingly provides a modeling system for fusing multiple data types, where the modeling system for fusing multiple data types includes:

the object configuration module is used for adding a database object, a file object and a graph database object; the object configuration module is used for managing data source configuration information and object information of the database, the CSV/EXCEL and the graph database and managing CSV/EXCEL files stored in the MINIO object database.

The model assembly module is used for creating a collision model on the collision model page and pushing the database object, the file object and the graph database object to a model canvas; the model assembly module is used for creating a model and comprises a model canvas, an object connection configuration and a component parameter configuration. The model assembly is mainly realized by a front-end page, and an object list column, a component list column, a tool column and a canvas are arranged in a model canvas, and the interface of the model assembly module is shown in figure 4. The object connection configuration is responsible for storing the connection relation between the objects and the components in the canvas; component parameter configuration is responsible for saving parameter setting information for the component.

And the verification error reporting module is used for verifying the connection specification when the components are connected and stored, and the verification is not carried out through error reporting prompt.

The model operation module is used for filling in task operation names, analyzing and operating the whole collision model through the operation controller, the SQL script converter and the data buffer; the model operation module is used for processing a model operation process and comprises an operation controller, an SQL script converter and a data buffer. The operation controller is responsible for managing the operation life cycle of the model, controlling the starting and stopping of the model, monitoring the operation condition of the model and recording the log information of each node operation of the model; the SQL script converter is responsible for converting the connection relation of the model component into an executable SQL sentence; the data buffer is responsible for storing and recording temporary data and result data of each component node operation of the model.

The result checking module is used for checking the result data and log information of each operation of the collision model on the result checking page after the operation of the collision model is completed; the result viewing module is used for previewing the result data and viewing log information. The preview of the result data can check the operation data of each component and the final output result data after each task operation of the model, and can download the data; log information viewing may view log information for various components in each task of the model.

Further, as shown in fig. 6, based on the modeling method and system for fusing multiple data types, the invention further provides a terminal correspondingly, which includes a processor 10, a memory 20 and a display 30. Fig. 6 shows only some of the components of the terminal, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may alternatively be implemented.

The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may in other embodiments also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a modeling program 40 that fuses multiple data types, and the modeling program 40 that fuses multiple data types is executable by the processor 10 to implement the modeling method that fuses multiple data types in the present application.

The processor 10 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 20, for example, for performing the modeling method or the like that fuses the various data types.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.

In an embodiment, the steps of the modeling method fusing multiple data types as described above are implemented when the processor 10 executes the modeling program 40 fusing multiple data types in the memory 20.

The present invention also provides a computer-readable storage medium storing a modeling program fusing a plurality of data types, which when executed by a processor, implements the steps of the modeling method fusing a plurality of data types as described above.

In summary, the present invention provides a modeling method and related equipment for fusing multiple data types, where the method includes: adding a database object, a file object and a graph database object; creating a collision model on the collision model page, and pushing the database object, the file object and the graph database object to a model canvas; when the components are connected and stored, checking the connection specification, wherein the checking does not pass through error reporting prompt; filling in a task operation name, and analyzing and operating the whole collision model through an operation controller, an SQL script converter and a data buffer; and after the collision model is operated, the result data and log information of each operation of the collision model are checked in a result check page. According to the invention, the data which originally needs to be operated in various environments is uniformly converted into the data which is operated in the FLINK engine, so that the analysis and comparison of the cross data types are realized, and the model construction of the zero code is realized through the operation processing component.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal comprising the element.

Of course, those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by a computer program for instructing relevant hardware (e.g., processor, controller, etc.), the program may be stored on a computer readable storage medium, and the program may include the above described methods when executed. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. The modeling method for fusing the multiple data types is characterized by comprising the following steps of:

adding a database object, a file object and a graph database object;

2. The modeling method of claim 1, wherein the adding database object, file object and graph database object specifically comprises:

3. The modeling method of claim 2, wherein creating a collision model in the collision model page pushes database objects, file objects, and graph database objects to a model canvas, further comprising:

and performing data object connection construction.

4. The modeling method for merging multiple data types according to claim 1, wherein the operation controller is used for being responsible for starting and closing the operation of the model, monitoring the operation condition of the model and recording log information;

5. The modeling method of claim 1, wherein the model canvas comprises an object list field, a component list field, a toolbar, and a canvas.

6. A modeling method incorporating multiple data types as claimed in claim 1 wherein the component comprises: intersection, union, difference set, aggregation, filter row, compute column, field settings, and SQL operator.

7. A modeling system that fuses multiple data types, the modeling system that fuses multiple data types comprising:

the verification error reporting module is used for verifying the connection specification when the components are connected and stored, and the verification does not pass through error reporting prompt;

8. A terminal, the terminal comprising: memory, a processor and a modeling program stored on the memory and executable on the processor for fusing multiple data types, which when executed by the processor, implements the steps of the modeling method of fusing multiple data types as defined in any one of claims 1-6.

9. A computer-readable storage medium storing a modeling program incorporating a plurality of data types, which when executed by a processor, implements the steps of the modeling method incorporating a plurality of data types as claimed in any one of claims 1 to 6.