CN109669976B - ETL-based data service method and device - Google Patents

ETL-based data service method and device Download PDF

Info

Publication number
CN109669976B
CN109669976B CN201811397715.XA CN201811397715A CN109669976B CN 109669976 B CN109669976 B CN 109669976B CN 201811397715 A CN201811397715 A CN 201811397715A CN 109669976 B CN109669976 B CN 109669976B
Authority
CN
China
Prior art keywords
data
etl
node
conversion
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811397715.XA
Other languages
Chinese (zh)
Other versions
CN109669976A (en
Inventor
付铨
梅纲
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dream Database Co ltd
Original Assignee
Wuhan Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dameng Database Co Ltd filed Critical Wuhan Dameng Database Co Ltd
Priority to CN201811397715.XA priority Critical patent/CN109669976B/en
Publication of CN109669976A publication Critical patent/CN109669976A/en
Application granted granted Critical
Publication of CN109669976B publication Critical patent/CN109669976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data service method and equipment based on ETL. Wherein the method comprises the following steps: acquiring data from various data sources, wherein the data sources comprise a database, a file, WebService data service and the like, and transmitting the acquired data to an ETL data exchange platform; cleaning, converting and integrating the acquired data by adopting a graphical data cleaning and integrating component to obtain a processed data result; and providing the data service in an ETL WebService form for the data result on an ETL data exchange platform, and issuing the WebService. The whole process is integrated and completed in a one-stop mode. The ETL-based data service method and the ETL-based data service equipment provided by the embodiment of the invention can organically combine data acquisition, cleaning conversion and data release service in a one-stop manner, so that the whole data acquisition and release process is convenient to operate and simple to deploy, and the effect of maximally utilizing information resources is achieved.

Description

ETL-based data service method and device
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a data service method and equipment based on ETL.
Background
The information is an important resource of modern enterprises and is the basis of scientific management and decision analysis of the enterprises. How an enterprise uses the existing data resources simply, conveniently, efficiently and maximally through various technical means, reduces the waste of time and funds, converts data into information and knowledge, and becomes an important method for improving the core competitiveness of the enterprise. ETL (Extract-Transform-Load) and distributed application platforms (e.g., WebService platform, to which embodiments of the present invention are primarily directed) are the main technical approaches.
ETL is used to Extract (Extract), Transform, Load (Load) data from source to destination. The ETL is an important ring for constructing a data warehouse, and a user extracts required data from a data source, and finally loads the data into the data warehouse according to a predefined data warehouse model after data cleaning.
WebService is a platform-independent, low-coupling, self-contained, programmable Web-based application that can be described, published, discovered, coordinated, and configured using the open XML standard for developing distributed, interoperable applications. WebService technology enables different applications running on different machines to exchange data or integrate with each other without the aid of additional, specialized third-party software or hardware. Applications implemented according to the WebService specification may exchange data with each other regardless of the language, platform, or internal protocol used by them.
The traditional mode of information system construction is to construct ETL and WebService as two independent subsystems. The traditional method separates the data processing and publishing processes, and information resources obtained after processing cannot be published in time, so that the time and the fund of the information resources are wasted. Therefore, finding a method for implementing the data processing and publishing process in one-stop manner is an urgent technical problem in the industry.
Disclosure of Invention
In view of the above problems in the prior art, embodiments of the present invention provide an ETL-based data service method and device.
In a first aspect, an embodiment of the present invention provides an ETL-based data service method, including: sending the acquired data to an ETL data exchange platform to obtain a processed data result; providing data service in an ETL WebService form for the data result on an ETL data exchange platform, and issuing the WebService; the acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together to complete the data exchange in a one-stop mode.
Further, the manner of acquiring the acquired data includes: and acquiring data from the WebService distributed application platform, the database, the JMS and/or the general file.
Further, the general file comprises: text files, Excel files, XML files, and/or dataset files.
Further, the providing, on the ETL data exchange platform, the data service in the form of ETL WebService for the data result includes: providing data source management, data node conversion, data node operation, function and variable calling, scheduling, monitoring and warning, authority management and/or version management service for received data.
Further, the providing, on the ETL data exchange platform, the data service in the form of ETL WebService for the data result further includes: the method for providing the data service in the form of ETL WebService for the received data by adopting the visualized ETL data exchange platform specifically comprises the following steps: service configuration, service deployment, process design, release design, creation of users, user authorization, and service verification.
Further, after the providing the data service in the form of ETL WebService to the data result on the ETL data exchange platform, the method further includes: and normalizing a data result after the ETL WebService form data service.
Further, the normalizing the data result after the ETL WebService form data service includes: an array specification, a JSON specification, and/or an XML specification.
In a second aspect, an embodiment of the present invention provides an ETL-based data service apparatus, including:
the data acquisition module is used for sending the acquired data to the ETL data exchange platform;
the data service module is used for providing ETL WebService-form data service for the data result on the ETL data exchange platform and issuing WebService;
the acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together to complete the data exchange in a one-stop mode.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the ETL-based data service method provided by any of the various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the ETL-based data service method provided in any of the various possible implementations of the first aspect.
The ETL-based data service method and the ETL-based data service equipment provided by the embodiment of the invention can organically combine data acquisition, cleaning conversion and data release service in a one-stop manner, so that the whole data acquisition and release process is convenient to operate and simple to deploy, and the effect of maximally utilizing information resources is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below to the drawings required for the description of the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of an ETL-based data service method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the working principle of independent ETL and WebService provided in the prior art;
FIG. 3 is a schematic view of a flow design in a visualization operation according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a publication design in a visualization operation provided by an embodiment of the present invention;
fig. 5 is a schematic flow chart of creating an ETL WebService in a visualization operation according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a creating user in a visualization operation provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of user authorization in a visualization operation provided by an embodiment of the present invention;
fig. 8 is a schematic diagram of an ETL WebService customized service provided in an embodiment of the present invention;
fig. 9 is a schematic diagram of ETL WebService customized service parameters provided in the embodiment of the present invention;
fig. 10 is a schematic structural diagram of an ETL-based data service device according to an embodiment of the present invention;
fig. 11 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, technical features of various embodiments or individual embodiments provided by the invention can be arbitrarily combined with each other to form a feasible technical solution, but must be realized by a person skilled in the art, and when the technical solution combination is contradictory or cannot be realized, the technical solution combination is not considered to exist and is not within the protection scope of the present invention.
The traditional mode of informatization system construction is to construct ETL and a distributed application platform (specifically, WebService) as two independent subsystems. In the traditional method, the data processing and publishing processes are separated, information resources obtained after processing cannot be published in time, so that the time and fund of the information resources are wasted, and a system mode topological graph is shown in fig. 2 to show the difference of the two modes. As can be seen from fig. 2, in the prior art, the functions of ETL and WebService are separated, and ETL processes source data into target data and then loads the target data; and the WebService issues related services through a network (web) according to the requested services to meet the corresponding requirements of the client.
Based on the above situation, the present patent aims to distribute the data processing flow configured in the ETL to a distributed application platform (specifically, WebService) service. The external application can subscribe and access the platform service of the distributed application program, complete the data processing flow call, and acquire and display data. The function can be integrated into a third party access service for joint invocation. To achieve the object, an embodiment of the present invention provides an ETL-based data service method, referring to fig. 1, the method including:
101. sending the acquired data to an ETL data exchange platform to obtain a processed data result;
102. and providing the data service in an ETL WebService form for the data result on an ETL data exchange platform, and issuing the WebService.
The acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together to complete the data exchange in a one-stop mode. The obtaining of the processed data result includes: the data is extracted, cleaned, converted, filtered, connected, searched for, replaced, ordered, aggregated, desensitized and/or combined.
On the basis of the above embodiment, in the ETL-based data service method provided in the embodiment of the present invention, the manner of acquiring the acquired data includes: and acquiring data from the WebService distributed application platform, the database, the JMS and/or the general file.
On the basis of the above embodiment, the ETL-based data service method provided in the embodiment of the present invention includes: text files, Excel files, XML files, and/or dataset files.
Specifically, the database data currently supports 24 database types, Access (. mdb,. accdb), DB2V5, DB2V9, db2v9.7, DM5, DM6, DM7, FoxPro (. dbf), greenplus, Informix10, Informix7.3, MySQL3, MySQL4, MySQL5, Oracle10, Oracle11, Oracle8, Oracle9, SQLServer2000, SQLServer2005, SQLServer2008, Sybase11, Sybase12, Sybase15, and the like. Access (. mdb,. accdb), FoxPro (. dbf) must provide their ODBC data source name. The database names of Oracle10, Oracle11, Oracle8, and Oracle9 are actually their service names. MySQL3, MySQL4, MySQL5 databases do not have the notion of schema, so the schema name does not exist within the tree display structure of the newly created data source.
The text file data mainly provides an access function of the text file data, and the text file data with a fixed format can be provided for the ETL engine to be processed after being analyzed in a tabular form. And various setting options such as character sets, line separators, column separators, text delimiters and the like are provided, so that the text file is conveniently split. And provides a document encoded character set and line separator detection function.
The CSV file data mainly provides an access function of the CSV file data, and may be provided to the ETL engine for processing after the CSV file in a fixed format is parsed in a table form. Provides setting options such as character sets and provides a character set detection function of file encoding.
The Excel file data provides an access function of the Excel file data, and the Excel file in a fixed format or any format can be analyzed. The fixed format refers to EXCEL being a simple table, similar to a table in a relational database, and column information can be obtained from an EXCEL file. When the Excel file in any format is analyzed, the column information is specified by a user, the system reads each row of data of the Excel file and fills the data into a column defined by the user, if the column defined by the user is exceeded, the exceeded part is discarded, and if the column defined by the user is less than the column defined by the user, a null value is filled.
The XML document data may use a specified XML document as a data source in the ETL process, and the XML data source may be used as a data set of an XML data reading component.
Data Set file Data, also known as DDS file, is an abbreviation for Damon Data Set. The DDS is a unique file format of ETL, supporting data compression. The DDS file stores complete column information and message record information obtained in the conversion process.
The DBF file data provides an access function of the DBF file data and can analyze the DBF file with a fixed format. Column information and data in the DBF file are read.
The WebService data uses a WebService site as a data source of the ETL.
The JMS data defines information for connecting the JMS server.
The Mail data defines information for connecting to the LDAP server.
The LDAP data defines information for connecting to the LDAP server.
Hbase data defines the information linking the Hbase database.
The access function of the JSON data can analyze the JSON file and read column information and data in the JSON file.
The MongoDB data defines information that connects to a MongoDB data server.
The Elasticsearch data defines information for connecting to an Elasticsearch data server.
On the basis of the above embodiment, the ETL-based data service method provided in the embodiment of the present invention, where the data service in the form of ETL WebService is provided to the data result on an ETL data exchange platform, includes: providing data source management, data node conversion, data node operation, function and variable calling, scheduling, monitoring and warning, authority management and/or version management service for received data.
Specifically, the data source management is to store external data to which ETLs need to be connected when data is read or written. ETL supports the management of database data sources, JMS data sources, file datasets (text files, Excel files, XML files, dataset files, etc.), and WebServices data sources. And operations of creating, modifying, deleting and the like of data sources and data sets are supported. The method comprises the steps of supporting the whole import and export operation of metadata of a data source and a data set; individual data source metadata imports export operations.
The data node conversion represents a flow process associated with data processing, and is composed of a data reading node, a data loading node, a data conversion node, a correct line and an error line. A conversion that can be performed must contain more than one node. The start and end points of the transition may be any nodes.
And connecting lines in conversion are used for connecting different nodes, and the direction of the connecting lines represents the flow direction of data. The connection lines are divided into correct lines and incorrect lines. The correct line represents the flow of data that can be correctly processed by the node. The error line indicates the flow of data that cannot be properly processed by the component. The data on the error line should be raw input data, the column information of which includes all input columns, and columns indicating the type of error and the error message may be added.
The nodes in the conversion are functional entities for data processing, and a user can open a node attribute configuration dialog box at any time to modify and store the attributes, namely, the configuration information reading and displaying of one node is independent of other nodes (namely, the node configuration dialog box can be opened without connecting an input node). The configuration information may be saved at any time, and if the configuration is incorrect or incomplete, the user may be prompted, but not prevented from saving. When the nodes are configured, the information related to the database is obtained from the ETL metadata base, and a data source does not need to be connected. Conversion is also referred to as data flow, because once it is started, the nodes are simultaneously executing, data flows continuously from one node to another, and the conversion stops after all data has been processed.
Data node jobs are a flow that controls the execution sequence and process of transitions and other job nodes. A job includes nodes and connecting lines, and a user can control transitions, and precedence and dependencies performed between other job nodes through the job, so the job is also referred to as a control flow.
The operation is composed of operation nodes and operation lines. A job may start or end with any job node. A job must contain at least one job node, and if the job contains a plurality of job nodes, the plurality of job nodes may or may not have a connection therebetween, that is, the connection is not necessary. A worker node may have any number of input and output connections. Jobs may be executed nested, i.e., one job may also be executed as a node in another job.
The connecting lines in the operation represent the execution sequence of the operation nodes, and are divided into success lines, failure lines, completion lines and condition lines. The success line indicates that the subsequent node is continuously executed if the execution of the job node is successful, the failure line indicates that the subsequent node is continuously executed after the execution of the job node is failed, the completion line indicates that the subsequent node is continuously executed no matter whether the execution of the job is successful or failed, and the condition line indicates that the subsequent node is executed when a certain condition is met.
Calling functions and variables uses functions to process data and expand system functions. In addition to using system functions, ETL also supports user-defined functions.
Scheduling is divided into two categories, execution once and repeated execution. The created schedule may be set on the job or the conversion node.
Monitoring and alerting is based on the consideration that not all flows can see the running process in the foreground, e.g. scheduled execution. Then the running process of the flow, i.e. the background flow, can be viewed through the monitored historical running instance.
ETL monitoring is a module used to view the conversion or job run log created by the current logged-in user. The current running instance and the historical running instance can be viewed separately. The conversion or job has a current running instance and a historical running instance below it. The current running instance refers to an instance which is running and is not finished yet, and the historical running instance refers to an instance which is running and finished. The historical run instance here displays up to 100 pieces of data.
And running a process monitoring tree interface to show the running process. If the newly running conversion or operation exists, the operation is monitored in real time, and the running process monitoring tree interface synchronously displays the running conversion or operation.
Rights management may enable management of ETLs by creating users and roles and assigning different rights to them. Permissions are the ability of the system to perform certain operations that are predefined. A role is a solution to rights management, being a collection of a set of rights. The user is a member that can access the ETL. Permissions can be divided into two categories: function rights and object rights.
The version management operation objects mainly include entire metadata, a single project, a single conversion, a single job, a single function, a single variable, a single global user function, and a single global user variable. The main functions of version management include backing up the current version, restoring the historical version, deleting the historical version and restoring the deleted object.
On the basis of the above embodiment, the ETL-based data service method provided in the embodiment of the present invention, where the data result is provided with a data service in the form of ETL WebService on an ETL data exchange platform, further includes: the method for providing the data service in the form of ETL WebService for the received data by adopting the visualized ETL data exchange platform specifically comprises the following steps: service configuration, service deployment, process design, release design, creation of users, user authorization, and service verification.
Specifically, taking the example that ETL corresponds to WebService, the file content of the service configuration is as follows:
HOST: an ETL WebService service name or IP address; PORT: an ETL WebService service port number; USERNAME: an ETL user account number; PASSSWORD: ETL user password.
The service deployment deploys ETL WebService. war to an application program directory of an application server, for example, a/tomcat/webcaps directory, and executes web _ monitor _ start.
And the flow design uses an ETL visual data processing flow designer to design the data processing flow on line. Referring specifically to fig. 3, in fig. 3, by clicking on the table/view, the text file may be output by default.
And the publishing design executes an ETL WebService publishing guide, and a data publishing plan is created in an online visual mode. And inputting related configuration contents according to a guide, selecting corresponding conversion in the distribution process conversion, selecting corresponding nodes and outputting. Node output if a process end node is selected and there is no output, then the save can be clicked without configuring the output. Referring specifically to fig. 4, in fig. 4, the flow end node is selected, so there is no output, i.e., no output is configured. As can be seen from fig. 4, the conversion name, the conversion node and the node output are blank, and after the selection is completed, the save and publish button 401 is clicked to save and publish. On the basis, after the process WebService is released by right clicking, the new ETL WebService is clicked to create a new process. Referring to fig. 5 specifically, in the interface of fig. 5, a flow WebService setting is clicked in a selection page on the left side (a specific position is to select an attribute category that you want to view in a list), and a flow WebService release name (S) is displayed in a flow WebService setting box on the right side, where the release name is a web Test; shown in the release conversion flow (T) is "webTest". The "conversion". test; browsing (B) for selecting a computer related path; the table/view is selected at the selection transformation flow node (N) and the default output is selected at the selection node output (O). And (C) whether the (A) is used for counting the called flow execution information is counted, if the WebService calling flow execution information needs to be concerned, the item can be selected. The issue (I) set as the independent Webservice method is used to set whether the issue method is the independent Webservice method. If the' call flow (E) is selected, the ETL WebService does not return the node data result, and the whole flow is executed. If the flow is not selected, the configured flow node is executed, the relevant flow data of the node is returned, and the flow is stopped at the node and cannot be completely executed.
And the creating user is used for creating a management user and carrying out identity authentication when the service is called. Referring specifically to fig. 6, in fig. 6, clicking on the create user button 601 creates a user in the lower interface bar, specifically the user name is u 1.
User authorization is used to authorize a user. Specifically referring to fig. 7, a user name u1 is input in the user name (N), a password is input in the password (P), the configured flow webTest is checked, and the storage, restart, ETL and WebService services are confirmed.
Service authentication is used to authenticate the service results.
On the basis of the foregoing embodiment, the ETL-based data service method provided in the embodiment of the present invention, after providing the data service in the form of ETL WebService to the data result on the ETL data exchange platform, further includes: and normalizing a data result after the ETL WebService form data service.
On the basis of the above embodiment, the method for data service based on ETL provided in the embodiment of the present invention, wherein normalizing the data result after the data service in the form of ETL WebService includes: an array specification, a JSON specification, and/or an XML specification.
The specification result of the array specification is a two-dimensional string array, which is specifically shown as follows (taking WebService as an example):
interface:
public String[][]getFlowArrayResult(String webServiceFlowName,String password,String[]paramNames,String[]paramValues,int maxResultCount);
the meaning of the parameters is as follows:
webServiceFlowName: the name of the above configured process webservice.
The username: the created management account is input.
password: the created management password is input.
paramNames: an array of parameter names for the incoming parameters is entered. And if no parameter exists, no parameter is input.
paramValue: the parameter values of the incoming parameters are input. And if no parameter exists, no parameter is input.
pageStart: the starting number of nodes is obtained.
pageSize: and acquiring the total number of the node data.
When pageStart and pageSize are both 0 or-1, all data of the flow node is returned, and the flow is terminated at the node.
The returned result of the JSON specification is a string in JSON format. Specifically, the following is shown (taking WebService as an example):
interface:
public String getFlowJsonResult(String webServiceFlowName,String password,String jsonParams,int maxResultCount);
the meaning of the parameters is as follows:
webServiceFlowName:
the username: the created management account is input.
password: the created management password is input.
json params: input parameters, incoming in json format, no parameter input { }. If the variables are configured, the json format input is
{"V_BEGIN":"\"begin\"","V_END":"\"end\""}。
pageStart: the starting number of nodes is obtained.
pageSize: and acquiring the total number of the node data.
When pageStart and pageSize are both 0 or-1, all data of the flow node is returned, and the flow is terminated at the node.
The XML specification is similar to the JSON specification parameters, and the returned result is XML. Specifically, the following is shown (taking WebService as an example):
interface:
public String getFlowXMLResult(String webServiceFlowName,String username,String password,StringjsonParams,int pageStart,int pageSize)。
under the condition of the service provided by each embodiment, the embodiment of the invention also provides a customized service for the user, the user can issue a customized method, input the ETL WebService service name 'testMethod' in a customized manner, select a return type json or xml, and input parameters by corresponding methods of the following three check boxes. The ETL and ETL WebService check service methods are restarted as well. Specifically referring to fig. 8, at this time, a testMethod method is added in the independent Webservice method issuing (I) (the rest of fig. 8 is the same as that in fig. 5, and is not described here again), and the call is performed by using the soap ui, which may specifically refer to fig. 9. As can be seen in FIG. 9, args0, args1 correspond to username, passswerd; args2 corresponds to json params; args3 and args4 correspond to pageStart and pageSize.
The ETL-based data service method provided by the embodiment of the invention can organically combine data acquisition, cleaning conversion and data release service in a one-stop manner, so that the whole data acquisition and release process is convenient to operate and simple to deploy, and the effect of maximally utilizing information resources is achieved.
The implementation basis of the various embodiments of the present invention is realized by programmed processing performed by a device having a processor function. Therefore, in engineering practice, the technical solutions and functions thereof of the embodiments of the present invention can be packaged into various modules. Based on this reality, on the basis of the above embodiments, embodiments of the present invention provide an ETL-based data service apparatus for executing the ETL-based data service method in the above method embodiments. Referring to fig. 10, the apparatus includes:
a data acquisition module 1001, configured to send acquired data to an ETL data exchange platform to obtain a processed data result;
the data service module 1002 is configured to provide an ETL WebService-like data service for the data result on an ETL data exchange platform, and perform WebService publishing;
the acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together to complete the data exchange in a one-stop mode.
The ETL-based data service device provided by the embodiment of the invention adopts the data acquisition module and the data service module, and can organically combine data acquisition, cleaning conversion and data release service in a one-stop manner, so that the whole data acquisition and release process is convenient to operate and simple to deploy, and the effect of maximally utilizing information resources is achieved.
The method of the embodiment of the invention is realized by depending on the electronic equipment, so that the related electronic equipment is necessarily introduced. To this end, an embodiment of the present invention provides an electronic apparatus, as shown in fig. 11, including: the system comprises at least one processor (processor)1101, a communication Interface (Communications Interface)1104, at least one memory (memory)1102 and a communication bus 1103, wherein the at least one processor 1101, the communication Interface 1104 and the at least one memory 1102 are in communication with each other through the communication bus 1103. The at least one processor 1101 may invoke logic instructions in the at least one memory 1102 to perform the following method: sending the acquired data to an ETL data exchange platform to obtain a processed data result; providing data service in an ETL WebService form for the data result on an ETL data exchange platform, and issuing the WebService; the acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together to complete the data exchange in a one-stop mode.
Furthermore, the logic instructions in the at least one memory 1102 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. Examples include: sending the acquired data to an ETL data exchange platform to obtain a processed data result; providing data service in an ETL WebService form for the data result on an ETL data exchange platform, and issuing the WebService; the acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together to complete the data exchange in a one-stop mode. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. An ETL-based data service method, comprising:
sending the acquired data to an ETL data exchange platform to obtain a processed data result, wherein ETL refers to a process of extracting, interactively converting and loading the data from a source end to a destination end;
providing data service in an ETL WebService form for the data result on an ETL data exchange platform, and issuing the WebService;
the acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together and finished in a one-stop mode;
the providing of the data service in the form of ETL WebService to the received data on the ETL data exchange platform includes:
providing data source management, data node conversion, data node operation, function and variable calling, scheduling, monitoring and warning, authority management and/or version management service for received data;
the method comprises the steps that data source management is to store external data needing to be connected with an ETL (extract-transform-load) when data are read or written, the ETL supports the management of a database data source, a JMS (Java Server) data source, a file data set and a WebServices data source, supports the creation, modification and deletion of the data source and the data set, supports the integral importing and exporting operation of metadata of the data source and the data set, and imports and exports operation of metadata of an independent data source;
the data node conversion represents a flow process related to data processing, and consists of a data reading node, a data loading node, a data conversion node, a correct line and an error line, wherein one executed conversion comprises more than one node, and the starting point and the end point of the conversion are any nodes;
connecting lines in conversion are used for connecting different nodes, the direction of the connecting lines represents the flow direction of data, the connecting lines are divided into correct lines and error lines, the correct lines represent the flow direction of the data which can be correctly processed by the nodes, the error lines represent the flow direction of the data which cannot be correctly processed by the components, the data on the error lines are unprocessed original input data, column information of the data comprises all input columns, and columns for describing error types and error messages are added;
the nodes in the conversion are functional entities for data processing, a user opens a node attribute configuration dialog box and modifies and stores the attributes, namely, the configuration information of one node is read and displayed independently of other nodes, the configuration information is stored at any time, if the configuration is wrong or incomplete, the user is prompted but the user is not prevented from storing, when the nodes are configured, the information related to the database is obtained from an ETL metadata base without connecting a data source, the conversion is executed once, the nodes are executed simultaneously, data continuously flow from one node to the other node, and the conversion is stopped after all data are processed, so the conversion is also called as data flow;
the data node operation is a flow for controlling the execution sequence and process of conversion and other operation nodes, one operation comprises nodes and connecting lines, and a user controls the conversion and the execution sequence and dependency relationship among other operation nodes through the operation, so the operation is also called as control flow;
the operation is composed of operation nodes and operation lines, the operation is started by any operation node and is ended by any operation node, one operation at least comprises one operation node, if the operation comprises a plurality of operation nodes, the plurality of operation nodes can be connected or not connected, namely, the connection is not necessary, one operation node has any plurality of input and output connections, the operation is executed in a nested mode, namely, one operation can be executed as a node in another operation;
connecting lines in operation represent the execution sequence of operation nodes, the connecting lines are divided into success lines, failure lines, completion lines and condition lines, the success lines represent that subsequent nodes are continuously executed if the operation nodes are successfully executed, the failure lines represent that the subsequent nodes are continuously executed after the operation nodes are failed to be executed, the completion lines represent that the subsequent nodes are continuously executed no matter whether the operation is successfully executed or failed to be executed, and the condition lines represent that the subsequent nodes are executed when certain conditions are met;
calling functions and variables are to process data by using functions, expand system functions, and ETL supports user-defined functions besides using system functions;
scheduling is divided into two types of 'once execution' and 'repeated execution', and the created scheduling is set on a job or a conversion node;
monitoring and warning are based on the consideration that not all processes can see the running process in the foreground, and then the running process of the processes, namely the background process, is checked through the monitored historical running instance;
the ETL monitoring is a module used for viewing a conversion or job running log established by a current login user, a current running instance and a historical running instance can be respectively viewed, the current running instance and the historical running instance are arranged under the conversion or job, the current running instance refers to an instance which is running but is not finished, and the historical running instance refers to an instance which is already running and is finished;
the running process monitoring tree interface displays the running process, if a new running conversion or operation exists, the running process monitoring tree interface can be monitored in real time, and the running process monitoring tree interface can synchronously display the running conversion or operation;
the authority management realizes the management of ETL by creating users and roles and distributing different authorities to the users and the roles, wherein the authority is the capability of executing certain operation which is defined in advance by a system, the role is a solution of the authority management and is a set of authorities, the users are members capable of accessing the ETL, and the authorities are divided into two types: function rights and object rights;
the operation objects of version management comprise whole metadata, single project, single conversion, single job, single function, single variable, single global user function and single global user variable, and the functions of version management comprise backing up the current version, restoring the historical version, deleting the historical version and restoring the deleted object.
2. The ETL-based data service method of claim 1, wherein the manner of obtaining the obtained data comprises:
and acquiring data from the WebService distributed application platform, the database, the JMS and/or the general file.
3. The ETL-based data service method of claim 2, wherein said general file comprises:
text files, Excel files, XML files, and/or dataset files.
4. The ETL-based data service method of claim 1, wherein said providing ETL WebService-form data services to said data results on an ETL data exchange platform further comprises:
the method for providing the data service in the form of ETL WebService for the received data by adopting the visualized ETL data exchange platform specifically comprises the following steps:
service configuration, service deployment, process design, release design, creation of users, user authorization, and service verification.
5. The ETL-based data service method of claim 1, wherein after said providing the data service in ETL WebService form to the data result on the ETL data exchange platform, further comprising:
and normalizing a data result after the ETL WebService form data service.
6. The ETL-based data service method of claim 1, wherein the normalizing the data result after the ETL WebService-form data service comprises:
an array specification, a JSON specification, and/or an XML specification.
7. An ETL-based data service apparatus, comprising:
the data acquisition module is used for sending the acquired data to the ETL data exchange platform to obtain a processed data result, wherein the ETL refers to a process of extracting, interactively converting and loading the data from a source end to a destination end;
the data service module is used for providing ETL WebService-form data service for the data result on the ETL data exchange platform and issuing WebService;
the acquired data are sent to an ETL data exchange platform, data services in an ETL WebService form are provided for the data results on the ETL data exchange platform, WebService issuing is carried out, and the steps are integrated together and finished in a one-stop mode;
the providing of the data service in the form of ETL WebService to the received data on the ETL data exchange platform includes:
providing data source management, data node conversion, data node operation, function and variable calling, scheduling, monitoring and warning, authority management and/or version management service for received data;
the method comprises the steps that data source management is to store external data needing to be connected with an ETL (extract-transform-load) when data are read or written, the ETL supports the management of a database data source, a JMS (Java Server) data source, a file data set and a WebServices data source, supports the creation, modification and deletion of the data source and the data set, supports the integral importing and exporting operation of metadata of the data source and the data set, and imports and exports operation of metadata of an independent data source;
the data node conversion represents a flow process related to data processing, and consists of a data reading node, a data loading node, a data conversion node, a correct line and an error line, wherein one executed conversion comprises more than one node, and the starting point and the end point of the conversion are any nodes;
connecting lines in conversion are used for connecting different nodes, the direction of the connecting lines represents the flow direction of data, the connecting lines are divided into correct lines and error lines, the correct lines represent the flow direction of the data which can be correctly processed by the nodes, the error lines represent the flow direction of the data which cannot be correctly processed by the components, the data on the error lines are unprocessed original input data, column information of the data comprises all input columns, and columns for describing error types and error messages are added;
the nodes in the conversion are functional entities for data processing, a user opens a node attribute configuration dialog box and modifies and stores the attributes, namely, the configuration information of one node is read and displayed independently of other nodes, the configuration information is stored at any time, if the configuration is wrong or incomplete, the user is prompted but the user is not prevented from storing, when the nodes are configured, the information related to the database is obtained from an ETL metadata base without connecting a data source, the conversion is executed once, the nodes are executed simultaneously, data continuously flow from one node to the other node, and the conversion is stopped after all data are processed, so the conversion is also called as data flow;
the data node operation is a flow for controlling the execution sequence and process of conversion and other operation nodes, one operation comprises nodes and connecting lines, and a user controls the conversion and the execution sequence and dependency relationship among other operation nodes through the operation, so the operation is also called as control flow;
the operation is composed of operation nodes and operation lines, the operation is started by any operation node and is ended by any operation node, one operation at least comprises one operation node, if the operation comprises a plurality of operation nodes, the plurality of operation nodes can be connected or not connected, namely, the connection is not necessary, one operation node has any plurality of input and output connections, the operation is executed in a nested mode, namely, one operation can be executed as a node in another operation;
connecting lines in operation represent the execution sequence of operation nodes, the connecting lines are divided into success lines, failure lines, completion lines and condition lines, the success lines represent that subsequent nodes are continuously executed if the operation nodes are successfully executed, the failure lines represent that the subsequent nodes are continuously executed after the operation nodes are failed to be executed, the completion lines represent that the subsequent nodes are continuously executed no matter whether the operation is successfully executed or failed to be executed, and the condition lines represent that the subsequent nodes are executed when certain conditions are met;
calling functions and variables are to process data by using functions, expand system functions, and ETL supports user-defined functions besides using system functions;
scheduling is divided into two types of 'once execution' and 'repeated execution', and the created scheduling is set on a job or a conversion node;
monitoring and warning are based on the consideration that not all processes can see the running process in the foreground, and then the running process of the processes, namely the background process, is checked through the monitored historical running instance;
the ETL monitoring is a module used for viewing a conversion or job running log established by a current login user, a current running instance and a historical running instance can be respectively viewed, the current running instance and the historical running instance are arranged under the conversion or job, the current running instance refers to an instance which is running but is not finished, and the historical running instance refers to an instance which is already running and is finished;
the running process monitoring tree interface displays the running process, if a new running conversion or operation exists, the running process monitoring tree interface can be monitored in real time, and the running process monitoring tree interface can synchronously display the running conversion or operation;
the authority management realizes the management of ETL by creating users and roles and distributing different authorities to the users and the roles, wherein the authority is the capability of executing certain operation which is defined in advance by a system, the role is a solution of the authority management and is a set of authorities, the users are members capable of accessing the ETL, and the authorities are divided into two types: function rights and object rights;
the operation objects of version management comprise whole metadata, single project, single conversion, single job, single function, single variable, single global user function and single global user variable, and the functions of version management comprise backing up the current version, restoring the historical version, deleting the historical version and restoring the deleted object.
8. An electronic device, comprising:
at least one processor, at least one memory, a communication interface, and a bus; wherein the content of the first and second substances,
the processor, the memory and the communication interface complete mutual communication through the bus;
the memory stores program instructions executable by the processor, the processor calling the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1-6.
CN201811397715.XA 2018-11-22 2018-11-22 ETL-based data service method and device Active CN109669976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811397715.XA CN109669976B (en) 2018-11-22 2018-11-22 ETL-based data service method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811397715.XA CN109669976B (en) 2018-11-22 2018-11-22 ETL-based data service method and device

Publications (2)

Publication Number Publication Date
CN109669976A CN109669976A (en) 2019-04-23
CN109669976B true CN109669976B (en) 2020-12-08

Family

ID=66142126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811397715.XA Active CN109669976B (en) 2018-11-22 2018-11-22 ETL-based data service method and device

Country Status (1)

Country Link
CN (1) CN109669976B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347992B (en) * 2019-07-10 2024-05-14 成都函夏科技有限公司 Data analysis method and system based on electronic report
CN110471968A (en) * 2019-07-11 2019-11-19 新华三大数据技术有限公司 Dissemination method, device, equipment and the storage medium of ETL task
CN111159265B (en) * 2019-12-03 2023-04-14 武汉达梦数据库股份有限公司 ETL data migration method and system
CN113360554B (en) * 2020-03-06 2023-06-23 深圳法大大网络科技有限公司 Method and equipment for extracting, converting and loading ETL (extract transform load) data
CN111399826B (en) * 2020-03-19 2020-12-01 北京三维天地科技股份有限公司 Visual dragging flow diagram ETL online data exchange method and system
CN113111104A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Web-ETL big data fusion method based on integration
CN113111106A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 ETL design data access method and data access module based on Web

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250691A (en) * 2016-07-29 2016-12-21 广州天健软件有限公司 A kind of medicinal data processing method
CN107992552A (en) * 2017-11-28 2018-05-04 南京莱斯信息技术股份有限公司 A kind of data interchange platform and method for interchanging data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250691A (en) * 2016-07-29 2016-12-21 广州天健软件有限公司 A kind of medicinal data processing method
CN107992552A (en) * 2017-11-28 2018-05-04 南京莱斯信息技术股份有限公司 A kind of data interchange platform and method for interchanging data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
达梦数据交换平台产品白皮书_2015版;达梦数据库有限公司;《https://www.docin.com/p-1843796872.html》;20170203;全文 *

Also Published As

Publication number Publication date
CN109669976A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN109669976B (en) ETL-based data service method and device
CN107370786B (en) General information management system based on micro-service architecture
US11663033B2 (en) Design-time information based on run-time artifacts in a distributed computing cluster
US10459881B2 (en) Data management platform using metadata repository
RU2546322C2 (en) Cooperation capability enhancement using external data
US20140006459A1 (en) Rule-based automated test data generation
US9418241B2 (en) Unified platform for big data processing
US10776359B2 (en) Abstractly implemented data analysis systems and methods therefor
WO2009042204A1 (en) Autopropagation of business intelligence metadata
US10564961B1 (en) Artifact report for cloud-based or on-premises environment/system infrastructure
US20200401465A1 (en) Apparatuses, systems, and methods for providing healthcare integrations
CN115374102A (en) Data processing method and system
US11282021B2 (en) System and method for implementing a federated forecasting framework
CN113254534A (en) Data synchronization method and device and computer storage medium
US9489437B2 (en) Master data management database asset as a web service
CN115017182A (en) Visual data analysis method and equipment
US20130254757A1 (en) Nesting installations of software products
US10922145B2 (en) Scheduling software jobs having dependencies
CN114816361A (en) Method, device, equipment, medium and program product for generating splicing project
CN113449035B (en) Data synchronization method, device, computer equipment and readable storage medium
CN105590133B (en) For the knowledge management method of IT system operation maintenance
CN110083624A (en) Stream data processing method, equipment, data processing equipment, computer media
US10152556B1 (en) Semantic modeling platform
US11995076B2 (en) System, computing platform and method of integrating data from a plurality of data sources
Yahia A language-based approach for web service composition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Patentee after: Wuhan dream database Co.,Ltd.

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Patentee before: WUHAN DAMENG DATABASE Co.,Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20220909

Address after: 430073 16-19 / F, building C3, future science and technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan City, Hubei Province

Patentee after: Wuhan dream database Co.,Ltd.

Patentee after: HUAZHONG University OF SCIENCE AND TECHNOLOGY

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Patentee before: Wuhan dream database Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230720

Address after: 16-19/F, Building C3, Future Science and Technology Building, No. 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan City, Hubei Province, 430206

Patentee after: Wuhan dream database Co.,Ltd.

Address before: 430073 16-19 / F, building C3, future science and technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan City, Hubei Province

Patentee before: Wuhan dream database Co.,Ltd.

Patentee before: HUAZHONG University OF SCIENCE AND TECHNOLOGY

TR01 Transfer of patent right