CN117874120A - Intelligent data integration method and system based on data virtualization - Google Patents

Intelligent data integration method and system based on data virtualization Download PDF

Info

Publication number
CN117874120A
CN117874120A CN202311735697.2A CN202311735697A CN117874120A CN 117874120 A CN117874120 A CN 117874120A CN 202311735697 A CN202311735697 A CN 202311735697A CN 117874120 A CN117874120 A CN 117874120A
Authority
CN
China
Prior art keywords
data
service
virtual
user
virtual table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311735697.2A
Other languages
Chinese (zh)
Inventor
郑聪
李君威
龚小龙
麻志毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN202311735697.2A priority Critical patent/CN117874120A/en
Publication of CN117874120A publication Critical patent/CN117874120A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to an intelligent data integration method and system based on data virtualization, which are characterized in that a target service wide table is firstly analyzed, a multi-level virtual table corresponding to the target service wide table is determined, and the multi-level virtual table is displayed to a user, so that a technician is assisted to quickly determine data requirements according to service requirements, and complex intermediate processing procedures are not required to be carried out on a data table of a data source. And because the virtual table is editable, the business theme field is also editable, so even if the subsequent data source, the data table or the field changes, the subsequent technician can conveniently relocate the position of the data corresponding to the target business wide table through flexible adjustment. In addition, through showing the virtual table and the header fields of the data table in the data source associated with the virtual table to the user, the user can locate the corresponding data through simple operation, and the determination process of the data development scheme is simpler, more intelligent and more flexible.

Description

Intelligent data integration method and system based on data virtualization
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an intelligent data integration method and system based on data virtualization.
Background
The (ETL) is a very important ring in the process of data valuing, and is a necessary step after the advance. ETL is responsible for extracting data in various heterogeneous data sources such as relation data, plane data files and the like to a temporary middle layer, then cleaning, converting and integrating the data, and finally loading the data into a data warehouse or a data mart to form the basis of online analysis processing and data mining.
For daily data development requirements of enterprises, the traditional data development flow is as follows: 1. demand communication: leading the demand to reach business personnel, repeatedly communicating business targets by the business personnel and data related engineers (such as data analysts, development engineers, DBAs and the like), and determining data calculation rules and circulation flows; 2. extracting: data is extracted from a plurality of heterogeneous data sources, including relational databases, files, web APIs, and the like. 3. Conversion: according to the development flow designed in advance, the extracted data are converted and cleaned, such as merging, dividing, filtering, converting the data types, and the like, a series of intermediate table and business target wide table structures are designed and the data are saved. 4. Loading: the final target wide-table data formed by the conversion is loaded into a target system, such as a data warehouse, a data lake, a business intelligence system, and the like. 5. And (3) visualization: and displaying the data to a front-end interface in a graphical mode, a report mode and the like for inquiring and viewing.
The problems in the prior art are: in the first aspect, before data development, a technician must define a business objective and comb up data and a data development process, so that the technician needs to repeatedly communicate with the business personnel, and therefore, it is difficult to quickly understand the business requirement and convert the business requirement into the data requirement. In the second aspect, because in the prior art, required data in different data sources needs to be copied to the same data source first, a lot of time is required for data copying, and meanwhile, intermediate table data in the data development process needs to be stored, so that the whole data development link cannot be queried quickly. In a third aspect, based on the second aspect, 3. When the data link, the data source, the target table field, etc. changes, the data development scheme needs to be redesigned, and all operations need to be performed again, including data replication, intermediate table structure design, SQL, etc. Therefore, how to help technicians to simply and quickly determine data requirements, and make the determination process of data development schemes simpler, more intelligent and flexible is a problem to be solved at present.
Disclosure of Invention
The invention provides an intelligent data integration method and system based on data virtualization, which are used for solving the problem that in the prior art, technicians are difficult to simply and quickly determine data requirements, and when data is modified, a data development scheme is required to be redesigned, so that the technicians are helped to simply and quickly determine the data requirements, and the determination process of the data development scheme is simpler, more intelligent and more flexible.
An intelligent data integration method based on data virtualization, the method comprising: carrying out service analysis on the target service wide table, determining and displaying a multi-level virtual table corresponding to the target service wide table to a user; wherein, each virtual table in each hierarchy is editable, each virtual table in each hierarchy comprises a business topic field of a data table of a data source, and the business topic field is editable; for each virtual table, determining and displaying header fields of the data table in the data source associated with the virtual table to the user based on the predetermined view in response to a binding operation of the user to the virtual table; the predetermined view comprises a logic table corresponding to a virtual table, wherein the logic table corresponding to the virtual table is used for storing metadata corresponding to the virtual table, and comprises a table header field of a data table in a data source associated with the virtual table; for each virtual table, determining the header field of the data table in the data source associated with the virtual table in response to a user selection operation of the header field of the data table in the data source associated with the virtual table.
In one embodiment, before performing service analysis on the target service wide table and determining and displaying the multi-level virtual table corresponding to the target service wide table to the user, the method further includes: responding to configuration operation of a user, and updating views corresponding to all data tables in a database through a data source connector; the view corresponding to each data table comprises a virtual table, a logic table and a mapping relation between the virtual table and the logic table, wherein the virtual table and the logic table correspond to each data table; the virtual tables corresponding to the data tables represent service topic label spectrums corresponding to the data tables, and the logic tables represent metadata information of the service topic labels in the data tables in the service topic label spectrums; the mapping relation between the virtual table and the logic table is the mapping relation between each service topic label in the service topic label map and the metadata information in the data table.
In one embodiment, the performing service analysis on the target service wide table, determining and displaying the multi-level virtual table corresponding to the target service wide table to the user, includes: under the condition that the view contains the service topic label atlas corresponding to the target service field, determining the service topic label atlas corresponding to the target service field based on the service topic label atlas corresponding to the target service field in the view; determining and displaying a multi-level virtual table corresponding to the target service wide table and an association relationship between the target service wide table and the multi-level virtual table to a user based on the service subject label map corresponding to each target service field; the target service field is any service field of a target service wide table; or under the condition that the view does not contain the service theme label map corresponding to the target service field, determining and displaying the multi-level virtual table corresponding to the target service wide table to the user.
In one embodiment, after the multi-level virtual table corresponding to the target service wide table is displayed to the user, the method further includes: and responding to the editing operation of the user on the target service wide table and the virtual table, and updating the association relation between the target service wide table and the multi-level virtual table.
In one embodiment, the updating the association relationship between the target service wide table and the multi-level virtual table in response to the editing operation of the target service wide table and the virtual table by the user includes: responding to a first association operation of a user on the target service wide table, and determining a first direct association relationship between the target service wide table and the multi-level virtual table; or, determining a second direct association relationship between the virtual tables in the multi-level virtual table in response to a second association operation of the user on each virtual table in the multi-level virtual table; or, in response to the new or deleted operation of the user on the target level virtual table, updating the target level virtual table; the target level virtual table is at least one layer of a multi-level virtual table; or, in response to the editing operation of the user on the business topic field of the target virtual table, updating the business topic field of the target virtual table; the target virtual table is at least one of a multi-level virtual table.
In one embodiment, after the determining the header field of the data table in the data source associated with the virtual table, the method further comprises: and responding to the determining operation of the user, and generating an instance corresponding to the target business wide table based on the multi-layer virtual table, the first direct association relationship and the second direct association relationship and the table header field of the data table in the data source associated with the virtual table.
In one embodiment, the updating, in response to the configuration operation of the user, the view corresponding to each data table in the database through the data source connector includes: responding to configuration operation of a user, and acquiring a data table and metadata information under a database and a header field and metadata information in the data table through a data source connector; determining a first business topic label corresponding to each data table based on the data table and a corresponding first preset business topic label determining algorithm; determining a second service theme label corresponding to the header field in the data table based on the header field in the data table and a corresponding second preset service theme label determining algorithm; determining a service theme label map of each data table based on the first service theme label and the second service theme label; and determining the view corresponding to each data table in the database based on the business topic label map of each data table.
In one embodiment, the first preset service theme label determining algorithm includes a preset rule class algorithm, an intelligent algorithm and an interaction algorithm, and the determining the first service theme label corresponding to the data table based on the data table and the first preset service theme label determining algorithm corresponding to the data table includes: splitting the data table according to a corresponding rule class algorithm under the condition that a rule class algorithm of the data table is contained in a first preset service theme label determining algorithm corresponding to the data table, obtaining an initial service theme label of the data table, and responding to modification or selection operation of a user on the initial service theme label of the data table, obtaining a first service theme label corresponding to the data table; or, under the condition that a rule type algorithm of the data table is not included in a first preset business topic label determining algorithm corresponding to the data table, but the data table is included in an intelligent algorithm of the data table, splitting the data table according to the intelligent algorithm of the data table to obtain an initial business topic label of the data table, and responding to a user modifying operation of the initial business topic label of the data table to obtain the first business topic label corresponding to the data table; or under the condition that the rule type algorithm of the data table is not contained in the first preset business topic label determining algorithm corresponding to the data table, or the intelligent algorithm of the data table is not contained in the first preset business topic label determining algorithm, responding to the splitting operation of the user on the business topic in the data table, and obtaining the first business topic label of the data table.
The invention also provides an intelligent data integration system based on data virtualization, which comprises: a connector layer, a virtualization service layer, and an output layer; the connector layer comprises at least one data source connector for connecting with corresponding data sources; the virtual service layer comprises a view management module, a label management module, an algorithm module and a flow construction module, wherein the view management module determines views corresponding to all data tables in a database through a data source connector, the label management module and the algorithm module; the label management module is used for realizing service analysis on the data table based on the algorithm module, determining the data table and the service theme labels of the header fields in the data table, and determining the views corresponding to the data tables based on the data table and the service theme labels of the header fields in the data table; the flow construction module is used for analyzing the target business wide table into an editable multi-level virtual table based on the view management module, the label management module and the algorithm module, and determining data of a data table in a data source corresponding to the target business wide table based on the mapping relation between the multi-level virtual table and the logic table which are determined in the view; determining an instance or view corresponding to the target business width table based on interactions with the user; and the output layer is used for outputting an instance or view corresponding to the target business wide table constructed by the virtualization service layer.
The invention also provides computer equipment, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the intelligent data integration method based on data virtualization.
The present invention also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the intelligent data integration method based on data virtualization described above.
According to the intelligent data integration method and system based on data virtualization, the multi-level virtual table corresponding to the target service wide table is determined by analyzing the target service wide table, and the multi-level virtual table is displayed to the user, so that a technician is assisted to quickly determine data requirements according to service requirements, and complex intermediate processing procedures on the data table of the data source are not needed. And because the virtual table is editable, the business theme field is also editable, so even if the subsequent data source, the data table or the field changes, the subsequent technician can conveniently relocate the position of the data corresponding to the target business wide table through flexible adjustment. In addition, in order to find the data of the data table in the corresponding data source based on the virtual table more simply, the user can locate and connect to the corresponding data through simple operation by displaying the virtual table and the table header fields of the data table in the data source associated with the virtual table to the user, so that the determination process of the data development scheme is simpler, more intelligent and more flexible.
Drawings
FIG. 1 is a schematic diagram of a framework of a data virtualization-based intelligent data integration system provided by the present invention;
FIG. 2 is a schematic flow chart of an intelligent data integration method based on data virtualization according to the present invention;
FIG. 3 is a schematic page diagram of a target service wide table and a multi-level virtual table according to the present invention;
FIG. 4 is a schematic diagram of a page for selecting an associated virtual table or inputting a corresponding instruction according to the present invention;
FIG. 5 is a schematic page diagram of the target service wide table and the multi-level virtual table after determining the association relation provided by the invention;
FIG. 6 is a schematic diagram of a page corresponding to a header field of a data table in a data source associated with a virtual table according to the present invention;
FIG. 7 is a schematic diagram of a business topic label map provided by the present invention;
fig. 8 is a second flowchart of an intelligent data integration method based on data virtualization according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
For ease of understanding, technical terms referred to in the present application are explained below.
The data virtualization technology is a technology for providing a unified access interface by abstracting and integrating data distributed in different data sources, and the core idea is to abstract scattered and heterogeneous data sources into a unified virtual data layer through which a user can access and inquire data without concern about the actual storage position, format and technical details of the data. The data view is one of core concepts of data virtualization, and comprises logic mapping to underlying data, so that operations such as filtering, aggregation and the like of the data can be realized. For example, products Denodo and Cisco Data Virtualization provide strong data integration and conversion capability, and can realize the storage-free intermediate process data in the long-chain data integration process, thereby greatly shortening the data development and data query time.
The following describes the intelligent data integration method and system based on data virtualization with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an intelligent data integration system based on data virtualization according to the present invention. As shown in fig. 1, the intelligent data integration system based on data virtualization provided by the present invention includes: a connector layer 110, a virtualization service layer 120, and an output layer 130.
Wherein, the connector layer 110 has at least one data source connector built therein, and the at least one data source connector is configured to complete connection to a data source, and may include: a relational database connector, a non-relational database connector, a file system connector. When a data source is newly added, the user can complete the connection of the corresponding data source by only filling in the configuration information (IP, port number, account name, password and the like) of the data source. Meanwhile, the connector layer encrypts configuration information of the data source and stores the configuration information in a database table.
The virtualized service layer 120 includes a view management module, a label management module, an algorithm module, and a flow construction module. The view management module determines views corresponding to all data tables in the database through the data source connector, the tag management module and the algorithm module; the label management module is used for realizing service analysis on the data table based on the algorithm module, determining the data table and the service theme labels of the header fields in the data table, and determining the views corresponding to the data tables based on the data table and the service theme labels of the header fields in the data table; the flow construction module is used for analyzing the target business wide table into an editable multi-level virtual table based on the view management module, the label management module and the algorithm module, and determining data of a data table in a data source corresponding to the target business wide table based on the mapping relation between the multi-level virtual table and the logic table which are determined in the view; an instance or view corresponding to the target business width table is determined based on interactions with the user.
The output layer 130 is configured to output an instance or view corresponding to the target service wide table constructed by the virtualization service layer 120. The instance corresponding to the target business wide table can directly inquire the result, but does not store the result data, and the result needs to be butted into an output data API, a data table or even an ETL task.
Fig. 2 is a schematic flow chart of an intelligent data integration method based on data virtualization. It will be appreciated that the data virtualization-based intelligent data integration method may be performed by a data virtualization-based intelligent data integration system. Wherein the intelligent data integration system based on data virtualization can be a computer device.
As shown in fig. 2, in one embodiment, an intelligent data integration method based on data virtualization is provided, which specifically may include the following steps:
step 210, carrying out service analysis on the target service wide table, determining and displaying a multi-level virtual table corresponding to the target service wide table to a user; wherein each virtual table in each hierarchy is editable, each virtual table in each hierarchy contains a business topic field of a data table of a data source, and the business topic field is editable.
The target service wide table is a virtual table only containing target service fields, and can be understood as a set of service subject fields corresponding to the target service. The target service may be a service of any industry. Illustratively, the target traffic broad table is a virtual table containing only the following target traffic fields: name, age, address, performance, attendance, and payroll. The corresponding multi-level virtual tables are respectively: virtual table 1, virtual table 2, virtual table 3 and virtual table 4, wherein virtual table 1 is a one-layer virtual table, and virtual table 2, virtual table 3 and virtual table 4 are two-layer virtual tables. Wherein, virtual table 1 includes: personnel, departments, and finances. Virtual table 2 corresponds to the personnel in virtual table 1, and includes: name, age, and address. Virtual table 3 corresponds to the department in virtual table 1, including: name, performance, and attendance. Virtual table 3 corresponds to the finance in virtual table 1, including: name and payroll.
The multi-level virtual table corresponding to the target service wide table can be understood as a service subject field of data corresponding to each target service in the target service wide table in a database in an actual data source.
It can be understood that by performing service analysis on the target service wide table, a multi-level virtual table corresponding to the target service wide table is determined and displayed to the user, so that a technician is assisted to quickly determine data requirements according to service requirements, and complex intermediate processing procedures on the data table of the data source are not required. And because the virtual table is editable, the business theme field is also editable, so even if the subsequent data source, the data table or the field changes, the subsequent technician can conveniently relocate the position of the data corresponding to the target business wide table through flexible adjustment.
Step 220, for each virtual table, determining and exposing to the user, based on the predetermined view, header fields of the data table in the data source associated with the virtual table in response to a binding operation of the user to the virtual table.
The predetermined view includes a logical table corresponding to the virtual table, where the logical table corresponding to the virtual table is used to store metadata corresponding to the virtual table, and includes a header field of a data table in a data source associated with the virtual table.
The binding operation of the user on the virtual table can be clicking operation of the binding function button of each virtual table by the user. Wherein the "bind" function button may be as shown in fig. 5. In response to a binding operation of the user to the virtual table, the user may be presented with header fields of the data table in the data source associated with the virtual table for further other operations by the user.
Step 230, for each virtual table, determining the header field of the data table in the data source associated with the virtual table in response to the user selecting the header field of the data table in the data source associated with the virtual table.
And the user selects the header fields of the data table in the data source associated with the virtual table, so that the user can select the header fields of the data table in the data source associated with each service subject field in the virtual table from the header fields of the data table in the data source associated with the virtual table. Specific examples may refer to the content associated with fig. 6.
It will be appreciated that after this step, the method further comprises: and determining an instance corresponding to the target business width table in response to a determination operation of the user, wherein the determination operation of the user can be, for example, a click operation of a 'determination' function button in a page as shown in fig. 6.
According to the intelligent data integration method based on data virtualization, the target service wide table is analyzed, the multi-level virtual table corresponding to the target service wide table is determined, and the multi-level virtual table is displayed to a user, so that a technician is assisted to quickly determine data requirements according to service requirements, and a complex intermediate processing process is not required to be carried out on the data table of a data source. And because the virtual table is editable, the business theme field is also editable, so even if the subsequent data source, the data table or the field changes, the subsequent technician can conveniently relocate the position of the data corresponding to the target business wide table through flexible adjustment. In addition, in order to find the data of the data table in the corresponding data source based on the virtual table more simply, the user can locate and connect to the corresponding data through simple operation by displaying the virtual table and the table header fields of the data table in the data source associated with the virtual table to the user, so that the determination process of the data development scheme is simpler, more intelligent and more flexible.
FIG. 3 is a schematic diagram of pages of a target traffic wide table and a multi-level virtual table. As shown in fig. 3, the target traffic wide table may be understood as a virtual table of a third hierarchy, including: the material D1, the material number 1, the material name 1, the total warehouse-in quantity 1, the planned sales quantity 1 and the warehouse-in occupation plan ratio 1, and the corresponding target business wide table also comprises "+", "associated" function buttons and "newly added virtual table" function buttons. Corresponding to the virtual tables of the other two levels, a first level virtual table and a second level virtual table, wherein the first level virtual table comprises: virtual table 1, virtual table 2, and virtual table 3, and a "new virtual table" function button. The second-level virtual table includes a virtual table 4 and an "add virtual table" function button. Wherein, virtual table 1 includes: ID. Entry quantity 1, material D1, material number 1, and material name 1, and corresponding "associated" function buttons. The virtual table 2 includes: ID. The list header fields of the planned sales number 1, the material D1, the material number 1 and the material name 1, and the corresponding "associated" function buttons. The virtual table 3 includes: material D1, material name 1, and Red blue mark these header fields, as well as the corresponding "associated" function buttons. The virtual table 4 includes: the header fields of the material D1, the material name 1, the red and blue marks, the planned sales quantity 1 and the total warehouse entry quantity 1, and the corresponding 'relevant' function buttons. And it can be seen from fig. 3 that each virtual table corresponds to an "edit" function button and a "delete" function button.
FIG. 4 is a schematic diagram of a page for selecting an associated virtual table or inputting a corresponding instruction. As can be seen from fig. 3, the user may click on the "associated" function button corresponding to each virtual table, and after the user clicks on the "associated" function button corresponding to each virtual table, the page shown in fig. 4 may be displayed. The method comprises a selection box corresponding to the associated virtual table, wherein a user can select the associated virtual table corresponding to the virtual table or input a corresponding SQL instruction, so that the computer equipment can determine the virtual table associated with the virtual table based on the selection operation of the user and the input instruction.
Fig. 5 is a schematic page diagram of the target service wide table and the multi-level virtual table after the association relationship is determined. As shown in fig. 5, unlike fig. 3, the association relationship between the target service wide table and the multi-level virtual table has been determined, the service subject field in the multi-level virtual table is obtained after the user modifies and edits, and each virtual table corresponds to a "bind" function button and a "preview" function button. It will be appreciated that the user, after clicking the "bind" function button, will be presented with a page as shown in fig. 6.
FIG. 6 is a schematic diagram of determining pages corresponding to header fields of a data table in a data source associated with a virtual table. The left side of the page shows the business topic fields contained in the virtual table, such as the number of warehouse entries, material IDs, material numbers, material names, and document IDs. Shown on the right side of the page is the header field of the data table in the recommended data source associated with the virtual table. Such as production synthesis, work in process inventory, factory shipment, tire bar codes, tread labels, others, dumping, lot tracking, technical information, and engineering quality. The right side of the page comprises a data list head field of the data list in the recommended data source related to the virtual list, and a 'determination' function button which is used for further determining that the data corresponding to the data list head field is the data related to the virtual list after a user selects the corresponding data list head field of the data list in the data source related to the virtual list.
Fig. 7 is a schematic diagram of a business topic label map provided by the invention. As shown in fig. 7, this business topic label for warehouse is associated with a warehouse ID, a warehouse name, a material number, a warehouse location, and a warehouse number.
In one embodiment, before performing service analysis on the target service wide table and determining and displaying the multi-level virtual table corresponding to the target service wide table to the user, the method further includes: and responding to the configuration operation of the user, and updating the view corresponding to each data table in the database through the data source connector.
The view corresponding to each data table comprises a virtual table, a logic table and a mapping relation between the virtual table and the logic table, wherein the virtual table and the logic table correspond to each data table; the virtual tables corresponding to the data tables represent service topic label spectrums corresponding to the data tables, and the logic tables represent metadata information of the service topic labels in the data tables in the service topic label spectrums; the mapping relation between the virtual table and the logic table is the mapping relation between each service topic label in the service topic label map and the metadata information in the data table.
The configuration operation of the user may be a configuration operation of configuration information of the data source. The configuration information of the data source comprises IP, port number, account name and password.
In one embodiment, as shown in fig. 8, in response to a configuration operation of a user, updating, through a data source connector, views corresponding to data tables in a database includes the following steps:
step 810, responding to the configuration operation of the user, and obtaining the data table and the metadata information under the database and the header field and the metadata information in the data table through the data source connector.
Wherein the metadata information of the data table includes: table name, table length and data source to which they belong. Metadata of header fields in the data table includes: the field type, the field length and the data table to which they belong.
Step 820, for each data table, determining a first service theme label corresponding to the data table based on the data table and a corresponding first preset service theme label determining algorithm; and determining a second business theme label corresponding to the header field in the data table based on the header field in the data table and a corresponding second preset business theme label determining algorithm.
It can be understood that in actual situations, for some data tables with complex business contents, the data in the data table and the corresponding business meaning thereof can be understood through an intelligent algorithm, for example, a semantic understanding algorithm related to the data table, based on the data table of one industry, so as to determine the first business topic label corresponding to the data table, and in addition, for the accuracy of the determined business topic label, the first business topic label can be adjusted by combining with manpower. Of course, for some data tables with simpler service structures, the corresponding service theme labels can be determined directly based on preset rules, and of course, under the condition that the corresponding content is difficult to determine the corresponding service theme labels based on the preset rules without a pre-trained semantic understanding model for a certain industry, the corresponding service theme labels can be determined in a manual division mode.
Therefore, in one embodiment, the first preset business topic label determining algorithm includes a preset rule class algorithm, an intelligent algorithm and an interactive algorithm, and the determining the first business topic label corresponding to the data table based on the data table and the first preset business topic label determining algorithm corresponding to the data table includes:
splitting the data table according to a corresponding rule class algorithm under the condition that a rule class algorithm of the data table is contained in a first preset service theme label determining algorithm corresponding to the data table, obtaining an initial service theme label of the data table, and responding to modification or selection operation of a user on the initial service theme label of the data table, obtaining a first service theme label corresponding to the data table; or alternatively, the first and second heat exchangers may be,
in the case that a rule class algorithm of the data table is not included in a first preset business topic label determining algorithm corresponding to the data table, but the data table is split according to an intelligent algorithm of the data table to obtain an initial business topic label of the data table, and in response to a modification operation of a user on the initial business topic label of the data table, a first business topic label corresponding to the data table is obtained; or alternatively, the first and second heat exchangers may be,
And under the condition that the rule class algorithm of the data table is not contained in the first preset business topic label determining algorithm corresponding to the data table, and the intelligent algorithm of the data table is not contained in the first preset business topic label determining algorithm, responding to the splitting operation of the user on the business topic in the data table, and obtaining the first business topic label of the data table.
The interaction algorithm refers to splitting of business topics of the data table based on input of a user, wherein the interaction algorithm can provide corresponding recommendation to the user. Such as recommending according to the frequency of use, time of invocation, co-character user habits, etc. of the SQL or business topic labels.
It will be appreciated that the process of determining the first service topic label of the data table based on the intelligent algorithm of the data table may refer to the prior art, and will not be described herein for brevity. The determining process of the second service theme label corresponding to the header field in the data table of the intelligent algorithm of the data table is similar to the determining process of the first service theme label corresponding to the data table, and for brevity, the description is omitted here.
Step 830, determining, for each data table, a service topic label map of the data table based on the first service topic label and the second service topic label.
Step 840, based on the business topic label map of each data table, determining the view corresponding to each data table in the database.
In one embodiment, the performing service analysis on the target service wide table, determining and displaying the multi-level virtual table corresponding to the target service wide table to the user, includes: under the condition that the view contains the service topic label atlas corresponding to the target service field, determining the service topic label atlas corresponding to the target service field based on the service topic label atlas corresponding to the target service field in the view; determining and displaying a multi-level virtual table corresponding to the target service wide table and an association relationship between the target service wide table and the multi-level virtual table to a user based on the service subject label map corresponding to each target service field; the target service field is any service field of a target service wide table; or under the condition that the view does not contain the service theme label map corresponding to the target service field, determining and displaying the multi-level virtual table corresponding to the target service wide table to the user.
It can be understood that, in the case that the view includes the service topic label map corresponding to the target service field, it is explained that the association relationship between the target service field and the corresponding virtual table in the target service wide table is determined, so that part or all of the association relationship between the target service wide table and the multi-level virtual table can be directly displayed, that is, the association relationship between the multi-level virtual table corresponding to the target service wide table similar to that shown in fig. 5 and the multi-level virtual table is displayed to the user, and if the view does not include the service topic label map corresponding to the target service field, the association relationship between the target service wide table and the multi-level virtual table cannot be displayed to the user, that is, the multi-level virtual table corresponding to the target service wide table similar to that shown in fig. 3 is displayed to the user. Regardless of which of the two, the subsequent user can adjust the association relationship between the corresponding target service wide table and the multi-level virtual table and the content in the virtual table, thereby updating the association relationship between the target service wide table and the multi-level virtual table and the content in the virtual table.
Thus, in one embodiment, after presenting the multi-level virtual table corresponding to the target service width table to the user, the method further comprises: and responding to the editing operation of the user on the target service wide table and the virtual table, and updating the association relation between the target service wide table and the multi-level virtual table.
Specifically, this step may include: responding to a first association operation of a user on the target service wide table, and determining a first direct association relationship between the target service wide table and the multi-level virtual table; or, determining a second direct association relationship between the virtual tables in the multi-level virtual table in response to a second association operation of the user on each virtual table in the multi-level virtual table; or, in response to the new or deleted operation of the user on the target level virtual table, updating the target level virtual table; the target level virtual table is at least one layer of a multi-level virtual table; or, in response to the editing operation of the user on the business topic field of the target virtual table, updating the business topic field of the target virtual table; the target virtual table is at least one of a multi-level virtual table.
The first association operation of the user on the target service wide table may be a click operation of the user on an "association" function button of each virtual table, or a connection operation of the user on a plus sign "+" in the target service wide table and a plus sign "+" corresponding to the multi-level virtual table. And in response to a first association operation of the user on the target service wide table, determining a first direct association relationship between the target service wide table and the multi-level virtual table, and associating the target service wide table with the multi-level virtual table. The target service width table may be shown on a page as shown in fig. 3. Similarly, the second association operation of the user on each virtual table in the multi-level virtual table may be a click operation of the user on the "association" function button of each virtual table, a selection operation on the associated virtual table, or a corresponding instruction input, or a connection operation of the user on the plus sign "+" in the two virtual tables. Specifically, the selection operation of the associated virtual table or the input of the corresponding instruction may be implemented in a page as shown in fig. 4. And responding to a second association operation of the user on the target business wide table, determining a second direct association relation between virtual tables in the multi-level virtual table, and associating every two virtual tables.
It will be appreciated that, with reference to fig. 3, multiple virtual tables may refer to fig. 3, since each virtual table corresponds to an "edit" function button and a "delete" function button, and each virtual table corresponds to a "new virtual table" function button, a user may select to add or delete a corresponding virtual table for each virtual table, and a user may select an "edit" function button for each virtual table, thereby implementing an editing operation for a service theme field of the virtual table.
In one embodiment, in response to a determining operation of a user, determining an instance corresponding to the target service width table includes:
responding to the determining operation of the user, and generating an instance corresponding to the target business wide table based on the multi-layer virtual table, the first direct association relationship and the second direct association relationship and the table header field of the data table in the data source associated with the virtual table; the instance is for subsequent invocation.
The user's determination operation may be, for example, a click operation on a "ok" function button in the page shown in fig. 6.
In one embodiment, a computer device is provided, where the computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the following steps corresponding to an intelligent data integration method based on data virtualization: carrying out service analysis on the target service wide table, determining and displaying a multi-level virtual table corresponding to the target service wide table to a user; wherein, each virtual table in each hierarchy is editable, each virtual table in each hierarchy comprises a business topic field of a data table of a data source, and the business topic field is editable; for each virtual table, determining and displaying header fields of the data table in the data source associated with the virtual table to the user based on the predetermined view in response to a binding operation of the user to the virtual table; the predetermined view comprises a logic table corresponding to a virtual table, wherein the logic table corresponding to the virtual table is used for storing metadata corresponding to the virtual table, and comprises a table header field of a data table in a data source associated with the virtual table; for each virtual table, determining the header field of the data table in the data source associated with the virtual table in response to a user selection operation of the header field of the data table in the data source associated with the virtual table.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the data virtualization-based intelligent data integration method provided by the present invention, the data virtualization-based intelligent data integration method comprising: carrying out service analysis on the target service wide table, determining and displaying a multi-level virtual table corresponding to the target service wide table to a user; wherein, each virtual table in each hierarchy is editable, each virtual table in each hierarchy comprises a business topic field of a data table of a data source, and the business topic field is editable; for each virtual table, determining and displaying header fields of the data table in the data source associated with the virtual table to the user based on the predetermined view in response to a binding operation of the user to the virtual table; the predetermined view comprises a logic table corresponding to a virtual table, wherein the logic table corresponding to the virtual table is used for storing metadata corresponding to the virtual table, and comprises a table header field of a data table in a data source associated with the virtual table; for each virtual table, determining the header field of the data table in the data source associated with the virtual table in response to a user selection operation of the header field of the data table in the data source associated with the virtual table.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the data virtualization-based intelligent data integration method provided by the present invention, the data virtualization-based intelligent data integration method comprising: carrying out service analysis on the target service wide table, determining and displaying a multi-level virtual table corresponding to the target service wide table to a user; wherein, each virtual table in each hierarchy is editable, each virtual table in each hierarchy comprises a business topic field of a data table of a data source, and the business topic field is editable; for each virtual table, determining and displaying header fields of the data table in the data source associated with the virtual table to the user based on the predetermined view in response to a binding operation of the user to the virtual table; the predetermined view comprises a logic table corresponding to a virtual table, wherein the logic table corresponding to the virtual table is used for storing metadata corresponding to the virtual table, and comprises a table header field of a data table in a data source associated with the virtual table; for each virtual table, determining the header field of the data table in the data source associated with the virtual table in response to a user selection operation of the header field of the data table in the data source associated with the virtual table.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It will be appreciated that the above embodiments are only illustrative of the technical solution of the invention and are not limiting thereof; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An intelligent data integration method based on data virtualization, which is characterized by comprising the following steps:
carrying out service analysis on the target service wide table, determining and displaying a multi-level virtual table corresponding to the target service wide table to a user; wherein, each virtual table in each hierarchy is editable, each virtual table in each hierarchy comprises a business topic field of a data table of a data source, and the business topic field is editable;
for each virtual table, determining and displaying header fields of the data table in the data source associated with the virtual table to the user based on the predetermined view in response to a binding operation of the user to the virtual table; the predetermined view comprises a logic table corresponding to a virtual table, wherein the logic table corresponding to the virtual table is used for storing metadata corresponding to the virtual table, and comprises a table header field of a data table in a data source associated with the virtual table;
For each virtual table, determining the header field of the data table in the data source associated with the virtual table in response to a user selection operation of the header field of the data table in the data source associated with the virtual table.
2. The intelligent data integration method based on data virtualization according to claim 1, wherein before performing service analysis on the target service wide table to determine and present the multi-level virtual table corresponding to the target service wide table to the user, the method further comprises:
responding to configuration operation of a user, and updating views corresponding to all data tables in a database through a data source connector;
the view corresponding to each data table comprises a virtual table, a logic table and a mapping relation between the virtual table and the logic table, wherein the virtual table and the logic table correspond to each data table; the virtual tables corresponding to the data tables represent service topic label spectrums corresponding to the data tables, and the logic tables represent metadata information of the service topic labels in the data tables in the service topic label spectrums; the mapping relation between the virtual table and the logic table is the mapping relation between each service topic label in the service topic label map and the metadata information in the data table.
3. The intelligent data integration method based on data virtualization according to claim 2, wherein the performing service analysis on the target service wide table, determining and displaying the multi-level virtual table corresponding to the target service wide table to the user, includes:
under the condition that the view contains the service topic label atlas corresponding to the target service field, determining the service topic label atlas corresponding to the target service field based on the service topic label atlas corresponding to the target service field in the view; determining and displaying a multi-level virtual table corresponding to the target service wide table and an association relationship between the target service wide table and the multi-level virtual table to a user based on the service subject label map corresponding to each target service field; the target service field is any service field of a target service wide table; or alternatively, the first and second heat exchangers may be,
and under the condition that the view does not contain the service theme label map corresponding to the target service field, determining and displaying the multi-level virtual table corresponding to the target service wide table to the user.
4. A data virtualization-based intelligent data integration method according to claim 1 or 3, wherein after presenting the multi-level virtual table corresponding to the target service width table to the user, the method further comprises:
And responding to the editing operation of the user on the target service wide table and the virtual table, and updating the association relation between the target service wide table and the multi-level virtual table.
5. The intelligent data integration method based on data virtualization according to claim 4, wherein the updating the association relationship between the target service width table and the multi-level virtual table in response to the user editing operation of the target service width table and the virtual table comprises:
responding to a first association operation of a user on the target service wide table, and determining a first direct association relationship between the target service wide table and the multi-level virtual table; or alternatively, the first and second heat exchangers may be,
responding to a second association operation of a user on each virtual table in the multi-level virtual table, and determining a second direct association relationship between each virtual table in the multi-level virtual table; or alternatively, the first and second heat exchangers may be,
updating the target level virtual table in response to an addition or deletion operation of the target level virtual table by a user; the target level virtual table is at least one layer of a multi-level virtual table; or alternatively, the first and second heat exchangers may be,
responding to the editing operation of the user on the business theme field of the target virtual table, and updating the business theme field of the target virtual table; the target virtual table is at least one of a multi-level virtual table.
6. The intelligent data integration method based on data virtualization according to claim 2, wherein updating views corresponding to data tables in the database through the data source connector in response to the configuration operation of the user comprises:
responding to configuration operation of a user, and acquiring a data table and metadata information under a database and a header field and metadata information in the data table through a data source connector;
determining a first business topic label corresponding to each data table based on the data table and a corresponding first preset business topic label determining algorithm; determining a second service theme label corresponding to the header field in the data table based on the header field in the data table and a corresponding second preset service theme label determining algorithm;
determining a service theme label map of each data table based on the first service theme label and the second service theme label;
and determining the view corresponding to each data table in the database based on the business topic label map of each data table.
7. The method for intelligent data integration based on data virtualization according to claim 6, wherein the first preset service topic label determining algorithm includes a preset rule class algorithm, an intelligent algorithm and an interactive algorithm, the determining a first service topic label corresponding to a data table based on the data table and the corresponding first preset service topic label determining algorithm includes:
Splitting the data table according to a corresponding rule class algorithm under the condition that a rule class algorithm of the data table is contained in a first preset service theme label determining algorithm corresponding to the data table, obtaining an initial service theme label of the data table, and responding to modification or selection operation of a user on the initial service theme label of the data table, obtaining a first service theme label corresponding to the data table; or alternatively, the first and second heat exchangers may be,
in the case that a rule class algorithm of the data table is not included in a first preset business topic label determining algorithm corresponding to the data table, but the data table is split according to an intelligent algorithm of the data table to obtain an initial business topic label of the data table, and in response to a modification operation of a user on the initial business topic label of the data table, a first business topic label corresponding to the data table is obtained; or alternatively, the first and second heat exchangers may be,
and under the condition that the rule class algorithm of the data table is not contained in the first preset business topic label determining algorithm corresponding to the data table, and the intelligent algorithm of the data table is not contained in the first preset business topic label determining algorithm, responding to the splitting operation of the user on the business topic in the data table, and obtaining the first business topic label of the data table.
8. An intelligent data integration system based on data virtualization, the system comprising:
a connector layer, a virtualization service layer, and an output layer;
the connector layer comprises at least one data source connector for connecting with corresponding data sources;
the virtual service layer comprises a view management module, a label management module, an algorithm module and a flow construction module, wherein the view management module determines views corresponding to all data tables in a database through a data source connector, the label management module and the algorithm module; the label management module is used for realizing service analysis on the data table based on the algorithm module, determining the data table and the service theme labels of the header fields in the data table, and determining the views corresponding to the data tables based on the data table and the service theme labels of the header fields in the data table; the flow construction module is used for analyzing the target business wide table into an editable multi-level virtual table based on the view management module, the label management module and the algorithm module, and determining data of a data table in a data source corresponding to the target business wide table based on the mapping relation between the multi-level virtual table and the logic table which are determined in the view; determining an instance or view corresponding to the target business width table based on interactions with the user;
And the output layer is used for outputting an instance or view corresponding to the target business wide table constructed by the virtualization service layer.
9. A computer device comprising a memory and a processor, wherein the memory has stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the intelligent data integration method based on data virtualization according to any one of claims 1 to 7.
10. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the intelligent data integration method based on data virtualization as claimed in any one of claims 1 to 7.
CN202311735697.2A 2023-12-15 2023-12-15 Intelligent data integration method and system based on data virtualization Pending CN117874120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311735697.2A CN117874120A (en) 2023-12-15 2023-12-15 Intelligent data integration method and system based on data virtualization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311735697.2A CN117874120A (en) 2023-12-15 2023-12-15 Intelligent data integration method and system based on data virtualization

Publications (1)

Publication Number Publication Date
CN117874120A true CN117874120A (en) 2024-04-12

Family

ID=90576357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311735697.2A Pending CN117874120A (en) 2023-12-15 2023-12-15 Intelligent data integration method and system based on data virtualization

Country Status (1)

Country Link
CN (1) CN117874120A (en)

Similar Documents

Publication Publication Date Title
US20210334248A1 (en) Generating and reusing transformations for evolving schema mapping
US10853387B2 (en) Data retrieval apparatus, program and recording medium
AU2019204976B2 (en) Intelligent data ingestion system and method for governance and security
Karnitis et al. Migration of relational database to document-oriented database: Structure denormalization and data transformation
CN102314424B (en) The relation diagram based on dimension of file
US8352495B2 (en) Distributed platform for network analysis
Morton et al. Dynamic workload driven data integration in tableau
US7743071B2 (en) Efficient data handling representations
US11853363B2 (en) Data preparation using semantic roles
US20140317563A1 (en) Generate field mapping
US20170139891A1 (en) Shared elements for business information documents
US11372930B2 (en) Template-based faceted search experience
US11100098B2 (en) Systems and methods for providing multilingual support for data used with a business intelligence server
US7937415B2 (en) Apparatus and method for stripping business intelligence documents of references to unused data objects
EP3594822A1 (en) Intelligent data ingestion system and method for governance and security
CN117874120A (en) Intelligent data integration method and system based on data virtualization
US9990415B2 (en) Data structure for representing information using expressions
US20180150543A1 (en) Unified multiversioned processing of derived data
EP3306540A1 (en) System and method for content affinity analytics
US20190172068A1 (en) Method and system for implementing a crm quote and order capture context service
KR102488466B1 (en) Apparatus and method to design key-value database based in table diagram
JP5512817B2 (en) Information processing apparatus, information processing method, program, and medium
US11216486B2 (en) Data retrieval apparatus, program and recording medium
JP5764226B2 (en) Information processing apparatus, information processing method, program, and medium
Li et al. CoInsight: Visual Storytelling for Hierarchical Tables with Connected Insights

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination