WO2024092926A1 - 生成数据表的方法及装置 - Google Patents

生成数据表的方法及装置 Download PDF

Info

Publication number
WO2024092926A1
WO2024092926A1 PCT/CN2022/135215 CN2022135215W WO2024092926A1 WO 2024092926 A1 WO2024092926 A1 WO 2024092926A1 CN 2022135215 W CN2022135215 W CN 2022135215W WO 2024092926 A1 WO2024092926 A1 WO 2024092926A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
query
logical
materialized
generating
Prior art date
Application number
PCT/CN2022/135215
Other languages
English (en)
French (fr)
Inventor
翟艳堂
杨仁慧
孙善禄
Original Assignee
蚂蚁区块链科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 蚂蚁区块链科技(上海)有限公司 filed Critical 蚂蚁区块链科技(上海)有限公司
Publication of WO2024092926A1 publication Critical patent/WO2024092926A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • One or more embodiments of the present specification relate to the field of terminal technology, and in particular, to a method and device for generating a data table.
  • institution-domain data warehouse system and the cross-institution-domain data fusion system are independent of each other, and the data in the two systems are isolated from each other. There is no unified data perspective for upper-level data applications, nor is there a unified service connection.
  • one or more embodiments of the present specification provide a method and device for generating a data table, which achieves compatibility and unification of data objects on both the organization domain data warehouse and the cross-organization domain data fusion system through data virtualization objects of the logical table.
  • a method for generating a data table comprising: determining a data source table, wherein the data source table is at least one of a data warehouse table of the current organization domain, a data federation table across organization domains, and an original data table across organization domains; based on the fields included in the data source table, generating a logical table for performing data virtualization; wherein the logical table is used to provide data query results for data applications.
  • a device for generating a data table comprising: a processing module for determining a data source table, wherein the data source table is at least one of a data warehouse table of the current organization domain, a data federation table across organization domains, and an original data table across organization domains; a generation module for generating a logical table for performing data virtualization based on the fields included in the data source table; wherein the logical table is used to provide data query results for data applications.
  • a data virtualization system comprising: a logical table for performing data virtualization, wherein the logical table is generated based on fields included in a data source table, and the logical table is used to provide data query results for data applications, wherein the data source table is at least one of a data warehouse table of the current organization domain, a data federation table across organization domains, and an original data table across organization domains; and a query engine, used to provide the data query results of querying the logical table for the data application.
  • an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor implements the method for generating a data table as described in any one of the first aspects by running the executable instructions.
  • a computer-readable storage medium on which computer instructions are stored.
  • the instructions are executed by a processor, the steps of the method for generating a data table as described in any one of the first aspects are implemented.
  • a logical table for performing data virtualization may be generated, and the logical table may be used to provide data query results for data applications.
  • the compatible unification of data objects on the local domain data warehouse and the cross-institutional domain data fusion system is achieved.
  • unified queries on the two systems of cross-institutional domain data fusion and the local domain data warehouse may be achieved.
  • the automatic virtualization of the logical table is achieved, and the availability is high.
  • FIG. 1 is a schematic diagram of a scenario of mutual granularity of two data systems provided by an exemplary embodiment.
  • FIG. 2 is a flow chart of a method for generating a data table provided by an exemplary embodiment.
  • FIG. 3 is a schematic diagram of a scenario of generating a new logic table based on a logic table provided by an exemplary embodiment.
  • FIG. 4 is a flow chart of another method for generating a data table provided by an exemplary embodiment.
  • FIG. 5 is a schematic diagram of the structure of a query engine provided by an exemplary embodiment.
  • 6A and 6B are schematic diagrams of query plans provided by an exemplary embodiment.
  • FIG. 7 is a flow chart of another method for generating a data table provided by an exemplary embodiment.
  • FIG. 8 is a schematic diagram of a scenario for storing a materialized table provided by an exemplary embodiment.
  • FIG. 9 is a schematic diagram of the structure of a data virtualization system provided by an exemplary embodiment.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an exemplary embodiment.
  • FIG. 11 is a block diagram of an apparatus for generating a data table provided by an exemplary embodiment.
  • the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the steps included in the method may be more or less than those described in this specification. In addition, a single step described in this specification may be decomposed into multiple steps for description in other embodiments; and multiple steps described in this specification may be combined into a single step for description in other embodiments.
  • Data warehouse is a central repository of information that is used to store and process data for analysis. Data is generally collected and flowed into the data warehouse from online transaction systems, relational databases, messaging systems, and other systems on a regular or real-time basis.
  • the data warehouse mentioned in this proposal specifically refers to a big data warehouse, which is a data warehouse built on a big data storage and computing system.
  • Local domain data warehouse A data warehouse belonging to the domain of this organization. The data stored in it belongs to the legal entity of this organization or is authorized to the legal entity of this organization in compliance with regulations.
  • Cross-institutional data integration The flow, sharing, analysis, and calculation of data between different institutions is built between different institutions under compliance requirements in order to break down data silos between institutions and jointly maximize the value of data.
  • Privacy computing From the perspective of computing, it is a collective term for a group of technologies to solve the problems of data security and privacy protection in the data computing process. It is represented by technologies such as Secure Muti-Party Computation (MPC), Federated Learning (FL), and Trusted Execution Environment (TEE).
  • MPC Secure Muti-Party Computation
  • FL Federated Learning
  • TEE Trusted Execution Environment
  • Federated table A data virtualization object on top of a cross-institutional domain data fusion system that shields the upper layer from the dispersed forms of multiple data sets.
  • Apache Calcite is an open source framework for building databases or data management systems. It includes a Structured Query Language (SQL) parser, an Application Program Interface (API) for building expressions in relational algebra, and a query plan engine.
  • SQL Structured Query Language
  • API Application Program Interface
  • ANTLR Open source grammar analyzer ANTLR's full name is ANother Tool for Language Recognition. It is a grammar parser generator based on the LL algorithm and is widely used in building languages, tools and frameworks.
  • Metadata information used to describe data properties.
  • the metadata of a data table can be understood as the field names of the data table.
  • the metadata of a data table includes: user ID, gender, age, etc.
  • Physical data It is used to describe the specific information of data.
  • the physical data of a data table can be understood as the field values of the data table.
  • the metadata of a data table includes: user ID, gender, age, etc., and the physical data includes: id#1, female, 28 years old, etc.
  • Data Virtualization A term used to describe data management methods that allow applications, such as data applications, to retrieve and manage data without requiring technical details about the data, such as how the data is formatted or where it is physically located. Wherein, physical location, in this disclosure, is understood to be the geographic location corresponding to the organization's domain.
  • the data sources of data applications come from both the local data warehouse system and the cross-institutional data fusion system.
  • the local data warehouse system and the cross-institutional data fusion system are independent of each other, and the data in the two systems are isolated from each other.
  • a risk control data application uses data from the data warehouse system in the domain and uses data from other institutions for joint risk control through a data fusion system.
  • the data sets used include both the data warehouse table and the federated table or original data table of the data fusion system, and the data application needs to connect to these two systems separately.
  • institution-domain data warehouse system and the cross-institution-domain data fusion system are independent of each other, and the data in the two systems are isolated from each other. There is no unified data perspective for upper-level data applications, nor is there a unified service connection.
  • the present disclosure provides the following method and device for generating data.
  • Fig. 2 is a flow chart of a method for generating a data table provided by an exemplary embodiment.
  • the method can be executed by a server, which can be a server in the current organization domain for providing data services, such as data query, data storage, data update, etc., including steps 201 to 202.
  • step 201 a data source table is determined.
  • the data source table may be at least one of a data warehouse table of the current organization domain, a data federation table across organization domains, and an original data table across organization domains.
  • cross-institutional domains can be understood as cross-domain names.
  • different domain names correspond to different geographical areas.
  • the data source table may include a data warehouse table of organization domain C.
  • the data source table may include a federated table across institution domain A and institution domain C.
  • the data source table may include a data warehouse table of institution domain B and an original data table across institution domains A and C.
  • the data source table is at least a data warehouse table of the current organization domain, a data federation table across organization domains, and an original data table across organization domains should all fall within the protection scope of the present disclosure.
  • step 202 a logical table for performing data virtualization is generated based on the fields included in the data source table.
  • the logic table may be used to provide data query results for the data application.
  • the logical table is a data view defined from a business perspective.
  • the data comes from the data warehouse table within the organization's domain and the federated table of cross-organization domain data fusion or the original data table across organization domains. This disclosure does not limit this.
  • a logical table can only store metadata and not physical data.
  • the purpose of a logical table is to unify and standardize data modeling. Based on a logical table, data warehouse data and data fusion data can be put together to build a unified data model.
  • Other logical tables can also be created based on logical tables. For example, as shown in Figure 3, after a logical table is generated based on the data warehouse table (i.e., data warehouse table) of the organization domain and the federated table across organization domains, other logical tables can be generated based on the logical table. Two logical tables are generated in Figure 3.
  • the method of generating a new logical table based on a logical table is similar to the method of generating a logical table.
  • the method of generating a logical table is as follows: In one example, a predefined rule can be used to determine the name of the logical table.
  • designated fields required for generating the logic table may be determined from the fields included in the data source table, wherein the designated fields are one or more fields included in the data source table.
  • a calculation logic relationship between the output field of the logic table and the designated field may be determined.
  • the output field of the logic table may be the same as the specified field, or the output field of the logic table needs to be determined based on the calculation logic relationship between the specified fields.
  • the designated fields include a buyer identifier buyer_id and a seller identifier seller_id.
  • the output fields of the logical table may include a buyer identifier buyer_id and a seller identifier seller_id.
  • the designated fields include a buyer identifier buyer_id, a seller identifier seller_id, and an order transaction quantity quantity
  • the output fields of the logic table include a daily maximum order transaction quantity_max_1d.
  • the calculation logic relationship between the output field quantity_max_1d of the logic table and the specified field is: find the maximum value of quantity according to buyer_id and seller_id grouping (that is, statistics within the same day).
  • the logic table may be generated based on at least one of the logic table name, the data source table, the designated field, and the calculation logic relationship.
  • the logical table name may be determined to be trade_indicator based on predefined rules.
  • the data source table includes: the data warehouse table MAXCOMPUTE.default.order of the current institution domain, and the federated table FeDX.default.event of institution domain A and institution C.
  • the specified fields of the selected data source table include: buyer ID event.buyer_id, seller ID event.seller_id, order transaction quantity order.quantity, transaction channel order.channel, and price event.price.
  • the output fields of the logical table include: buyer ID buyer_id, seller ID seller_id, daily maximum order transaction volume quantity_max_1d, and total weekly order transaction revenue on the cloud amount_cloud_7d.
  • a logical table for performing data virtualization can be generated, and the logical table can be used to provide data query results for data applications.
  • the compatibility and unification of data objects on both the local domain data warehouse and the cross-institutional domain data fusion systems can be achieved, with high availability.
  • Fig. 4 is a flow chart of another method for generating a data table based on the embodiment shown in Fig. 2.
  • the method may be executed by a server, which may be a server in the current organization domain for providing data services, such as data query, data storage, data update, etc.
  • the method further includes step 203.
  • step 203 a query engine for querying the logic table is provided for the data application.
  • a query engine for querying the above-mentioned logical table may be provided for data applications.
  • the structure of the query engine may be as shown in FIG. 5 .
  • the query language supported by the query engine is SQL, which may be a subset of the standard SQL language with appropriate grammatical extensions, which is not limited in the present disclosure.
  • the query engine also supports SQL statements for the docking interface for data applications.
  • the parsing of the query engine in Figure 5 can be performed by a SQL parser, and the SQL parser can be implemented using Apache Calcit or ANTLR, which is not limited in the present disclosure.
  • the query engine is used to store metadata of the logical table, that is, the query engine does not directly store metadata of physical tables of each organization domain.
  • metadata verification and/or authentication of logical tables are implemented by the query engine
  • metadata verification and/or authentication of physical tables are implemented by the query engine calling the computing engine of the organization domain where the physical tables are located.
  • a query plan across institutional domains may be generated by the query engine, wherein the query plan may be generated based on a query statement provided by a data application.
  • the query engine splits the query plan into at least one query sub-plan, wherein the query sub-plan corresponds to the computing engine of the data source table one by one.
  • the query engine splits the query plan according to the computing engine corresponding to each data source table.
  • the query engine may merge the query sub-results and provide the obtained data query results to the data application.
  • the query engine receives the following query statement from the data application:
  • the query engine translates it into a query plan, such as shown in FIG. 6A .
  • the data source table corresponding to the logic table includes the data warehouse table of the current organization domain and the federated table of the data fusion system, and the corresponding computing engines are the data warehouse computing engine and the data fusion system computing engine.
  • the query engine splits the above query plan into three parts, including the first part executed by the query engine, the second part executed by the data warehouse computing engine and the third part executed by the data fusion system computing engine, as shown in Figure 6B.
  • the query sub-plans of the second part and the third part are respectively executed by the data warehouse computing engine and the data fusion system computing engine to obtain corresponding query sub-results.
  • the query engine merges the query sub-results to finally obtain the data query result.
  • the query engine can realize unified query on two systems: cross-institutional domain data fusion and local institution domain data warehouse.
  • Fig. 7 is a flow chart of another method for generating a data table based on the embodiment shown in Fig. 4.
  • the method may be executed by a server, which may be a server in the current organization domain for providing data services, such as data query, data storage, data update, etc.
  • the method further includes step 204.
  • step 204 when materialization is performed on the logical table, a materialized table for carrying the physical data of the logical table is generated.
  • the materialized table may be generated by a materialization engine.
  • the main differences between the materialization of a logical table and the materialized view in a database are as follows.
  • the materialized view in the database must have data storage; while the logical table does not necessarily have data storage. Data storage will only be available when materialization is turned on. If materialization is not turned on, there will be no data storage.
  • the data storage of the materialized view is stored by itself; when the logical table is materialized, the materialized data storage of the logical table is carried by a separate physical table.
  • the logical table and its corresponding physical table together constitute the materialized logical table.
  • a materialized view is a data object; and a logical table that is materialized can correspond to multiple physical tables.
  • the materialized view is directly facing the user; while when the materialized logical table is opened, the physical table generated is not directly facing the user. The user still directly uses the logical table instead of the generated physical table.
  • the table that carries the materialized data storage of the logical table is called the materialized table.
  • the materialized table is stored in each underlying system.
  • the correspondence between the logical table and the materialized table is shown in Figure 8, where materialized table 1, materialized table 2, and materialized table 3 are stored in the data warehouse of the A organization domain, the current organization domain (i.e., the B organization domain), and the C organization domain, respectively.
  • the designated field required for generating the logic table can be determined from the fields included in the data source table.
  • the method for determining the designated field has been introduced in the above embodiment and will not be repeated here.
  • the materialized table may be generated based on at least one of a materialized table name, an output field of the materialized table, and the physical data.
  • the process of generating a materialized table is similar to that of generating a logical table, except that when generating a materialized table, it is necessary to determine the physical data corresponding to the output fields of the materialized table.
  • trade_indicator that is, when materialization is turned on, two materialized tables: trade_1d and trade_7d will be automatically generated.
  • the output fields corresponding to the materialized table trade_1d include: buyer_id, seller_id, quantity_max_1d, dt; the corresponding materialization logic is as follows:
  • the output fields of the materialized table trade_7d are as follows: buyer_id, seller_id, amount_cloud_7d, dt; the corresponding materialization logic is as follows:
  • the automation of the logic table can be realized, which improves the data query efficiency and has high usability.
  • the present disclosure provides a data virtualization system, which can be deployed on a local domain data warehouse system and a cross-domain data fusion system. Its structural diagram is shown in Figure 9, including: a logical table, a query engine, and may also include a materialization engine.
  • the method of generating the logic table is similar to that in the embodiment shown in FIG. 2 , and will not be described in detail here.
  • the structure of the query engine is similar to that of FIG. 5 , and the operations performed by the query engine are similar to those shown in FIG. 4 , which will not be described in detail here.
  • the data virtualization object of the logic table can be used to achieve the compatibility and unification of data objects on the local domain data warehouse and the cross-organization domain data fusion system.
  • unified query on the cross-organization domain data fusion and the local domain data warehouse system can be achieved, and the automation of the logic table is realized, which improves the data query efficiency and has high availability.
  • FIG10 is a schematic structural diagram of an electronic device provided by an exemplary embodiment, and the electronic device may be a data server, which is not limited in the present disclosure.
  • the device includes a processor 1002, an internal bus 1004, a network interface 1006, a memory 1008, and a non-volatile memory 1010, and may also include hardware required for other services.
  • One or more embodiments of this specification may be implemented based on software, such as the processor 1002 reading the corresponding computer program from the non-volatile memory 1010 into the memory 1008 and then running it.
  • one or more embodiments of this specification do not exclude other implementations, such as logic devices or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or logic devices.
  • the device for generating a data table can be applied to the device shown in Figure 10 to implement the technical solution of this specification.
  • the device for generating a data table may include: a processing module 1101, which is used to determine a data source table, wherein the data source table is at least one of a data warehouse table of the current organization domain, a data federation table across organization domains, and an original data table across organization domains; a generating module 1102, which is used to generate a logical table for performing data virtualization based on the fields included in the data source table; wherein the logical table is used to provide data query results for data applications.
  • a typical implementation device is a computer, which may be in the form of a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, a game console, a tablet computer, a wearable device or a combination of any of these devices.
  • a computer includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information.
  • Information can be computer readable instructions, data structures, program modules or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, disk storage, quantum memory, graphene-based storage media or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.
  • first, second, third, etc. may be used to describe various information in one or more embodiments of this specification, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
  • the word "if” as used herein may be interpreted as "at the time of” or "when” or "in response to determining”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本说明书一个或多个实施例提供一种生成数据表的方法及装置,其中,该方法包括:确定数据来源表,其中,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表;其中,所述逻辑表用于为数据应用提供数据查询结果。在本公开中,可以生成用于执行数据虚拟化的逻辑表,通过逻辑表的数据虚拟化对象,达到本域数据仓库和跨机构域数据融合两种系统之上数据对象的兼容统一。

Description

生成数据表的方法及装置 技术领域
本说明书一个或多个实施例涉及终端技术领域,尤其涉及一种生成数据表的方法及装置。
背景技术
随着数字化程度越来越高,越来越多的机构建设或者使用了数据仓库系统,为上层数据应用提供支撑服务。随着数据驱动力越来越强,而且数据应用对效果的要求越来越高,数据应用不仅使用本机构的数据,还寻求使用其他机构的数据。随着数据合规要求越来越严格,若想使用其他机构的数据,直接连接或者直接采集的方式越来越少,更多的选择建设或者使用合规的跨机构域数据融合系统,比如基于隐私计算技术构建的系统。这样数据应用的数据来源既有来自本域数据仓库系统的数据,又有来自跨机构域数据融合系统的数据。
但是本机构域数据仓库系统和跨机构域数据融合系统相互独立,两种系统中的数据相互隔离,对于上层数据应用没有形成统一的数据视角,也没有形成统一的服务对接。
发明内容
有鉴于此,本说明书一个或多个实施例提供一种生成数据表的方法及装置,通过逻辑表的数据虚拟化对象,达到本机构域数据仓库和跨机构域数据融合两种系统之上,数据对象的兼容统一。
根据本说明书一个或多个实施例的第一方面,提出了一种生成数据表的方法,包括:确定数据来源表,其中,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表;其中,所述逻辑表用于为数据应用提供数据查询结果。
根据本说明书一个或多个实施例的第二方面,提出了一种生成数据表的装置,包括:处理模块,用于确定数据来源表,其中,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;生成模块,用于 基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表;其中,所述逻辑表用于为数据应用提供数据查询结果。
根据本说明书一个或多个实施例的第三方面,提出了一种数据虚拟化系统,包括:用于执行数据虚拟化的逻辑表,所述逻辑表是基于数据来源表所包括的字段生成的,且所述逻辑表用于为数据应用提供数据查询结果,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;查询引擎,用于为数据应用提供查询所述逻辑表的所述数据查询结果。
根据本说明书一个或多个实施例的第四方面,提出了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器通过运行所述可执行指令以实现如第一方面中任一项所述的生成数据表的方法。
根据本说明书一个或多个实施例的第五方面,提出了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现如第一方面中任一项所述生成数据表的方法的步骤。
本说明书的实施例提供的技术方案可以包括以下有益效果:在本公开中,可以生成用于执行数据虚拟化的逻辑表,该逻辑表可以用于为数据应用提供数据查询结果,通过逻辑表的数据虚拟化对象,达到本域数据仓库和跨机构域数据融合两种系统之上数据对象的兼容统一。此外,可以实现跨机构域数据融合和本域数据仓库两种系统之上的统一的查询。且实现了逻辑表的自动物化,可用性高。
附图说明
图1是一示例性实施例提供的两种数据系统相互粒度的场景示意图。
图2是一示例性实施例提供的一种生成数据表的方法的流程图。
图3是一示例性实施例提供的一种基于逻辑表生成新的逻辑表的场景示意图。
图4是一示例性实施例提供的另一种生成数据表的方法的流程图。
图5是一示例性实施例提供的一种查询引擎的结构示意图。
图6A至图6B是一示例性实施例提供的查询计划示意图。
图7是一示例性实施例提供的另一种生成数据表的方法的流程图。
图8是一示例性实施例提供的一种存储物化表的场景示意图。
图9是一示例性实施例提供的一种数据虚拟化系统的结构示意图。
图10是一示例性实施例提供的一种电子设备的结构示意图。
图11是一示例性实施例提供的一种生成数据表的装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书一个或多个实施例相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书一个或多个实施例的一些方面相一致的装置和方法的例子。
需要说明的是:在其他实施例中并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤。在一些其他实施例中,其方法所包括的步骤可以比本说明书所描述的更多或更少。此外,本说明书中所描述的单个步骤,在其他实施例中可能被分解为多个步骤进行描述;而本说明书中所描述的多个步骤,在其他实施例中也可能被合并为单个步骤进行描述。
在介绍本公开提供的方案之前,先介绍一下本公开涉及到的术语。
数据仓库:数据仓库是信息的中央集中存储库,用于存储和处理数据,以便对数据进行分析。数据一般定期或者实时从在线事务系统、关系型数据库、消息系统和其他系统采集流入到数据仓库。本提案中提到的数据仓库特指大数据仓库,是基于大数据存储和计算系统构建的数据仓库。
本域数据仓库:归属于本机构域的数据仓库,存储的数据归属于本机构法律主体或者合规授权给本机构法律主体。
跨机构域数据融合:不同机构之间数据的流转、共享、分析、计算等,和合规要求下建设在不同的机构之间,为了打破机构之间数据孤岛,共同发挥数据的更大价值。
隐私计算:从计算讲,它是一组技术统称,以解决数据计算过程中的数据安全和隐私保护的问题。以多方安全计算(Secure Muti-Party Computation,MPC)、联邦学习(Federated Learning,FL)、可信执行环境(Trusted Execution Environment,TEE)等技术为代表。
联邦表:一种跨机构域数据融合系统之上的数据虚拟化对象,对上层屏蔽多数据集的分散形态。
动态数据管理框架Apache Calcite:Apache Calcite用于构建数据库或者数据管理系统的开源框架。包括一个结构化查询语言(Structured Query Language,SQL)解析器,一个用于在关系代数中构建表达式的应用程序接口(Application Program Interface,API)和一个查询计划引擎。
开源语法分析器ANTLR:ANTLR的全名是ANother Tool for Language Recognition,是基于LL算法实现的语法解析器生成器,广泛用于构建语言、工具和框架。
元数据:用于描述数据属性(property)的信息,某个数据表的元数据可以理解为该数据表的字段名,例如,某个数据表的元数据包括:用户标识id、性别gender、年龄age等。
物理数据:用于描述数据的具体信息,某个数据表的物理数据可以理解为该数据表的字段值,例如,某个数据表的元数据包括:用户标识id、性别gender、年龄age等,物理数据包括:id#1、女、28岁等。
数据虚拟化:用来描述数据管理方法,这些方法允许应用程序,例如数据应用检索并管理数据,且不需要数据相关的技术细节,例如数据格式化的方式或物理位置所在。其中,物理位置在本公开中可以理解为机构域所对应的地理位置。
随着数字化程度越来越高,越来越多的机构建设或者使用了数据仓库系统,为上层数据应用提供支撑服务。随着数据驱动力越来越强,而且数据应用对效果的要求越来越高,数据应用不仅使用本机构的数据,还寻求使用其他机构的数据。随着数据合规要求越来越严格,若想使用其他机构的数据,直接连接或者直接采集的方式越来越少,更多的选择建设或者使用合规的跨机构域数据融合系统,比如基于隐私计算技术构建的系统。
这样数据应用的数据来源既有来自本域数据仓库系统的数据,又有来自跨机构域数据融合系统的数据。但是本域数据仓库系统和跨机构域数据融合系统相互独立,两种系统中的数据相互隔离,对于上层数据应用没有形成统一的数据视角,也没有形成统一的服务对接,例如图1所示。
示例性地,某风控类的数据应用,既使用本域内的数据仓库系统的数据,又通过数据融合系统使用其他机构的数据做联合风控,使用的数据集既有数据仓库的数仓表, 又有数据融合系统的联邦表或者原始数据表,而且该数据应用需要分别对接这两种系统。
但是本机构域数据仓库系统和跨机构域数据融合系统相互独立,两种系统中的数据相互隔离,对于上层数据应用没有形成统一的数据视角,也没有形成统一的服务对接。
为了解决这一技术问题,本公开提供了以下生成数据的方法及装置。
图2是一示例性实施例提供的一种生成数据表的方法流程图。请参考图2,该方法可以由服务器执行,该服务器可以是当前机构域内用于提供数据服务,例如数据查询、数据存储、数据更新等的服务器,包括步骤201至步骤202。
在步骤201中,确定数据来源表。
在本公开实施例中,所述数据来源表可以是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个。
其中,跨机构域可以理解为跨域名,一般情况下,不同的域名对应不同的地理区域。
在一个示例中,假设当前机构域为机构域B,数据来源表可以包括机构域C的数据仓库表。
在另一个示例中,假设当前机构域为机构域B,数据来源表可以包括跨机构域A和机构域C的联邦表。
在另一个示例中,假设当前机构域为机构域B,数据来源表可以包括机构域B的数据仓库表和跨机构域A和机构域C的原始数据表。
以上仅为示例性说明,数据来源表至少是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个的任意一种情况,均应属于本公开的保护范围。
在步骤202中,基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表。
在本公开实施例中,逻辑表可以用于为数据应用提供数据查询结果。
逻辑表是从业务视角定义的一份数据视图,数据来源于本机构域内的数据仓库表和跨机构域数据融合的联邦表或者跨机构域的原始数据表,本公开对此不作限定。
在一个示例中,逻辑表可以只存储元数据,不存储物理数据。逻辑表的目的是为 了统一规范的数据建模,基于逻辑表可以把数据仓库数据和数据融合数据放在一起建设统一的数据模型。基于逻辑表还可以创建其他的逻辑表,例如图3所示,在基于本机构域的数据仓库表(即数仓表)和跨机构域的联邦表生成一张逻辑表后,可以基于该逻辑表生成其他的逻辑表,图3中又生成了两张逻辑表。基于逻辑表生成新的逻辑表的方式与生成逻辑表的方式类似,生成逻辑表的方式如下:在一个示例中,可以采用预定义规则,确定逻辑表名。
进一步地,可以在所述数据来源表所包括的字段中,确定生成所述逻辑表所需要的指定字段,其中,指定字段是数据来源表所包括的一个或多个字段。
进一步地,可以确定所述逻辑表的输出字段与所述指定字段之间的计算逻辑关系。
在本公开实施例中,逻辑表的输出字段可以与指定字段相同,或者逻辑表的输出字段需要基于指定字段之间的计算逻辑关系确定。
例如,指定字段包括买方标识buyer_id、卖方标识seller_id。逻辑表的输出字段可以包括买方标识buyer_id、卖方标识seller_id。
再例如,指定字段包括买方标识buyer_id、卖方标识seller_id、订单交易量quantity,逻辑表的输出字段包括日最大订单交易量quantity_max_1d。
其中,逻辑表的输出字段quantity_max_1d与所述指定字段之间的计算逻辑关系为:按照buyer_id、seller_id分组(即统计同一天内)对quantity求最大值。
再进一步地,可以基于所述逻辑表名、所述数据来源表、所述指定字段以及所述计算逻辑关系中的至少一项,生成所述逻辑表。
示例性地,可以基于预定义规则,确定逻辑表名为trade_indicator。
进一步地,数据来源表包括:当前机构域的数据仓库表MAXCOMPUTE.default.order,以及机构域A和机构C的联邦表FeDX.default.event,数据来源表包括多个时,多个数据来源表之间的关联条件是join,join的字段和条件是order.id=event.id。
选择的数据来源表的指定字段包括:买方标识event.buyer_id、卖方标识event.seller_id、订单交易量order.quantity、交易渠道order.channel、价格event.price。
逻辑表的输出字段包括:买方标识buyer_id、卖方标识seller_id、日最大订单交易量quantity_max_1d、云上的周订单交易收入总和amount_cloud_7d。
计算逻辑关系包括:输出字段quantity_max_1d与指定字段buyer_id、seller_id、quantity之间的计算逻辑关系包括:按照buyer_id、seller_id分组对quantity求最大值;输出字段amount_cloud_7d字段与指定字段buyer_id、seller_id、quantity、price之间的计算逻辑关系包括:统计7天数据量,按照buyer_id、seller_id分组(统计七日内)对quantity×price求和,限制条件是交易渠道channel='cloud'(云上)。
以上仅为示例性说明,实际应用中,将当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个作为数据来源表,从而生成用于执行数据虚拟化的逻辑表的方案均应属于本公开的保护范围。
上述实施例中,可以生成用于执行数据虚拟化的逻辑表,该逻辑表可以用于为数据应用提供数据查询结果,通过逻辑表的数据虚拟化对象,达到本域数据仓库和跨机构域数据融合两种系统之上数据对象的兼容统一,可用性高。
在一些可选实施例中,图4是基于图2所示实施例提供的另一种生成数据表的方法流程图。请参考图4,该方法可以由服务器执行,该服务器可以是当前机构域内用于提供数据服务,例如数据查询、数据存储、数据更新等的服务器,所述方法还包括步骤203。
在步骤203中,为数据应用提供查询所述逻辑表的查询引擎。
在本公开实施例中,可以为数据应用提供查询上述逻辑表的查询引擎,示例性地,该查询引擎的结构可以例如图5所示。
在一个示例中,查询引擎支持的查询语言为SQL语言,可以为标准SQL语言的子集,并有适量语法扩展,本公开对此不作限定。该查询引擎针对数据应用的对接接口同样支持SQL语句。
其中,图5中查询引擎的解析可以由SQL parser执行,SQL parser可以采用Apache Calcit或ANTLR来实现,本公开对此不作限定。
在另一个示例中,查询引擎用于存储所述逻辑表的元数据。即查询引擎不直接存储各个机构域物理表的元数据。
在本公开实施例中,逻辑表的元数据校验和/或鉴权由查询引擎来实现,物理表的元数据校验和/或鉴权通过查询引擎调用物理表各自所在机构域的计算引擎来实现。
在另一个示例中,可以由所述查询引擎生成跨机构域的查询计划,其中,该查询 计划可以基于数据应用提供的查询语句来生成。
进一步地,由所述查询引擎将所述查询计划拆分为至少一个查询子计划,其中,所述查询子计划与所述数据来源表的计算引擎一一对应。也就是说,由查询引擎将查询计划按照每个数据来表所对应的计算引擎分别进行拆分。
每个计算引擎执行查询子计划得到查询子结果后,可以由查询引擎合并所述查询子结果,并将得到的所述数据查询结果提供给所述数据应用。
示例性地,查询引擎接收到来自数据应用的查询语句如下:
Figure PCTCN2022135215-appb-000001
进一步地,查询引擎将其翻译成查询计划,例如图6A所示。
其中逻辑表对应的数据来源表包括当前机构域的数据仓库表和数据融合系统的联邦表,分别对应的计算引擎为数据仓库计算引擎、数据融合系统计算引擎。查询引擎将上述的查询计划拆分成三个部分,包括由查询引擎所执行的第一部分,由数据仓库计算引擎所执行的第二部分和由数据融合系统计算引擎所执行的第三部分,例如图6B所示。
第二部分和第三部分的查询子计划分别由数据仓库计算引擎和数据融合系统计算引擎执行,得到对应的查询子结果,查询引擎合并所述查询子结果,最终得到数据查询结果。
上述实施例中,通过查询引擎可以实现跨机构域数据融合和本机构域数据仓库两种系统之上的统一的查询。
在一些可选实施例中,图7是基于图4所示实施例提供的另一种生成数据表的方法流程图。请参考图7,该方法可以由服务器执行,该服务器可以是当前机构域内用于提供数据服务,例如数据查询、数据存储、数据更新等的服务器,所述方法还包括步骤 204。
在步骤204中,当对所述逻辑表执行物化时,生成用于承载所述逻辑表的物理数据的物化表。
在一个示例中,可以由物化引擎生成所述物化表。
在一个示例中,逻辑表的物化与数据库中的物化视图主要区别点有以下几项。
首先,数据库中的物化视图是一定有数据存储的;而逻辑表不一定有数据存储,在打开物化的时候,才会有数据存储,如果没有打开物化,是没有数据存储的。
其次,物化视图的数据存储是由其本身存储;而逻辑表在物化的时候,逻辑表的物化数据存储由单独的物理表来承载,逻辑表和其对应的物理表共同组成了物化了的逻辑表。
再次,一个物化视图就是一个数据对象;而打开物化的一个逻辑表可以对应多个物理表。
再次,物化视图是直接面对使用者的;而打开物化的逻辑表,所产生的物理表是不直接面对使用者的,使用者还是直接使用逻辑表而不是所产生的物理表。
另外,对于物化到数据融合系统的数据,在一些合规要求下,有些数据是分散物化存储到各个机构域。
承载逻辑表物化数据存储的表称为物化表,物化表存储在底层的各个系统,逻辑表和物化表的对应关系如图8所示,其中,物化表1、物化表2、物化表3分别存储在A机构域内、当前机构域(即B机构域)的数据仓库、C机构域内。
首先,可以在所述数据来源表所包括的字段中,确定生成所述逻辑表所需要的所述指定字段。确定指定字段的方式已经在上述实施例进行了介绍,此处不再赘述。
其次,需要确定所述物化表的输出字段与所述指定字段之间的物化逻辑关系。
进一步地,基于所述指定字段和所述物化逻辑关系,确定与所述物化表的输出字段所对应的物理数据。
进一步地,可以基于物化表名、所述物化表的输出字段和所述物理数据中的至少一项,生成所述物化表。
生成物化表的过程与生成逻辑表的过程类似,只是在生成物化表时需要确定物化 表输出字段所对应的物理数据。
示例性地,上文的逻辑表trade_indicator,当对所述逻辑表执行物化时,即打开物化时,会自动生成两张物化表:trade_1d、trade_7d。
其中,物化表trade_1d对应的输出字段包括:buyer_id、seller_id、quantity_max_1d、dt;对应的物化逻辑如下:
Figure PCTCN2022135215-appb-000002
其中,物化表trade_7d对应输出的字段如下:buyer_id、seller_id、amount_cloud_7d、dt;对应的物化逻辑如下:
Figure PCTCN2022135215-appb-000003
Figure PCTCN2022135215-appb-000004
以上仅为示例性说明,本领域技术人员能够理解的是,基于逻辑表生成物化表的方案均应属于本公开的保护范围。
上述实施例中,可以实现逻辑表的自动物化,提高了数据查询效率,可用性高。
在一些可选实施例中,本公开提供了一种数据虚拟化系统,该系统可以部署在本机构域数据仓库系统和跨机构域数据融合系统之上,其结构示意图参考图9所示,包括:逻辑表、查询引擎,还可以包括物化引擎。
其中,逻辑表的生成方式与上述图2所示实施例类似,在此不再赘述。
其中,查询引擎的结构与上述图5类似,查询引擎所执行的操作与上述图4所示类似,在此同样不再赘述。
其中,物化引擎生成物化表的方式已经在上述实施例进行了介绍,此处不再赘述。
上述实施例中,可以通过逻辑表的数据虚拟化对象,达到本域数据仓库和跨机构域数据融合两种系统之上数据对象的兼容统一。此外,可以实现跨机构域数据融合和本域数据仓库两种系统之上的统一的查询,且实现了逻辑表的自动物化,提高了数据查询效率,可用性高。
图10是一示例性实施例提供的一种电子设备的示意结构图,该电子设备可以为数据服务器,本公开对此不作限定。请参考图10,在硬件层面,该设备包括处理器1002、内部总线1004、网络接口1006、内存1008以及非易失性存储器1010,当然还可能包括其他业务所需要的硬件。本说明书一个或多个实施例可以基于软件方式来实现,比如由 处理器1002从非易失性存储器1010中读取对应的计算机程序到内存1008中然后运行。当然,除了软件实现方式之外,本说明书一个或多个实施例并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。
请参考图11,生成数据表的装置可以应用于如图10所示的设备中,以实现本说明书的技术方案。其中,该生成数据表的装置可以包括:处理模块1101,用于确定数据来源表,其中,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;生成模块1102,用于基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表;其中,所述逻辑表用于为数据应用提供数据查询结果。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
在一个典型的配置中,计算机包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带、磁盘存储、量子存储器、基于石墨烯的存储介质或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
在本说明书一个或多个实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书一个或多个实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
以上所述仅为本说明书一个或多个实施例的较佳实施例而已,并不用以限制本说明书一个或多个实施例,凡在本说明书一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书一个或多个实施例保护的范围之内。

Claims (14)

  1. 一种生成数据表的方法,其特征在于,包括:
    确定数据来源表,其中,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;
    基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表;其中,所述逻辑表用于为数据应用提供数据查询结果。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表,包括:
    确定逻辑表名;
    在所述数据来源表所包括的字段中,确定生成所述逻辑表所需要的指定字段;
    确定所述逻辑表的输出字段与所述指定字段之间的计算逻辑关系;
    基于所述逻辑表名、所述数据来源表、所述指定字段以及所述计算逻辑关系中的至少一项,生成所述逻辑表。
  3. 根据权利要求1所述的方法,其特征在于,还包括:
    为数据应用提供查询所述逻辑表的查询引擎。
  4. 根据权利要求3所述的方法,其特征在于,所述查询引擎用于存储所述逻辑表的元数据。
  5. 根据权利要求4所述的方法,其特征在于,还包括:
    由所述查询引擎对所述逻辑表的元数据进行校验和/或鉴权。
  6. 根据权利要求4所述的方法,其特征在于,还包括:
    由所述查询引擎生成跨机构域的查询计划;
    由所述查询引擎将所述查询计划拆分为至少一个查询子计划,其中,所述查询子计划与所述数据来源表的计算引擎一一对应;
    由所述查询引擎获取每个所述计算引擎执行对应的所述查询子计划的查询子结果;
    由所述查询引擎合并所述查询子结果,并将合并得到的所述数据查询结果提供给所述数据应用。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,还包括:
    当对所述逻辑表执行物化时,生成用于承载所述逻辑表的物理数据的物化表。
  8. 根据权利要求7所述的方法,其特征在于,所述生成用于承载所述逻辑表的物理数据的物化表,包括:
    在所述数据来源表所包括的字段中,确定生成所述逻辑表所需要的指定字段;
    确定所述物化表的输出字段与所述指定字段之间的物化逻辑关系;
    基于所述指定字段和所述物化逻辑关系,确定与所述物化表的输出字段所对应的物理数据;
    基于物化表名、所述物化表的输出字段和所述物理数据中的至少一项,生成所述物化表。
  9. 根据权利要求7所述的方法,其特征在于,所述生成用于承载所述逻辑表的物理数据的物化表,包括:
    由物化引擎生成所述物化表。
  10. 一种生成数据表的装置,其特征在于,包括:
    处理模块,用于确定数据来源表,其中,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;
    生成模块,用于基于所述数据来源表所包括的字段,生成用于执行数据虚拟化的逻辑表;其中,所述逻辑表用于为数据应用提供数据查询结果。
  11. 一种数据虚拟化系统,其特征在于,包括:
    用于执行数据虚拟化的逻辑表,所述逻辑表是基于数据来源表所包括的字段生成的,且所述逻辑表用于为数据应用提供数据查询结果,所述数据来源表是当前机构域的数据仓库表、跨机构域的数据联邦表以及跨机构域的原始数据表中的至少一个;
    查询引擎,用于为数据应用提供查询所述逻辑表的所述数据查询结果。
  12. 根据权利要求11所述的系统,其特征在于,还包括:
    物化引擎,用于当对所述逻辑表执行物化时,生成承载所述逻辑表的物理数据的物化表。
  13. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1-9中任一项所述的生成数据表的方法。
  14. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,该指令被处理器执行时实现如权利要求1-9中任一项所述生成数据表的方法的步骤。
PCT/CN2022/135215 2022-10-31 2022-11-30 生成数据表的方法及装置 WO2024092926A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211351865.3 2022-10-31
CN202211351865.3A CN115935926A (zh) 2022-10-31 2022-10-31 生成数据表的方法及装置

Publications (1)

Publication Number Publication Date
WO2024092926A1 true WO2024092926A1 (zh) 2024-05-10

Family

ID=86647906

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135215 WO2024092926A1 (zh) 2022-10-31 2022-11-30 生成数据表的方法及装置

Country Status (2)

Country Link
CN (1) CN115935926A (zh)
WO (1) WO2024092926A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790322A (zh) * 2004-12-17 2006-06-21 国际商业机器公司 从多个格式不同的物理表创建逻辑表的方法和系统
US20120124081A1 (en) * 2010-11-17 2012-05-17 Verizon Patent And Licensing Inc. Method and system for providing data migration
CN112905595A (zh) * 2021-03-05 2021-06-04 腾讯科技(深圳)有限公司 一种数据查询方法、装置及计算机可读存储介质
CN112966004A (zh) * 2021-03-04 2021-06-15 北京百度网讯科技有限公司 数据查询方法、装置、电子设备以及计算机可读介质
CN114647716A (zh) * 2022-05-13 2022-06-21 天津南大通用数据技术股份有限公司 一种泛化数据仓库

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790322A (zh) * 2004-12-17 2006-06-21 国际商业机器公司 从多个格式不同的物理表创建逻辑表的方法和系统
US20120124081A1 (en) * 2010-11-17 2012-05-17 Verizon Patent And Licensing Inc. Method and system for providing data migration
CN112966004A (zh) * 2021-03-04 2021-06-15 北京百度网讯科技有限公司 数据查询方法、装置、电子设备以及计算机可读介质
CN112905595A (zh) * 2021-03-05 2021-06-04 腾讯科技(深圳)有限公司 一种数据查询方法、装置及计算机可读存储介质
CN114647716A (zh) * 2022-05-13 2022-06-21 天津南大通用数据技术股份有限公司 一种泛化数据仓库

Also Published As

Publication number Publication date
CN115935926A (zh) 2023-04-07

Similar Documents

Publication Publication Date Title
TWI748175B (zh) 資料的處理方法、裝置及設備
TWI706259B (zh) 資料的查詢方法及查詢裝置
Costa et al. Big Data: State-of-the-art Concepts, Techniques, Technologies, Modeling Approaches and Research Challenges.
US10025846B2 (en) Identifying entity mappings across data assets
US11023486B2 (en) Low-latency predictive database analysis
CN109034988B (zh) 一种会计分录生成方法和装置
CN106547809A (zh) 将复合关系表示在图数据库中
CN111858615B (zh) 数据库表生成方法、系统、计算机系统和可读存储介质
US8533159B2 (en) Processing materialized tables in a multi-tenant application system
US20150294120A1 (en) Policy-based data-centric access control in a sorted, distributed key-value data store
CN113711218A (zh) 协同智能的约束查询以及约束计算
JP2018506775A (ja) トランザクションアクセスパターンに基づいた結合関係の識別
US20140006369A1 (en) Processing structured and unstructured data
CN107832392A (zh) 一种元数据管理系统
CN113609141B (zh) 一种基于api拼接的无侵入式跨库数据融合方法
Feuerlicht Database Trends and Directions: Current Challenges and Opportunities.
Montoya et al. A knowledge base for personal information management
Glava et al. Information Systems Reengineering Approach Based on the Model of Information Systems Domains
WO2024092926A1 (zh) 生成数据表的方法及装置
O'Sullivan et al. Applying data models to big data architectures
US11630852B1 (en) Machine learning-based clustering model to create auditable entities
WO2024092927A1 (zh) 生成数据表的方法及装置
Tahat et al. Mapping relational operations onto hypergraph model
Li et al. Real‐Time Controllable Optimization Algorithm for Correlated Big Data in Cloud Computing Environment
US20230418869A1 (en) Graph workspace for heterogeneous graph data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22964222

Country of ref document: EP

Kind code of ref document: A1