CN115023921A

CN115023921A - System and method for global data sharing

Info

Publication number: CN115023921A
Application number: CN202180011492.2A
Authority: CN
Inventors: 朱培基; 本诺特·戴奇维勒; 马修·格利克曼; 克里斯蒂安·克雷纳尔曼; 普拉桑纳·克里希南; 贾斯汀·朗塞斯
Original assignee: Snowflake Computing Inc
Current assignee: Snowflake Inc
Priority date: 2020-01-28
Filing date: 2021-01-20
Publication date: 2022-09-06
Anticipated expiration: 2041-01-20
Also published as: US20210250400A1; KR20220130728A; US20210344747A1; US11323506B2; US20230007074A1; EP4097955A1; US11030343B1; US11805167B2; US11743324B2; US11463508B1; US10999355B1; US11082483B1; CN115023921B; US20220239728A1; US20230362235A1; US11418577B1; US20210320968A1; US20240129360A1; WO2021154567A1; EP4097955A4

Abstract

Shared data in data exchange across multiple cloud computing platforms and/or regions of cloud computing platforms is described. An example computer-implemented method can include receiving data sharing information from a data provider for sharing a data set in a data exchange from a first cloud computing entity to a set of second cloud computing entities. In response to receiving the data sharing information, the method may also include creating an account with each of the set of second cloud computing entities. The method may further include sharing the data set from the first cloud computing entity with the set of second cloud computing entities using at least the corresponding account of the second cloud computing entity.

Description

System and method for global data sharing

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请根据35U.S.C.§119(e)要求2020年3月10日提交的美国专利申请第16/814,875号的权益，该美国专利申请要求2020年1月28日提交的美国临时申请序列号62/966,977的权益，这些申请的公开内容通过引用以其整体并入本文。This application claims the benefit of US Patent Application Serial No. 16/814,875, filed March 10, 2020, under 35 U.S.C. §119(e), which claims US Provisional Application Serial No. 62, filed January 28, 2020 /966,977, the disclosures of these applications are hereby incorporated by reference in their entirety.

技术领域technical field

本公开涉及管理数据存储和计算资源的资源管理系统和方法。The present disclosure relates to resource management systems and methods for managing data storage and computing resources.

背景background

数据库广泛用于计算应用中的数据存储和访问。数据库可以包括一个或更多个表，该表包括或引用可以使用查询来读取、修改或删除的数据。数据库可以用于存储和/或访问个人信息或其它敏感信息。可以通过加密和/或以加密形式存储数据来提供对数据库数据的安全存储和访问，以防止未授权的访问。在一些情况下，可能需要数据共享以让其他方针对一组数据执行查询。Databases are widely used for data storage and access in computing applications. A database can include one or more tables that contain or reference data that can be read, modified, or deleted using queries. Databases may be used to store and/or access personal or other sensitive information. Secure storage and access to database data may be provided by encrypting and/or storing the data in encrypted form to prevent unauthorized access. In some cases, data sharing may be required to allow other parties to perform queries against a set of data.

附图简述Brief Description of Drawings

通过参考以下结合附图的描述，可以最好地理解所描述的实施例及其优点。这些附图决不限制本领域技术人员可以对所描述的实施例进行的形式和细节上的任何改变，而不脱离所描述的实施例的精神和范围。The described embodiments and their advantages are best understood by reference to the following description taken in conjunction with the accompanying drawings. These drawings in no way limit any changes in form and detail that may be made by those skilled in the art to the described embodiments without departing from the spirit and scope of the described embodiments.

图1A是描绘其中可以实现本文公开的方法的示例计算环境的框图。1A is a block diagram depicting an example computing environment in which the methods disclosed herein may be implemented.

图1B是示出示例虚拟仓库(warehouse)的框图。1B is a block diagram illustrating an example virtual warehouse.

图2是根据本发明的实施例的可用于实现公共或私有数据交换的数据的示意性框图。Figure 2 is a schematic block diagram of data that may be used to implement public or private data exchanges according to an embodiment of the present invention.

图3是根据本发明的实施例的用于实现数据交换的组件的示意性框图。3 is a schematic block diagram of components for implementing data exchange according to an embodiment of the present invention.

图4A是根据本发明的实施例的用于在数据交换中的实体之间的数据的受控共享的方法的过程流程图。4A is a process flow diagram of a method for controlled sharing of data between entities in a data exchange according to an embodiment of the invention.

图4B是示出根据本发明的实施例的用于实现数据的私有共享的数据的图。4B is a diagram illustrating data for enabling private sharing of data according to an embodiment of the present invention.

图4C是示出根据本发明的实施例的用于实现数据的私有共享的安全视图的图。4C is a diagram illustrating a security view for enabling private sharing of data according to an embodiment of the present invention.

图5是根据本发明的实施例的用于在数据交换中的实体之间公共共享数据的方法的过程流程图。5 is a process flow diagram of a method for commonly sharing data between entities in a data exchange, according to an embodiment of the present invention.

图6是根据本发明的实施例的用于在数据交换中执行双向共享的方法的过程流程图。6 is a process flow diagram of a method for performing bidirectional sharing in data exchange, according to an embodiment of the present invention.

图7是根据本发明的实施例的用于在数据交换中提供丰富的数据的方法的过程流程图。7 is a process flow diagram of a method for providing rich data in a data exchange according to an embodiment of the present invention.

图8是示出其中数据提供者可以经由云计算服务共享数据的网络环境的框图。8 is a block diagram illustrating a network environment in which data providers can share data via cloud computing services.

图9是根据本发明的实施例的示例私有数据交换。9 is an example private data exchange according to an embodiment of the present invention.

图10是示出来自私有数据交换的共享数据的示例安全视图的图。10 is a diagram illustrating an example security view of shared data from a private data exchange.

图11是示出两个私有数据交换之间的数据列表的示例隧穿的图。11 is a diagram illustrating an example tunneling of data lists between two private data exchanges.

图12是示出根据本发明的一些实施例的示例数据查询和递送服务的图。12 is a diagram illustrating an example data query and delivery service in accordance with some embodiments of the present invention.

图13A是与数据交换共享数据的多个云计算服务的示例系统的框图。13A is a block diagram of an example system of multiple cloud computing services sharing data with data exchange.

图13B是与跨云计算服务的多个区域的数据交换共享数据的云计算服务的示例系统的框图。13B is a block diagram of an example system of a cloud computing service that shares data with data exchanges across multiple regions of the cloud computing service.

图14是用于与云计算服务共享跨多个云计算服务和/或跨多个区域的数据的方法的过程流程图。14 is a process flow diagram of a method for sharing data across multiple cloud computing services and/or across multiple regions with a cloud computing service.

图15是用于在数据交换中创建列表的方法的过程流程图，其中该列表在不同的云计算服务和/或在具有云计算服务的多个区域中可用。15 is a process flow diagram of a method for creating a list in a data exchange where the list is available in different cloud computing services and/or in multiple regions with cloud computing services.

图16是用于在数据交换中为个性化共享创建列表的方法的过程流程图，其中该列表在不同的云计算服务和/或在具有云计算服务的多个区域中可用。16 is a process flow diagram of a method for creating a list for personalized sharing in a data exchange, where the list is available in different cloud computing services and/or in multiple regions with cloud computing services.

图17是用于与虚拟私有云(VPC)共享数据的方法的过程流程图。17 is a process flow diagram of a method for sharing data with a virtual private cloud (VPC).

图18是根据一些实施例的可以执行本文所述的一个或更多个操作的示例计算设备的框图。18 is a block diagram of an example computing device that can perform one or more operations described herein, according to some embodiments.

详细描述Detailed Description

数据提供者经常拥有难以共享的数据资产。数据资产可以是另一实体感兴趣的数据。例如，一个大型在线零售公司可能有包括过去十年中数百万顾客的购买习惯的数据集。这个数据集可能很大。如果在线零售商希望与另一实体共享该数据的全部或部分，则在线零售商可能需要使用旧的且缓慢的方法来传送数据，诸如文件传送协议(FTP)，或甚至将数据复制到物理介质上，并且然后将物理介质邮寄到另一个实体。这有几个缺点。首先，它很慢。复制兆兆字节(terabytes)或千万亿字节(petabytes)的数据可能需要几天的时间。其次，一旦传递了数据，共享者就无法控制数据发生了什么。接收方可以更改数据，进行复制或与其他方共享。再次，唯一有兴趣以此类方式访问如此大数据集的实体是大型公司，该公司可以负担得起传送和处理数据的复杂后勤工作，以及此类麻烦的数据传送的高昂价格。因此，较小的实体(例如，“夫妻店”)或甚至更小，更灵活的，以云为中心的初创公司通常因价格太高而无法访问该数据，即使该数据对其业务而言可能是有价值的。这可能是因为原始数据资产通常过于粗糙，并且充满了潜在的敏感数据，无法直接出售给其它公司。数据拥有者必须先进行数据清理、取消标识、聚合、连接和其它形式的数据充实，然后才能与另一方共享。这既费时又昂贵。最后，由于上述原因，传统的数据共享方法不允许可扩展的共享，因此很难与许多实体共享数据资产。传统的共享方法还会给所有访问最近更新的数据的各方带来时延和延迟。Data providers often have data assets that are difficult to share. A data asset may be data of interest to another entity. For example, a large online retail company might have a dataset that includes the buying habits of millions of customers over the past decade. This dataset can be quite large. If the online retailer wishes to share all or part of this data with another entity, the online retailer may need to use old and slow methods to transfer the data, such as File Transfer Protocol (FTP), or even copy the data to physical media , and then mail the physical media to another entity. This has several drawbacks. First, it's slow. Copying terabytes or petabytes of data can take days. Second, once the data is passed, the sharer has no control over what happens to the data. The recipient may change the data, copy it or share it with other parties. Again, the only entities interested in accessing such large data sets in this way are large corporations that can afford the complex logistics of transferring and processing the data, and the high price of such troublesome data transfers. As a result, smaller entities (e.g., "mom and pop shops") or even smaller, more flexible, cloud-centric startups are often too expensive to access that data, even though it may be a valuable. This may be because raw data assets are often too crude and full of potentially sensitive data to sell directly to other companies. Data owners must perform data cleansing, de-identification, aggregation, joins, and other forms of data enrichment before they can be shared with another party. This is time consuming and expensive. Finally, due to the above reasons, traditional data sharing methods do not allow scalable sharing, making it difficult to share data assets with many entities. Traditional sharing methods also introduce delays and delays to all parties accessing recently updated data.

私有数据交换可以允许数据提供者更容易和安全地与其它实体共享其数据资产。私有数据交换可以使用数据提供者的商标，并且数据提供者可以控制谁可以访问它。私有数据交换只能用于内部使用，或也可以向顾客、合作伙伴、供应商或其他人开放。数据提供者可以控制列出哪些数据资产，以及控制谁可以访问哪些数据集。这允许无缝的方式来发现和共享数据提供者的组织内及其业务伙伴的数据。Private data exchanges may allow data providers to more easily and securely share their data assets with other entities. A private data exchange can use the data provider's trademark, and the data provider can control who can access it. Private data exchange is for internal use only, or it can also be open to customers, partners, suppliers or others. Data providers can control which data assets are listed and who can access which datasets. This allows a seamless way to discover and share data within the data provider's organization and its business partners.

私有数据交换可以通过诸如SNOWFLAKE的云计算服务来促进，并且允许数据提供者在私有在线市场中以其自己的品牌直接从其自己的在线域(例如，网站)提供数据资产。私有数据交换可以为实体提供集中式，受管理的集线器，以列出内部或外部共享的数据资产，激发数据协作以及维护数据治理和审核访问。通过私有数据交换，数据提供者能够共享数据而无需在公司之间复制数据。数据提供者可以邀请其他实体查看其数据列表，控制哪些数据列表出现在其私有在线市场中，控制谁可以访问数据列表以及其他人如何可与连接到该列表的数据资产进行交互。可以将其视为“围墙花园”市场，在该市场中，进入花园的游客必须得到批准，并且某些列表的访问可能受到限制。Private data exchange may be facilitated by cloud computing services such as SNOWFLAKE, and allow data providers to offer data assets under their own brands in private online marketplaces directly from their own online domains (eg, websites). Private data exchanges can provide entities with a centralized, managed hub to list data assets shared internally or externally, inspire data collaboration, and maintain data governance and audit access. With a private data exchange, data providers are able to share data without duplicating data between companies. Data providers can invite other entities to view their data listings, control which data listings appear in their private online marketplace, who can access the data listings, and how others can interact with data assets connected to the listings. Think of it as a "walled garden" market, where visitors to the garden must be approved and access to certain listings may be restricted.

例如，公司A可以是一家消费者数据公司，该消费者数据公司已经收集并分析了数个不同类别中数百万个人的消费习惯。他们的数据集可以包括以下类别的数据：在线购物、视频流、电力消耗、汽车使用、互联网使用、服装购买、移动应用购买、俱乐部会员资格和在线订阅服务。公司A可能希望向其他实体提供这些数据集(或这些数据集的子集或派生产品)。例如，新的服装品牌可能希望访问与消费者服装购买和在线购物习惯有关的数据集。公司A可以支持其网站上的页面，该页面是或功能基本上类似于私有数据交换，其中数据消费者(例如，新服装品牌)可以直接从公司A浏览、探索、发现、访问并有可能购买数据集。此外，公司A可以控制：谁可以进入私有数据交换，可以查看特定列表的实体，实体可以关于列表采取的动作(例如，仅查看)，以及任何其它合适的动作。另外，数据提供者可以将其自己的数据与来自例如公共数据交换的其它数据集组合，并使用组合的数据创建新列表。For example, Company A may be a consumer data company that has collected and analyzed the spending habits of millions of individuals in several different categories. Their datasets can include data in the following categories: online shopping, video streaming, electricity consumption, car usage, internet usage, clothing purchases, mobile app purchases, club memberships, and online subscription services. Company A may wish to make these datasets (or subsets or derivatives of these datasets) available to other entities. For example, a new clothing brand might want to access datasets related to consumers' clothing purchases and online shopping habits. Company A may support a page on its website that is or functions essentially like a private data exchange, where data consumers (e.g., a new clothing brand) can browse, explore, discover, visit, and potentially purchase directly from Company A data set. In addition, Company A can control who can enter the private data exchange, who can view a particular list, what actions an entity can take with respect to the list (eg, view only), and any other suitable actions. Additionally, data providers can combine their own data with other datasets from, for example, public data exchanges, and use the combined data to create new lists.

私有数据交换可以是发现、组合、清理和丰富数据以使其更可获利的合适场所。大型公司在私有数据交换中可能会聚合其各个分支和部门的数据，这可能对另一家公司是有价值的。此外，私有生态系统数据交换的参与者可以一起工作，以将其数据集连接在一起，共同创建他们当中任何一方都无法单独生产的有用数据产品。一旦创建了这些连接的数据集，它们就可以在公共或私有数据交换上被列出。A private data exchange can be a suitable place to discover, combine, cleanse, and enrich data to make it more profitable. A large company may aggregate data from its various branches and departments in a private data exchange, which may be of value to another company. Additionally, participants in private ecosystem data exchanges can work together to connect their datasets together to create useful data products that neither of them can produce alone. Once these connected datasets are created, they can be listed on public or private data exchanges.

本文描述的系统和方法使用新的数据处理平台提供了灵活且可扩展的数据仓库。在一些实施例中，所描述的系统和方法利用支持基于云的存储资源，计算资源等的云基础设施。示例的基于云的存储资源以低成本提供了按需可用的大量存储容量。此外，这些基于云的存储资源可能是容错的并且高度可扩展，这在私有数据存储系统中实现可能会很昂贵。示例的基于云的计算资源是按需可用的，并且可以基于资源的实际使用水平来定价。通常，以快速方式动态部署、重新配置和停用(decommission)云基础设施。The systems and methods described herein provide a flexible and scalable data warehouse using a new data processing platform. In some embodiments, the described systems and methods utilize cloud infrastructure that supports cloud-based storage resources, computing resources, and the like. The example cloud-based storage resource provides a large amount of storage capacity available on demand at low cost. Furthermore, these cloud-based storage resources may be fault-tolerant and highly scalable, which can be expensive to implement in private data storage systems. Exemplary cloud-based computing resources are available on demand and can be priced based on actual usage levels of the resources. Typically, cloud infrastructure is dynamically deployed, reconfigured, and decommissioned in a rapid manner.

在所描述的系统和方法中，数据存储系统利用基于SQL(结构化查询语言)的关系数据库。然而，这些系统和方法适用于使用任何数据存储架构和使用任何语言在数据存储和检索平台内存储和检索数据的任何类型的数据库以及任何类型的数据存储和检索平台。本文描述的系统和方法还提供了多租户(multi-tenant)系统，该多租户系统支持隔离不同顾客/客户端(client)之间以及同一顾客/客户端内的不同用户之间的计算资源和数据。In the described systems and methods, the data storage system utilizes a relational database based on SQL (Structured Query Language). However, these systems and methods are applicable to any type of database and any type of data storage and retrieval platform using any data storage architecture and using any language to store and retrieve data within a data storage and retrieval platform. The systems and methods described herein also provide a multi-tenant system that supports isolation of computing resources and computing resources between different customers/clients and between different users within the same customer/client. data.

图1A是其中可以实现本文公开的系统和方法的示例计算环境100的框图。具体地，可以实现云计算平台110，例如AMAZON WEB SERVICE^TM(AWS)、MICROSOFT AZURE^TM、GOOGLECLOUD^TM等。如本领域中已知的，云计算平台110提供可以被获取(购买)或租赁并被配置为执行应用和存储数据的计算资源和存储资源。1A is a block diagram of an example computing environment 100 in which the systems and methods disclosed herein may be implemented. Specifically, a cloud computing platform 110, such as AMAZON WEB SERVICE ^™ (AWS), MICROSOFT AZURE ^™ , GOOGLECLOUD ^™ , etc., may be implemented. As is known in the art, cloud computing platform 110 provides computing and storage resources that can be acquired (purchased) or leased and configured to execute applications and store data.

云计算平台110可以托管云计算服务112，该云计算服务112促进数据在云计算平台110上的存储(例如，数据管理和访问)和分析功能(例如，SQL查询、分析)以及其它计算能力(例如，云计算平台110的用户之间的安全数据共享)。云计算平台110可以包括三层架构：数据存储140，查询处理130和云服务120。Cloud computing platform 110 may host cloud computing services 112 that facilitate data storage (eg, data management and access) and analytical functions (eg, SQL queries, analytics) and other computing capabilities (eg, SQL queries, analytics) on cloud computing platform 110 For example, secure data sharing among users of cloud computing platform 110). Cloud computing platform 110 may include a three-tier architecture: data storage 140 , query processing 130 and cloud service 120 .

数据存储140可以促进将数据存储在一个或更多个云数据库141中的云计算平台110上。数据存储140可以使用诸如AMAZON S3的存储服务在云计算平台110上存储数据和查询结果。在特定实施例中，为了将数据加载到云计算平台110中，可以将数据表水平划分成大的，不可变的文件，该文件可以类似于传统数据库系统中的块或页面。在每个文件内，每个属性或列的值被分组在一起，并使用有时被称为混合列式(hybrid columnar)的方案进行压缩。每个表都有一个表头，表头中除了其它元数据外，还包含文件内每一列的偏移。Data storage 140 may facilitate storing data on cloud computing platform 110 in one or more cloud databases 141 . The data store 140 may store data and query results on the cloud computing platform 110 using a storage service such as AMAZON S3. In certain embodiments, to load data into cloud computing platform 110, data tables may be horizontally divided into large, immutable files, which may be similar to blocks or pages in traditional database systems. Within each file, the values for each property or column are grouped together and compressed using a scheme sometimes referred to as hybrid columnar. Each table has a header that contains, among other metadata, the offset of each column within the file.

除了存储表数据之外，数据存储140还有助于存储由查询操作(例如，连接)生成的临时数据以及包含在大查询结果中的数据。这可以允许系统计算大型查询而不会出现内存不足或磁盘不足的错误。通过该方式存储查询结果可以简化查询处理，因为它不需要传统数据库系统中的服务器端游标(server-side cursors)。In addition to storing table data, data store 140 also facilitates storing temporary data generated by query operations (eg, joins) and data contained in large query results. This can allow the system to compute large queries without out-of-memory or out-of-disk errors. Storing query results in this way simplifies query processing because it does not require server-side cursors as in traditional database systems.

查询处理130可以处理在虚拟机的弹性集群(在本文中称为虚拟仓库或数据仓库)内的查询执行。因此，查询处理130可以包括一个或更多个虚拟仓库131，虚拟仓库131在本文中也可以称为数据仓库。虚拟仓库131可以是在云计算平台110上运行的一个或更多个虚拟机。虚拟仓库131可以是可以根据需要在任何时候创建、破坏或调整大小的计算资源。该功能可以创建“弹性”虚拟仓库，该虚拟仓库可以根据用户的需求进行扩展、收缩或关闭。扩展虚拟仓库涉及生成到虚拟仓库131的一个或更多个计算节点132。收缩虚拟仓库涉及从虚拟仓库131删除一个或更多个计算节点132。更多计算节点132可以导致更快的计算时间。例如，在具有四个节点的系统上花费15个小时进行的数据加载在具有32个节点的系统上可能仅花费2个小时。Query processing 130 may handle query execution within an elastic cluster of virtual machines (referred to herein as a virtual warehouse or data warehouse). Accordingly, query processing 130 may include one or more virtual warehouses 131, which may also be referred to herein as data warehouses. Virtual repository 131 may be one or more virtual machines running on cloud computing platform 110 . Virtual repository 131 can be a computing resource that can be created, destroyed or resized at any time as needed. This feature enables the creation of "elastic" virtual warehouses that can expand, contract or close based on user demand. Extending a virtual repository involves generating one or more computing nodes 132 to a virtual repository 131 . Shrinking a virtual repository involves removing one or more compute nodes 132 from virtual repository 131 . More compute nodes 132 can result in faster computation times. For example, a data load that took 15 hours on a system with four nodes might take only 2 hours on a system with 32 nodes.

云服务120可以是协调云计算服务110上的活动的服务的集合。这些服务将云计算服务110的所有不同组件捆绑在一起，以便处理从登录到查询分发的用户请求。云服务120可以对由云计算服务110从云计算平台110提供的计算实例进行操作。云服务120可以包括管理虚拟仓库、查询、交易、数据交换的服务以及与这些服务相关联的元数据(诸如数据库模式、访问控制信息、加密密钥和使用情况统计信息)的集合。云服务120可以包括但不限于认证(authentication)引擎121、基础设施管理器122、优化器123、交换管理器124、安全125引擎和元数据存储设备126。Cloud service 120 may be a collection of services that coordinate activities on cloud computing service 110 . These services bundle together all the different components of the cloud computing service 110 in order to handle user requests from login to query distribution. Cloud service 120 may operate on computing instances provided by cloud computing service 110 from cloud computing platform 110 . Cloud services 120 may include services that manage virtual repositories, queries, transactions, data exchanges, and a collection of metadata associated with these services, such as database schemas, access control information, encryption keys, and usage statistics. Cloud service 120 may include, but is not limited to, authentication engine 121 , infrastructure manager 122 , optimizer 123 , exchange manager 124 , security 125 engine and metadata storage device 126 .

图1B是示出示例虚拟仓库131的框图。交换管理器124可以使用例如私有数据交换来促进数据提供者与数据消费者之间的数据共享。例如，云计算服务112可以管理数据库108的存储和访问。数据库108可以包括用于不同用户(例如不同的企业或个人)的用户数据150的各种实例。用户数据可以包括该用户存储和访问的数据的用户数据库152。用户数据库152可以受到访问控制，使得在向云计算服务112认证之后，仅允许数据的所有者改变和访问数据库112。例如，可以对数据进行加密，使得只能使用数据所有者拥有的解密信息对它进行解密。使用交换管理器124，可以根据本文公开的方法以受控的方式与其他用户共享来自受到这些访问控制的用户数据库152的特定数据。特别地，用户可以指定共享154，如上所述，该共享154可以以不受控制的方式在公共或私有数据交换中共享，或者以受控制的方式与特定的其他用户共享。“共享”封装了共享数据库中的数据所需的所有信息。共享可以包括至少三段信息：(1)授予对数据库和包含要共享的对象的模式的访问权的权限，(2)授予对特定对象(例如，表、安全视图和安全UDF)的访问权的权限，以及(3)与数据库及其对象共享的消费者账户。共享数据时，不会在用户之间复制或传送数据。共享是通过云计算服务110的云服务120完成的。FIG. 1B is a block diagram illustrating an example virtual warehouse 131 . Exchange manager 124 may facilitate data sharing between data providers and data consumers using, for example, a private data exchange. For example, cloud computing service 112 may manage storage and access to database 108 . Database 108 may include various instances of user data 150 for different users (eg, different businesses or individuals). User data may include a user database 152 of data stored and accessed by the user. The user database 152 may be subject to access control such that after authentication with the cloud computing service 112, only the owner of the data is allowed to change and access the database 112. For example, data can be encrypted so that it can only be decrypted using decryption information possessed by the owner of the data. Using the exchange manager 124, certain data from the user database 152 subject to these access controls can be shared with other users in a controlled manner in accordance with the methods disclosed herein. In particular, a user may designate a share 154, which may be shared in an uncontrolled manner in a public or private data exchange, as described above, or with a specific other user in a controlled manner. "Share" encapsulates all the information needed to share data in a database. Shares can include at least three pieces of information: (1) permissions that grant access to the database and schema containing the objects to be shared, (2) permissions that grant access to specific objects (eg, tables, security views, and security UDFs) permissions, and (3) consumer accounts shared with the database and its objects. When data is shared, it is not copied or transferred between users. Sharing is done through cloud service 120 of cloud computing service 110 .

当数据提供者在数据提供者的账户中创建数据库的共享并授予对特定对象(例如表、安全视图和安全用户定义函数(UDF))的访问权时，可以执行共享数据。然后可以使用共享中提供的信息创建只读数据库。可以由数据提供者控制对该数据库的访问。Sharing data can be performed when a data provider creates a share of a database in the data provider's account and grants access to specific objects such as tables, security views, and security user-defined functions (UDFs). A read-only database can then be created using the information provided in the share. Access to this database can be controlled by the data provider.

然后，共享数据可用于处理SQL查询，该SQL查询可能包括连接、聚合或其它分析。在一些情况下，数据提供者可以定义共享，使得允许关于共享数据执行“安全连接”。可以执行安全连接，使得可以关于共享数据执行分析，但是实际共享数据不能被数据消费者(例如，共享的接收方)访问。可以如2019年3月18日提交的美国申请序列号16/368,339中所述执行安全连接。The shared data can then be used to process SQL queries, which may include joins, aggregations, or other analysis. In some cases, a data provider may define a share such that a "secure connection" is allowed to be performed with respect to the shared data. A secure connection can be performed so that analysis can be performed on the shared data, but the actual shared data cannot be accessed by data consumers (eg, recipients of the share). The secure connection can be performed as described in US Application Serial No. 16/368,339, filed March 18, 2019.

诸如膝上型计算机、台式计算机、移动电话、平板计算机、云托管的计算机、云托管的无服务器进程或其它计算进程或设备的用户设备101-104可用于通过诸如互联网或专用网络的网络105访问虚拟仓库131或云服务120。User devices 101-104, such as laptop computers, desktop computers, mobile phones, tablet computers, cloud-hosted computers, cloud-hosted serverless processes, or other computing processes or devices, are available for access through a network 105, such as the Internet or a private network Virtual warehouse 131 or cloud service 120.

在下面的描述中，将动作归因于用户，特别是消费者和提供者。应当理解此类动作是关于由此类用户操作的设备101-104执行的。例如，对用户的通知可以被理解为是发送到设备101-104的通知，来自用户的输入或指令可以被理解是通过用户的设备101-104被接收，并且用户与界面的交互应被理解为是与用户设备101-104上的界面的交互。另外，应将归因于用户(消费者或提供者)的数据库操作(连接、聚合、分析等)理解为包括云计算服务110响应于来自该用户的指令执行此类动作。In the description below, actions are attributed to users, in particular consumers and providers. It should be understood that such actions are performed with respect to devices 101-104 operated by such users. For example, a notification to a user may be understood as a notification sent to devices 101-104, input or instructions from a user may be understood as received through a user's device 101-104, and user interaction with the interface should be understood as is the interaction with the interface on the user devices 101-104. Additionally, database operations (connection, aggregation, analysis, etc.) attributable to a user (consumer or provider) should be understood to include cloud computing service 110 performing such actions in response to instructions from the user.

图2是根据本发明的实施例的可用于实现公共或私有数据交换的数据的示意性框图。交换管理器124可以关于一些或全部所示的交换数据200进行操作，交换数据200可以存储在执行交换管理器124的平台(例如，云计算平台110)上或某个其它位置处。交换数据200可以包括描述由第一用户(“提供者”)共享的数据的多个列表202。列表202可以是私有数据交换中或公共数据交换中的列表。对于公共数据交换和私有数据交换二者，列表的访问控制、管理和监管都可能相似。列表202可以包括描述共享数据的元数据204。列表202可以包括描述共享数据的元数据204。元数据204可以包括以下一些或全部信息：共享数据的共享者的标识符、与共享者相关联的URL、共享的名称、表的名称、共享数据所属的类别、共享数据的更新频率、表的目录、每个表中的列数和行数，以及列的名称和描述。元数据204还可以包括帮助用户使用数据的示例。此类示例可以包括样本表或视图(其包括示例表的行和列的样本)、可以针对该表或其可能的结果可以运行的示例查询、示例表的示例视图、基于表数据的示例可视化(例如，图表、仪表板)。元数据204中包括的其它信息可以是供商业智能工具使用的元数据、表中包含的数据的文本描述、与表相关联以便于搜索的关键字、某些列中数据的布隆过滤器或其他全文索引、指向与共享数据有关的文档的链接(例如URL)，以及指示共享数据的更新频率的刷新间隔(或共享数据持续更新的指示)和数据的最后更新日期。Figure 2 is a schematic block diagram of data that may be used to implement public or private data exchanges according to an embodiment of the present invention. Exchange manager 124 may operate with respect to some or all of the illustrated exchange data 200, which may be stored on the platform (eg, cloud computing platform 110) on which exchange manager 124 is executed, or at some other location. Exchange data 200 may include a plurality of lists 202 describing data shared by a first user ("provider"). List 202 may be a list in a private data exchange or a public data exchange. Access control, management, and governance of lists may be similar for both public and private data exchanges. The list 202 may include metadata 204 describing the shared data. The list 202 may include metadata 204 describing the shared data. The metadata 204 may include some or all of the following information: the identifier of the sharer who shared the data, the URL associated with the sharer, the name of the share, the name of the table, the category to which the shared data belongs, how often the shared data is updated, the Table of Contents, the number of columns and rows in each table, and the names and descriptions of the columns. Metadata 204 may also include examples that help users use the data. Such examples may include a sample table or view (which includes a sample of the rows and columns of a sample table), a sample query that can be run against the table or its possible results, a sample view of a sample table, a sample visualization based on table data ( e.g. charts, dashboards). Other information included in metadata 204 may be metadata for use by business intelligence tools, textual descriptions of the data contained in the table, keywords associated with the table to facilitate searching, bloom filters for data in certain columns, or Other full-text indexes, links (eg, URLs) to documents related to the shared data, and a refresh interval indicating how often the shared data is updated (or an indication that the shared data is continuously updated) and the date the data was last updated.

列表202可以包括访问控制206，该访问控制206可以被配置为任何合适的访问配置。例如，访问控制206可以指示共享数据不受限制地可用于私有交换的任何成员(如本文中其它地方所使用的“任何共享”)。访问控制206可以指定被允许访问数据和/或查看列表的用户类别(特定组或组织的成员)。访问控制206可以指定“点对点”共享(参见图4的讨论)，其中用户可以请求访问，但是仅在提供者的批准下才被允许访问。访问控制206可以指定用户的一组用户标识符，该组用户标识符被排除在能够访问列表202所引用的数据之外。List 202 may include access controls 206, which may be configured in any suitable access configuration. For example, access control 206 may indicate that shared data is unrestrictedly available to any member of the private exchange (as used elsewhere herein "any share"). Access control 206 may specify the categories of users (members of a particular group or organization) that are allowed to access data and/or view lists. Access control 206 may specify "peer-to-peer" sharing (see discussion of Figure 4), where a user may request access, but is only allowed access with the provider's approval. Access control 206 may specify a set of user identifiers for users that are excluded from the data referenced by access list 202 .

注意，一些列表202可能被用户发现而没有进一步的认证或访问许可，而实际访问仅在后续的认证步骤之后才被允许(参见图4和图6的讨论)。访问控制206可以指定列表202仅可由特定用户或特定类别的用户发现。Note that some lists 202 may be discovered by the user without further authentication or access permission, while actual access is only allowed after a subsequent authentication step (see discussion of Figures 4 and 6). Access control 206 may specify that list 202 is only discoverable by certain users or certain classes of users.

还应注意，列表202的默认功能是共享所引用的数据不可由消费者导出或复制。可选地，访问控制206可以指定该操作是不允许的。例如，访问控制206可以指定可以关于共享数据执行安全操作(如安全连接和安全功能，如下所讨论)，使得不允许查看和导出共享数据。It should also be noted that the default function of list 202 is to share that the referenced data cannot be exported or copied by consumers. Optionally, access control 206 may specify that the operation is not allowed. For example, access control 206 may specify that security operations (such as secure connections and security functions, discussed below) may be performed with respect to the shared data such that viewing and exporting of the shared data is not permitted.

在一些实施例中，一旦关于列表202认证了用户，则对该用户的引用(例如，该用户在虚拟仓库131中的账户的用户标识符)被添加到访问控制206，使得该用户随后将无需进一步认证就可以访问列表202引用的数据。In some embodiments, once a user is authenticated with respect to list 202, a reference to the user (eg, the user identifier of the user's account in virtual repository 131) is added to access control 206 so that the user will subsequently not need to Further authentication provides access to the data referenced by list 202 .

列表202可以定义一个或更多个过滤器208。例如，过滤器208可以定义在浏览目录220时可以查看对列表202的引用的用户的特定用户标识符214。过滤器208可以定义在浏览目录220时可以查看对列表202的引用的用户类别(特定行业的用户、与特定公司或组织相关联的用户、特定地理区域或国家内的用户)。以该方式，可以由交换管理器124使用相同的组件来实现私有交换。在一些实施例中，被排除访问列表202(即，将列表202添加到被排除用户的消费共享156)中的被排除用户在浏览目录220时仍可以被允许查看列表的表示，并且可以进一步被允许请求访问列表202，如下所述。可以在呈现给列表202的提供者的界面中列出由此类被排除用户和其他用户访问列表的请求。列表202的提供者然后可以查看访问列表的需求并选择扩展过滤器208以对被排除用户或被排除用户的类别(例如，被排除的地理区域或国家的用户)许可访问权。List 202 may define one or more filters 208 . For example, filter 208 may define specific user identifiers 214 of users who may view references to listing 202 when browsing directory 220 . Filter 208 may define the categories of users (users in a particular industry, users associated with a particular company or organization, users within a particular geographic region or country) that may view references to listing 202 when browsing catalog 220 . In this manner, the same components may be used by the exchange manager 124 to implement private exchanges. In some embodiments, excluded users in the excluded access list 202 (ie, adding the list 202 to the excluded user's consumption share 156) may still be allowed to view a representation of the list while browsing the catalog 220, and may be further Access to list 202 is permitted to request, as described below. Requests by such excluded users and other users to access the list may be listed in an interface presented to the provider of list 202 . The provider of the list 202 can then review the requirements for access to the list and select the extended filter 208 to grant access to the excluded user or category of excluded users (eg, users of excluded geographic regions or countries).

过滤器208可以进一步定义用户可以查看哪些数据。特别地，过滤器208可以指示选择列表202以添加到该用户的消费共享156的用户被允许访问该列表所引用的但仅仅是过滤版本的数据，该过滤版本仅包括与该用户的标识符214相关联、与该用户的组织相关联或特定于用户的某个其它分类的数据。在一些实施例中，私有交换是通过邀请进行的：在传递接受从提供者接收到的邀请后，由提供者邀请以查看私有交换的列表202的用户被允许通过交换管理器124进行私有交换。Filters 208 may further define which data the user may view. In particular, the filter 208 may indicate that a user who selects the list 202 for addition to the user's consumption share 156 is allowed to access the data referenced by the list but only a filtered version that includes only the identifier 214 associated with the user Data associated with, associated with the user's organization, or some other classification of data specific to the user. In some embodiments, the private exchange is by invitation: a user invited by the provider to view the list of private exchanges 202 is allowed to conduct the private exchange through the exchange manager 124 after passing on acceptance of the invitation received from the provider.

在一些实施例中，列表202可以被寻址到单个用户。因此，对列表202的引用可以被添加到用户可查看的一组“待定共享”。然后，在用户将批准传递给交换管理器124后，可以将列表202添加到用户的一组共享。In some embodiments, list 202 may be addressed to a single user. Thus, a reference to list 202 can be added to a set of "pending shares" viewable by the user. Then, after the user passes the approval to the exchange manager 124, the list 202 can be added to the user's set of shares.

列表202可以进一步包括使用数据210。例如，云计算服务112可以实现积分系统，其中积分由用户购买，并且每次用户运行查询，存储数据或使用云计算服务112实现的其它服务时都将消费积分。因此，使用数据210可以记录通过访问共享数据而消耗的积分。使用数据210可以包括其它数据，诸如查询次数、针对共享数据执行的多种类型中的每种类型的聚合次数，或其它使用统计信息。在一些实施例中，用于用户的列表202或多个列表202的使用数据以共享数据库的形式(即，交换管理器124将对包括使用数据的数据库的引用添加到用户的消费共享)被提供给用户。List 202 may further include usage data 210 . For example, cloud computing service 112 may implement a points system, where points are purchased by users and consumed each time a user runs a query, stores data, or uses other services implemented by cloud computing service 112 . Thus, usage data 210 may record points consumed by accessing shared data. Usage data 210 may include other data, such as the number of queries, the number of aggregations performed for each of the types of shared data, or other usage statistics. In some embodiments, usage data for the user's list 202 or lists 202 is provided in the form of a shared database (ie, the exchange manager 124 adds a reference to the database that includes the usage data to the user's consumption share) to users.

列表202还可以包括热图(heat map)211，该热图211可以表示用户在该特定列表上单击的地理位置。云计算服务110可以使用热图来做出复制决定或关于列表的其它决定。例如，私有数据交换可以显示包含美国乔治亚州的天气数据的列表。热图211可以指示加利福尼亚的许多用户正在选择该列表以了解乔治亚州的天气。鉴于该信息，云计算服务110可以复制该列表，并使该列表在数据库(该数据库的服务器物理上位于美国西部)中可用，使得加利福尼亚州的消费者可以访问该数据。在一些实施例中，实体可以将其数据存储在位于美国西部的服务器上。特定列表可能深受消费者欢迎。云计算服务110可以复制该数据并将其存储在位于美国东部的服务器中，使得中西部和东海岸的消费者也可以访问该数据。The list 202 may also include a heat map 211, which may represent the geographic locations where the user clicked on that particular list. The cloud computing service 110 may use the heatmap to make replication decisions or other decisions about listings. For example, a private data exchange can display a list containing weather data for the US state of Georgia. Heatmap 211 may indicate that many users in California are selecting the list for Georgia weather. Given this information, the cloud computing service 110 can replicate the listing and make the listing available in a database whose servers are physically located in the western United States so that consumers in California can access the data. In some embodiments, entities may store their data on servers located in the western United States. Certain listings may be popular with consumers. The cloud computing service 110 can replicate this data and store it on servers located in the eastern United States so that consumers in the Midwest and East Coast can also access the data.

列表202还可以包括一个或更多个标签213。标签213可以促进更简单地共享包含在一个或更多个列表中的数据。例如，一家大型公司可具有人力资源(HR)列表，其中包含内部员工在私有数据交换上的HR数据。HR数据可以包含十种类型的HR数据(例如，员工编号、选择的健康保险、当前的退休计划、职称等)。公司中的100个人(例如，HR部门的每个人)都可以访问HR列表。HR部门的管理层可能希望添加第十一种类型的HR数据(例如，员工股票期权计划)。并非手动将其添加到HR列表中并向100个人中的每一人授予对该新数据的访问权，管理层可以将HR标签简单地应用于新数据集，并且该HR标签可以用于将数据分类为HR数据，将其与HR列表一起列出，并向100个人授予查看新数据集的访问权。List 202 may also include one or more tags 213 . Tags 213 may facilitate easier sharing of data contained in one or more lists. For example, a large company may have a human resources (HR) list containing HR data for internal employees on a private data exchange. HR data can contain ten types of HR data (eg, employee number, health insurance selected, current retirement plan, job title, etc.). 100 people in the company (e.g. everyone in the HR department) have access to the HR list. The management of the HR department may want to add an eleventh type of HR data (for example, employee stock option plans). Instead of manually adding it to the HR list and granting access to that new data to each of the 100 people, management could simply apply the HR label to the new dataset, and that HR label could be used to categorize the data For HR data, list it with the HR list and give 100 people access to view the new dataset.

列表202还可以包括版本元数据215。版本元数据215可以提供跟踪数据集如何改变的方式。这可以帮助确保一个实体正在查看的数据不会过早改变。例如，如果公司拥有原始数据集，并且然后发布该数据集的更新版本，则更新可能会干扰另一用户对该数据集的处理，因为该更新可能具有不同的格式、新列以及可能与接收方用户的当前处理机制不兼容的其它改变。为了对此进行补救，云计算服务112可以使用版本元数据215来跟踪版本更新。云计算服务112可以确保每个数据消费者访问数据的相同版本，直到他们接受不会干扰数据集的当前处理的更新版本为止。List 202 may also include version metadata 215 . Version metadata 215 may provide a way to track how a dataset has changed. This can help ensure that the data an entity is viewing does not change prematurely. For example, if a company owns the original dataset and then publishes an updated version of that dataset, the update may interfere with another user's processing of the dataset because the update may have a different format, new columns, and possibly a different Other changes that are not compatible with the user's current processing mechanism. To remedy this, cloud computing service 112 may use version metadata 215 to track version updates. The cloud computing service 112 can ensure that each data consumer has access to the same version of the data until they accept an updated version that does not interfere with the current processing of the data set.

交换数据200可以进一步包括用户记录212。用户记录212可以包括识别与用户记录212相关联的用户的数据，例如，在服务数据库128中具有用户数据150并且由虚拟仓库131管理的用户的标识符(例如，仓库标识符)。Exchange data 200 may further include user records 212 . User record 212 may include data identifying the user associated with user record 212 , eg, an identifier (eg, repository identifier) of the user who has user data 150 in service database 128 and is managed by virtual repository 131 .

用户记录212可以列出与用户相关联的共享，例如，由用户创建的引用列表202。用户记录212可以列出用户消费的共享，例如由另一个用户创建并已根据本文描述的方法与用户的账户相关联的引用列表202。例如，列表202可以具有标识符，该标识符将用于在用户记录212的共享或消费共享中引用它。User record 212 may list shares associated with the user, eg, reference list 202 created by the user. User record 212 may list shares consumed by the user, such as reference list 202 created by another user and associated with the user's account according to the methods described herein. For example, the list 202 may have an identifier that will be used to refer to the user record 212 in its sharing or consumption sharing.

交换数据200可以进一步包括目录220。目录220可以包括所有可用列表202的列表，并且可以包括来自元数据204的数据的索引，以便于根据本文描述的方法进行浏览和搜索。在一些实施例中，列表202以JavaScript对象符号(JSON)对象的形式存储在目录中。Exchange data 200 may further include directory 220 . Catalog 220 may include a listing of all available listings 202, and may include an index of data from metadata 204 to facilitate browsing and searching in accordance with the methods described herein. In some embodiments, the list 202 is stored in a directory as a JavaScript Object Notation (JSON) object.

注意，在不同的云计算平台上存在虚拟仓库131的多个实例的情况下，虚拟仓库131的一个实例的目录220可以在一个或更多个其它云计算平台110上存储来自其它实例的列表或对列表的引用。因此，每个列表202可以是全局唯一的(例如，跨虚拟仓库131的所有实例被分配了全局唯一的标识符)。例如，虚拟仓库131的实例可以同步其目录220的副本，使得每个副本指示可从虚拟仓库131的所有实例获得的列表202。在一些实例中，列表202的提供者可以指定它只能在指定的一个或更多个计算平台110上可用。Note that where multiple instances of virtual repository 131 exist on different cloud computing platforms, catalog 220 of one instance of virtual repository 131 may store listings or listings from other instances on one or more other cloud computing platforms 110 . A reference to the list. Thus, each list 202 may be globally unique (eg, assigned a globally unique identifier across all instances of the virtual repository 131). For example, an instance of virtual repository 131 may synchronize its copies of directory 220 such that each copy indicates a list 202 available from all instances of virtual repository 131 . In some instances, the provider of the listing 202 may specify that it is only available on the specified computing platform(s) 110 .

在一些实施例中，目录220在互联网上可用，使得其可由诸如BING或GOOGLE的搜索引擎搜索。该目录可能受搜索引擎优化(SEO)算法支配，以提高其可见性。潜在的消费者因此可以从任何网络浏览器浏览目录220。交换管理器124可以公开链接到每个列表202的统一资源定位符(URL)。每个URL下面的该网页可以是可搜索的，可以在由交换管理器124实现的任何界面之外共享。例如，列表202的提供者可以发布其列表202的URL，以便促进其列表202及其品牌的使用。In some embodiments, the directory 220 is available on the Internet such that it is searchable by a search engine such as BING or GOOGLE. The directory may be subject to search engine optimization (SEO) algorithms to increase its visibility. Potential consumers can thus browse the catalog 220 from any web browser. Exchange manager 124 may expose a Uniform Resource Locator (URL) linked to each listing 202 . The web page below each URL can be searchable and can be shared outside of any interface implemented by exchange manager 124 . For example, a provider of a listing 202 may publish the URL of its listing 202 in order to facilitate use of its listing 202 and its brand.

图3示出交换管理器124中可以包括的各种组件300-310。创建模块300可以提供用于创建列表202的界面。例如，网页界面使一个或更多个设备101-104上的用户能够选择数据，例如用户的用户数据150中的特定表，用于共享并输入定义一些或全部的元数据204、访问控制206和过滤器208的值。在一些实施例中，创建可以由用户通过在云计算平台110上执行并通过用户设备101-104上的网页界面访问的SQL解释器中的SQL命令来执行。FIG. 3 illustrates various components 300 - 310 that may be included in switch manager 124 . Creation module 300 may provide an interface for creating list 202 . For example, a web interface enables a user on one or more of the devices 101-104 to select data, such as a particular table in the user's user data 150, for sharing and entering metadata 204, access controls 206 and defining some or all of the The value of filter 208. In some embodiments, the creation may be performed by a user through SQL commands in a SQL interpreter executed on cloud computing platform 110 and accessed through a web interface on user devices 101-104.

确认模块302可以在尝试创建列表202时确认由提供者提供的信息。注意，在一些实施例中，归因于确认模块302的动作可以由人回顾提供者提供的信息来执行。在其它实施例中，这些动作是自动执行的。确认模块302可以执行或促进人类操作员执行各种功能。这些功能可以包括：验证元数据204与其引用的共享数据一致；验证元数据204引用的共享数据不是盗版数据、个人识别信息(PII)、个人健康信息(PHI)或不希望共享或共享非法的其它数据。确认模块302还可以促进验证是否已经在阈值时间段内(例如，在最近的二十四小时内)更新了数据。确认模块302还可以促进验证数据不是静态的或者不能从其它静态公共源获得。确认模块302还可以促进验证数据不仅是样本(例如，数据足够完整以至于是有用的)。例如，地理上受限制的数据可能是不希望的，而在其他方面没有限制的数据的聚合可以仍然是有用的。Validation module 302 may validate the information provided by the provider when attempting to create listing 202 . Note that, in some embodiments, the actions attributed to the validation module 302 may be performed by a human reviewing the information provided by the provider. In other embodiments, these actions are performed automatically. Validation module 302 may perform or facilitate a human operator to perform various functions. These functions may include: verifying that the metadata 204 is consistent with the shared data it references; verifying that the shared data referenced by the metadata 204 is not pirated data, personally identifiable information (PII), personal health information (PHI), or other undesired or illegal sharing data. Confirmation module 302 may also facilitate verifying whether data has been updated within a threshold period of time (eg, within the last twenty-four hours). Validation module 302 may also facilitate verifying that the data is not static or available from other static public sources. Validation module 302 may also facilitate validating that the data is not just a sample (eg, the data is complete enough to be useful). For example, geographically restricted data may be undesirable, while aggregation of otherwise unrestricted data may still be useful.

交换管理器124可以包括搜索模块304。搜索模块304可以实现可由用户在一个或更多个用户设备101-104上访问的网页界面，以便关于目录220中的元数据调用对搜索字符串的搜索，接收对搜索的响应，并选择对搜索结果中列表202的引用，以将其添加到执行该搜索的用户的用户记录212的消费共享156中。在一些实施例中，搜索可以由用户通过在云计算平台102上执行并通过用户设备101-104上的网页界面访问的SQL解释器中的SQL命令来执行。例如，可以通过针对下面讨论的SQL引擎310内的目录220的SQL查询来执行搜索共享。Exchange manager 124 may include search module 304 . Search module 304 may implement a web interface accessible by a user on one or more user devices 101-104 to invoke searches for search strings with respect to metadata in catalog 220, receive responses to searches, and select searches for A reference to the list 202 in the results to add it to the consumption share 156 of the user record 212 of the user who performed the search. In some embodiments, the search may be performed by a user through SQL commands in a SQL interpreter executed on cloud computing platform 102 and accessed through a web interface on user devices 101-104. For example, a search share may be performed by a SQL query against the catalog 220 within the SQL engine 310 discussed below.

搜索模块304可以进一步实现推荐算法。例如，推荐算法可以基于用户的消费共享156中或先前在用户的消费共享中的其它列表来为用户推荐其它列表202。推荐可以基于逻辑相似性：一种天气数据来源导致第二天气数据来源的推荐。可以基于不同点进行推荐：一个列表用于一个域(地理区域、技术领域等)中的数据，结果导致不同域(不同地理区域、相关技术领域等)中的列表，以便于用户分析的完整覆盖。The search module 304 may further implement a recommendation algorithm. For example, the recommendation algorithm may recommend other listings 202 for the user based on other listings in the user's consumption share 156 or previously in the user's consumption share. Recommendations may be based on logical similarity: one source of weather data leads to recommendations for a second source of weather data. Recommendations can be made based on different points: one list is used for data in one domain (geographical region, technical field, etc.), resulting in listings in different domains (different geographical region, related technical field, etc.) for complete coverage of user analytics .

交换管理器124可以包括访问管理模块306。如上所述，用户可以添加列表202。这可能需要关于列表202的提供者的认证。一旦列表202被添加到用户的用户记录212的消费共享156，用户可以(a)每次访问列表202引用的数据时都需要认证，或者(b)一旦添加了列表202就自动认证并被允许访问数据。访问管理模块306可以管理对用户的消费共享156中的数据的后续访问的自动认证，以便提供对共享数据的无缝访问，就好像它是该用户的用户数据150的一部分一样。为此，访问管理模块306可以访问列表202的访问控制206、证书、令牌或其它认证材料，以便在执行对共享数据的访问时对用户进行认证。The exchange manager 124 may include an access management module 306 . As described above, the user may add to the list 202 . This may require authentication with the provider of list 202 . Once the list 202 is added to the consumption share 156 of the user's user record 212, the user may either (a) require authentication each time the data referenced by the list 202 is accessed, or (b) automatically authenticate and be allowed access once the list 202 is added data. The access management module 306 can manage automatic authentication of subsequent access to data in a user's consumption share 156 in order to provide seamless access to the shared data as if it were part of the user's user data 150 . To this end, the access management module 306 may access the access controls 206 of the list 202, certificates, tokens or other authentication material to authenticate the user when performing access to the shared data.

交换管理器124可以包括连接模块308。连接模块308管理由用户的消费共享156引用的共享数据(即来自不同提供者的共享数据)彼此以及与用户拥有的数据的用户数据库152的集成。特别地，连接模块308可以管理关于这些各种数据源的查询和其它计算功能的执行，使得它们的访问对用户是透明的。连接模块308可以进一步管理数据的访问以对共享数据实施限制，例如使得可以执行分析并显示分析结果，而不会将基础数据暴露给数据消费者(其中该限制由列表202的访问控制206指示)。Switch manager 124 may include connection module 308 . The connection module 308 manages the integration of shared data referenced by the user's consumption shares 156 (ie, shared data from different providers) with each other and with the user database 152 of user-owned data. In particular, the connection module 308 can manage the execution of queries and other computing functions with respect to these various data sources so that their access is transparent to the user. The connection module 308 can further manage access to the data to enforce restrictions on the shared data, for example, so that analysis can be performed and the results of the analysis displayed without exposing the underlying data to the data consumer (where the restrictions are indicated by the access control 206 of the list 202) .

交换管理器124可进一步包括标准查询语言(SQL)引擎310，该SQL引擎310被编程为从用户接收查询并关于查询所引用的数据执行查询，该数据可以包括用户的消费共享156和用户拥有的用户数据112。SQL引擎310可以执行本领域中已知的任何查询处理功能。SQL引擎310可以另外地或可替代地包括本领域中已知的任何其它数据库管理工具或数据分析工具。SQL引擎310可以定义在云计算平台102上执行的网页界面，通过该网页界面输入SQL查询并呈现对SQL查询的响应。The exchange manager 124 may further include a standard query language (SQL) engine 310 programmed to receive queries from users and execute them on data referenced by the queries, which may include the user's consumption shares 156 and the user's owned User data 112 . SQL engine 310 may perform any query processing function known in the art. SQL engine 310 may additionally or alternatively include any other database management tool or data analysis tool known in the art. The SQL engine 310 may define a web interface executing on the cloud computing platform 102 through which SQL queries are entered and responses to the SQL queries are presented.

参考图4A，所示的方法400可以由交换管理器124执行，以便实现第一用户(“提供者402”)和第二用户(“消费者404”)之间的点对点共享。4A, the illustrated method 400 may be performed by the exchange manager 124 to enable peer-to-peer sharing between a first user ("provider 402") and a second user ("consumer 404").

方法400可以包括提供者输入406元数据。这可以包括提供者的设备101-104上的用户将元数据输入到由交换管理器124提供的网页中的表格的字段中。在一些实施例中，可以通过SQL引擎310使用SQL命令来输入406元数据。元数据项可以包括以上关于列表202的元数据204所讨论的那些中的一些或全部。步骤406可以包括接收用于列表202的其它数据，诸如访问控制206和定义过滤器208的参数。The method 400 can include the provider entering 406 metadata. This may include a user on the provider's device 101 - 104 entering metadata into fields of a form in a web page provided by the exchange manager 124 . In some embodiments, the metadata may be entered 406 through the SQL engine 310 using SQL commands. The metadata items may include some or all of those discussed above with respect to the metadata 204 of the list 202 . Step 406 may include receiving other data for list 202 , such as access control 206 and parameters defining filter 208 .

提供者402然后可以在设备101-104上调用表格和输入数据的提交。The provider 402 may then invoke the submission of the form and input data on the devices 101-104.

交换管理器124然后可以验证408元数据并确认410由元数据引用的数据。这可以包括执行归因于确认模块302的一些或全部动作。The exchange manager 124 may then verify 408 the metadata and validate 410 the data referenced by the metadata. This may include performing some or all of the actions attributed to the validation module 302 .

如果元数据和共享数据没有被成功验证408和确认410，则交换管理器124可以诸如借助于通过网络界面的通知来通知提供者402，在步骤406通过该界面提交元数据。If the metadata and shared data are not successfully verified 408 and confirmed 410, the exchange manager 124 may notify the provider 402, such as by means of a notification through a web interface, through which the metadata is submitted at step 406.

如果没有成功验证408和确认410元数据和共享数据，则交换管理器124可以诸如通过在步骤406通过其提交元数据的网络界面来通知提供者402。If the metadata and shared data have not been successfully verified 408 and confirmed 410, the exchange manager 124 may notify the provider 402, such as through a web interface through which the metadata was submitted at step 406.

交换管理器124可以进一步创建412包括在步骤406提交的数据的列表202，并且可以进一步在目录220中创建条目。例如，元数据中的关键字、描述性文本和其它信息项可以被索引以便于搜索。Exchange manager 124 may further create 412 list 202 including the data submitted at step 406 and may further create entries in directory 220 . For example, keywords, descriptive text, and other information items in the metadata can be indexed to facilitate searching.

注意，步骤406-412可以借助于提供给提供者402的界面来执行。此类界面可以包括任何合适的特征，包括用于输入数据的元素(例如，元素204-210)，以及用于生成数据列表的元素。另外，该界面可以包括用于发布数据列表的元素，或不发布(unpublish)数据列表以使该列表对于至少一些其他用户不可查看的元素。该界面还可以包括用于更新数据列表的版本或回滚到列表或与列表相关联的元数据的先前版本的元素。该界面还可以包括添加数据列表或将成员添加到数据交换的待定请求的列表。该界面还可以包括与访问给定列表的数据消费者有关的数量和其它非识别信息的指示，以及该列表的数据消费者对该列表所引用的数据的使用模式的表示。Note that steps 406-412 may be performed by means of an interface provided to provider 402. Such interfaces may include any suitable features, including elements for entering data (eg, elements 204-210), and elements for generating lists of data. Additionally, the interface may include elements for publishing the data listing, or unpublishing the data listing so that the listing is not viewable by at least some other users. The interface may also include elements for updating the version of the data list or rolling back to a previous version of the list or metadata associated with the list. The interface may also include a list of pending requests to add data or add members to the data exchange. The interface may also include an indication of the number and other non-identifying information related to the data consumers accessing a given list, as well as an indication of usage patterns of the data referenced by the list by the data consumers of the list.

然后，充当消费者404的另一用户可以浏览414目录。这可能包括访问提供目录搜索界面的网页。该网页可以在虚拟仓库131的外部，即可以由未登录虚拟仓库131的用户访问。在其它实施例中，只有登录到虚拟仓库131的用户才能够访问搜索界面。如上所述，可以使用对引用目录220的SQL引擎310的查询来执行目录220的浏览。例如，用户设备101-104可以具有到SQL引擎310的基于网络的界面，通过该界面针对目录220的查询由消费者404输入并被发送到SQL引擎310。Another user acting as consumer 404 can then browse 414 the catalog. This may include accessing web pages that provide a directory search interface. This web page may be outside of the virtual repository 131 , ie accessible by users who are not logged into the virtual repository 131 . In other embodiments, only users logged into virtual repository 131 can access the search interface. As described above, browsing of catalog 220 may be performed using a query to SQL engine 310 referencing catalog 220 . For example, user devices 101-104 may have a web-based interface to SQL engine 310 through which queries for catalog 220 are entered by consumers 404 and sent to SQL engine 310.

响应于消费者的浏览活动，交换管理器124可以显示目录并关于该目录执行416搜索，以识别具有与消费者404提交的查询或搜索字符串相对应的元数据的列表202。执行该搜索的方式可以根据本领域中已知的任何搜索算法。在SQL查询的情况下，可以根据本领域中已知的用于处理SQL查询的任何方法来处理查询。In response to the consumer's browsing activity, the exchange manager 124 may display the catalog and perform 416 a search on the catalog to identify the listing 202 with metadata corresponding to the query or search string submitted by the consumer 404 . The manner in which this search is performed can be according to any search algorithm known in the art. In the case of SQL queries, the query may be processed according to any method known in the art for processing SQL queries.

交换管理器124可以将搜索字符串或SQL查询的结果返回给消费者404的设备101-104，诸如以对根据搜索算法或处理SQL查询所识别的列表202的引用列表的形式。该列表可以包括元数据项或消费者404可以选择以调用元数据显示的链接。特别地，列表202的元数据204的任何项可以显示在列表中或通过列表中与搜索记录202相对应的条目链接。The exchange manager 124 may return the results of the search string or SQL query to the devices 101-104 of the consumer 404, such as in the form of a list of references to the list 202 identified from the search algorithm or processing the SQL query. The list may include metadata items or links that the consumer 404 may select to invoke the metadata display. In particular, any item of the metadata 204 of the list 202 may be displayed in the list or linked by an entry in the list corresponding to the search record 202 .

注意，图4A中引用的交换可以是私有交换或公共交换。特别地，在浏览414期间显示和搜索416并且消费者404可以查看的那些列表202可以限于具有过滤器208的那些列表，该过滤器208指示该列表202可以被消费者404、消费者的组织或消费者404所属的某个其它分类查看。在交换是公开的情况下，则在一些实施例中，消费者404不需要满足任何过滤标准。Note that the exchanges referenced in Figure 4A may be private exchanges or public exchanges. In particular, those listings 202 that are displayed and searched 416 during browsing 414 and that can be viewed by consumers 404 may be limited to those listings having a filter 208 indicating that the listings 202 may be viewed by consumers 404, the consumer's organization, or Some other category to which the consumer 404 belongs is viewed. Where the exchange is public, then in some embodiments, the consumer 404 is not required to meet any filtering criteria.

方法400可以包括消费者404请求418访问与列表202相对应的数据。例如，通过在消费者404的设备101-104上选择列表中的条目，这调用到交换管理器124的请求的传输，以将与该条目相对应的列表202添加到消费者404的用户记录212中的消费共享156中。The method 400 may include the consumer 404 requesting 418 to access data corresponding to the list 202 . For example, by selecting an entry in the list on consumer 404's device 101-104, this invokes the transmission of a request to exchange manager 124 to add list 202 corresponding to the entry to consumer 404's user record 212 consumption in 156 shares.

在所示的示例中，所选条目的列表202具有访问控制206。因此，交换管理器124可以将请求连同消费者404的标识符一起转发420至提供者402。消费者404和提供者402然后可以交互，以执行以下两者之一或两者：(a)关于提供者402认证(登录)424消费者404；以及(b)处理424用于访问列表202所引用的数据的支付。该交互可以根据任何登录或认证或本领域中已知的方法。同样地，可以实现用于处理各方之间的支付的任何方法。在一些实施例中，由于在访问提供者的共享数据时消费者404消耗的积分，数据仓库模块可以向提供者402提供折扣。积分可以是由用户购买的使用单位，该积分然后响应于消费者404所使用的虚拟仓库131的服务(例如，对由虚拟仓库131托管的数据执行的查询和其它分析)而被消费。交互可以直接在消费者404和提供者402的设备126之间进行，或可以通过交换管理器124执行。在一些实施例中，交换管理器124使用访问控制信息206来认证消费者404，使得不需要与提供者402的交互。同样地，列表202可以定义支付条款，使得交换管理器124处理支付而无需与提供者402的交互。一旦提供者402确定消费者404被认证并被授权访问列表202所引用的数据，则提供者402可以向交换管理器124通知426消费者404可以访问列表202所引用的数据。作为响应，交换管理器124将对列表202的引用添加428到消费者404的用户记录212中的消费共享156。In the example shown, the list 202 of selected entries has access controls 206 . Accordingly, exchange manager 124 may forward 420 the request to provider 402 along with the identifier of consumer 404. Consumer 404 and provider 402 may then interact to perform one or both of the following: (a) authenticate (login) 424 consumer 404 with provider 402; Payment for referenced data. The interaction can be according to any login or authentication or method known in the art. Likewise, any method for processing payments between parties may be implemented. In some embodiments, the data warehouse module may provide a discount to the provider 402 due to the credits consumed by the consumer 404 when accessing the provider's shared data. The credits may be units of use purchased by the user that are then consumed in response to the services of the virtual warehouse 131 used by the consumer 404 (eg, queries and other analysis performed on data hosted by the virtual warehouse 131 ). The interaction may take place directly between the devices 126 of the consumer 404 and the provider 402 , or may be performed through the exchange manager 124 . In some embodiments, exchange manager 124 uses access control information 206 to authenticate consumer 404 such that interaction with provider 402 is not required. Likewise, the list 202 may define payment terms such that the exchange manager 124 processes the payment without interaction with the provider 402 . Once the provider 402 determines that the consumer 404 is authenticated and authorized to access the data referenced by the list 202 , the provider 402 may notify 426 the exchange manager 124 that the consumer 404 may access the data referenced by the list 202 . In response, the exchange manager 124 adds 428 a reference to the list 202 to the consumption share 156 in the user record 212 of the consumer 404 .

注意，在一些情况下，列表202没有列出特定数据，而是引用了特定的云服务120，例如服务的品牌名称或公司名称。因此，访问列表202的请求是访问发出请求的消费者的用户数据150的请求。因此，步骤422、424、426包括关于认证引擎121对消费者404进行认证，使得云服务120可以验证消费者404的身份，并向交换管理器124通知哪些数据要与消费者404共享，并且指示消费者404被授权访问该数据。Note that in some cases, the listing 202 does not list specific data, but rather references a specific cloud service 120, such as the service's brand name or company name. Thus, a request to access list 202 is a request to access user data 150 of the requesting consumer. Thus, steps 422, 424, 426 include authenticating the consumer 404 with respect to the authentication engine 121 so that the cloud service 120 can verify the identity of the consumer 404 and inform the exchange manager 124 which data is to be shared with the consumer 404, and indicate Consumer 404 is authorized to access this data.

在一些实施例中，这可以使用“单点登入(single sign on)”方法来实现，其中消费者404关于云服务120进行一次认证(登录)，并且然后使该消费者404能够访问服务数据库158中的消费者404数据。例如，交换管理器124可以在消费者404的设备101-104上向云服务120呈现界面。消费者404将认证信息(用户名和密码、证书、令牌等)输入到界面中，并且该信息被转发到云服务120的认证引擎121。认证信息处理认证信息，并且，如果该信息对应于用户账户，则向交换管理器124通知消费者404关于该用户账户而被认证。然后，交换管理器124可以识别该用户账户的用户数据150，并创建引用该数据的数据库。然后，对该数据库的引用将被添加到消费者404的消费共享156。In some embodiments, this may be accomplished using a "single sign on" approach, wherein the consumer 404 authenticates (logs in) once with the cloud service 120 and then enables the consumer 404 to access the service database 158 Consumer 404 data in . For example, exchange manager 124 may present an interface to cloud service 120 on consumer 404's devices 101-104. The consumer 404 enters authentication information (username and password, certificates, tokens, etc.) into the interface, and this information is forwarded to the authentication engine 121 of the cloud service 120 . The authentication information processes the authentication information and, if the information corresponds to a user account, notifies the exchange manager 124 that the consumer 404 is authenticated with respect to the user account. The exchange manager 124 can then identify the user data 150 for the user account and create a database referencing the data. A reference to this database will then be added to the consumer 404 consumption share 156 .

在一些实施例中，关于虚拟仓库131的用户认证足以关于云服务120认证用户，使得鉴于消费者404的先前认证，省略了步骤422、424。例如，消费者404可以将虚拟仓库131指示给云服务120以被授权验证消费者404的身份。In some embodiments, user authentication with respect to the virtual warehouse 131 is sufficient to authenticate the user with respect to the cloud service 120, such that in view of the prior authentication of the consumer 404, steps 422, 424 are omitted. For example, consumer 404 may indicate virtual warehouse 131 to cloud service 120 to be authorized to verify the identity of consumer 404 .

在一些实施例中，交换管理器124使用访问控制信息206来认证消费者404，使得不需要与提供者402的交互。同样地，列表202可以定义支付条款，使得交换管理器124处理支付而无需与提供者402的交互。因此，在此类实施例中，步骤422由交换管理器124执行并且步骤426被省略。一旦消费者404被认证和/或被提供所需的支付，则交换管理器124执行步骤428。In some embodiments, exchange manager 124 uses access control information 206 to authenticate consumer 404 such that interaction with provider 402 is not required. Likewise, the list 202 may define payment terms such that the exchange manager 124 processes the payment without interaction with the provider 402 . Thus, in such embodiments, step 422 is performed by exchange manager 124 and step 426 is omitted. Once the consumer 404 is authenticated and/or provided with the required payment, the exchange manager 124 executes step 428 .

在一些实施例中，将列表202添加到消费者404的消费共享中可以进一步包括：从消费者404接收对呈现给消费者404的条款的认同。在一些实施例中，在消费者404已经根据本文所述的方法400或其它方法添加了列表202之后，由提供者402改变协议的条款的情况下，交换管理器124可能要求消费者404同意改变的条款，然后再允许其继续访问列表202引用的数据。In some embodiments, adding the list 202 to the consumer's 404 consumption share may further include receiving from the consumer 404 an approval of the terms presented to the consumer 404 . In some embodiments, where the terms of the agreement are changed by provider 402 after consumer 404 has added listing 202 according to method 400 or other methods described herein, exchange manager 124 may require consumer 404 to agree to the change terms before allowing it to continue accessing the data referenced by list 202.

添加428列表202引用的数据可以包括创建引用数据的数据库。然后可以将对该数据库的引用添加到消费共享156，并且然后可以将该数据库用于处理引用共享记录所引用的数据的查询。添加428数据可以包括添加根据过滤器208过滤的数据。例如，列表202所引用并且与消费者404、消费者404的组织或消费者404的某个其它分类相关联的数据(例如，数据的过滤视图)。Adding 428 the data referenced by the list 202 may include creating a database of referenced data. A reference to this database can then be added to the consuming share 156, and the database can then be used to process queries referencing the data referenced by the shared record. Adding 428 data may include adding data filtered according to filter 208 . For example, data referenced by list 202 and associated with consumers 404, an organization of consumers 404, or some other classification of consumers 404 (eg, a filtered view of the data).

在一些实施例中，将列表202添加到用户记录212可以包括改变列表202的访问控制206以引用消费者404的身份数据214，使得访问列表202所引用的数据的尝试将由交换管理器124允许并执行。In some embodiments, adding the list 202 to the user record 212 may include changing the access control 206 of the list 202 to reference the identity data 214 of the consumer 404 such that attempts to access the data referenced by the list 202 will be permitted by the exchange manager 124 and implement.

消费者404然后可以通过消费者的设备101-104将查询输入432到SQL引擎310。该查询可以引用在步骤428处添加的列表202中引用的数据以及在用户数据库152和消费共享156中引用的其它数据。然后，SQL引擎310使用在步骤428处创建的数据库处理430查询，并将结果返回给消费者404，或者创建视图，实例化视图或用户可以访问或分析的其它数据。如上所述，由查询操作的消费共享的数据可先前已被过滤以仅包括与消费者404有关的数据。因此，将相同列表202添加到其消费共享156的不同消费者404将会看到列表202所引用的数据库的不同版本。The consumer 404 may then enter 432 the query into the SQL engine 310 via the consumer's device 101-104. The query may reference the data referenced in the list 202 added at step 428 as well as other data referenced in the user database 152 and consumption share 156 . The SQL engine 310 then processes 430 the query using the database created at step 428 and returns the results to the consumer 404, or creates a view, materialized view or other data that the user can access or analyze. As described above, the data shared by the consumers of the query operation may have previously been filtered to include only data related to consumers 404 . Thus, different consumers 404 adding the same list 202 to their consumption share 156 will see different versions of the database referenced by the list 202.

参考图4B，在一些实施例中，可以使用所示的数据结构来实现根据消费者404身份的数据的私有共享和数据的过滤。例如，提供者402的服务数据库158可以包括顾客映射434，该顾客映射434包括由提供者402提供的服务(例如，由服务器的云服务120实现的服务)的用户的顾客标识符436的条目，以及作为用于采用认证界面120进行认证的标识符的顾客标识符436的条目。顾客映射434可以将每个顾客标识符436映射到仓库标识符438，即用户用于向虚拟仓库131进行认证的用户标识符，使得同一用户对应于两个标识符436、438。可以通过如上所述认证(例如，上述单点登入方法)来执行标识符436和438之间的映射。Referring to Figure 4B, in some embodiments, the data structures shown may be used to implement private sharing of data and filtering of data based on the identity of the consumer 404. For example, the provider's 402 service database 158 may include a customer map 434 that includes entries for the customer identifiers 436 of users of services provided by the provider 402 (eg, services implemented by the server's cloud service 120), and an entry for the customer identifier 436 as an identifier for authentication using the authentication interface 120 . The customer mapping 434 may map each customer identifier 436 to a warehouse identifier 438 , the user identifier used by the user to authenticate to the virtual warehouse 131 , such that the same user corresponds to both identifiers 436 , 438 . The mapping between identifiers 436 and 438 may be performed by authentication as described above (eg, the single sign-on method described above).

顾客映射434可进一步包括对权利表442的引用440，该权利表442可以是多个权利表442中的一个权利表。每个权利表442定义可以采用映射到其的顾客ID 436访问提供者402的一个或更多个表444中的哪个表。权利表442可以进一步定义可以采用顾客ID 436访问的表444的列。权利表442可进一步基于可采用顾客ID 436来访问的表444的一个或更多个过滤标准来定义行或行的类型。权利表442可以进一步定义可采用顾客ID436来访问的表444的模式。Customer mapping 434 may further include a reference 440 to entitlement table 442 , which may be one of a plurality of entitlement tables 442 . Each entitlement table 442 defines which of the one or more tables 444 of the provider 402 can be accessed with the customer ID 436 mapped to it. The entitlement table 442 may further define the columns of the table 444 that can be accessed with the customer ID 436 . The entitlement table 442 may further define a row or type of row based on one or more filter criteria of the table 444 accessible using the customer ID 436 . The entitlement table 442 may further define the schema of the table 444 that can be accessed using the customer ID 436 .

因此，表444的列表202可以指定如由顾客映射434定义的那样执行对数据表444的访问。例如，参考图4C，当消费者404请求为根据顾客映射为其定义访问的数据库添加列表202时，交换管理器124可以根据映射到消费者404的仓库标识符438的顾客标识符436和权利表442来创建安全视图446。安全视图可以通过执行根据顾客标识符436过滤的，在权利表442中指定的数据库的数据表444(或如在权利表442中指定的数据表的一部分)的内部连接而生成，使得连接结果仅包括特定顾客标识符436的数据，并且仅包括在权利表442中指定的数据库的那些部分(表444和/或表444的部分)。生成安全视图的方式可以如2018年8月6日提交和题为“SECURE DATA SHARING IN A MULTI-TENANT DATABASE”的美国申请序列号16/055,824以及2019年1月7日提交和题为“SECURE DATA SHARING IN A MULTI-TENANTDATABASE”的美国申请序列号16/241,463中所述。Thus, list 202 of tables 444 may specify that access to data table 444 is performed as defined by customer mapping 434 . For example, referring to FIG. 4C, when a consumer 404 requests to add a list 202 to a database for which access is defined according to the customer mapping, the exchange manager 124 may map the customer 404's warehouse identifier 438 according to the customer identifier 436 and entitlement table 442 to create a secure view 446. A secure view may be generated by performing an inner join of the data table 444 (or part of a data table as specified in the entitlement table 442) of the database specified in the entitlement table 442, filtered according to the customer identifier 436, such that the join results only Data for specific customer identifiers 436 is included, and only those portions of the database specified in entitlement table 442 (table 444 and/or portions of table 444). A secure view can be generated in a manner as described in U.S. Application Serial No. 16/055,824, filed August 6, 2018 and entitled "SECURE DATA SHARING IN A MULTI-TENANT DATABASE" and filed on January 7, 2019 and entitled "SECURE DATA SHARING IN A MULTI-TENANTDATABASE" US Application Serial No. 16/241,463.

图5示出用于共享数据的替代方法500，该替代方法500可以在消费者请求418添加对公共或私有交换的所有用户可用的列表202时执行。在该情况下，交换管理器124将对列表202的引用添加428到消费者404的消费共享156，并且省略认证或支付步骤。可以如上所述执行步骤428，除了不执行对访问控制206的改变之外。同样，如上所述，可以关于共享数据执行步骤430和432。图5的交换可以是上面关于图4所述的公共交换或私有交换。图5示出如果列表202是可查看的情况(即，如上所述，过滤标准允许消费者404查看)，消费者404可以将列表202添加到消费者404的消费共享156中，而无需进一步的认证或支付。FIG. 5 illustrates an alternative method 500 for sharing data that may be performed when a consumer requests 418 to add a list 202 available to all users of a public or private exchange. In this case, exchange manager 124 adds 428 a reference to list 202 to consumer 404's consumption share 156, and omits the authentication or payment step. Step 428 may be performed as described above, except that no changes to access control 206 are performed. Also, as described above, steps 430 and 432 may be performed with respect to shared data. The exchange of FIG. 5 may be a public exchange or a private exchange as described above with respect to FIG. 4 . Figure 5 shows that if the list 202 is viewable (ie, as described above, the filter criteria allow the consumer 404 to view), the consumer 404 can add the list 202 to the consumer 404's consumption share 156 without further Authentication or payment.

注意，当根据本文公开的任何方法将列表202添加到用户的消费共享156中时，当列表202所引用的数据被更新时，交换管理器124可以向消费者通知列表202。Note that when the list 202 is added to the user's consumption share 156 according to any of the methods disclosed herein, the exchange manager 124 may notify the consumer of the list 202 when the data referenced by the list 202 is updated.

参考图6，在一些实施例中，方法600可以包括消费者404从交换管理器124浏览目录并如针对本文所述(例如参见图4A和图5)的其它方法所描述的那样选择列表202，关于列表所引用的数据(“共享数据”)以及用户数据库112中的附加数据(“用户数据”)的双向共享。注意，在一些实施例中，提供者402的列表202不引用任何特定数据(例如，特定表或数据库)，而是关于消费者404提供的数据提供执行服务。因此，在此类情况下，如下所述，“共享数据”可以理解为用“所提供的服务”代替。6, in some embodiments, the method 600 may include the consumer 404 browsing the catalog from the exchange manager 124 and selecting the list 202 as described for other methods described herein (eg, see FIGS. 4A and 5), Bidirectional sharing of data referenced by the list ("shared data") and additional data in the user database 112 ("user data"). Note that in some embodiments, the list 202 of providers 402 does not reference any specific data (eg, specific tables or databases), but rather provides execution services with respect to the data provided by consumers 404 . Therefore, in such cases, as described below, "shared data" can be understood to be replaced by "provided services".

响应于该请求，交换管理器124关于消费者404和提供者402实现604共享数据的点对点共享。这可以如以上关于图4A所描述地执行，例如包括对消费者404的认证，并且可能过滤共享数据以仅包括与消费者404相关联的数据，如上所述。交换管理器124可以进一步如关于图4A所述关于提供者402实现用户数据的点对点共享，除了：(a)消费者404充当提供者，而提供者402充当用户数据的消费者，并且用户的数据被添加到提供者402的消费共享156中，并且(b)消费者404无需创建用于用户数据的列表202，并且用户数据也无需在目录220中列出。In response to the request, exchange manager 124 implements 604 peer-to-peer sharing of shared data with respect to consumer 404 and provider 402. This may be performed as described above with respect to Figure 4A, eg, including authentication of the consumer 404, and possibly filtering the shared data to include only data associated with the consumer 404, as described above. Exchange manager 124 may further implement peer-to-peer sharing of user data with respect to provider 402 as described with respect to FIG. 4A, except that: (a) consumer 404 acts as a provider and provider 402 acts as a consumer of user data, and the user's data is added to the consumer share 156 of the provider 402, and (b) the consumer 404 does not need to create a list 202 for user data, and the user data does not need to be listed in the catalog 220.

在步骤606之后，消费者404或提供者402都可以访问共享数据和用户数据。然后，可以针对这两者运行查询，连接它们，对连接的数据执行聚合，或关于多个数据库执行本领域已知的任何其它动作或充实。After step 606, either the consumer 404 or the provider 402 can access the shared data and user data. Queries can then be run against both, join them, perform aggregations on the joined data, or perform any other action or enrichment known in the art with respect to multiple databases.

在一些实施例中，双向共享可以包括或由消费者404请求包括，提供者402还连接608共享数据和用户数据以获得连接数据，并且采用将连接数据的引用添加612到消费者404的消费共享156的请求(交换管理器124所做)，将连接数据的引用返回610给交换管理器124。In some embodiments, the two-way sharing may include or be requested to include by the consumer 404, the provider 402 also connects 608 the shared data and the user data to obtain the connection data, and employs adding 612 a reference to the connection data to the consumer 404's consumption share 156 (made by the exchange manager 124 ), returns 610 a reference to the connection data to the exchange manager 124 .

因此，消费者404现在将有权访问连接的数据。步骤608可以进一步包括在连接之前或之后对用户数据和共享数据执行其它动作(聚合、分析)。虚拟仓库131可以响应于来自消费者404的这样做的请求来执行步骤608。Therefore, the consumer 404 will now have access to the connected data. Step 608 may further include performing other actions (aggregation, analysis) on the user data and shared data before or after the connection. Virtual repository 131 may perform step 608 in response to a request from consumer 404 to do so.

注意，连接结果可以是(a)作为连接结果的新数据库，或者(b)定义共享数据和用户数据的连接的连接数据库视图。Note that the join result can be either (a) a new database as the join result, or (b) a join database view that defines a join that shares data and user data.

来自步骤608(连接，聚合，分析等)的结果可以可替代地被添加到在步骤606、608处执行的原始共享，例如，定义在步骤608处执行的操作的视图(实体化或非实体化)。The results from step 608 (join, aggregate, analyze, etc.) may alternatively be added to the original share performed at steps 606, 608, e.g., define the view (materialized or unmaterialized) of the operation performed at step 608 ).

虚拟仓库131也可以响应于来自消费者404或提供者402的这样做的请求而独立于在步骤602处所做出的请求，来执行步骤608-612。Virtual repository 131 may also perform steps 608-612 independently of the request made at step 602 in response to a request from consumer 404 or provider 402 to do so.

注意，在许多情况下，有许多消费者404尝试关于提供者402执行双向共享，并且这些消费者404可能关于其用户数据寻求双向共享，该用户数据可能采用许多不同的格式(模式)，该格式(模式)可能与提供者402的共享数据所使用的模式不同。因此，步骤608可以包括变换步骤。变换步骤将用户数据的源模式映射到共享数据的目标模式。该变换可以是由人类操作员提供的静态变换。该变换可以根据将源模式的列标签映射到目标模式的对应列标签的算法。该算法可以包括被训练以执行变换的机器学习或人工智能模型。例如，多个训练数据条目可以由人类注释者指定，每个条目都包括作为输入的源模式，而作为输出的包括源模式和目标模式之间的映射。然后，这些条目可用于训练机器学习或人工智能算法，以输出到给定输入源模式的目标模式的映射。Note that in many cases there are many consumers 404 attempting to perform two-way sharing with the provider 402, and these consumers 404 may seek two-way sharing with respect to their user data, which may be in many different formats (schemas), which format (schema) may differ from the schema used by provider 402 for the shared data. Thus, step 608 may include a transformation step. The transformation step maps the source schema of the user data to the target schema of the shared data. The transformation may be a static transformation provided by a human operator. The transformation may be based on an algorithm that maps the column labels of the source schema to the corresponding column labels of the target schema. The algorithm may include a machine learning or artificial intelligence model trained to perform the transformation. For example, multiple training data entries can be specified by a human annotator, each entry including as input the source schema, and as output the mapping between the source schema and the target schema. These entries can then be used to train a machine learning or artificial intelligence algorithm to output a mapping to a target pattern given an input source pattern.

添加到消费者404和提供者402所消费的共享的数据然后可以分别由消费者404和提供者402操作，诸如通过针对数据执行查询，聚合数据，分析数据或执行本文描述为关于添加到用户的消费共享156的共享执行的任何其它动作。The data added to the share consumed by consumers 404 and providers 402 may then be manipulated by consumers 404 and providers 402, respectively, such as by performing queries against the data, aggregating the data, analyzing the data, or performing the operations described herein with respect to adding to users. Any other actions performed by the share of the consumption share 156 .

在特定实施例中，如上所述，数据提供者可以通过实现双向方式的数据的安全交换来改善其与商业伙伴的关系。传统的双向数据共享方法很难实现，并且经由API、FTP或公司之间的文件传送只能共享非常有限的数据集。这通常会带来巨大的成本、费用、数据时延，并且甚至带来一些安全风险。In certain embodiments, as described above, data providers can improve their relationships with business partners by enabling secure exchange of data in a two-way manner. Traditional two-way data sharing methods are difficult to achieve, and only very limited data sets can be shared via API, FTP, or file transfers between companies. This often brings huge costs, expenses, data latency, and even some security risks.

数据提供者可以替代地托管私有数据交换，并邀请其顾客和合作伙伴参与交换。顾客和合作伙伴可以例如以安全视图访问数据，并且他们也可以在另一个方向中推送数据。这可能是将数据共享回主机，也可能是列出数据，使得生态系统的其他参与者也可以安全地共享数据。也可以包括来自公共数据交换、其它私有交换或来自其它外部来源的数据。Data providers can instead host a private data exchange and invite their customers and partners to participate in the exchange. Customers and partners can access the data, eg in a secure view, and they can also push the data in the other direction. This could be sharing the data back to the host, or it could be listing the data so that other participants in the ecosystem can also safely share the data. Data from public data exchanges, other private exchanges, or other external sources may also be included.

每个大型公司取决于其他公司及其顾客。双向共享数据不仅可以从公司到这些方之间进行，也可以在这些外部各方之间进行，这可以使丰富的协作数据生态系统得以发展，其中使公司集团可以围绕数据一起工作。他们可以安全地发现、合并和丰富数据资产，以帮助服务共同顾客或在彼此之间建立新的合作伙伴关系。这些关系中的一些关系甚至可能导致向围墙花园生态系统的其他参与者出售数据、跨数据的安全视图或功能的机会。Every large company depends on other companies and their customers. Two-way sharing of data not only from companies to these parties, but also between these external parties, could allow for the development of a rich collaborative data ecosystem where groups of companies can work together around data. They can securely discover, combine and enrich data assets to help serve common customers or forge new partnerships with each other. Some of these relationships may even lead to opportunities to sell data, secure views across data, or functionality to other participants in the walled garden ecosystem.

参考图7，如本文所述的共享和消费数据的方法使得能够丰富数据并将该丰富的数据返回到交换。例如，提供者A可以以与本文描述的其它方法相同的方式请求702与交换共享数据(共享1)。交换管理器124验证、确认并将共享1添加704到目录220。Referring to Figure 7, the methods of sharing and consuming data as described herein enable enrichment of data and return the enriched data to an exchange. For example, Provider A may request 702 and exchange shared data (Share 1) in the same manner as the other methods described herein. Exchange manager 124 verifies, validates and adds 704 share 1 to directory 220.

然后，第二提供者B可以浏览目录220并将共享1添加706到其消费共享156。提供者B可以对共享数据执行708操作，诸如将其与其它数据连接，执行聚合，和/或执行关于共享1的其它分析，产生修改的数据(共享2)。然后，提供者B可以请求710与本文所述的交换来共享共享2。注意，步骤708的连接可以包括连接任何数量的数据库，诸如基于任何数量的其他用户的任何数量的列表的任何数量的共享。因此，可以将许多用户的步骤702-710的迭代视为一种层次结构，其中，基于来自数量的较大数量的列表202的数据，将多个用户的较大列表202缩小为较小数量的列表202。The second provider B can then browse the catalog 220 and add 706 share 1 to its consumption share 156. Provider B may perform 708 operations on the shared data, such as joining it with other data, performing aggregations, and/or performing other analysis on Share 1, resulting in modified data (Share 2). Provider B may then request 710 to share Share 2 with the exchange described herein. Note that the connection of step 708 may include connecting to any number of databases, such as any number of shares based on any number of lists of any number of other users. Thus, the iteration of steps 702-710 for many users can be viewed as a hierarchy in which a larger list 202 of multiple users is reduced to a smaller number based on data from the larger number of lists 202 in number List 202.

交换管理器124验证、确认并将共享2添加712到目录220。随着提供者A、提供者B或不同的提供者添加共享2，基于它生成修改的数据，并且以相同的方式将结果添加回目录，可以关于共享2重复714该过程。以该方式，可以使用户获得丰富的数据和分析生态系统。根据方法700的共享可以是根据本文公开的方法的任何共享、点对点共享、私有交换共享或双向交换共享。Exchange manager 124 verifies, validates and adds 712 share 2 to directory 220. The process can be repeated 714 for Share 2 as Provider A, Provider B, or a different provider adds Share 2, generates modified data based on it, and adds the results back to the catalog in the same manner. In this way, users can gain access to a rich ecosystem of data and analytics. The sharing according to method 700 may be any sharing according to the methods disclosed herein, peer-to-peer sharing, private exchange sharing, or two-way exchange sharing.

注意，有一种可能性是提供者可以关于基于列表202的列表202执行步骤708和710。例如，提供者B使用提供者A的列表L1来创建列表L2，提供者C使用该列表L2来创建列表L3，提供者A使用该列表L3来定义列表L1。此类流程可以包括任何数量的步骤。这在一些情况(使得鉴于从L1派生L3，不允许将列表L1修改为引用L3)下可能是不可取的。在其它情况下，如果刷新每个列表引用的数据时存在时间延迟，则允许此类循环。例如，L1可以引用L3，前提是L3直到刷新L1之后的某个时间才会被刷新，并且因此循环引用不会无限期地导致L1和L3的连续更新。本公开还考虑了非循环流，使得列表L1不受其他提供者对列表L1的使用影响。Note that there is a possibility that the provider may perform steps 708 and 710 with respect to the list 202 based on the list 202 . For example, provider B uses provider A's list L1 to create list L2, provider C uses this list L2 to create list L3, and provider A uses this list L3 to define list L1. Such a process can include any number of steps. This may not be desirable in some cases (such that modification of list L1 to reference L3 is not allowed given that L3 is derived from L1). In other cases, such loops are allowed if there is a time delay in flushing the data referenced by each list. For example, L1 can reference L3, provided that L3 is not flushed until some time after L1 is flushed, and so a circular reference does not cause continuous updates of L1 and L3 indefinitely. The present disclosure also contemplates acyclic flow, so that list L1 is not affected by the use of list L1 by other providers.

在步骤712(共享2)处创建的列表可以(a)包括在步骤708之后剩余的，来自共享1的数据副本，并如根据步骤708修改，或者(b)包括视图，该视图引用共享1(例如，数据库，其根据本文公开的方法基于共享1的列表202创建)，并定义在步骤708处执行的操作，而没有包括来自共享1或从共享1派生的实际数据。因此，如上所述的层次结构可以是视图的层次结构，这些视图引用根据方法700创建的视图的列表202或根据本文公开的任何方法来自一个或更多个提供者的数据的列表202中的一个或二者。The list created at step 712 (share 2) may either (a) include the copy of the data from share 1 remaining after step 708, and modified as per step 708, or (b) include a view that references share 1 ( For example, a database, created according to the methods disclosed herein based on the list 202 of Share 1 ), and defines the operations performed at step 708 without including actual data from or derived from Share 1 . Thus, the hierarchy as described above may be a hierarchy of views that reference one of the list 202 of views created according to method 700 or the list 202 of data from one or more providers according to any of the methods disclosed herein or both.

在本文公开的方法中，公开了用于创建共享(列表202)和用于添加共享的方法。以类似的方式，消费者404可以指示交换管理器124删除添加的共享。提供者402可以指示交换管理器124停止共享某些列表202。在一些实施例中，这可以伴随有避免干扰那些列表202的消费者404的动作。诸如通过通知这些消费者404并仅在通知后的指定时间段之后或在所有消费者404从其消费共享156中删除对列表202的引用之后，才停止共享列表202。In the methods disclosed herein, methods for creating shares (list 202) and for adding shares are disclosed. In a similar manner, the consumer 404 may instruct the exchange manager 124 to delete the added share. Provider 402 may instruct exchange manager 124 to stop sharing certain lists 202 . In some embodiments, this may be accompanied by actions to avoid interfering with consumers 404 of those listings 202 . Stop sharing the list 202 , such as by notifying these consumers 404 and only after a specified period of time after notification or after all consumers 404 delete references to the list 202 from their consuming shares 156 .

用例Example

在第一用例中，公司根据上述方法实现私有交换。特别地，公司的列表202仅由与公司相关联的消费者404(员工、管理人员、投资者等)可查看。同样，仅允许与公司相关联的人员添加列表202。当将列表202添加到消耗共享156时，可以基于添加该列表的消费者的身份，即与该消费者在公司中的角色有关的数据来对其进行过滤。In the first use case, the company implements a private exchange according to the method described above. In particular, the company's listing 202 is viewable only by consumers 404 (employees, managers, investors, etc.) associated with the company. Likewise, only persons associated with the company are allowed to add to the list 202 . When a list 202 is added to the consumption share 156, it can be filtered based on the identity of the consumer adding the list, ie, data related to the consumer's role in the company.

在第二用例中，提供者402为尚未成为虚拟仓库131用户的消费者404创建读取器账户或读取器/写入器账户。该账户可与该消费者的账户数据相关联(请参见以上讨论的图4B的消费者映射)。然后，消费者404可以登录到该账户，并且然后访问提供者的列表以访问由提供者402管理的消费者数据404(例如，参见图4A的讨论)。In a second use case, provider 402 creates reader accounts or reader/writer accounts for consumers 404 who are not yet users of virtual warehouse 131 . The account may be associated with the consumer's account data (see consumer mapping of Figure 4B discussed above). Consumer 404 can then log into the account and then access the provider's list to access consumer data 404 managed by provider 402 (see, eg, the discussion of FIG. 4A).

在第五用例中，消费者404添加私有的共享(例如，由于消费者404的身份，根据上述方法可访问)，以及公共的共享。然后，消费者404可以将这些共享连接，并用于处理查询。In a fifth use case, the consumer 404 adds a private share (eg, accessible according to the methods described above due to the identity of the consumer 404), as well as a public share. Consumers 404 can then connect these shares and use them to process queries.

在第六用例中，可以基于订阅(例如，每月)共享列表202，或者可以基于每个查询的价格或信用提升乘数来访问列表202。因此，交换管理器124可以管理支付和访问的处理，使得消费者404被允许访问受定价模型支配的数据(订阅、每个查询等)。In a sixth use case, the list 202 may be shared on a subscription (eg, monthly) basis, or the list 202 may be accessed based on a price per query or credit boost multiplier. Accordingly, the exchange manager 124 can manage the processing of payments and access such that the consumers 404 are allowed access to data (subscriptions, per query, etc.) governed by the pricing model.

在第七用例中，交换管理器124实现可用于处理私有数据的安全功能和安全机器学习模型(训练和评分二者)，使得允许消费者404使用该功能或机器学习模型的结果，但无权访问由该功能或机器学习模型本身处理的原始数据。同样，不允许共享数据的消费者导出共享数据。尽管如此，允许消费者对共享数据执行分析功能。例如，可以实现以下安全功能以使得以安全方式查看顾客购物数据：In a seventh use case, the exchange manager 124 implements a security function and a secure machine learning model (both training and scoring) that can be used to process private data, such that the consumer 404 is allowed to use the function or the results of the machine learning model, but not authorized to Access raw data processed by the function or by the machine learning model itself. Likewise, consumers of shared data are not allowed to export shared data. Nonetheless, consumers are allowed to perform analytical functions on the shared data. For example, the following security features can be implemented to enable viewing of customer shopping data in a secure manner:

create or replace secure functioncreate or replace secure function

UDF_DEMO.PUBLIC.get_market_basket(input_item_sk number(38))UDF_DEMO.PUBLIC.get_market_basket(input_item_sk number(38))

returns table(input_item NUMBER(38,0),basket_item_sk NUMBER(38,0),returns table(input_item NUMBER(38,0), basket_item_sk NUMBER(38,0),

num_baskets NUMBER(38,0))num_baskets NUMBER(38,0))

asas

'select input_item_sk,ss_item_sk basket_Item,count(distinct'select input_item_sk,ss_item_sk basket_Item,count(distinct

ss_ticket_number)basketsss_ticket_number)baskets

from udf_demo.public.salesfrom udf_demo.public.sales

where ss_ticket_number in(select ss_ticket_number from udf_demo.public.sales where ss_item_sk＝input_item_sk)where ss_ticket_number in(select ss_ticket_number from udf_demo.public.sales where ss_item_sk=input_item_sk)

group by ss_item_skgroup by ss_item_sk

order by 3desc,2'；order by 3desc, 2';

在第八用例中，交换管理器124可以将一个或更多个消费者404对列表202的使用统计信息(例如查询、使用的积分、扫描的表、命中的表等)提供给列表的提供者402。In an eighth use case, the exchange manager 124 may provide usage statistics (eg, queries, credits used, tables scanned, tables hit, etc.) of the list 202 by one or more consumers 404 to the provider of the list 402.

在第九用例中，本文公开的系统和方法用于特定于行业的应用。例如：1.网络安全(Cybersecurity)In a ninth use case, the systems and methods disclosed herein are used for industry-specific applications. For example: 1. Cybersecurity

a.允许共享风险矢量、不良行为者、IP白名单/黑名单、正在进行的实时攻击、已知的好/坏电子邮件程序等a. Allow sharing of risk vectors, bad actors, IP whitelists/blacklists, live attacks in progress, known good/bad email programs, etc.

2.医疗保健2. Healthcare

a.安全共享患者信息，包括费用信息和结果信息，以及其它类型的信息a. Secure sharing of patient information, including cost and outcome information, and other types of information

b.保护多医院数据库，使患者可以将其信息共享给多个提供者。(例如，如果患者A居住在加利福尼亚州并度假去佛罗里达州，受伤并且在急诊室接受治疗，则佛罗里达州的医院也许能够从不同的医院和提供者访问患者A的记录)。b. Secure the multi-hospital database so that patients can share their information with multiple providers. (For example, if Patient A lives in California and goes on vacation to Florida, gets injured, and is treated in an emergency room, a Florida hospital might be able to access Patient A's records from a different hospital and provider).

根据本文公开的系统和方法，其它行业也可以从数据的私有或公共共享中受益。诸如金融服务业、电信业、媒体和广告业、政府机构、军队和情报机构。Other industries may also benefit from private or public sharing of data in accordance with the systems and methods disclosed herein. Such as financial services, telecommunications, media and advertising, government agencies, military and intelligence agencies.

在第十用例中，第一用户为第二用户提供营销服务，因此，第二用户与第一用户共享顾客列表。第一用户向第二用户共享关于营销活动的数据，诸如活动元数据、当前用户事件(特定用户的会话开始/结束，特定用户的购买等)。这可以使用图6的双向共享来实现。可以连接该数据(来自第一用户的顾客列表+顾客事件)，以便更好地了解特定用户或一组用户的事件。如上所述，可以在不创建副本或传送数据的情况下执行这种数据交换-每个用户访问共享数据的相同副本。由于没有数据传送，因此在发生顾客事件时可以几乎实时地访问数据。In a tenth use case, the first user provides marketing services for the second user, and thus, the second user shares the customer list with the first user. The first user shares data about the marketing campaign with the second user, such as campaign metadata, current user events (session start/end for a particular user, purchases for a particular user, etc.). This can be achieved using the bidirectional sharing of Figure 6. This data (customer list + customer events from the first user) can be concatenated to better understand events for a particular user or group of users. As mentioned above, this data exchange can be performed without creating a copy or transferring the data - each user accesses the same copy of the shared data. Since there is no data transfer, data can be accessed in near real-time as customer events occur.

图8是示出其中数据提供者可以经由云计算服务共享数据的网络环境的框图。数据提供者810可以使用云计算服务112将一个或更多个数据集820上传到云存储设备中。然后，这些数据集可以由一个或更多个数据消费者101-104查看。数据提供者810能够使用本文讨论的方法和系统使用云计算服务112来控制、监视和增加其数据的安全性。在特定实施例中，数据提供者810可以使用由云计算服务112提供的功能、方法和系统在其自己的在线域上实现私有数据交换。数据提供者810可以是任何数据提供者，诸如例如零售公司、政府机构、民意调查机构、非营利组织等。数据消费者101-104可以在数据提供者810的内部或在数据提供者810的外部。在数据提供者内部的数据消费者可以是数据提供者810的员工。数据提供者可以是自行车共享公司，该公司以每天、每月、每年或基于旅行的费用提供自行车。自行车共享公司可以收集有关其用户的数据，诸如基本的人口统计信息以及骑乘信息，包括骑乘日期、骑乘时间和骑乘持续时间。该信息可以经由云计算服务112可用于自行车共享公司的员工。8 is a block diagram illustrating a network environment in which data providers can share data via cloud computing services. The data provider 810 may use the cloud computing service 112 to upload one or more datasets 820 into a cloud storage device. These datasets can then be viewed by one or more data consumers 101-104. Data providers 810 can use cloud computing services 112 to control, monitor, and increase the security of their data using the methods and systems discussed herein. In certain embodiments, the data provider 810 may implement a private data exchange on its own online domain using the functions, methods, and systems provided by the cloud computing service 112 . The data provider 810 may be any data provider such as, for example, a retail company, a government agency, a pollster, a non-profit organization, or the like. Data consumers 101 - 104 may be internal to data provider 810 or external to data provider 810 . Data consumers within the data provider may be employees of the data provider 810 . The data provider can be a bike-sharing company that provides bikes at a daily, monthly, annual, or trip-based fee. Bike-sharing companies may collect data about their users, such as basic demographic information and ride information, including ride date, ride time, and ride duration. This information may be made available to employees of the bike share company via cloud computing services 112 .

数据提供者810、私有数据交换812(如由云计算服务112实现的)和数据消费者之间的交互可以如下。数据提供者可以使用数据集820创建一个或更多个列表811。列表可以用于任何合适的数据。例如，消费者数据公司可以创建被称为“视频流”的列表，其包含与大量用户的视频流习惯有关的数据。数据提供者可以设定与谁可以查看列表811，谁可以访问列表811中的数据有关的列表策略821，或者任何其它合适的策略。上面参考图2讨论了此类列表策略。然后，数据提供者810可以在步骤813处提交给私有交换812。私有数据交换812可以被嵌入在数据提供者810的网络域内。例如，如果消费者数据公司的网络域是www.entityA.com，则可以在www.entityA.com/privatedataexchange处找到私有数据交换。如果列表符合由云计算服务112确定的一个或更多个规则，则私有数据交换812可以接收列表并在步骤814处批准它。然后，私有数据交换812可以至少部分地根据在步骤821中设定的列表策略在815处设置访问控制。私有数据交换812然后可以在步骤816处邀请成员。成员可以是数据消费者801。数据消费者801可以在步骤817处接受邀请，并且然后可以在818处开始消费数据。数据消费的类型可以取决于在815处建立的访问控制。例如，数据消费者可以仅读取数据或共享数据。作为另一个示例，数据消费者能够根据访问控制对数据进行上述读取或共享操作的任何组合。通常，数据共享不涉及更改共享数据。The interaction between the data provider 810, the private data exchange 812 (as implemented by the cloud computing service 112), and the data consumer may be as follows. A data provider may create one or more lists 811 using the dataset 820 . Lists can be used for any suitable data. For example, consumer data companies may create lists called "video streams" that contain data related to the video streaming habits of a large number of users. The data provider may set a listing policy 821 regarding who can view the listing 811, who can access the data in the listing 811, or any other suitable policy. Such listing strategies are discussed above with reference to FIG. 2 . The data provider 810 may then submit to the private exchange 812 at step 813 . The private data exchange 812 may be embedded within the data provider's 810 network domain. For example, if the consumer data company's web domain is www.entityA.com, the private data exchange can be found at www.entityA.com/privatedataexchange. If the listing complies with one or more rules determined by cloud computing service 112 , private data exchange 812 may receive the listing and approve it at step 814 . The private data exchange 812 may then set access control at 815 based at least in part on the list policy set in step 821 . Private data exchange 812 may then invite members at step 816 . Members may be data consumers 801 . Data consumer 801 may accept the invitation at step 817 and may then begin consuming data at 818 . The type of data consumption may depend on the access control established at 815 . For example, data consumers can only read data or share data. As another example, a data consumer can perform any combination of the above read or share operations on the data according to access control. Typically, data sharing does not involve changing shared data.

在一些实施例中，数据消费者801可以通过在浏览器中直接导航到私有数据交换812，或者通过点击用于私有数据交换812的广告，或者通过任何其它合适的机制，来独立地访问私有数据交换812。也可以经由定制或其它代码通过经由API访问列表和其它信息来呈现私有数据交换。如果数据消费者801希望访问列表内的数据并且该列表尚未普遍可用或者数据消费者801尚未具有访问权限，则数据消费者801可需要在步骤820处请求访问。数据提供者可以在822处批准或者拒绝请求。如果批准，则私有数据交换可以在823处授予对列表的访问权。然后，用户可以开始消费如上所述的数据。In some embodiments, the data consumer 801 may independently access the private data by navigating directly to the private data exchange 812 in a browser, or by clicking on an advertisement for the private data exchange 812, or through any other suitable mechanism Exchange 812. Private data exchanges may also be presented via custom or other code by accessing lists and other information via APIs. Data consumer 801 may need to request access at step 820 if data consumer 801 wishes to access data within the list and the list is not yet generally available or data consumer 801 does not yet have access rights. The data provider may approve or deny the request at 822 . If approved, the private data exchange may grant access to the list at 823 . The user can then start consuming the data as described above.

在特定实施例中，一个或更多个数据交换管理员账户可以由云计算服务112指定。数据交换管理员可以通过将成员指定为数据提供者810或数据消费者801来管理私有数据交换成员。数据交换管理员能够通过选择哪些成员可以看到给定的列表来控制列表的可见性。数据交换管理员还可以具有其它功能，诸如在将列表发布到私有数据交换上之前批准列表，跟踪列表中的每个列表的使用情况或任何其它合适的管理功能。在一些实施例中，数据提供者和数据交换管理员是同一实体的一部分；在一些实施例中，它们是分开的实体。提供者可以创建列表，可以对列表下面的数据测试样本查询，可以设定列表访问权，授予对列表请求的访问权，并跟踪列表中每个列表和列表下面的数据的使用情况。数据消费者801可以访问私有数据交换并浏览可显示为图块(tile)的可见列表。为了消费列表下面的数据，消费者可以立即访问数据，或可以请求访问数据。In certain embodiments, one or more data exchange administrator accounts may be designated by cloud computing service 112 . A data exchange administrator can manage private data exchange members by designating members as data providers 810 or data consumers 801 . Data exchange administrators can control the visibility of lists by choosing which members can see a given list. The data exchange administrator may also have other functions, such as approving listings before publishing them on a private data exchange, tracking usage of each listing in the listing, or any other suitable management function. In some embodiments, the data provider and the data exchange administrator are part of the same entity; in some embodiments, they are separate entities. Providers can create lists, can test sample queries on the data below the list, can set access rights to the list, grant access to list requests, and track the usage of each list in the list and the data below the list. Data consumers 801 can access the private data exchange and browse the visible list that can be displayed as tiles. In order to consume the data below the list, the consumer can access the data immediately, or can request access to the data.

图9是根据本发明的实施例的示例私有数据交换900。私有数据交换900可以是数据消费者在导航到网络上的私有数据交换时看到的东西。例如，数据消费者可以将www.entityA.com/privatedataexchange输入其浏览器中。如本文所讨论的，“实体A数据交换”可以是由云计算服务112促进并且被嵌入到实体A自己的网络域或应用中的私有数据交换，或者可以经由API来访问。私有数据交换900可以包括用于不同数据集的几个列表，例如列表A-L。列表A-L在本文中也可以被称为数据目录，其可以允许私有数据交换的访问者查看私有数据交换中的所有可用列表。这些列表可以由实体A内部的管理员放置。以该方式提供数据目录可以用于将众包内容(crowdsourced content)、数据质量以及适当级别的集中控制和协调的优势相结合，这可以克服使其他企业数据分类方法(例如索引和爬取(crawling)系统)的采用减慢的挑战。它允许整个企业中的用户提供数据，使用其它组中的数据以及将数据连接在一起以创建丰富的数据产品，以供内部使用以及潜在地用于外部获利。FIG. 9 is an example private data exchange 900 according to an embodiment of the present invention. Private data exchange 900 may be what a data consumer sees when navigating to a private data exchange on the network. For example, a data consumer can enter www.entityA.com/privatedataexchange into their browser. As discussed herein, an "entity A data exchange" may be a private data exchange facilitated by cloud computing service 112 and embedded in entity A's own network domain or application, or may be accessed via an API. Private data exchange 900 may include several lists for different data sets, such as lists A-L. Lists A-L may also be referred to herein as data directories, which may allow a visitor of the private data exchange to view all available listings in the private data exchange. These lists can be placed by administrators inside entity A. Providing a data catalog in this way can be used to combine the advantages of crowdsourced content, data quality, and an appropriate level of centralized control and coordination, which can overcome other enterprise data classification methods such as indexing and crawling. ) system) the challenge of slowing adoption. It allows users across the enterprise to contribute data, consume data from other groups, and connect data together to create rich data products for internal use and potentially for external monetization.

作为示例而非限制，实体A可以是已经收集并分析了多个不同类别中的数百万人的消费习惯的消费者数据公司。他们的数据集可以包括以下类别的数据：在线购物、视频流、电力消耗、汽车使用、互联网使用、服装购买、移动应用购买、俱乐部会员资格和在线订阅服务。这些数据集的每一个数据集可以对应于不同的列表。例如，列表A可以用于在线购物数据，列表B可以用于视频流数据，列表C可以用于电力消耗数据，依此类推。注意，数据可能会匿名，从而不会透露个人身份。位于行915下面的列表可以对应于实体A可以在其私有数据交换上允许的第三方列表。此类列表可以由其它数据提供者生成，并且在被添加到私有数据交换900之前，可能要经过实体A的批准。数据消费者可以单击并查看受以上参考图2、图4和图8讨论的各种访问控制和策略支配的任何列表。By way of example and not limitation, Entity A may be a consumer data company that has collected and analyzed the consumption habits of millions of people in a number of different categories. Their datasets can include data in the following categories: online shopping, video streaming, electricity consumption, car usage, internet usage, clothing purchases, mobile app purchases, club memberships, and online subscription services. Each of these datasets can correspond to a different list. For example, list A can be used for online shopping data, list B can be used for video streaming data, list C can be used for power consumption data, and so on. Note that data may be anonymized so that personal identity is not revealed. The list located below row 915 may correspond to a list of third parties that entity A may allow on its private data exchange. Such lists may be generated by other data providers, and may be subject to Entity A's approval before being added to private data exchange 900. The data consumer can click and view any list subject to the various access controls and policies discussed above with reference to FIGS. 2 , 4 and 8 .

在特定实施例中，如参考图8所讨论的，数据提供者可以邀请成员访问其私有数据交换。一类成员可以是数据提供者的物理和数字供应链供应商。例如，数据提供者可以在其库存级别或供应商提供的物品消费方面与供应商共享数据，因此他们可以更好地满足数据提供者的需求。另外，数字数据提供者可以将数据直接提供到其私有数据交换中，以使其立即可用于内部企业数据并与之连接，从而为双方节省了传输、存储和加载数据的成本。In certain embodiments, as discussed with reference to FIG. 8, a data provider may invite members to access its private data exchange. A class of members may be physical and digital supply chain suppliers of data providers. For example, data providers can share data with suppliers on their inventory levels or the consumption of items provided by suppliers, so they can better serve the needs of the data provider. In addition, digital data providers can provide data directly into their private data exchange to make it immediately available and connected to internal enterprise data, saving both parties the cost of transferring, storing, and loading the data.

一些公司(诸如对冲基金和营销机构)从许多外部来源引入数据。一些对冲基金每年评估数百种潜在数据集。私有数据交换不仅可以用于连接已购买的数据，还可以用于评估新数据资产。例如，对冲基金可以让潜在的数据供应商在其私有交换上列出其数据，并且该基金可以在他们是唯一顾客的私有数据存储中探索和“购买”数据。如参考图11所讨论的，此类内部数据存储还可以“隧穿(tunnel)”来自公共数据交换(例如，SNOWFLAKE公共数据交换)的数据资产。Some companies, such as hedge funds and marketing agencies, bring in data from many external sources. Some hedge funds evaluate hundreds of potential datasets each year. Private data exchanges can be used not only to connect purchased data, but also to evaluate new data assets. For example, a hedge fund could have potential data vendors list their data on their private exchange, and the fund could explore and "buy" the data in a private data store where they are the only customers. As discussed with reference to Figure 11, such internal data storage may also "tunnel" data assets from a public data exchange (eg, the SNOWFLAKE public data exchange).

作为另一个示例，公司的营销数据的现有提供者可以列出一些附加数据集，其顾客可以在试用的基础上经由其私有交换使用该附加数据集，并且如果顾客发现它们有用，则供应商可以立即通过同一交换提供完全访问权。这些安排可以带来更大的数据深度，双向和更新鲜的数据，以及对数据和实物商品的供应商及其顾客之间关系的更大信任和透明度。As another example, an existing provider of a company's marketing data may list additional data sets that its customers can use on a trial basis via their private exchange, and if the customer finds them useful, the provider Full access is immediately available through the same exchange. These arrangements can lead to greater data depth, bi-directional and fresher data, and greater trust and transparency in the relationship between suppliers of data and physical goods and their customers.

图10是示出来自私有数据交换的共享数据的示例安全视图的图。当数据消费者1020希望访问列表(例如，列表H)中的数据时，云计算服务112可以促进经由共享数据的安全视图1010的访问。共享数据的安全视图1010可以包括元数据1015，该元数据1015包括本文参考图2讨论的元数据和访问控制。这可以使数据提供者能够共享数据而不会暴露基础表或内部细节。这使数据更加私有和安全。通过共享数据的安全视图1010，视图定义和细节仅对授权用户可见。10 is a diagram illustrating an example security view of shared data from a private data exchange. When a data consumer 1020 wishes to access data in a list (eg, List H), the cloud computing service 112 may facilitate access via a secure view 1010 of shared data. The security view 1010 of the shared data may include metadata 1015 including the metadata and access controls discussed herein with reference to FIG. 2 . This can enable data providers to share data without exposing underlying tables or internal details. This makes the data more private and secure. Through a secure view 1010 of shared data, view definitions and details are only visible to authorized users.

在私有数据交换中，可以在同一实体内以及不同实体之间共享数据。另外，数据共享可以是单向、双向或多向的。在一个实施例中，这可以导致多达五个用于共享数据的主要用例：双向实体间、双向实体内、单向实体间、单向实体内和多向多实体。双向实体间数据共享的示例可以是从投资组合公司到母公司以及投资组合公司之间的数据共享。双向实体内数据共享的示例可以是从大型公司总部到该公司内不同业务部门的数据共享，并且也可以是从业务部门到总部的数据共享。单向实体间数据共享的示例可以是与许多不同实体共享数据但不从那些实体接收数据的大型数据提供者(例如，国家气象服务)。单向实体内的示例可以是一家大型公司，该公司向其相应的业务部门提供数据，但不从那些业务部门接收数据。在特定实施例中，数据可以共享为特定数据的“点对点共享”或“任何共享”。特定数据的点对点共享可以包括母公司和特定投资组合公司之间的私有数据交换共享。任何共享可以包括从母公司到公共交换上或私有交换内的众多数据消费者的私有数据交换共享。In a private data exchange, data can be shared within the same entity as well as between different entities. Additionally, data sharing can be unidirectional, bidirectional, or multidirectional. In one embodiment, this can lead to as many as five primary use cases for sharing data: two-way inter-entity, two-way intra-entity, one-way inter-entity, one-way intra-entity, and multi-way multi-entity. An example of two-way inter-entity data sharing could be data sharing from a portfolio company to a parent company and between portfolio companies. An example of two-way intra-entity data sharing could be data sharing from a large company headquarters to different business units within that company, and also data sharing from business units to headquarters. An example of one-way inter-entity data sharing may be a large data provider (eg, the National Weather Service) that shares data with many different entities but does not receive data from those entities. An example within a one-way entity could be a large company that provides data to its corresponding business units, but does not receive data from those business units. In certain embodiments, data may be shared as a "peer-to-peer share" or "any share" of specific data. Peer-to-peer sharing of specific data may include private data exchange sharing between parent companies and specific portfolio companies. Any sharing may include private data exchange sharing from a parent company to numerous data consumers on a public exchange or within a private exchange.

在特定实施例中，云计算服务112可以为作为要在私有数据交换上共享的数据的所有者的实体生成私有数据交换。云计算服务112可以指定私有数据交换的一个或更多个管理员。这些管理员可以控制私有数据交换相对于其他用户的访问权限。例如，管理员能够将另一个用户账户添加到私有数据交换中，并将该账户指定为数据提供者、数据消费者、交换管理员或这些的组合。In particular embodiments, the cloud computing service 112 may generate a private data exchange for an entity that is the owner of the data to be shared on the private data exchange. Cloud computing service 112 may designate one or more administrators of the private data exchange. These administrators can control access to private data exchanges relative to other users. For example, an administrator can add another user account to a private data exchange and designate that account as a data provider, data consumer, exchange administrator, or a combination of these.

在特定实施例中，交换管理员可以控制私有数据交换的查看和访问权限。查看权限可以包括可以在私有数据交换中查看列表的实体的列表。访问权限可以包括在选择特定列表之后可以访问数据的实体列表。例如，公司可以发布私有数据交换900，并且可以包括几个列表，列表A至列表L。这些列表中的每一个列表可以包括它们自己的单独的查看和访问权限。例如，列表A可以包括有权查看私有数据交换900上的列表的实体的第一列表和有权访问列表的实体的第二列表。查看列表可能只是看该列表存在于私有数据交换上。访问列表可以选择列表并访问该列表的基础数据。访问可以包括查看基础数据，操纵该数据或两者。对于不希望某些用户甚至知道私有数据交换中存在某个列表的数据提供者，控制查看权限可能很有用。因此，当对特定列表没有查看权限的用户访问私有数据交换时，该用户甚至不会在交换上看到该列表。在特定实施例中，可以通过应用程序接口(API)来提供上述讨论的查看和访问权限。交换目录可以是查询，并且可以经由API进行更新。这可以允许数据提供者在其自己的应用或网站上向访问的任何人显示列表。当用户想要访问或请求对数据的访问权时，该用户然后可以采用云计算服务112创建账户并获得访问权。在一些实施例中，可以在用户请求访问列表中的数据时调用URL。这可以允许与外部请求批准工作流程集成。例如，如果用户发出访问请求，则可以访问和激活数据提供者的外部请求批准工作流程。然后，外部请求批准工作流程可以正常操作以执行外部请求批准过程。在一些实施例中，列表可以不列出，这意味着该列表存在但在数据交换上不可见。为了访问未列出的列表，消费者可以在浏览器中输入全局URL。对于每个列表，这可能需要一个唯一的URL。In certain embodiments, an exchange administrator can control viewing and access rights for private data exchanges. View permissions can include a list of entities that can view the list in the private data exchange. Access rights can include a list of entities that can access the data after a particular list is selected. For example, a company may publish a private data exchange 900 and may include several lists, List A through List L. Each of these lists can include their own individual viewing and access rights. For example, List A may include a first list of entities that have access to the list on private data exchange 900 and a second list of entities that have access to the list. Looking at the list might just see that the list exists on a private data exchange. An access list allows you to select a list and access the underlying data for that list. Access can include viewing the underlying data, manipulating that data, or both. Controlling viewing permissions can be useful for data providers who do not want some users to even know that a list exists in a private data exchange. So when a user who doesn't have permission to view a particular list accesses a private data exchange, that user won't even see that list on the exchange. In certain embodiments, the viewing and access rights discussed above may be provided through an application programming interface (API). The exchange catalog can be queried and updated via an API. This could allow the data provider to display the list on their own app or website to anyone who visits. When a user wants to access or requests access to the data, the user can then create an account with the cloud computing service 112 and gain access. In some embodiments, the URL may be invoked when a user requests access to data in the list. This can allow integration with external request approval workflows. For example, if a user makes an access request, the data provider's external request approval workflow can be accessed and activated. The external request approval workflow can then operate normally to perform the external request approval process. In some embodiments, the list may not be listed, which means that the list exists but is not visible on the data exchange. In order to access unlisted listings, consumers can enter the global URL in their browser. This may require a unique URL for each list.

在特定实施例中，当为数据提供者创建新的私有数据交换时，云计算服务112可以指定交换管理员(例如，如上讨论的数据交易管理员)，并且还可以生成关于私有数据交换的以下元数据：私有数据交换的名称(必须是唯一的)、显示名称、徽标、私有数据交换的简短描述以及是否需要交换管理员的批准才能发布的指示(例如Admin_Approval_for_Publishing)。这可能是真/假语句。如果交换管理员需要在它们被发布之前批准提交给私有数据交换的列表，则可以将其设定为真。如果交换管理员不需要提供此类批准，则可以将其设定为假。在该情况下，提供者可以直接发布数据。如果交换管理员将Admin_Approval_for_Publishing设定为真，则交换管理员能够看到列表的清单，并选择要预览和批准/拒绝的列表。私有数据交换的所有者可以是为私有数据交换付费的账户。该元数据信息可以存储为交换对象的一部分。与私有数据交换相关联地存储的还有向交换提供数据的用户和账户、交换的消费者以及交换管理员。In certain embodiments, when creating a new private data exchange for a data provider, cloud computing service 112 may designate an exchange administrator (eg, a data transaction administrator as discussed above), and may also generate the following information about the private data exchange Metadata: The private data exchange's name (must be unique), display name, logo, a short description of the private data exchange, and an indication of whether the exchange administrator's approval is required to publish (eg Admin_Approval_for_Publishing). This could be a true/false statement. This can be set to true if the exchange administrator needs to approve lists submitted to private data exchanges before they are published. This can be set to false if the exchange administrator is not required to provide such approval. In this case, the provider can publish the data directly. If the exchange administrator has set Admin_Approval_for_Publishing to true, the exchange administrator can see the list of listings and choose which ones to preview and approve/reject. The owner of the private data exchange can be an account that pays for the private data exchange. This metadata information can be stored as part of the exchange object. Also stored in association with the private data exchange are users and accounts that provide data to the exchange, consumers of the exchange, and exchange administrators.

在特定实施例中，交换管理员可以通过以任何合适的方式邀请成员(例如，数据提供者和数据消费者)来将成员添加到私有数据交换。例如，通过邀请云计算服务112上的用户账户，或者通过向用户的电子邮件账户地址发送电子邮件。当交换管理员将成员添加到私有数据交换时，交换管理员还可以指定一个或更多个成员类型：交换管理员、提供者或消费者。交换管理员能够从私有数据交换中添加和删除成员，以及编辑与私有数据交换相关联的元数据。对于每个用户，交换管理员可以指定该用户是否是交换管理员、数据提供者和数据消费者或这些角色中的多个。下表总结了与每种类型的账户相关联的权限。In certain embodiments, an exchange administrator may add members to a private data exchange by inviting members (eg, data providers and data consumers) in any suitable manner. For example, by inviting a user account on the cloud computing service 112, or by sending an email to the user's email account address. When an exchange administrator adds members to a private data exchange, the exchange administrator can also specify one or more member types: exchange administrator, provider, or consumer. Exchange administrators can add and remove members from the private data exchange, as well as edit the metadata associated with the private data exchange. For each user, the exchange administrator can specify whether the user is an exchange administrator, a data provider, and a data consumer, or more of these roles. The following table summarizes the permissions associated with each type of account.

表1：与每种类型的私有数据交换账户相关联的权限Table 1: Permissions associated with each type of private data exchange account

在一些实施例中，如果交换管理员删除成员或仅将成员的类型从提供者改变为消费者，则由该成员发布的现有列表可以从交换变成未发布。另外，该成员添加到交换的现有共享不再被视为私有数据交换的一部分。该成员发布的列表可以被存档，并且在UI中不再对任何人(包括该成员)可见。如果已被删除的相同成员(云计算服务112上的相同账户)再次成为提供者，则云计算服务112可以取消存档。In some embodiments, if an exchange administrator deletes a member or simply changes a member's type from provider to consumer, existing listings published by that member may become unpublished from exchange. Additionally, existing shares that the member adds to the exchange are no longer considered part of the private data exchange. Lists posted by the member can be archived and are no longer visible to anyone (including the member) in the UI. If the same member (same account on cloud computing service 112) that has been deleted becomes a provider again, cloud computing service 112 may unarchive.

在一些实施例中，交换管理员能够指定类别列表以及编辑现有列表。类别可具有与之关联的图标，并且交换管理员能够指定该图标以及类别名称。In some embodiments, the exchange administrator can specify category lists as well as edit existing lists. A category can have an icon associated with it, and the exchange administrator can specify the icon along with the category name.

当成员成为数据提供者时，可能会生成提供者简档，其中包括徽标、提供者的描述以及提供者网站的URL。在提交列表时，提供者可以执行以下操作：选择哪个私有数据交换来发布数据(例如，可能存在许多私有交换，并且提供者可能需要选择这些交换的子集，其可以是一个或更多个)，并设定有关新列表的元数据。元数据可以包括列表标题、列表类型(例如，标准或个性化)、列表描述、一个或更多个使用示例(例如，标题和样本查询)、列表类别(可以将其输入为自由格式文本)、列表的更新频率、支持电子邮件/URL和文档链接。提供者还可以设定对列表的访问权。提供者可以允许交换管理员控制列表的可见性，或者提供者可以自己保留该控制。提供者还可以将共享与列表相关联。对于标准共享，列表可以与零个或更多个共享相关联。提供者可以通过UI或SQL将共享与列表相关联。对于个性化共享，当提供者响应于请求提供共享时，提供者可以将该共享与列表相关联。当提供者希望发布列表时，取决于私有数据交换的发布规则，列表可能首先需要交换管理员的批准。When a member becomes a data provider, a provider profile may be generated, which includes a logo, a description of the provider, and the URL of the provider's website. When submitting a list, the provider may do the following: select which private data exchange to publish the data (eg, there may be many private exchanges, and the provider may need to select a subset of these exchanges, which may be one or more) , and set metadata about the new list. Metadata can include the listing title, listing type (eg, standard or personalized), listing description, one or more usage examples (eg, title and sample query), listing category (which can be entered as free-form text), How often the list is updated, support emails/URLs, and documentation links. Providers can also set access rights to the list. The provider can allow the exchange administrator to control the visibility of the list, or the provider can retain that control itself. Providers can also associate shares with lists. For standard shares, the list can be associated with zero or more shares. Providers can associate shares with lists via UI or SQL. For personalized shares, when a provider provides a share in response to a request, the provider can associate the share with a list. When a provider wishes to publish a listing, depending on the private data exchange's publishing rules, the listing may first require the approval of the exchange administrator.

图11是示出在公共数据交换和私有数据交换之间的数据列表的示例隧穿的图。可替代地，数据可以在两个公共数据交换之间或在两个私有数据交换之间隧穿，或者从一个公共交换隧穿到多个私有交换，或以任何其它合适的组合来隧穿。在一些实施例中，实体可能希望在其私有数据交换上提供公开列出的数据列表。例如，实体B可能希望在其自己的私有数据交换1000上包括公共数据交换1100的列表F。可以将列表F下面的数据从公共数据交换1100隧穿到私有数据交换1000。在特定实施例中，数据可以在两个私有数据交换之间隧穿。有时，第一数据提供者可能希望允许第二数据提供者在第二数据提供者的私有数据交换上列出属于第一数据提供者的数据。数据列表的隧穿可以允许两个数据提供者提供相同的列表。例如，实体A和实体B可能具有业务协议，以在其每个私有数据交换上共享列表F。列表F可能是实体A的财产，但是实体B可能也具有在其私有数据交换上提供它的许可证。在该情况下，两个标题为“列表F”的列表都将指向存储在云计算平台110中的相同数据集。隧道1015是示出可以在两个或更多个数据交换1100和1000之间安全且轻松地共享列表F的表示。在隧穿中没有数据被复制或传送。相反，每个列表都包含指向本文讨论的列表F引用的数据的指针。11 is a diagram illustrating an example tunneling of data lists between public data exchanges and private data exchanges. Alternatively, data may be tunneled between two public data exchanges or between two private data exchanges, or from one public exchange to multiple private exchanges, or in any other suitable combination. In some embodiments, an entity may wish to provide a publicly listed data listing on its private data exchange. For example, entity B may wish to include list F of public data exchange 1100 on its own private data exchange 1000. The data below list F can be tunneled from public data exchange 1100 to private data exchange 1000. In certain embodiments, data may be tunneled between the two private data exchanges. At times, a first data provider may wish to allow a second data provider to list data belonging to the first data provider on the second data provider's private data exchange. Tunneling of data lists may allow two data providers to provide the same list. For example, entity A and entity B might have a business agreement to share list F on each of their private data exchanges. List F may be the property of entity A, but entity B may also have a license to provide it on its private data exchange. In this case, both lists titled "List F" would point to the same dataset stored in the cloud computing platform 110 . Tunnel 1015 is a representation showing that list F can be securely and easily shared between two or more data exchanges 1100 and 1000 . No data is copied or transferred during tunneling. Instead, each list contains pointers to the data referenced by list F discussed in this article.

在特定实施例中，可以在私有数据交换和公共数据交换之间完成隧道链接，反之亦然。例如，数据交换1100可以是公共数据交换。实体B可以经由隧道1015在其自己的私有数据交换1000上使用在公共数据交换1100上列出的列表。在一些实施例中，可以将数据列表从一个数据交换隧穿到另一数据交换，并且然后可以将基础数据与另一数据集连接，并且然后可以从组合的数据集生成新列表。作为示例而非限制，第一数据集可以在私有数据交换上列出，该私有数据交换包括最近五年的NBA球员投篮统计数据。可以在不同的数据交换上列出第二数据集，该数据交换包括相同时间跨度上的天气数据。这两个数据集可以连接，并在私有或公共数据交换中作为新列表列出。然后，数据消费者可以根据本文讨论的查看和访问控制来访问该数据集，以了解天气可能如何影响球员的投篮命中率。另外，如果数据在公共数据交换(例如，由云计算服务112托管的数据交换)上列出，则该数据可以被隧穿到私有数据交换。In certain embodiments, a tunnel link may be done between the private data exchange and the public data exchange, and vice versa. For example, data exchange 1100 may be a public data exchange. Entity B can use the list listed on public data exchange 1100 on its own private data exchange 1000 via tunnel 1015 . In some embodiments, lists of data can be tunneled from one data exchange to another, and the underlying data can then be concatenated with another set of data, and a new list can then be generated from the combined set of data. By way of example and not limitation, the first data set may be listed on a private data exchange that includes NBA player shooting statistics for the last five years. The second dataset may be listed on a different data exchange that includes weather data over the same time span. The two datasets can be concatenated and listed as a new list in a private or public data exchange. Data consumers can then access this dataset to understand how the weather might affect a player's field goal percentage, subject to the viewing and access controls discussed in this article. Additionally, if the data is listed on a public data exchange (eg, a data exchange hosted by cloud computing service 112), the data may be tunneled to a private data exchange.

在一些实施例中，数据集的隧穿可用于创建公共或私有的“全行业(industry-wide)”数据交换。许多不同的实体可能会将数据集隧穿到“大型生态系统数据交换”。如果私有生态系统的数据交换真正崛起，则它可能会变得如此庞大和有影响力，从而可能成为整个行业进行数据交换、协作和货币化的标准场所。每个行业可能有一个或两个“大型生态系统数据交换”的空间。一旦任何一个交换获得了巨大的吸引力，它就有可能成为该行业的“去向”。如果一个行业中出现了多于一个的可行交换，则它们相应的主人可以决定在他们的交换之间合作并“跨隧道”交易一些(但可能不是全部)资产，以达到临界量。In some embodiments, tunneling of datasets can be used to create public or private "industry-wide" data exchanges. Many different entities may tunnel datasets to a "large ecosystem data exchange". If data exchange for private ecosystems really takes off, it could become so large and influential that it could become the standard place for data exchange, collaboration, and monetization across the industry. Each industry may have room for one or two "large ecosystem data exchanges." Once any one exchange gains significant traction, it has the potential to be the "go-to" for the industry. If more than one viable exchange emerges in an industry, their respective owners may decide to cooperate between their exchanges and trade some (but possibly not all) assets "across the tunnel" to reach a critical mass.

虽然行业联盟可以通过隧穿主持此类交换，但每个行业中的一个或两个大型参与者很有可能会快速而广泛地引导生态系统私有数据交换，以成为其行业的实际数据交换。这为希望成为其行业数据方面主要参与者的公司提供了巨大的动力，使其可以尽快建立内部数据交换，并且然后迅速向其供应商、顾客和合作伙伴开放。While industry consortia could host such exchanges through tunneling, it is very likely that one or two large players in each industry will quickly and broadly lead the ecosystem-private data exchange to become the de facto data exchange for their industry. This provides a huge incentive for companies wishing to become major players in their industry data to establish an internal data exchange as quickly as possible, and then quickly open it up to their suppliers, customers and partners.

图12是示出根据本发明一些实施例的示例数据查询和递送服务1200的图。数据查询和递送服务1200示出了数据提供者能够共享数据的四种方式。第一种方式是通过数据交换900。数据交换900可以是公共数据交换或私有数据交换。数据提供者1210可以根据本文参考图2、图4和图8描述的方法和系统列出1211数据交换上的数据。数据消费者1220可以如本文所讨论的通过接受来自数据提供者的邀请或如本文参考图8所讨论的通过请求1212访问列表来访问列表中的数据。共享数据的第二种方式可以是在1213处直接共享数据。这可以是如参考图4所讨论的点对点共享，或者可以是使用本文讨论的安全数据共享方法实现的任何其它合适类型的共享。注意，数据提供者1210和数据消费者1220二者都是云计算服务112的用户。如果数据提供者1210希望与非用户1230共享数据，则这可以作为与读取器账户1215a或与读取器账户/写入器账户1215b共享数据的第三种方式。这里，非用户可能需要具有读取器账户，但可能不必是云计算服务112的实际用户。读取器账户可以允许非用户1230查看数据，但对该数据不做任何其它处理。最后，共享数据的第四种方式是经由文件拖放到云存储设备1214。这里，数据提供者1210可以复制数据集1216，并且可以允许另一个非用户1230具有数据集1216。共享数据的该方式可能不允许数据提供者1210保留对数据集的控制。因此，使用第四种方式，非用户1230能够查看、操纵和重新共享数据。Figure 12 is a diagram illustrating an example data query and delivery service 1200 in accordance with some embodiments of the present invention. Data query and delivery service 1200 illustrates four ways in which data providers can share data. The first way is through data exchange 900 . Data exchange 900 may be a public data exchange or a private data exchange. The data provider 1210 may list 1211 the data on the data exchange according to the methods and systems described herein with reference to FIGS. 2 , 4 and 8 . The data consumer 1220 may access the data in the list by accepting an invitation from the data provider as discussed herein or by requesting 1212 to access the list as discussed herein with reference to FIG. 8 . A second way to share data may be to share data at 1213 directly. This may be point-to-point sharing as discussed with reference to Figure 4, or may be any other suitable type of sharing implemented using the secure data sharing methods discussed herein. Note that both data provider 1210 and data consumer 1220 are users of cloud computing service 112 . If the data provider 1210 wishes to share data with a non-user 1230, this can be used as a third way to share data with the reader account 1215a or with the reader account/writer account 1215b. Here, a non-user may need to have a reader account, but may not necessarily be an actual user of the cloud computing service 112 . The reader account may allow non-users 1230 to view the data, but not do any other processing of the data. Finally, a fourth way to share data is via drag and drop of files to cloud storage 1214. Here, the data provider 1210 may replicate the dataset 1216 and may allow another non-user 1230 to have the dataset 1216. This way of sharing data may not allow the data provider 1210 to retain control over the data set. Thus, using the fourth approach, non-users 1230 are able to view, manipulate and re-share the data.

全局数据共享global data sharing

如上所述，私有数据交换在云计算服务的一个数据区域内或在单个云计算服务内使用。顾客可能希望能够跨云计算平台的多个区域和/或跨多个云计算平台拥有一个或更多个列表。为了让数据提供者跨多个云计算平台区域和/或多个云计算平台共享数据集，数据提供者需要在不同区域中设置账户，登录到每个账户以设置复制，并使用任务来刷新。数据提供者需要与目标区域中的消费者共享，并需要复制整个数据库。整个过程为数据提供者增加了大量的开销。此外，当虚拟私有云(VPC)顾客想要通过数据交换消费共享数据时，该顾客目前没有方法在其VPC账户内这样做。诸如要求顾客打开多租户账户并将共享数据保存在那里的替代方法给消费者带来了负担。在一个实施例中，多租户账户是系统中的账户，其支持隔离不同顾客/客户端之间以及同一顾客/客户端内的不同用户之间的计算资源和数据。因此，这是困难的。对于数据提供者(以及对消费这些数据的顾客)来说，实现跨区域和跨云数据共享是有用的。As mentioned above, private data exchanges are used within one data area of a cloud computing service or within a single cloud computing service. A customer may wish to be able to have one or more listings across multiple regions of a cloud computing platform and/or across multiple cloud computing platforms. In order for data providers to share datasets across multiple cloud computing platform regions and/or multiple cloud computing platforms, data providers need to set up accounts in different regions, log into each account to set up replication, and use tasks to refresh. The data provider needs to be shared with consumers in the target region, and the entire database needs to be replicated. The whole process adds a lot of overhead to the data provider. Additionally, when a virtual private cloud (VPC) customer wants to consume shared data through data exchange, the customer currently has no way to do so within their VPC account. Alternative methods such as requiring customers to open multi-tenant accounts and keep shared data there place a burden on consumers. In one embodiment, a multi-tenant account is an account in a system that supports isolation of computing resources and data between different customers/clients and between different users within the same customer/client. Therefore, it is difficult. It is useful for data providers (and for customers consuming this data) to enable cross-regional and cross-cloud data sharing.

在一个实施例中，对于全局数据共享，有两种类型的数据：标准的和个性化的。标准数据表示对每个消费者都相同的数据。例如，没有基于消费者账户的行的动态过滤。相比之下，个性化数据表示每个消费者(或一组消费者)独有的数据。在一个实施例中，个性化数据可以具有安全视图，以基于消费者的账户动态地过滤数据行，因此每个消费者看到他们自己的数据切片。在又一实施例中，对于数据提供者，这意味着他们可以针对一些或所有他们的消费者创建视图，而不是针对每个消费者创建视图。In one embodiment, for global data sharing, there are two types of data: standard and personalized. Standard data represents the same data for every consumer. For example, there is no dynamic filtering of rows based on consumer accounts. In contrast, personalization data represents data that is unique to each consumer (or group of consumers). In one embodiment, the personalized data may have a secure view to dynamically filter rows of data based on the consumer's account, so each consumer sees their own slice of the data. In yet another embodiment, for data providers, this means that they can create views for some or all of their consumers, rather than for each consumer.

在一个实施例中，共享数据的列表可以包括与零个或更多个共享相关联的元数据，或者数据库(DB)对象集合。此外，列表可以是免费数据或付费数据。此外，数据提供者可以根据共享数据是否是标准的(其中每个顾客都可以访问相同的共享数据)，或者共享数据是否是个性化的(其中数据是在个人或组的基础上共享的)，授予消费者对共享数据的访问权。对于个性化数据，在一个实施例中，数据提供者将数据的每个顾客添加到权利表中。或者，非个性化数据可能仍然需要某种类型的批准过程，用于顾客访问数据。例如，在一个实施例中，在私有交换中，即使数据本身不是个性化的，但获得对共享的访问权可能需要批准工作流。因此，在一个实施例中，对于标准数据共享，数据提供者不需要在请求实现循环中。发现列表的任何消费者账户都可以访问数据，并使用该数据以从其创建数据库。替代地，对于通过请求共享的数据，数据提供者将需要明确地将消费者添加到共享中。In one embodiment, the list of shared data may include metadata associated with zero or more shares, or a collection of database (DB) objects. Also, listings can be free data or paid data. In addition, data providers can determine whether the shared data is standard (in which every customer has access to the same shared data), or whether the shared data is personalized (in which data is shared on an individual or group basis), Grant consumers access to shared data. For personalization data, in one embodiment, the data provider adds each customer of the data to the entitlement table. Alternatively, non-personalized data may still require some type of approval process for customers to access the data. For example, in one embodiment, in a private exchange, gaining access to a share may require approval workflows even though the data itself is not personalized. Thus, in one embodiment, for standard data sharing, the data provider need not be in the request fulfillment loop. Any consumer account that discovers the list can access the data and use that data to create a database from it. Alternatively, for data shared by request, the data provider will need to explicitly add the consumer to the share.

在另一个实施例中，标准共享可以存在于数据交换(无论是公共还是私有)的上下文中，因为数据交换表示其可用的会员基础以及如何发现列表。在一个实施例中，由数据提供者共享的数据也称为数据共享。在一个实施例中，消费者应该不是真正关心基础共享类型(例如，标准的或按请求(By Request))。此外，消费者始终可以与标准列表交互。在该实施例中，数据提供者必须知道共享才能创建它们。但是，数据交换将处理这些共享的创建。In another embodiment, a standard share may exist in the context of a data exchange (whether public or private) because the data exchange represents its available membership base and how the list is discovered. In one embodiment, data shared by data providers is also referred to as data sharing. In one embodiment, the consumer should not really care about the underlying share type (eg, Standard or By Request). Also, consumers can always interact with standard lists. In this embodiment, the data provider must know the shares in order to create them. However, the data exchange will handle the creation of these shares.

在一个实施例中，根据定义，交换之外的数据共享是按请求共享。此外，数据交换可以包括在数据交换中没有条目的未列出或无法发现的标准列表。用于此的有效会员基础是任何Snowflake顾客。然而，消费者无法发现列表，因为数据提供者会发送列表URL。在该实施例中，查看URL并登录到数据交换的任何人都可以查看数据，并且例如，从共享创建DB。In one embodiment, data sharing outside of exchanges is by definition sharing on request. Additionally, the data exchange may include unlisted or undiscoverable standard lists that have no entries in the data exchange. The active membership base for this is any Snowflake customer. However, the consumer cannot discover the listing because the data provider sends the listing URL. In this embodiment, anyone viewing the URL and logging into the data exchange can view the data and, for example, create a DB from a share.

在一个实施例中，标准列表和按请求列表都可以是免费的或付费的。如果数据共享是免费的，则消费者一旦接受提供者条款，该消费者就可以从共享数据创建DB。如果数据共享是付费的，则消费者可以接受条款并安排付费。然后消费者可以从共享中创建DB。在一个实施例中，对于标准列表，数据可以在区域中可用或不可用：如果区域中的数据尚未可用，消费者仍然点击获取，并可以从共享中创建数据库。在这一点上，数据交换可以让他们知道数据将在一定的时间内可用(例如，如果以秒为单位，可以有可视的倒计时，或者在UI中有进度条指示)。在一个实施例中，个性化数据共享可以是按请求共享或列表的类型。In one embodiment, both standard listings and on-demand listings can be free or paid. If the data sharing is free, once the consumer accepts the provider terms, the consumer can create DBs from the shared data. If the data sharing is paid, the consumer can accept the terms and arrange to pay. Consumers can then create DBs from the share. In one embodiment, for standard listings, the data may or may not be available in the area: if the data in the area is not yet available, the consumer still hits fetch and can create a database from the share. At this point, the data exchange can let them know that the data will be available within a certain amount of time (for example, if it is in seconds, there can be a visual countdown, or a progress bar indication in the UI). In one embodiment, personalized data sharing may be of the type of share on request or list.

图13A是与数据交换共享数据的多个云计算平台的示例系统1300的框图。在图13A中，系统1300包括两个不同的云计算平台1302A-B，其中每个云计算平台1302A-B可以是上面在图1中描述的云计算平台中的一个。每一个云计算平台1302A-B耦合到数据交换1306。在一个实施例中，除了数据交换1306可以支持跨多个云计算平台(诸如云计算平台1302A-B)的数据共享之外，数据交换1306类似于上文描述的数据交换。13A is a block diagram of an example system 1300 of multiple cloud computing platforms sharing data with data exchanges. In FIG. 13A, the system 1300 includes two distinct cloud computing platforms 1302A-B, where each cloud computing platform 1302A-B may be one of the cloud computing platforms described above in FIG. 1 . Each cloud computing platform 1302A-B is coupled to a data exchange 1306. In one embodiment, data exchange 1306 is similar to the data exchange described above, except that data exchange 1306 may support data sharing across multiple cloud computing platforms, such as cloud computing platforms 1302A-B.

在一个实施例中，云计算平台1302A包括数据提供者共享1304。在该实施例中，数据提供者经由数据交换1306共享其数据1304的一些或全部，该数据交换1306对两个云计算平台1302A-B中的数据消费者可见。例如，在一个实施例中，处于不同的云计算平台(例如，云计算平台1302B)中的数据消费者1308可以经由数据交换来查看数据提供者共享1304的列表。如果数据消费者1308请求列表，作为响应，数据交换1306可以在云计算平台1302B中为数据提供者共享1304创建提供者账户，并将数据共享复制到云计算平台1302B。利用在云计算平台1302B中复制的数据共享1310，数据消费者1308可以访问数据。在一个实施例中，数据交换1306可以复制整个数据提供者共享1304，或者可以复制数据提供者共享1304中的一些。在一个实施例中，数据提供者指示数据提供者共享1304的要复制的哪些部分。在另一实施例中，数据交换1304推断数据提供者共享1304要复制哪些部分。在该实施例中，数据交换可以推断数据提供者共享的需要复制的对象、这些对象需要复制的频率、和/或消费者账户和对应的提供者次要账户的区域。In one embodiment, cloud computing platform 1302A includes data provider sharing 1304 . In this embodiment, data providers share some or all of their data 1304 via a data exchange 1306, which is visible to data consumers in both cloud computing platforms 1302A-B. For example, in one embodiment, data consumers 1308 in different cloud computing platforms (eg, cloud computing platform 1302B) can view the list of data provider shares 1304 via data exchanges. If data consumer 1308 requests a list, in response, data exchange 1306 may create a provider account for data provider share 1304 in cloud computing platform 1302B and copy the data share to cloud computing platform 1302B. Using the data share 1310 replicated in the cloud computing platform 1302B, the data consumers 1308 can access the data. In one embodiment, data exchange 1306 may replicate the entire data provider share 1304, or may replicate some of the data provider shares 1304. In one embodiment, the data provider indicates which portions of the data provider to share 1304 to replicate. In another embodiment, the data exchange 1304 infers which parts the data provider shares 1304 to copy. In this embodiment, the data exchange may infer the objects shared by the data provider that need to be replicated, the frequency with which those objects need to be replicated, and/or the area of the consumer account and corresponding provider secondary account.

图13B是与跨云计算平台的多个区域的数据交换共享数据的云计算平台的示例系统1320的框图。在图13B中，系统1320包括具有两个不同区域1322A-B的云计算平台，其中每个云计算平台区域1322A-B可以是该云计算平台的不同地理区域(例如，美国西部、美国东部、欧洲、亚洲和/或另一类型的地理区域)。在一个实施例中，每一个云计算平台区域1322A-B耦合到数据交换1306。在一个实施例中，除了数据交换1306可以支持跨多个云计算平台(诸如云计算平台区域1322A-B)的数据共享之外，数据交换1306类似于描述的数据交换。13B is a block diagram of an example system 1320 of a cloud computing platform that shares data with data exchange across multiple regions of the cloud computing platform. In Figure 13B, system 1320 includes a cloud computing platform with two distinct regions 1322A-B, where each cloud computing platform region 1322A-B may be a different geographic region of the cloud computing platform (eg, US West, US East, Europe, Asia and/or another type of geographic region). In one embodiment, each cloud computing platform area 1322A-B is coupled to a data exchange 1306. In one embodiment, data exchange 1306 is similar to that described, except that data exchange 1306 may support data sharing across multiple cloud computing platforms, such as cloud computing platform regions 1322A-B.

在一个实施例中，云计算平台区域1322A包括数据提供者共享1304。在该实施例中，数据提供者经由数据交换1306共享其数据1304的一些或全部，该数据交换1306对两个云计算平台1302A-B中的数据消费者可见。例如，在一个实施例中，处于不同的云计算平台区域(例如，云计算平台区域1302B)中的数据消费者1308可以经由数据交换来查看数据提供者共享1304的列表。如果数据消费者1308请求列表，作为响应，数据交换1306可以在云计算平台1322B中为数据提供者共享1304创建提供者账户，并将数据共享复制到云计算平台区域1322B。利用在云计算平台区域1322B中复制的数据共享1310，数据消费者1308可以访问数据。在一个实施例中，数据交换1306可以复制整个数据提供者共享1304，或者可以复制数据提供者共享1304中的一些。在一个实施例中，数据提供者指示数据提供者共享1304的要复制哪些部分。在另一实施例中，数据交换1304推断数据提供者共享1304要复制哪些部分。在该实施例中，数据交换可以推断数据提供者共享的需要复制的对象、这些对象需要复制的频率、和/或消费者账户和对应的提供者次要账户的区域。In one embodiment, cloud computing platform area 1322A includes data provider shares 1304 . In this embodiment, data providers share some or all of their data 1304 via a data exchange 1306, which is visible to data consumers in both cloud computing platforms 1302A-B. For example, in one embodiment, data consumers 1308 in different cloud computing platform regions (eg, cloud computing platform region 1302B) may view a list of data provider shares 1304 via a data exchange. If data consumer 1308 requests a list, in response, data exchange 1306 may create a provider account for data provider share 1304 in cloud computing platform 1322B and copy the data share to cloud computing platform area 1322B. Using the data share 1310 replicated in the cloud computing platform area 1322B, the data consumers 1308 can access the data. In one embodiment, data exchange 1306 may replicate the entire data provider share 1304, or may replicate some of the data provider shares 1304. In one embodiment, the data provider indicates which portions of the data provider share 1304 are to be copied. In another embodiment, the data exchange 1304 infers which parts the data provider shares 1304 to copy. In this embodiment, the data exchange may infer the objects shared by the data provider that need to be replicated, the frequency with which those objects need to be replicated, and/or the area of the consumer account and corresponding provider secondary account.

在一个实施例中，存在用于跨部署的数据共享的不同类型的用例，诸如跨区域、跨云和/或以及进入/离开VPC。例如，在一个实施例中，数据提供者可以在数据交换中或在数据交换之外提供非个性化的按请求数据共享，其中消费者可以处于不同的部署中。在一个实施例中，这是某种其他类型可以构建在其上的基本构建块。另一种用例类型是用于许多不同消费者的标准列表的数据提供者(例如，公共或私有交换上的天气数据)。第三种用例类型可以是个性化共享/列表的数据提供者。这种用例类型可以在公共数据交换上，也可以在私有数据交换中，或者在交换之外(只是数据共享)。最后，一个用例类型可以是VPC上的消费者想要访问云计算平台中的数据共享。在一个实施例中，额外的细微差别是私有数据交换增添了由数据交换管理员添加提供者。理想情况下，数据交换管理员希望不必培训提供者来管理设置复制以将数据从该提供者共享到可能位于不同区域/云上的其他私有交换成员的复杂性。In one embodiment, there are different types of use cases for data sharing across deployments, such as across regions, across clouds and/or as well as entering/leaving a VPC. For example, in one embodiment, a data provider may provide non-personalized, on-demand data sharing within or outside of a data exchange, where consumers may be in different deployments. In one embodiment, this is a basic building block upon which some other type can be built. Another type of use case is a data provider for a standard list of many different consumers (eg weather data on a public or private exchange). A third use case type can be a data provider for personalized shares/lists. This use case type can be on a public data exchange, in a private data exchange, or outside the exchange (just data sharing). Finally, a use case type could be a consumer on a VPC wanting to access data sharing in a cloud computing platform. In one embodiment, an additional nuance is that the private data exchange adds providers by the data exchange administrator. Ideally, a data exchange administrator would like to not have to train a provider to manage the complexities of setting up replication to share data from that provider to other private exchange members that may be on different regions/clouds.

在一个实施例中，预订数据提供者想要与营销数据公司(其主要账户在相同的云计算平台上，但在不同的地区，例如美国-东部)共享其数据存储在该云计算平台的美国-西部区域上的选定表和/或视图。在该实施例中，预订数据提供者是营销数据提供者的顾客。营销数据提供者的主要目标是处理这些信息并将其并入到营销数据提供者的产品中。在一个实施例中，成本应该影响营销数据提供者的数据交换账户，但不影响预订数据提供者的账户。在该实施例中，营销数据提供者的次要目标是将某些视图从营销数据提供者共享到预订数据提供者。在这种情况下，预订数据提供者是营销数据提供者的顾客，因此营销数据提供者不想让预订数据提供者做这项工作。此外，营销数据提供者具有引用另一个数据库中的对象的视图，因此必须复制两个数据库。在一个实施例中，预订数据提供者确定要在数据交换中列出哪些表和/或视图。响应于营销数据提供者经由数据交换请求数据共享，数据交换在云计算提供者的美国-东部区域中创建账户，并复制预订数据提供者的相关数据共享(例如，由预订数据提供者指示的主数据共享以及从属数据)。下面的图14进一步描述了跨云计算平台和/或区域的共享数据。In one embodiment, a subscription data provider wants to share with a marketing data company whose primary account is on the same cloud computing platform, but in a different region, such as US-East, where its data is stored in the United States of America on the cloud computing platform - Selected tables and/or views on the western region. In this embodiment, the subscription data provider is a customer of the marketing data provider. The main objective of the Marketing Data Provider is to process and incorporate this information into the Marketing Data Provider's product. In one embodiment, the cost should affect the marketing data provider's data exchange account, but not the subscription data provider's account. In this embodiment, a secondary goal of the marketing data provider is to share certain views from the marketing data provider to the subscription data provider. In this case, the booking data provider is a customer of the marketing data provider, so the marketing data provider does not want the booking data provider to do the work. Additionally, the marketing data provider has views that reference objects in another database, so both databases must be replicated. In one embodiment, the subscription data provider determines which tables and/or views to list in the data exchange. In response to a marketing data provider requesting a data share via the data exchange, the data exchange creates an account in the cloud computing provider's US-East region and replicates the subscription data provider's relevant data share (e.g., the master indicated by the subscription data provider). data sharing and subordinate data). Figure 14 below further describes the sharing of data across cloud computing platforms and/or regions.

在又一实施例中，在私人交换中，非营利研究机构想要成员必须请求访问的列表。在该实施例中，这些请求触发具有云计算平台的工作流，一旦完成批准，每个消费者就被添加。这是私有交换中的一个常见场景，其中数据不是个性化的，但消费者需要通过批准工作流。下面的图14进一步描述了使用批准工作流跨云计算平台和/或区域共享数据。In yet another embodiment, in a private exchange, the nonprofit research organization wants a list that members must request access to. In this embodiment, these requests trigger a workflow with a cloud computing platform, and once approval is complete, each consumer is added. This is a common scenario in private exchanges, where the data is not personalized, but the consumer needs to go through an approval workflow. Figure 14 below further describes the use of an approval workflow to share data across cloud computing platforms and/or regions.

在另一实施例中，数据提供者希望跨不同的云计算平台和/或区域为消费者复制定制的数据共享集。在本实施例中，数据提供者只希望复制其数据库的某些表。运输分析公司的数据在云计算服务中，其中每个表包含特定数据集的数据。该公司基于顾客订阅的数据集与顾客共享特定的数据集。运输分析公司希望独立于如何创建共享对他们在云计算服务中的数据建模。然后，当运输分析公司为特定顾客创建共享时，他们只希望复制那些表。下面的图14进一步描述了使用批准工作流跨云计算平台和/或区域共享数据。In another embodiment, a data provider wishes to replicate a customized set of data shares for consumers across different cloud computing platforms and/or regions. In this embodiment, the data provider only wishes to replicate certain tables of its database. The transportation analytics company's data is in a cloud computing service, where each table contains data for a specific dataset. The company shares specific datasets with customers based on the datasets they subscribe to. Transportation analytics companies want to model their data in cloud computing services independently of how the shares are created. Then, when a shipping analytics company creates a share for a specific customer, they only want to replicate those tables. Figure 14 below further describes the use of an approval workflow to share data across cloud computing platforms and/or regions.

在一个实施例中，公共交换上的标准列表可以表示数据交换和/或云计算服务的增加的机会。付费标准共享通过货币化呈现了新的收入来源。在一个实施例中，消费者能够尽可能立即地获得免费或付费列表的体验。由于提供者没有明确地将消费者添加到共享中，因此它们不在用户流中。此外，数据提供者也表示，基于需求产生复制成本也是理想的。例如，在一个实施例中，天气数据提供者已经使其数据的免费子集作为免费标准列表可用，并且希望具有付费标准列表，其中顾客可以立即支付并获得预定义的包。第三，对于更多的定制包，他们希望消费者与他们联系，且他们将与消费者建立直接共享。在这种情况下，最有可能的情况是免费列表在(几乎)所有地区都可用。对于付费列表，只要该部署中至少有一个付费用户，或者如果云计算服务覆盖复制成本，数据提供者就会希望其在任何部署上都可用。注意，虽然免费共享与付费共享可以设置为行上的过滤器(一个共享内的安全视图)或设置为完全单独的共享，但一种情况是这些是两个单独的共享。例如，零售分析提供者针对免费和付费的不同的对象集合创建不同共享。在付费共享中，该数据提供者将具有让消费者选择他们想要哪些行以创建动态创建的包的能力。例如，消费者将通过数据交换用户界面选择他们只想要的针对“Type＝McDonald's”和“State＝CA”的位置数据，因此只为他们获得的行付费。In one embodiment, a standard listing on a public exchange may represent an increased opportunity for data exchange and/or cloud computing services. Paid standard sharing presents new revenue streams through monetization. In one embodiment, the consumer can experience the free or paid listing as immediately as possible. Since the provider does not explicitly add the consumer to the share, they are not in the user flow. In addition, data providers have also stated that it would be ideal to incur replication costs based on demand. For example, in one embodiment, a weather data provider has made a free subset of its data available as a free standard listing, and would like to have a paid standard listing where customers can pay for and get a predefined package right away. Third, for more customized packages, they want consumers to connect with them, and they will establish direct sharing with consumers. In this case, the most likely scenario is that the free listing is available in (almost) all regions. For a paid list, the data provider will want it to be available on any deployment as long as there is at least one paying user in that deployment, or if the cloud computing service covers the cost of replication. Note that while free shares and paid shares can be set as filters on the row (safe view within a share) or as completely separate shares, one case is that these are two separate shares. For example, a retail analytics provider creates different shares for different sets of objects for free and paid. In a paid share, this data provider would have the ability to let consumers choose which rows they want to create dynamically created packages. For example, consumers will select only the location data they want for "Type=McDonald's" and "State=CA" through the data exchange user interface, and thus only pay for the rows they get.

在一个实施例中，可以使用上述数据复制来处理各种场景。例如，在一个实施例中，品牌数据提供者正在云计算服务上构建其数据驱动营销解决方案。他们将在公开交换上提供标准和个性化共享。他们拥有数百TB的数据，并且可以与数百个客户端共享，其中大多数客户端都是云计算服务的新手。基于客户端在其平台上购买的数据集，他们自动将客户端ID插入云计算服务中的权利表中，并将他们的消费者账户添加到共享中。他们想要该自动共享流水线能够在很少或不需要人力的情况下跨区域和云工作。此外，品牌数据提供者也可能希望将数据共享到VPC中。In one embodiment, the data replication described above can be used to handle various scenarios. For example, in one embodiment, a brand data provider is building its data-driven marketing solution on cloud computing services. They will provide standard and personalized sharing on an open exchange. They have hundreds of terabytes of data and can share it with hundreds of clients, most of whom are new to cloud computing services. Based on the datasets clients purchase on their platform, they automatically insert the client ID into the entitlement table in the cloud computing service and add their consumer account to the share. They want this automated shared pipeline to work across regions and clouds with little or no human effort. Additionally, brand data providers may also wish to share data into a VPC.

顾客关系公司构建移动营销活动，并经由数据交换与顾客共享结果。在一个示例中，活动数据至少每15分钟到达他们的50TB事件表，特定于消费者的数据将与消费者共享。此外，该公司想要给他们的消费者15分钟的延迟(或更短)。Customer relationship companies build mobile marketing campaigns and share results with customers via data exchange. In one example, activity data arrives at their 50TB event table at least every 15 minutes, and consumer-specific data will be shared with consumers. Additionally, the company wants to give their consumers a 15-minute delay (or less).

设备制造商想要与200家经销商共享机器健康数据，以便他们可以采取纠正措施。数据量不大，但全天都有数据出炉，并且他们想要给他们的经销商延迟和/或新鲜度保证。“我期望远程数据共享是连续的数据流，而不是批量模型。”该公司将基于将经销商映射到设备的权利表来设置安全视图，使得每个经销商只看到与他们相关的行。An equipment manufacturer wants to share machine health data with 200 dealers so they can take corrective action. Not a huge amount of data, but there is data coming out throughout the day, and they want to give their dealers latency and/or freshness guarantees. "I would expect remote data sharing to be a continuous stream of data, not a batch model." The company will set up secure views based on entitlement tables that map dealers to devices, so that each dealer only sees the rows relevant to them.

在另一个示例和实施例中，VPC顾客想要消费来自多租户部署中的数据提供者的共享。在一个实施例中，VPC是在公共云环境内分配的共享计算资源的按需可配置池，提供使用资源的不同组织之间的隔离级别。例如，金融公司想要消费来自营销公司这样的公司的营销数据。但是，当直接消费来自这些数据源的数据时，需要付出相当大的努力来摄取数据、管理和维护摄取过程以及更改基础模式的管理。解决通常被称为“数据摄取的第一英里问题”的问题可以提供巨大的益处。金融公司可以在从交换摄取数据之前，让第三方对数据进行处理。此外，金融公司具有多个业务部门(BU)，其对共享数据有不同的需求和可见性要求。因此，当金融公司向其内部顾客(BU)共享第三方公司提供的数据时，其需要应用细粒度安全控制的能力。此外，任何要考虑的解决方案都必须易于使用(没有额外的编码和/或工程时间)、稳健(没有脆弱的正在进行的或维护过程)。下面的图16进一步描述了处理这种类型的工作流。In another example and embodiment, a VPC customer wants to consume shares from data providers in a multi-tenant deployment. In one embodiment, a VPC is an on-demand configurable pool of shared computing resources allocated within a public cloud environment, providing a level of isolation between different organizations using the resources. For example, a financial company wants to consume marketing data from a company like a marketing company. However, when consuming data from these data sources directly, considerable effort is required to ingest the data, manage and maintain the ingestion process, and change the management of the underlying schema. Solving what is often referred to as the "first mile problem of data ingestion" can provide enormous benefits. Financial firms can have third parties process the data before ingesting it from the exchange. Additionally, financial companies have multiple business units (BUs) with different needs and visibility requirements for shared data. Therefore, when a financial company shares data provided by a third-party company with its internal customers (BU), it needs the ability to apply fine-grained security controls. Furthermore, any solution to be considered must be easy to use (no additional coding and/or engineering time), robust (no fragile ongoing or maintenance processes). Figure 16 below further describes processing this type of workflow.

图14是用于与云计算服务共享跨多个云计算服务和/或跨多个区域的数据的方法1400的过程流程图。通常，方法1400可以由处理逻辑来执行，该处理逻辑可以包括硬件(例如，处理设备、电路、专用逻辑、可编程逻辑、微码、设备的硬件、集成电路等)、软件(例如，在处理设备上运行或执行的指令)或其组合。例如，处理逻辑可以被实现为交换管理器124。方法1400可以在步骤1402处开始，其中处理逻辑接收用于跨不同的云计算平台或具有多个区域的云计算平台的数据共享的指示。在一个实施例中，数据提供者可以创建对所有顾客可用的标准列表、通过请求可用的数据共享、和/或如上所述的其他类型的数据共享。下面的图15进一步描述了创建列表。在块1404处，处理逻辑接收对数据共享的列表的请求。在一个实施例中，该请求与作为云计算平台或云计算平台区域的一部分的账户的顾客账户相关联。例如，在一个实施例中，该请求可以与不同的云计算平台和/或云计算平台区域的顾客账户相关联，该不同的云计算平台和/或云计算平台区域不同于与列表相关联的云计算平台和/或云计算平台区域。处理逻辑确定在与列表请求相关联的云计算平台和/或云计算平台区域中是否允许提供者顾客账户。例如，在一个实施例中，提供者可以具有不允许超出某个区域的信息(例如，安全数据)。如果否，则执行进行到块1406，其中返回错误。如果提供者顾客账户被允许，则处理逻辑在块1410处创建顾客账户列表请求。在一个实施例中，如果请求是针对处于不同的云计算平台和/或云计算平台区域中的消费者，则处理逻辑在该云计算平台和/或云计算平台区域中创建提供者账户，因此数据提供者可以与请求的顾客共享数据。14 is a process flow diagram of a method 1400 for sharing data across multiple cloud computing services and/or across multiple regions with cloud computing services. Generally, method 1400 may be performed by processing logic, which may include hardware (eg, processing devices, circuits, special purpose logic, programmable logic, microcode, hardware of devices, integrated circuits, etc.), software (eg, processing instructions to run or execute on the device) or a combination thereof. For example, the processing logic may be implemented as exchange manager 124 . Method 1400 may begin at step 1402, where processing logic receives an indication for data sharing across different cloud computing platforms or cloud computing platforms having multiple regions. In one embodiment, the data provider may create a standard list available to all customers, data sharing available by request, and/or other types of data sharing as described above. Creating a list is further described in Figure 15 below. At block 1404, processing logic receives a request for a list of data shares. In one embodiment, the request is associated with a customer account of an account that is part of the cloud computing platform or region of the cloud computing platform. For example, in one embodiment, the request may be associated with a customer account for a different cloud computing platform and/or cloud computing platform region than the one associated with the listing Cloud computing platform and/or cloud computing platform region. Processing logic determines whether a provider customer account is allowed in the cloud computing platform and/or cloud computing platform region associated with the listing request. For example, in one embodiment, a provider may have information (eg, security data) that is not allowed outside a certain area. If not, execution proceeds to block 1406, where an error is returned. If the provider customer account is allowed, processing logic creates a customer account listing request at block 1410 . In one embodiment, if the request is for a consumer in a different cloud computing platform and/or cloud computing platform region, processing logic creates a provider account in that cloud computing platform and/or cloud computing platform region, thus The data provider can share the data with the requesting customer.

在块1412处，处理逻辑共享数据。在一个实施例中，处理逻辑通过将数据复制到与发出原始列表请求的消费者相关联的云计算平台和/或云计算平台区域来共享数据。在一个实施例中，处理逻辑可以复制整个数据共享或数据共享的部分。在该实施例中，处理逻辑可以基于请求的消费者的特征(地理、时间等)推断数据的哪些部分将被共享。在又一实施例中，处理逻辑可以基于消费者定制数据(例如，付费数据表示与未付费数据相对的一个视图，共享的数据基于消费者的区域、消费者的隶属关系和/或其他类型的特征)。在块1414处，处理逻辑设置用于频率复制的任务。在一个实施例中，通过设置这些任务，可以周期性地刷新与不同的云计算平台和/或云计算平台区域共享的数据，使得数据提供者或消费者不需要手动刷新数据。At block 1412, processing logic shares data. In one embodiment, processing logic shares data by copying the data to the cloud computing platform and/or cloud computing platform region associated with the consumer that issued the original listing request. In one embodiment, the processing logic may replicate the entire data share or portions of the data share. In this embodiment, the processing logic may infer which portions of the data are to be shared based on the characteristics of the requesting consumer (geography, time, etc.). In yet another embodiment, processing logic may be based on customer-customized data (eg, paid data represents a view as opposed to unpaid data, shared data is based on consumer's region, consumer's affiliation, and/or other types of feature). At block 1414, processing logic sets up tasks for frequency replication. In one embodiment, by setting up these tasks, data shared with different cloud computing platforms and/or cloud computing platform regions can be refreshed periodically so that data providers or consumers do not need to manually refresh the data.

在图14中，消费者请求跨不同的云计算平台和/或云计算平台区域的共享数据。在一个实施例中，数据提供者可以创建允许跨不同的云计算平台和/或云计算平台区域复制数据的列表。图15是用于在数据交换中创建列表的方法1500的过程流程图，其中列表在不同的云计算服务和/或在具有云计算服务的多个区域中可用。例如，处理逻辑可以被实现为交换管理器124。方法1500可以在步骤1502处开始，其中处理逻辑从数据提供者接收创建数据列表的请求。在一个实施例中，数据列表可以包括列表类型(例如，标准的或按请求，无论是免费的还是付费的列表，要共享什么数据(例如，要共享数据库的哪些表、行等)、任何共享限制、数据提供者信息，和/或用于列表的其他信息。在块1504处，处理逻辑在数据交换中创建列表。处理逻辑可以基于要共享的数据和用于列表的云计算实体(例如，云计算平台和/或云计算平台区域)的特征，预先复制共享数据。例如，在一个实施例中，处理逻辑可以将标准数据共享到存在现有消费者的不同的云计算平台，或者可以基于地理区域预先共享数据(例如，基于云计算平台区域共享天气数据)和/或其他特征。在块1508处，处理逻辑设置用于频率复制的任务。在一个实施例中，通过设置这些任务，可以周期性地刷新与不同云计算平台和/或云计算平台区域共享的数据，使得数据提供者或消费者不需要手动刷新数据。In Figure 14, a consumer requests shared data across different cloud computing platforms and/or regions of cloud computing platforms. In one embodiment, a data provider may create a list that allows data to be replicated across different cloud computing platforms and/or cloud computing platform regions. 15 is a process flow diagram of a method 1500 for creating a list in a data exchange where the list is available in different cloud computing services and/or in multiple regions with cloud computing services. For example, the processing logic may be implemented as exchange manager 124 . Method 1500 may begin at step 1502, where processing logic receives a request from a data provider to create a data list. In one embodiment, the list of data may include the type of list (eg, standard or on-demand, whether free or paid, what data to share (eg, which tables, rows, etc. of the database to share), any sharing restrictions, data provider information, and/or other information for the listing. At block 1504, processing logic creates the listing in a data exchange. The processing logic may be based on the data to be shared and the cloud computing entity used for the listing (e.g., Cloud computing platform and/or cloud computing platform region) features, pre-replicate shared data. For example, in one embodiment, processing logic can share standard data to different cloud computing platforms where existing consumers exist, or can be based on Geographically pre-shared data (e.g., based on cloud computing platform regional shared weather data) and/or other features. At block 1508, processing logic sets tasks for frequency replication. In one embodiment, by setting these tasks, it is possible to Data shared with different cloud computing platforms and/or cloud computing platform regions is refreshed periodically so that data providers or consumers do not need to manually refresh the data.

在一个实施例中，另一种类型的共享模型是其中数据提供者在与消费者共享数据之前向一个或更多个第三方实体提供共享数据的模型。这可以用于针对消费者使共享数据个性化。图16是用于在数据交换中为个性化共享创建列表的方法1600的过程流程图，其中该列表在不同的云计算服务中和/或在具有云计算服务的多个区域中可用。例如，处理逻辑可以被实现为交换管理器124。方法1600可以在步骤1602处开始，其中在块1602处，处理逻辑从数据提供者接收创建列表的请求。在一个实施例中，数据提供者可以创建对所有顾客可用的标准列表、通过请求可用的数据共享、和/或如上所述的其他类型的数据共享。在又一实施例中，列表可以包括其中该列表可以是可见的云计算平台和/或云计算平台区域的集合。列表可以在所有可能的云计算区域和/或云计算平台云中可见，也可以是所有可能的云计算区域和/或云计算平台云的子集。此外，列表可以包括其他信息(例如，数据的允许消费者)。在块1604处，处理逻辑在数据交换中创建列表。在一个实施例中，处理逻辑通过创建权利映射来创建列表，该权利映射将消费者标识符映射到数据提供者。此外，列表可以包括数据的安全视图，其被添加到共享数据中。In one embodiment, another type of sharing model is one in which the data provider provides the shared data to one or more third party entities before sharing the data with the consumer. This can be used to personalize shared data for consumers. 16 is a process flow diagram of a method 1600 for creating a list for personalized sharing in a data exchange, where the list is available in different cloud computing services and/or in multiple regions with cloud computing services. For example, the processing logic may be implemented as exchange manager 124 . The method 1600 may begin at step 1602, where at block 1602, processing logic receives a request from a data provider to create a list. In one embodiment, the data provider may create a standard list available to all customers, data sharing available by request, and/or other types of data sharing as described above. In yet another embodiment, the list may include a set of cloud computing platforms and/or cloud computing platform areas where the list may be visible. The list may be visible in all possible cloud computing regions and/or cloud computing platform clouds, or may be a subset of all possible cloud computing regions and/or cloud computing platform clouds. Additionally, the list may include other information (eg, allowed consumers of the data). At block 1604, processing logic creates a list in the data exchange. In one embodiment, processing logic creates the list by creating a rights map that maps consumer identifiers to data providers. Additionally, the list can include safe views of the data, which are added to the shared data.

在块1606处，处理逻辑确定哪个第三方账户将接收数据。在一个实施例中，在与消费者共享数据之前，可以将共享数据复制到另一方进行处理。在块1608处，处理逻辑确定哪些对象将被复制到潜在的第三方。在块1610处，处理逻辑将数据复制到第三方账户。在块1612处，处理逻辑为潜在消费者确定共享数据的安全视图。在一个实施例中，安全视图用于为潜在消费者创建访问共享数据的安全方式。在该实施例中，对于不同的潜在消费者、消费者组可以有不同的安全视图，或者对于所有潜在消费者可以有相同的安全视图。At block 1606, processing logic determines which third party account is to receive the data. In one embodiment, the shared data may be copied to another party for processing prior to sharing the data with the consumer. At block 1608, processing logic determines which objects are to be copied to potential third parties. At block 1610, processing logic copies the data to the third party account. At block 1612, processing logic determines a safe view of the shared data for potential consumers. In one embodiment, the security view is used to create a secure way for potential consumers to access shared data. In this embodiment, there may be different security views for different potential consumers, groups of consumers, or the same security view for all potential consumers.

在块1614处，处理逻辑可以基于要共享的数据和用于列表的云计算实体(例如，云计算平台和/或云计算平台区域)的特征，预先地复制共享数据和权利表。例如，在一个实施例中，处理逻辑可以将标准数据共享到存在现有消费者的不同的云计算平台，或者可以基于地理区域预先共享数据(例如，基于云计算平台区域共享天气数据)和/或其他特征。At block 1614, processing logic may pre-copy the shared data and entitlement table based on the data to be shared and the characteristics of the cloud computing entity (eg, cloud computing platform and/or cloud computing platform region) used for the listing. For example, in one embodiment, processing logic may share standard data to different cloud computing platforms where existing consumers exist, or may pre-share data based on geographic area (eg, share weather data based on cloud computing platform area) and/or or other characteristics.

在块1616处，处理逻辑接收列表请求。在一个实施例中，该请求与作为云计算平台和/或云计算区域的一部分的账户的消费者账户相关联。例如，在一个实施例中，该请求可以与在不同的云计算平台和/或云计算平台区域上的消费者账户相关联，该不同的云计算平台和/或云计算平台区域不同于与该列表相关联的云计算平台和/或云计算平台区域。处理逻辑确定提供者顾客账户是否被允许在云计算平台、云计算平台区域和/或与列表请求相关联的VPC中。例如，在一个实施例中，提供者可以具有不允许超出某个区域的信息(例如，数据的安全性、个人可识别信息、政府限制和/或其他类型的限制)。如果不是，处理逻辑将返回错误。如果提供者顾客账户被允许，则处理逻辑在块1618处创建顾客账户列表请求。在一个实施例中，如果请求是针对处于不同的云计算平台、云计算平台区域和/或VPC中的消费者，则处理逻辑在该云计算平台和/或云计算平台区域中创建提供者账户，因此数据提供者可以与请求的顾客共享数据。At block 1616, processing logic receives a listing request. In one embodiment, the request is associated with a consumer account of an account that is part of the cloud computing platform and/or cloud computing region. For example, in one embodiment, the request may be associated with a consumer account on a different cloud computing platform and/or cloud computing platform region than the one associated with the List associated cloud computing platforms and/or cloud computing platform regions. Processing logic determines whether the provider customer account is allowed in the cloud computing platform, cloud computing platform region, and/or VPC associated with the listing request. For example, in one embodiment, a provider may have information that is not allowed outside a certain area (eg, security of data, personally identifiable information, government restrictions, and/or other types of restrictions). If not, the processing logic will return an error. If the provider customer account is allowed, processing logic creates a customer account listing request at block 1618 . In one embodiment, if the request is for a consumer in a different cloud computing platform, cloud computing platform region, and/or VPC, processing logic creates a provider account in that cloud computing platform and/or cloud computing platform region , so the data provider can share the data with the requesting customer.

在块1620处，处理逻辑使用与消费者相关联的安全视图共享数据。在一个实施例中，处理逻辑通过使用安全视图将数据复制到与发出原始列表请求的消费者相关联的云计算平台和/或云计算平台区域来共享数据。在一个实施例中，处理逻辑可以复制整个数据共享或数据共享的部分。在该实施例中，处理逻辑可以基于请求的消费者的特征(地理、时间等)推断数据的哪些部分将被共享。在块1622处，处理逻辑设置用于频率复制的任务。在一个实施例中，通过设置这些任务，可以周期性地刷新与云计算平台和/或云计算平台区域共享的数据，使得数据提供者或消费者不需要手动刷新数据。At block 1620, processing logic shares data using the secure view associated with the consumer. In one embodiment, processing logic shares data by copying the data to the cloud computing platform and/or cloud computing platform region associated with the consumer that issued the original listing request using a secure view. In one embodiment, the processing logic may replicate the entire data share or portions of the data share. In this embodiment, the processing logic may infer which portions of the data are to be shared based on the characteristics of the requesting consumer (geography, time, etc.). At block 1622, processing logic sets up tasks for frequency replication. In one embodiment, by setting these tasks, data shared with the cloud computing platform and/or cloud computing platform region can be refreshed periodically so that data providers or consumers do not need to manually refresh the data.

图17是用于与VPC共享数据的方法1700的处理流程图。例如，处理逻辑可以被实现为交换管理器124。方法1700可以在步骤1702处开始，其中处理逻辑使用VPC从消费者接收列表请求。在一个实施例中，处理逻辑使用事先创建的VPC的账户来共享数据。在块1704处，处理逻辑使用该账户将数据复制到消费者的VPC。在一个实施例中，处理逻辑可以复制整个数据共享或数据共享的部分。在该实施例中，处理逻辑可以基于请求的消费者的特征(地理、时间等)推断数据的哪些部分将被共享。在该实施例中，消费者通过与VPC账户相关联的用户界面看到数据共享。消费者可以从共享数据创建数据库。在块1706处，处理逻辑设置用于频率复制的任务。在一个实施例中，通过设置这些任务，可以周期性地刷新与VPC共享的数据，使得数据提供者或消费者不需要手动刷新数据。17 is a process flow diagram of a method 1700 for sharing data with a VPC. For example, the processing logic may be implemented as exchange manager 124 . The method 1700 may begin at step 1702, where processing logic receives a listing request from a consumer using the VPC. In one embodiment, the processing logic uses a pre-created VPC account to share data. At block 1704, processing logic uses the account to replicate the data to the consumer's VPC. In one embodiment, the processing logic may replicate the entire data share or portions of the data share. In this embodiment, the processing logic may infer which portions of the data are to be shared based on the characteristics of the requesting consumer (geography, time, etc.). In this embodiment, the consumer sees the data sharing through the user interface associated with the VPC account. Consumers can create databases from shared data. At block 1706, processing logic sets up tasks for frequency replication. In one embodiment, by setting up these tasks, the data shared with the VPC can be refreshed periodically so that the data provider or consumer does not need to refresh the data manually.

图18是根据一些实施例的可以执行本文描述的一个或更多个操作的示例计算设备1800的框图。计算设备1800可以连接到LAN、内联网、外联网和/或互联网中的其他计算设备。该计算设备可以在客户端-服务器网络环境中以服务器机器的能力操作或在对等网络环境中以客户端的能力操作。该计算设备可以由个人计算机(PC)、机顶盒(STB)、服务器、网络路由器、交换机或网桥或能够执行指定该机器要采取的动作的指令集(顺序或以其它方式)的任何机器提供。此外，尽管仅示出了单个计算设备，但是术语“计算设备”也应被认为包括单独地或共同地执行一组(或多组)指令以执行本文所讨论的方法的计算设备的任何集合。18 is a block diagram of an example computing device 1800 that can perform one or more operations described herein, according to some embodiments. Computing device 1800 may be connected to other computing devices in a LAN, intranet, extranet, and/or the Internet. The computing device may operate at the capabilities of a server machine in a client-server network environment or at the capabilities of a client in a peer-to-peer network environment. The computing device may be provided by a personal computer (PC), set-top box (STB), server, network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by the machine. Furthermore, although only a single computing device is shown, the term "computing device" should also be considered to include any collection of computing devices that individually or collectively execute a set (or sets) of instructions to perform the methods discussed herein.

示例计算设备1800可以包括处理设备(例如，通用处理器、PLD等)1802、主存储器1804(例如，同步动态随机存取存储器(DRAM)、只读存储器(ROM))、静态存储器1806(例如，闪存和数据存储设备1818)，它们可以经由总线1830彼此通信。Example computing device 1800 may include processing device (eg, general purpose processor, PLD, etc.) 1802, main memory 1804 (eg, synchronous dynamic random access memory (DRAM), read only memory (ROM)), static memory 1806 (eg, Flash memory and data storage devices 1818), which can communicate with each other via bus 1830.

处理设备1802可以由一个或更多个通用处理设备提供，诸如微处理器、中央处理单元等。在说明性示例中，处理设备1802可以包括复杂指令集计算(CISC)微处理器、精简指令集计算(RISC)微处理器、超长指令字(VLIW)微处理器或实现其它指令集的处理器或实现指令集的组合的处理器。处理设备1802还可以包括一个或更多个专用处理设备，诸如专用集成电路(ASIC)、现场可编程门阵列(FPGA)、数字信号处理器(DSP)、网络处理器等。根据本公开的一个或更多个方面，处理设备1802可以被配置为执行本文描述的操作，以执行本文讨论的操作和步骤。在一个实施例中，处理设备1802表示图1的云计算平台110。在另一实施例中，处理设备1802表示客户端设备(例如，客户端设备101-104)的处理设备。Processing device 1802 may be provided by one or more general-purpose processing devices, such as microprocessors, central processing units, and the like. In an illustrative example, processing device 1802 may include a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, or a process implementing other instruction sets A processor or a processor that implements a combination of instruction sets. Processing devices 1802 may also include one or more special-purpose processing devices, such as application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), network processors, and the like. In accordance with one or more aspects of the present disclosure, processing device 1802 may be configured to perform the operations described herein to perform the operations and steps discussed herein. In one embodiment, processing device 1802 represents cloud computing platform 110 of FIG. 1 . In another embodiment, processing device 1802 represents a processing device of a client device (eg, client devices 101-104).

计算设备1800可以进一步包括可以与网络1820通信的网络界面设备1808。计算设备1800还可以包括视频显示单元1810(例如，液晶显示器(LCD)或阴极射线管(CRT))、字母数字输入设备1812(例如键盘)、光标控制设备1814(例如鼠标)和声音信号生成设备1816(例如扬声器)。在一个实施例中，视频显示单元1810、字母数字输入设备1812和光标控制设备1814可以组合成单个组件或设备(例如，LCD触摸屏)。Computing device 1800 may further include a web interface device 1808 that may communicate with network 1820 . Computing device 1800 may also include a video display unit 1810 (eg, a liquid crystal display (LCD) or cathode ray tube (CRT)), an alphanumeric input device 1812 (eg, a keyboard), a cursor control device 1814 (eg, a mouse), and a sound signal generating device 1816 (eg speakers). In one embodiment, video display unit 1810, alphanumeric input device 1812, and cursor control device 1814 may be combined into a single component or device (eg, an LCD touch screen).

数据存储设备1818可以包括计算机可读存储介质1828，根据本公开的一个或更多个方面，可以在该计算机可读存储介质1828上存储一组或更多组指令，例如，用于执行本文所述的操作的指令。在由也构成计算机可读介质的计算设备1800、主存储器1804和处理设备1802执行私有数据交换指令1826期间，私有数据交换指令1426也可以全部或至少部分地驻留在主存储器1804内和/或处理设备1802内。指令可以进一步经由网络界面设备1808通过网络1820发送或接收。Data storage device 1818 may include a computer-readable storage medium 1828 on which one or more sets of instructions may be stored in accordance with one or more aspects of the present disclosure, eg, for performing the procedures described herein. instructions for the operations described. Private data exchange instructions 1426 may also reside wholly or at least partially within main memory 1804 and/or during execution of private data exchange instructions 1826 by computing device 1800, main memory 1804, and processing device 1802, which also constitute computer-readable media. within the processing device 1802. Instructions may further be sent or received over network 1820 via web interface device 1808 .

虽然在说明性示例中将计算机可读存储介质1828示为单个介质，但是术语“计算机可读存储介质”应被理解为包括存储一组或更多组指令的单个介质或多个介质(例如，集中式或分布式数据库和/或关联的缓存和服务器)。术语“计算机可读存储介质”也应被理解为包括能够存储、编码或携带一组指令以由机器执行并且使机器执行本文描述的方法的任何介质。因此，术语“计算机可读存储介质”应被认为包括但不限于固态存储器、光学介质和磁性介质。Although computer-readable storage medium 1828 is shown as a single medium in the illustrative examples, the term "computer-readable storage medium" should be understood to include a single medium or multiple media that store one or more sets of instructions (eg, centralized or distributed databases and/or associated caches and servers). The term "computer-readable storage medium" should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a machine and causing the machine to perform the methods described herein. Accordingly, the term "computer-readable storage medium" shall be considered to include, but not be limited to, solid-state memory, optical media, and magnetic media.

除非另有明确说明，否则诸如“接收(receiving)”、“接收(receiving)”、“创建”、“确定”、“共享”、“提供”、“指定”等的术语是指由计算设备执行或实现的动作和过程，该动作和过程将表示为计算设备的寄存器和存储器内的物理(电子)量的数据操纵和转换为类似表示为计算设备存储器或寄存器或其它此类信息存储、传输或显示设备内的物理量的其它数据。此外，如本文中所使用的，术语“第一”、“第二”、“第三”、“第四”等是指用于区分不同元件的标签，并且不一定具有根据它们的数字名称的序数含义。Terms such as "receiving," "receiving," "creating," "determining," "sharing," "providing," "designating," etc. refer to execution by a computing device, unless expressly stated otherwise. or implementing acts and processes that manipulate and convert data represented as physical (electronic) quantities within the registers and memory of a computing device into similar representations of the computing device memory or registers or other such information storage, transmission or Displays other data of physical quantities within the device. Furthermore, as used herein, the terms "first," "second," "third," "fourth," etc. refer to labels used to distinguish different elements, and do not necessarily have a numerical designation based on them. ordinal meaning.

本文描述的示例还涉及用于执行本文描述的操作的装置。该装置可以被特殊地构造用于所需目的，或者它可以包括由存储在计算设备中的计算机程序选择性地编程的通用计算设备。此类计算机程序可以被存储在计算机可读的非暂态存储介质中。The examples described herein also relate to apparatuses for performing the operations described herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computing device selectively programmed by a computer program stored in the computing device. Such computer programs may be stored in computer-readable non-transitory storage media.

本文描述的方法和说明性示例与任何特定计算机或其它装置不是固有地相关。可以根据本文描述的教导来使用各种通用系统，或者可以证明构造更专用的装置以执行所需的方法步骤是方便的。如上面的描述中所示，各种这些系统的所需结构将出现。The methods and illustrative examples described herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used in accordance with the teachings described herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. As shown in the description above, the required structure for a variety of these systems will appear.

以上描述旨在是说明性的，而不是限制性的。尽管已经参考特定的说明性示例描述了本公开，但是将认识到，本公开不限于所描述的示例。本公开的范围应参考所附权利要求书以及权利要求书所赋予的等效物的全部范围来确定。The above description is intended to be illustrative, not restrictive. Although the present disclosure has been described with reference to specific illustrative examples, it will be appreciated that the present disclosure is not limited to the described examples. The scope of the disclosure should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

如本文中所使用的，单数形式“一(a)”、“一个(an)”和“该(the)”也意图包括复数形式，除非上下文另外明确指出。将进一步理解的是，当在本文中使用时，术语“包括(comprises)”、“包括(comprising)”、“包含(includes)”和/或“包含(including)”指定存在所述特征、整数、步骤、操作、元素和/或组件，但不排除一个或更多个其它特征、整数、步骤、操作、元素、组件和/或其组的存在或添加。因此，本文所使用的术语仅出于描述特定实施例的目的，而无意于进行限制。As used herein, the singular forms "a (a)," "an (an)," and "the (the)" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will be further understood that the terms "comprises", "comprising", "includes" and/or "including" as used herein designate the presence of the stated features, integers , steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. Therefore, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.

还应注意，在某些替代实现中，提到的功能/动作可能不按照图中提到的顺序发生。例如，取决于所涉及的功能/动作，连续示出的两个图实际上可以基本上同时执行，或者有时可以以相反的顺序执行。It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the two figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

尽管以特定顺序描述了方法操作，但是应该理解，可以在所描述的操作之间执行其它操作，可以调节所描述的操作，使得它们在稍微不同的时间发生，或者所描述的操作可以分布在允许以与处理相关联的各种间隔处发生处理操作的系统中。Although method operations are described in a particular order, it should be understood that other operations may be performed between the described operations, the described operations may be adjusted such that they occur at slightly different times, or the described operations may be distributed over a period of time that allows In a system where processing operations occur at various intervals associated with processing.

各种单元、电路或其它组件可以被描述或要求为“被配置为”或“可配置为”执行一个或多个任务。在此类上下文中，短语“被配置为”或“可配置为”用于通过指示单元/电路/组件包括在操作期间执行一个或多个任务的结构(例如，电路)来表示结构。这样，即使当指定的单元/电路/组件当前不工作(例如，未接通)时，也可以说该单元/电路/组件被配置为执行任务，或者可以配置为执行任务。与“被配置为”或“可配置为”语言一起使用的单元/电路/组件包括硬件-例如，电路、存储可执行以实现操作的程序指令的存储器等。声明某个单元/电路/组件“被配置为”执行一个或更多个任务，或者“可配置为”执行一个或更多个任务，显然不旨在针对该单元/电路/组件调用35U.S.C.112，第六段。另外，“被配置为”或“可配置为”可包括由软件和/或固件(例如，FPGA或执行软件的通用处理器)操纵来以能够执行讨论中的任务的方式操作的通用结构(例如，通用电路)。“被配置为”还可包括使制造过程(例如，半导体制造设施)适应于制造适于实现或执行一个或更多个任务的设备(例如，集成电路)。明确表示“可配置为”不适用于空白介质、未编程的处理器或未编程的通用计算机、或未编程的可编程逻辑设备、可编程门阵列或其它未编程的设备，除非伴随有赋予未编程设备被配置为执行所公开的功能的能力的编程的介质。Various units, circuits, or other components may be described or claimed to be "configured to" or "configurable to" perform one or more tasks. In such contexts, the phrases "configured to" or "configurable to" are used to denote structure by indicating that the unit/circuit/component includes structure (eg, circuitry) that performs one or more tasks during operation. In this way, a specified unit/circuit/component can be said to be configured to perform a task, or can be configured to perform a task, even when the specified unit/circuit/component is not currently operational (eg, not turned on). A unit/circuit/component used with the "configured to" or "configurable to" language includes hardware - eg, circuits, memory storing program instructions executable to implement operations, and the like. A statement that a unit/circuit/component is "configured to" perform one or more tasks, or is "configurable to" perform one or more tasks, is clearly not intended to invoke 35U.S.C. for that unit/circuit/component. 112, sixth paragraph. Additionally, "configured to" or "configurable to" may include general-purpose structures (eg, FPGA or general-purpose processor executing software) manipulated by software and/or firmware (eg, FPGA or general-purpose processor executing software) to operate in a manner capable of performing the tasks in question , general circuit). "Configured to" may also include adapting a manufacturing process (eg, a semiconductor fabrication facility) to manufacture a device (eg, an integrated circuit) adapted to perform or perform one or more tasks. It is expressly stated that "configurable to" does not apply to blank media, unprogrammed processors or unprogrammed general-purpose computers, or unprogrammed programmable logic devices, programmable gate arrays, or other unprogrammed devices, unless accompanied by an attribution to unprogrammed A programming device is configured as a programmed medium capable of performing the disclosed functions.

可以利用一种或更多种计算机可用或计算机可读介质的任何组合。例如，计算机可读介质可以包括便携式计算机磁盘、硬盘、随机存取存储器(RAM)设备、只读存储器(ROM)设备、可擦可编程只读存储器(EPROM或闪存)设备、便携式光盘只读存储器(CDROM)、光学存储设备和磁性存储设备中的一种或更多种。可以以一种或多更种编程语言的任何组合来编写用于执行本公开的操作的计算机程序代码。可以将此类代码从源代码编译为适合于将在其上执行代码的设备或计算机的计算机可读汇编语言或机器代码。Any combination of one or more computer-usable or computer-readable media may be utilized. For example, computer readable media may include portable computer magnetic disks, hard disks, random access memory (RAM) devices, read only memory (ROM) devices, erasable programmable read only memory (EPROM or flash memory) devices, portable optical disk read only memory (CDROM), one or more of optical storage devices and magnetic storage devices. Computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages. Such code may be compiled from source code into computer readable assembly language or machine code suitable for the device or computer on which the code will be executed.

实施例也可以在云计算环境中实现。在本说明书和所附权利要求中，“云计算”可以被定义为用于使得能够对可配置计算资源(例如，网络、服务器、存储、应用和服务)的共享池进行普遍、方便、按需的网络访问的模型，其可以快速配置(包括经由虚拟化)并以最小的管理工作量或服务提供商交互来发布(released)，并且然后相应地进行扩展。云模型可以由各种特征(例如，按需自助服务、广泛的网络访问、资源池化(resource pooling)、快速弹性和可衡量的服务)、服务模型(例如，软件即服务(“SaaS”)、平台即服务(“PaaS”)和基础架构即服务(“IaaS”))以及部署模型(例如，私有云、社区云、公共云和混合云)组成。附图中的流程图和框图示出了根据本公开的各种实施例的系统、方法和计算机程序产品的可能实现的架构、功能和操作。就这一点而言，流程图或框图中的每个框可以表示代码的模块、段或部分，其包括用于实现指定的逻辑功能的一个或更多个可执行指令。还应注意，框图或流程图的每个框以及框图或流程图中的框的组合，可以通过执行指定功能或动作的基于专用硬件的系统或专用硬件和计算机指令的组合来实现。这些计算机程序指令还可以存储在计算机可读介质中，该计算机可读介质可以指导计算机或其它可编程数据处理装置以特定方式运行，使得存储在计算机可读介质中的指令产生包括指令装置的制品，该指令装置实现在流程图和/或一个或更多个流程图框中指定的功能/动作。Embodiments may also be implemented in cloud computing environments. In this specification and the appended claims, "cloud computing" may be defined as a method used to enable a pervasive, convenient, on-demand implementation of a shared pool of configurable computing resources (eg, networks, servers, storage, applications, and services). A model of network access that can be rapidly provisioned (including via virtualization) and released with minimal management effort or service provider interaction, and then scaled accordingly. Cloud models can consist of various characteristics (eg, on-demand self-service, extensive network access, resource pooling, rapid elasticity, and measurable services), service models (eg, software as a service ("SaaS") , Platform as a Service ("PaaS") and Infrastructure as a Service ("IaaS")) and deployment models (eg, private cloud, community cloud, public cloud, and hybrid cloud). The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by special purpose hardware-based systems, or combinations of special purpose hardware and computer instructions, that perform the specified functions or actions. These computer program instructions may also be stored in a computer-readable medium, which can direct a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including the instruction apparatus , the instruction means implement the functions/acts specified in the flowchart and/or one or more flowchart blocks.

出于解释的目的，已经参考特定实施例描述了前述描述。然而，以上说明性讨论并非旨在穷举或将本发明限制为所公开的精确形式。鉴于以上教导，许多修改和变化是可能的。选择和描述这些实施例是为了最好地解释实施例的原理及其实际应用，从而使本领域的其他技术人员能够最好地利用实施例以及如可以适合于预期的特定用途的各种修改。因此，本实施例应被认为是说明性的而不是限制性的，并且本发明不限于本文给出的细节，而是可以在所附权利要求的范围和等效物内进行修改。For purposes of explanation, the foregoing description has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teachings. The embodiments were chosen and described in order to best explain the principles of the embodiments and their practical application, to thereby enable others skilled in the art to best utilize the embodiments and various modifications as may be suited to the particular use contemplated. Accordingly, the present embodiments are to be regarded as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Claims

1. A method comprising:

receiving data sharing information from a data provider for sharing a data set in a data exchange from a first cloud computing entity to a set of second cloud computing entities;

creating, by the processing device, an account with each of the set of second cloud computing entities in response to receiving the data sharing information; and

The data set from the first cloud computing entity is shared with the set of second cloud computing entities using at least a corresponding account of the second cloud computing entity.

2. The method of claim 1, wherein the data exchange includes a plurality of data lists provided by a plurality of data providers, the plurality of data lists referencing a plurality of data sets stored in a data storage platform, The dataset is one of the plurality of datasets, and the data provider is one of the plurality of data providers.

3. The method of claim 1, wherein the first cloud computing entity is a first cloud computing platform and at least one of the set of second cloud computing entities is different from the first cloud computing Platform cloud computing platform.

4. The method of claim 3, wherein the account with the second cloud computing entity is a provider account of the second cloud computing platform.

5. The method of claim 1, wherein the first cloud computing entity is a first region for a cloud computing platform having multiple regions and at least one of the set of second cloud computing entities is a second area different from the first area for the cloud computing platform.

6. The method of claim 5, wherein the account with the second cloud computing entity is a provider account of the second region of the cloud computing platform.

7. The method of claim 1, further comprising:

A frequency of updating the dataset shared with the set of second cloud computing entities is determined.

8. The method of claim 1, wherein the sharing of the dataset comprises:

determining a set of one or more objects of the shared dataset to be replicated by the set of second cloud computing entities; and

The set of one or more objects is replicated with the set of second cloud computing entities.

9. The method of claim 1, wherein the account is associated with a customer and the shared dataset is customized for the customer.

10. The method of claim 1, wherein the shared dataset is a subset of a database managed by the data provider.

11. The method of claim 1, wherein the shared dataset references at least one object in a second database different from the first data database used to store the shared dataset.

12. The method of claim 11, further comprising replicating the at least one object to the set of second cloud computing entities.

13. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause one or more second processors to:

14. The machine-readable medium of claim 13, wherein the data exchange includes a plurality of data lists provided by a plurality of data providers referencing a plurality of data lists stored in a data storage platform a dataset, the dataset is one of the plurality of datasets, and the data provider is one of the plurality of data providers.

15. The machine-readable medium of claim 13, wherein the first cloud computing entity is a first cloud computing platform and at least one of the set of second cloud computing entities is a different A cloud computing platform of a cloud computing platform.

16. The machine-readable medium of claim 13, wherein the account with the second cloud computing entity is a provider account of the second cloud computing platform.

17. The machine-readable medium of claim 13, wherein the first cloud computing entity is a first region for a cloud computing platform having a plurality of regions, and wherein the set of second cloud computing entities At least one of is a second area for the cloud computing platform different from the first area.

18. The machine-readable medium of claim 17, wherein the account with the second cloud computing entity is a provider account for the second region of the cloud computing platform.

19. The machine-readable medium of claim 13, wherein the instructions further cause the computing device to:

20. The machine-readable medium of claim 13, wherein the instructions further cause the computing device to share the dataset by:

21. The machine-readable medium of claim 13, wherein the account is associated with a customer and the shared dataset is customized for the customer.

22. The machine-readable medium of claim 13, wherein the shared dataset is a subset of a database managed by the data provider.

23. The machine-readable medium of claim 13, wherein the shared dataset references at least one object in a second database different from the first data used to store the shared dataset database.

24. The machine-readable medium of claim 23, wherein the instructions further cause the computing device to:

The at least one object is copied to the set of second cloud computing entities.

25. A system comprising:

the first cloud computing entity; and

a set of second cloud computing entities;

Data exchange for:

receiving data sharing information from a data provider for sharing data sets in the data exchange from a first cloud computing entity to a set of second cloud computing entities;

in response to receiving the data sharing information, creating an account with each of the set of second cloud computing entities, and

26. The system of claim 25, wherein the data exchange includes a plurality of data lists provided by a plurality of data providers, the plurality of data lists referencing a plurality of data sets stored in a data storage platform, The dataset is one of the plurality of datasets, and the data provider is one of the plurality of data providers.

27. The system of claim 25, wherein the first cloud computing entity is a first cloud computing platform and at least one of the set of second cloud computing entities is different from the first cloud computing Platform cloud computing platform.

28. The system of claim 25, wherein the account with the second cloud computing entity is a provider account of the second cloud computing platform.

29. The system of claim 25, wherein the first cloud computing entity is a first region for a cloud computing platform having multiple regions and at least one of the set of second cloud computing entities is a second area different from the first area for the cloud computing platform.

30. The system of claim 29, wherein the account with the second cloud computing entity is a provider account of the second region of the cloud computing platform.