WO2022206089A1 - 一种数据互联网方法及系统 - Google Patents

一种数据互联网方法及系统 Download PDF

Info

Publication number
WO2022206089A1
WO2022206089A1 PCT/CN2022/070183 CN2022070183W WO2022206089A1 WO 2022206089 A1 WO2022206089 A1 WO 2022206089A1 CN 2022070183 W CN2022070183 W CN 2022070183W WO 2022206089 A1 WO2022206089 A1 WO 2022206089A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
domain
virtual
physical
proxy node
Prior art date
Application number
PCT/CN2022/070183
Other languages
English (en)
French (fr)
Inventor
刘麒赟
赵乃岩
Original Assignee
即云天下(北京)数据科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 即云天下(北京)数据科技有限公司 filed Critical 即云天下(北京)数据科技有限公司
Publication of WO2022206089A1 publication Critical patent/WO2022206089A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]

Definitions

  • the present invention relates to the technical field of databases, in particular to a data Internet method and system.
  • Blockchain network It can only realize the "bookkeeping" function, and cannot meet the requirements of ownership protection and high performance in the process of data sharing;
  • Cloud computing/data center a centralized solution, through ETL (Extract-Transform-Load, extraction-transform-load) to complete "merging small data islands into large data islands", which cannot meet data security, ownership protection, Technical requirements such as delay.
  • ETL Extract-Transform-Load, extraction-transform-load
  • the present invention provides a data Internet method, including:
  • a computing task is performed on the twin dataset, and the calculation result is restored to the original dataset corresponding to the twin dataset.
  • the present invention also provides a data Internet system, comprising:
  • the first module is used to establish a data Internet backbone, and network the data Internet backbone and basic service components to establish a data Internet;
  • the second module is used to establish an information entropy reduction function, extract data from the original data set on the data Internet according to the information entropy reduction function, encrypt the extracted data, and generate a corresponding data set corresponding to the original data set.
  • the third module is configured to perform a calculation task on the twin dataset, and restore the calculation result to the original dataset corresponding to the twin dataset.
  • the data Internet method and system provided by the present invention through the data Internet, information entropy reduction function and twin data sets, enable data islands to be interconnected in a manner close to real business, efficient, real-time and safe, and provide convenient, flexible and flexible Controlled conjoint analysis means to users.
  • FIG. 1 is a schematic diagram of a virtual domain network architecture provided by this embodiment
  • FIG. 2 is a schematic diagram of a virtual autonomous area network architecture provided by this embodiment
  • FIG. 3 is a schematic diagram of the data Internet network architecture provided by the present embodiment.
  • FIG. 5 is a schematic diagram of the relationship between a twin data set and a physical data island provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a data Internet backbone provided by an embodiment of the present invention.
  • FIG. 7 is a complete schematic diagram of the data Internet provided by an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a data Internet compatible with a third-party computing framework provided by an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a data Internet of a typical application example provided by an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a data Internet system provided by an embodiment of the present invention.
  • the data Internet provided by the embodiments of the present invention is a network infrastructure with "data islands” as nodes and the Internet as the underlying communication network, and supports joint data analysis across security boundaries.
  • Data silos refer to accessible data storage containers with secure boundaries, such as databases, IoT terminal devices (mobile terminals, sensors, smart home facilities, etc.).
  • IoT terminal devices mobile terminals, sensors, smart home facilities, etc.
  • virtual domains constitute a virtual autonomous domain
  • several virtual autonomous domains constitute a data Internet. The virtual domain, virtual autonomous domain and data Internet are described in detail below.
  • the virtual domain does not correspond to the physical domain in the real physical world, and is a logically existing domain, that is, "virtual space".
  • a virtual domain may have several virtual domain internal proxy nodes from different physical domains, and further proxy physical proxy nodes and data assets from different physical domains, and these physical proxy nodes and data assets may exist in different physical domains respectively.
  • a virtual domain can have several routers, and these routers can be interconnected to complete data routing.
  • a virtual domain usually has only one external proxy node of the virtual domain, and the external proxy node of the virtual domain is used as the gateway of the virtual domain, and is used for data entry and exit control, routing addressing, providing data resource list and data access services.
  • a virtual domain is the smallest network structure in the data Internet, and its network architecture is shown in Figure 1.
  • the internal proxy node of the virtual domain must be directly connected to a router, and then connected to other virtual domain internal proxy nodes and virtual domain external proxy nodes in the virtual domain through the router; all virtual domains directly connected to the same router Together, the internal proxy nodes form a virtual domain subnet.
  • the virtual domain has the IoD DNS (Internet of Data Domain Name System, Data Internet Domain Name System) service to complete the mapping of the IoD domain name; the virtual domain also has the IoD NAT (Internet of Data Domain Network Address Translation, Data Internet Network Address Translation) service , to complete the internal and external network address translation work in the virtual domain.
  • the virtual domain also has front-end equipment.
  • the front-end equipment is a set of software systems that are directly installed and run on the operating system (usually Linux system) of the server owned by the data Internet. Configure to complete server role division (virtual domain proxy node, physical proxy node, router, DNS and other IoD device roles), routing settings, etc.
  • a virtual autonomous domain is composed of several virtual domains through routers, and its network architecture is shown in Figure 2.
  • each virtual domain is directly connected to an external router through its own virtual domain gateway (ie, a virtual domain external proxy node), and is connected to the virtual autonomous domain gateway and other virtual domain gateways through the router.
  • the virtual autonomous domain has an autonomous domain asset metadata system.
  • the virtual domain internal proxy nodes in each virtual domain will send the metadata information (Uniform Resource Identifier, Asset (name, asset description, asset owner, etc.) are registered to the asset metadata system, so that all assets in the virtual autonomous domain can be discovered and managed through the asset metadata system, which can be discovered and accessed externally, and asset information can be provided. access services, etc.
  • the virtual autonomous domain has the Iod DNS service to complete the mapping of the IoD domain name.
  • the virtual autonomous domain has the Iod NAT service to complete the internal and external network address translation work in the virtual autonomous domain.
  • the virtual autonomous domain has a management system to complete the management work in the virtual autonomous domain.
  • Each virtual autonomous domain has an external agent node of the virtual autonomous domain, that is, the virtual autonomous domain gateway, which is used for the external connection of the entire virtual autonomous domain.
  • the data Internet consists of several virtual autonomous domains and basic service components, and its network architecture is shown in Figure 3; among them, the basic service components include public asset metadata systems, DNS, and data transaction record systems.
  • each virtual autonomous domain is directly connected to other virtual autonomous domain gateways through its own virtual autonomous domain gateway, thereby realizing the mutual connection between virtual autonomous domains.
  • There is a data Internet asset metadata system in the data Internet and the virtual autonomous domain gateways in each virtual autonomous domain synchronize the asset metadata information in the autonomous domain asset metadata system of the respective virtual autonomous domain to the data Internet asset metadata system of the data Internet , so that all assets in the entire data Internet can be discovered and managed through this asset metadata system.
  • the Data Internet has an IoD DNS service to complete the mapping of IoD domain names.
  • the data Internet has a data transaction record system for recording all data transaction transactions, thereby supporting data transaction services across virtual autonomous domains.
  • the data Internet can also have some public services independent of each virtual autonomous domain, such as data transaction systems, business model services, data model services, third-party applications, application stores, etc., users can complete data across virtual autonomous domains through these public services. Analytics and data trading, etc.
  • the data transaction record system is used to record all data transaction transactions. Since the data transaction system needs to serve multiple independent data transaction parties, the system must ensure that the third party is neutral, cannot be tampered with, information is traceable, stable and reliable. In general, the system can be built based on a trusted blockchain system.
  • the data Internet method provided by the embodiment of the present invention specifically includes the following steps:
  • Step S101 Establish a physical domain, a virtual domain and a virtual autonomous domain.
  • a preset plan such as a virtual domain construction plan formed based on the actual physical network and virtual domain construction requirements of a group, a physical domain, a virtual domain, and a virtual autonomous domain are established.
  • the physical domain refers to the actual network domain corresponding to the real physical world, that is, the network that is physically independent and has a clear physical isolation boundary from the external network, that is, the "real space", and generally contains a unique gateway that controls the connection between internal and external networks.
  • Devices; a physical domain has several physical proxy nodes and data island nodes.
  • the physical proxy node is used to provide computing assets in the physical domain (CPU, memory, disk, network, etc. contained in the physical proxy node itself) and connect data assets in the physical domain (physical database, data source, etc. of the data island node).
  • the physical proxy node provides external access services for computing assets and data assets through container technology.
  • the computing container is used to provide the computing asset access service
  • the data container is used to provide the data asset access service
  • the data container is used to load the twin data set. carrier, and provide corresponding access to the outside world.
  • a physical proxy node can run multiple computing containers and data containers at the same time. All computing containers and data containers provided by the physical proxy node have a corresponding proxy container on the internal proxy node of the virtual domain to which they belong, and finally pass through the internal proxy of the virtual domain.
  • Nodes provide external access services. Physical proxy nodes in the same physical domain can be interconnected. Physical proxy nodes do not directly provide external access services, but complete proxy forwarding of access services through internal proxy nodes in the virtual domain.
  • the proxy node in the virtual domain is used to proxy multiple physical proxy nodes and multiple data island nodes in the same physical domain.
  • the proxy node in the virtual domain belongs to its physical proxy node, and then proxy all containers of the physical proxy node. Any external access to the container and the twin datasets loaded in the container needs to pass through the internal proxy node of the virtual domain, that is, all containers are in the virtual domain.
  • the same physical proxy node or data island node can only be uniquely mounted on the proxy node within a virtual domain. All computing assets and data assets can be registered in the data Internet through the virtual domain internal proxy node. Through the information entropy reduction function, the mapping of information from the data island node in the physical domain to the twin data set of the proxy node in the virtual domain is completed; through the container technology, the data of the computing resources from the physical proxy node in the physical domain to the proxy node in the virtual domain is completed. A map of containers and compute containers. The proxy node in the virtual domain can be connected with the router in the same virtual domain to complete the routing of data in the virtual domain. In the same virtual domain, each virtual domain internal proxy node has a unique IoD IP address within the virtual domain.
  • the virtual domain external proxy node is the gateway node of the virtual domain, and only one node in the virtual domain can serve as the virtual domain external proxy node.
  • the external proxy node of the virtual domain is the gateway node to the outside of the virtual domain. It is the only entrance and exit for all data assets in the domain to connect to the data Internet network. That is, any user or application outside the domain accesses any data container and computing container in the domain must pass through the virtual domain.
  • External agent node the virtual domain external agent node has functions such as control of container access rights, data and instruction routing. Since the virtual domain foreign proxy node is the gateway of the virtual domain, the virtual domain foreign proxy node has all the basic functions of an IoD router.
  • the external proxy node of the virtual domain is connected with the external router, and completes the data routing through the router's routing service.
  • Both the internal proxy node of the virtual domain and the external proxy node of the virtual domain can include the MQ (Message Queue, message queue) service, and support the MQTT (Message Queuing Telemetry Transport, message queue telemetry transport) protocol, which is used to help the routing and storage of messages between nodes , transfer and other functions.
  • MQ Message Queue, message queue
  • MQTT Message Queuing Telemetry Transport, message queue telemetry transport
  • each virtual domain external proxy node has a unique IoD IP.
  • Different virtual domains can use their own independent IoD IP allocation strategies, that is, to form an independent virtual domain local area network, and ensure that each device (agent node, router, etc.) in the virtual domain has a unique IoD IP.
  • the external proxy node of the virtual domain completes the translation of internal and external addresses through the NAT service.
  • Step S102 Establish the mapping relationship between the physical proxy node and the internal proxy node in the virtual domain, and establish the mapping relationship between the external proxy node in the virtual domain and the external proxy node in the virtual autonomous domain.
  • the virtual domain has multiple internal proxy nodes in the virtual domain and one external proxy node in the virtual domain. Both the internal proxy nodes in the virtual domain and the external proxy nodes in the virtual domain are derived from the mapping of the physical domain entity components.
  • the host devices corresponding to the proxy nodes in the virtual domain are configured with IoD IP addresses, IoD MAC addresses, and IoD subnet masks.
  • the virtual autonomous domain is also a logical domain, and the virtual autonomous domain external proxy node is the mapping of the virtual autonomous domain external proxy node.
  • Step S103 Establish a routing connection between the virtual domain internal proxy node and the virtual domain external proxy node, establish a routing connection between the virtual domain external proxy node and the virtual autonomous domain external proxy node, and form a data Internet backbone.
  • the internal proxy node of the virtual domain is routed to connect with the external proxy node of the virtual domain through the IoD router.
  • the default IP address of the IoD router is set to the IoD IP address of the external proxy node of the virtual domain.
  • a virtual autonomous domain has a virtual autonomous domain gateway (that is, a virtual autonomous domain external proxy node) and multiple data Internet routers. In practical applications, multiple gateway nodes with the same function can be used, so that high service availability and load balancing can be achieved.
  • the default IoD IP address of each data Internet router in the virtual autonomous domain is set to the IoD IP address of the virtual autonomous domain gateway, and the default IP address of the virtual domain gateway in each virtual domain is set to a data Internet router in the virtual autonomous domain .
  • the virtual autonomous domain also includes public services such as the IoD NAT service, the IoD DNS service, the resource metadata system, and the virtual autonomous domain management system; for the IoD NAT service, it is necessary to install the IoD NAT service and set a dedicated server for each host in the virtual autonomous domain.
  • IoD IP For public services in the virtual autonomous domain, you need to set their default IoD IP address to the IoD IP address of the virtual autonomous domain gateway.
  • Step S104 Network the backbone of the data Internet and basic service components to establish the data Internet.
  • Basic service components include public asset metadata system, IoD DNS and data transaction record system, etc.
  • a separate virtual autonomous domain can also be established in the data Internet, which is managed by the data Internet network administrator.
  • Application stores, data transaction systems, business model services, data model services and third parties are deployed in this separate virtual autonomous domain. applications and other public services.
  • mapping of the physical domain can be established to form a virtual domain, as shown in Figure 6, which involves three virtual domains, namely virtual domain 1, virtual domain 2 and virtual domain 3:
  • virtual domain 1 the virtual domain internal proxy node v1 in physical domain 1 is mapped to virtual domain internal proxy node v1', and the virtual domain internal proxy node v2 in physical domain 2 is mapped to virtual domain internal proxy node v2', IoD router r1 is mapped to IoD router r1', virtual domain gateway node g1 is mapped to virtual domain gateway node g1'; virtual domain internal proxy node v1', virtual domain internal proxy node v2' and IoD router r1' cannot communicate with virtual domain 1 For external communication, only the virtual domain gateway node g1' can communicate with the virtual domain 1; through the connection between the internal proxy node v1' of the virtual domain and the IoD router r1', the physical domain 1 and the physical domain 2 together constitute the virtual domain 1 .
  • the virtual domain internal proxy node v3 in the physical domain 3 is mapped to the virtual domain internal proxy node v3', the IoD router r2 is mapped to the IoD router r2', and the virtual domain gateway node g2 is mapped to the virtual domain gateway node.
  • virtual domain internal proxy node v3' and IoD router r2' cannot communicate with virtual domain 2, only virtual domain gateway node g2' can communicate with virtual domain 2; some nodes in physical domain 3 constitute virtual domain 2.
  • the virtual domain internal proxy node v4 in the physical domain 4 is mapped to the virtual domain internal proxy node v4', the IoD router r5 is mapped to the IoD router r5', and the virtual domain gateway node g4 is mapped to the virtual domain gateway node.
  • virtual domain internal proxy node v4' and IoD router r5' cannot communicate with virtual domain 3
  • only virtual domain gateway node g4' can communicate with virtual domain 3; some nodes in physical domain 4 constitute virtual domain 3.
  • mapping of the virtual domain can be established to form a virtual autonomous domain, as shown in Figure 6, which involves two virtual autonomous domains, namely virtual autonomous domain 1 and virtual autonomous domain 2:
  • IoD router r3 in physical domain 3 is mapped to IoD router r3', virtual autonomous domain gateway node g3 is mapped to virtual autonomous domain gateway node g3'; virtual domain gateway node g1' of virtual domain 1
  • the virtual domain gateway node g2' of the virtual domain and the virtual domain 2 are connected to the IoD router r3', thereby constituting the virtual autonomous domain 1.
  • the IoD router r3' cannot communicate with the virtual autonomous domain 1, and only the virtual autonomous domain gateway node g3' can communicate with the virtual autonomous domain 1; it can be seen from Figure 6 that the physical domain 1, the physical domain 2 and the physical domain 3 share the same A virtual autonomous domain 1 is formed.
  • IoD router r4 in physical domain 4 is mapped to IoD router r4', virtual autonomous domain gateway node g5 is mapped to virtual autonomous domain gateway node g5'; virtual domain gateway node g4' of virtual domain 3 Connect to the IoD router r4' to form a virtual autonomous domain 2.
  • the IoD router r4' cannot communicate with the outside of the virtual autonomous domain 2, and only the virtual autonomous domain gateway node g5' can communicate with the outside of the virtual autonomous domain 2; as can be seen from Figure 6, the physical domain 4 constitutes the virtual autonomous domain 2 alone.
  • Step S105 Establish an information entropy reduction function, extract data from the original data set on the data Internet according to the information entropy reduction function, encrypt the extracted data, and generate a twin data set corresponding to the original data set.
  • each data silo is a source of information that owns the data.
  • any data query task of any user is a unique information (valid data) mining process, that is, a process of eliminating entropy, eliminating uncertainty, and obtaining target information, and this process includes
  • the filtering of noise (invalid data) works.
  • the validity and invalidity of data are determined for specific data query and analysis tasks, and are relative. For example, data that is information for one query task may be noise for another query task.
  • the total data volume of a data island source is the same, but the query workload of each query analysis task based on this source is different, so the amount of noise that needs to be filtered is also different.
  • the embodiment of the present invention proposes and uses an information entropy reduction function.
  • the information entropy reduction function is a function for reducing information entropy. It can be designed according to a specific query analysis task to help the query analysis task reduce the amount of data noise. , and establish the corresponding target twin data set, thereby improving the overall execution efficiency of the task.
  • the information entropy reduction function is a tool to connect the data islands of the physical domain to the data Internet, and can complete the mapping of data from the physical domain to the virtual domain.
  • the embodiment of the present invention proposes and uses a twin data set.
  • the twin data set is extracted from the original physical data set through an information entropy reduction function, and the extracted data is securely encrypted through a desensitization function/encryption function, etc.
  • the resulting sub-data set is shown in Figure 5.
  • the twin dataset is essentially a virtual model of its corresponding physical model, which contains the basic characteristics of the virtual model of the digital twin, such as real-time dynamics, bidirectionality, etc.
  • the twin data set of the embodiment of the present invention also has the following unique functions and advantages:
  • the twin dataset is not a simple copy of the original physical dataset, but specifies the corresponding data extraction algorithms and rules according to the needs and characteristics of the actual business scenario, and then according to the data extraction algorithm and Rules extract data from the original physical dataset.
  • the data size of the twin dataset established by the information entropy reduction function is usually smaller than its original dataset, which can significantly improve the efficiency of data collection and analysis based on the twin dataset.
  • the size of the data size of the twin dataset is mainly determined by the specific actual business situation and the information entropy reduction function.
  • the full original dataset can also be mapped to the twin dataset.
  • the Siamese dataset can also be an empty set.
  • the twin data set can be a mapping of the entire or partial subset of the original data set, and its real content is exactly the same as the content of the original data set.
  • the entropy reduction function does not perform encryption or desensitization operations.
  • the data leakage of the data set may be equivalent to the data leakage of the original data set, which has security risks. Therefore, desensitization/encryption functions are usually used to obtain high data security for twin datasets, thereby greatly expanding its practical application scenarios.
  • the twin data set can be dynamically changed in real time with the change of its corresponding physical model, thus ensuring the timeliness of the analysis results based on the twin data set.
  • Bidirectionality Not only can the corresponding twin data set be generated from the physical model, but also the calculation and analysis results based on the twin data set can be fed back to the physical model, thus forming a two-way, uninterrupted closed-loop information feedback.
  • the physical model can supplement its information at certain latitudes based on feedback, or continuously optimize the product.
  • Support batch or real-time stream computing The information entropy reduction function mainly focuses on the definition of rules and algorithms, and does not limit its specific implementation method, technology used, computing form, etc. Usually, according to specific business scenarios and characteristics, you can choose to use timed or interval batch computing, or choose real-time stream computing, or a combination of batch and stream computing.
  • the system can provide applications with rich and flexible APIs to facilitate the use of applications and improve analysis efficiency.
  • an API for data query and analysis can be provided, which facilitates users to perform OLTP and OLAP production tasks;
  • an API for data set monitoring can be provided, and the application can monitor the specified data set, so that when the twin data set changes, the system The application can be reminded, which can trigger some set logic of the application.
  • This service is generally provided through the data container provided by the physical proxy node of the data Internet.
  • a twin dataset can not only be a dataset that provides queries, but also provides computing power.
  • it is usually necessary to provide a corresponding computing container in the physical agent node of the data Internet, which has the necessary and configurable hardware support (such as CPU, memory, hard disk, etc.).
  • the computing tasks of the computing container are based on, but not limited to, the data of its corresponding twin dataset, and can execute different and customizable computing task types.
  • a computing container can also be established for it.
  • the computing power enables the data Internet to accept computing push-down commands and complete tasks such as edge computing while providing external query services.
  • Twin datasets can not only directly provide access interfaces to datasets, but also be compatible with third-party access interfaces.
  • Third-party access interfaces include but are not limited to data query interfaces and data computing interfaces, such as JDBC (Java Database Connection) interfaces, MapReduce computing interfaces, Presto computing query interfaces, edge computing framework interfaces (such as EdgeX, etc.), Tensorflow computing interfaces, FATE ( Federated AI Technology Enabler) computing interface, Secure Multi-party Computation framework, etc.
  • the access command sent by the third-party application or platform to the twin dataset will be routed by the data internet to the computing container corresponding to the twin dataset. Calculate, and finally return the calculation results to third-party applications or platforms through the data Internet.
  • the whole calling process is transparent to the third-party application or platform, and the calculation operation of the third-party application or platform accessing the twin data set is considered to be completed locally.
  • This service is usually provided through computing containers provided by physical proxy nodes of the data Internet.
  • twin datasets (such as d1, d2, d3, d4, etc.) on each independent database (such as data island 1, data island 3, data island 5, data island 7, etc.)
  • Twin datasets (such as d1-1, d1d2-1, d3-1, d4-1, etc.) can be generated through information entropy reduction functions and desensitization/encryption functions; an original dataset can generate a twin dataset (eg, through data The set d3 generates a twin dataset d3-1), or a twin dataset can be generated from several original datasets (for example, a twin dataset d1d2-1 is generated by using the dataset d1 and the dataset d2).
  • twin datasets is that access to original datasets can be narrowed and secure datasets can be provided
  • the twin dataset is calculated and generated by the computing container provided by the physical proxy node of the data internet.
  • the computing container includes program logic such as information entropy reduction function and desensitization/encryption function.
  • the twin dataset generated by the calculation will be loaded into the corresponding data container. in order to provide external data access services.
  • the physical proxy node p2 calculates the original data sets d1 and d2 to generate a twin data set d1d2-1 through the computing container c1 provided by it, and the twin data set d1d2-1 is loaded into the data container d1d2-1.
  • the physical proxy node of the data Internet undertakes all the actual data service and computing service tasks, and the external access management of the data container and the computing container is completed through the internal proxy node of the virtual domain.
  • the data container d1d2-1 mounted on the physical proxy node p2 is actually proxied by the proxy node v1' in the virtual domain of the virtual domain 1 (that is, d1d2-1') in the data Internet.
  • the application can complete the access to data resources and computing resources across physical domains through the data Internet.
  • application A completes the access to the data resources and computing resources in the virtual autonomous domain 1 through the virtual autonomous domain gateway node g3' of the virtual autonomous domain 1 connected to the data Internet
  • application B The virtual autonomous domain gateway nodes g3' and g5' of the autonomous domain 1 and the virtual autonomous domain 2 complete the access to the data resources and computing resources in the virtual autonomous domain 1 and the virtual autonomous domain 2.
  • both the MapReduce computing framework and the Tensorflow computing framework can complete the virtual autonomous domain 1 and virtual autonomous domain by accessing the virtual autonomous domain gateway nodes g3' and g5' of virtual autonomous domain 1 and virtual autonomous domain 2 of the data Internet. 2. Access to data resources and computing resources.
  • the MapReduce computing framework can take the twin dataset carried by the data container d1-1' in the virtual domain 1 as the input of the computing task (specify the data file path in the program as the twin dataset domain name in the data internet), and at the same time calculate the map The task is completed in the computing container c1', and the reduce computing task is completed in the computing container c2', so that the entire computing task is completed through the data resources and computing resources of the data Internet.
  • the task management program of the MapReduce computing framework (such as JobTracker, or Application Master, etc.) can be run in the computing container of the data Internet.
  • the access and use of different domains, remote data sources and computing resources can be completed through the data Internet without code changes, and the effect is equivalent to localized calls.
  • the specific data network routing, security authentication and other work are all done by the data Internet itself and are transparent to the MapReduce computing framework. Therefore, the data Internet can theoretically seamlessly integrate third-party interfaces, applications or frameworks.
  • the idea that the data Internet is compatible with other third-party interfaces, applications or frameworks can be referred to and is not limited to the above-mentioned solutions (compatible with the MapReduce computing framework solution).
  • Step S106 Execute the calculation task on the twin dataset, and restore the calculation result to the original dataset corresponding to the twin dataset.
  • the platform will provide the necessary API and corresponding functions to the application, so that users can complete data reading and task development more efficiently. For example, by providing a monitoring interface for the twin dataset, the application can easily monitor the target dataset and automatically execute the preset related program logic.
  • the calculation result based on the twin dataset is equivalent to the calculation result on the original dataset, and then the calculation result in the twin dataset can be passed through the root node of the virtual domain. It is sent back to its corresponding data island. If the calculation result data is encrypted, the original calculation result data is restored through corresponding decryption (symmetric encryption, homomorphic encryption, etc.) and other algorithms. Store the calculation result data in the data island to complete the restoration of the calculation result. In this process, the computing tasks are all completed in the computing container, including the task of synchronously updating the twin dataset and the original dataset with the computed results.
  • This typical example is to use the data Internet to find out the license plate car that has passed the electronic bayonet, and assist the business A agency to crack down on such illegal acts.
  • the so-called fake car refers to the illegal elements forging and illegally obtaining the license plate, model and color of the real car, so that the smuggled, assembled, scrapped and stolen vehicles are covered with a "legal" coat on the surface.
  • the deck is to stick a label, refer to the real license plate, and put the fake license plate with the same number on other cars.
  • Fig. 9 is the actual data distribution situation in reality of this typical example.
  • the comprehensive data of business A in province X is distributed according to each prefecture and city, and is stored in different physical domains, and each physical domain contains several databases.
  • the physical domains included in the comprehensive data of business A include "business A physical domain 1 of province X", “business A physical domain 2 of province X”, “business A physical domain 3 of province X”; "business A physical domain 1 of province X”
  • the included physical domain nodes are physical node 1, physical node 2, and physical node 3, and the included data islands are d1 and d2; the physical domain nodes included in "X province business A physical domain 2" are 4 and 5, and the included data islands are is d3; the physical domain nodes included in "X province service A physical domain 3" are 8 and 9, and the included data island is d5.
  • Department B of the business of X province has a physical domain named "physical domain 1 of business B of X province”.
  • the physical domain nodes contained in this domain are 6 and 7, and the data island is d4.
  • This data island stores the data of the transportation department of the whole province.
  • personnel information table RYXX including attribute field personnel number RYBH, name NAME, ID number SFZH, address ADDRESS, telephone TEL
  • personnel photo table RXZP including attribute field personnel number RYBH, photo PHOTO.
  • vehicle information table VEHICLE including license plate HP, owner ID number SFZH
  • bayonet vehicle driving record table EVENT including license plate HP, time TIME, and passing bayonet name KAKOU.
  • the detailed information of the car owner can be queried according to the license plate number through joint query of the comprehensive data of business A and the data of the transportation department.
  • the above method is difficult to realize in the current technology. Due to the requirements of management boundaries, data scale, and real-time data, the bayonet databases of various provinces (cities) have been in the shape of "data islands", and the data between them cannot be integrated. It is impossible to directly find out the set of cars with cross-domain information in any data island.
  • the current analysis technology usually extracts and transmits the data of all data islands to a centralized system through ETL means, and then performs joint analysis.
  • the real-time performance of this solution is poor, and it is often the "T+1" feedback timeliness. , for example, the analysis results are obtained one day later, which increases the difficulty of the business A agency in handling the case of duplicate license plates and reduces the value of the data.
  • the data internet provided by the embodiment of the present invention can solve the problem of grabbing the license plate vehicles across the data island, and the process is as follows:
  • the physical domain is established.
  • all physical proxy nodes, internal proxy nodes, gateways, routers and other hosts that should be included in the physical domain are established.
  • the corresponding 4 physical domains are established, namely, the X province service A physical domain 1, the X province service A physical domain 2, and the X province service A physical domain 3 and X province business B physical domain 1:
  • the virtual domain includes a virtual domain internal proxy node v1', a virtual domain internal proxy node v2', a router r1', and a virtual domain gateway node g1' mapped from the physical domain of the X-province service A.
  • the X province service A virtual domain 1 contains the virtual domain nodes corresponding to the X province service A physical domain 1 and the X province service A physical domain 2, so it is a virtual domain spanning physical domains. .
  • the virtual domain includes a virtual domain internal proxy node v3', a router r2', and a virtual domain gateway node g2' mapped from the physical domain 1 of the service B in the X province. It can be seen from FIG. 9 that the virtual domain corresponds to the actual physical domain one-to-one.
  • the virtual domain includes a virtual domain internal proxy node v4', a router r5', and a virtual domain gateway node g4' mapped from the physical domain 3 of the X-province service A. It can be seen from FIG. 9 that the virtual domain corresponds to the actual physical domain one-to-one.
  • the tables contained in this data set d4 are the personnel information table RYXX and the personnel photo table RXZP.
  • the information entropy reduction function is to change any records in these two tables (new Add/delete/update) operations are sent to the twin datasets (personnel information table RYXX', personnel photo table RXZP') corresponding to these two tables, and the corresponding operations are performed in the twin datasets to replicate the corresponding changes. Since this part of the data volume is relatively small and the changes are relatively small, the information entropy reduction function does not require additional complex operations to establish a mapping twin data set;
  • the information entropy reduction function of the vehicle information table VEHICLE in the database is the change of any records in this table (new /delete/update) operations are sent to the twin dataset corresponding to the table, and the corresponding operations are performed in the twin dataset (vehicle input information table VEHICLE') to replicate the corresponding changes. Since the amount of data in this part is relatively small and changes are relatively small, the information entropy reduction function does not require additional complex operations to establish a mapping twin data set;
  • each physical domain establish a corresponding data container (such as d1-1, d3-1, d5-1, etc.) on the physical agent node for it, and then create a set of bayonet license plates for it in the data container
  • the data structure is used to store the license plate HP that has appeared in the bayonet table of all data islands in its domain. Judging whether a license plate already exists in it through the bayonet license plate set, or adding a license plate to it.
  • the bayonet license plate set can be named "domain Y_HP_KAKOU_SET", for example, the names of the bayonet license plate sets of several business A domains in Figure 9 are respectively "domain 1_HP_KAKOU_SET", “domain 2_HP_KAKOU_SET” and “domain 3_HP_KAKOU_SET”;
  • step 2) Build a twin dataset.
  • an empty twin data set can be established first.
  • a twin data set can be established first.
  • a data container such as d1-1, d3-1, d5-1, etc.
  • the information entropy reduction function defined in step 1) is run as a resident application in the computing container of the data Internet, so that these information entropy reduction functions can continuously update the corresponding twin datasets.
  • desensitization algorithms/encryption algorithms symmetric encryption, asymmetric encryption, etc.
  • the application specific logic is:
  • the threshold is 120 km/h, if the speed is 100 km/h, no alarm will be made; when the speed is 130 km/h, it will immediately report the relevant situation to the business A authority of the domain to which the two bayonet ports belong.
  • ii Restore the calculation results to the data island corresponding to the twin dataset.
  • the calculation result obtained by performing the calculation task on the twin dataset is the encrypted data, that is, the encrypted license plate number, bayonet number and physical domain number, but the original license plate number, bayonet number and physical domain number can be easily obtained through the decryption program.
  • Physical domain number Since the calculation effect on the twin data set/twin data set is consistent with the calculation effect of the original data set, the calculation result can be sent back from the virtual space to the corresponding physical domain through the data Internet, and the decryption work can be completed.
  • the database and related applications in the physical domain can obtain the original data information of the suspicious fake license plate car, so as to remind and assist the business A agency to further track the suspicious fake license plate car.
  • the data Internet method provided by the embodiment of the present invention supports data “interconnection” across multiple security domains, is compatible with various data security boundaries, and can meet many technical requirements arising in the process of data sharing and transaction.
  • the specific advantages are as follows:
  • Data silos have their own “domains”, that is, the boundaries of data silos, which are generally represented by enterprise boundaries, department boundaries, or network boundaries.
  • domains The security constraints of data within a domain are the same.
  • Associativity analysis based on multiple data silos usually needs to be done across domains, compared to centralized datasets that usually have unified data security constraints. To complete the cross-domain correlation analysis of multiple data islands, it is necessary to not break the security constraint boundary of each data island.
  • Centralized datasets usually do not support the dynamic expansion of data models (including heterogeneous and homogeneous data models) when the computing model is running, and can only support the expansion of datasets based on existing data models.
  • data models including heterogeneous and homogeneous data models
  • it needs to face heterogeneous data islands that are dynamically connected or disconnected, that is, the dynamic expansion and scalability of heterogeneous data models.
  • Centralized data sets usually do not support real-time data analysis tasks, and the general analysis results are delayed to a certain extent, such as the common "T+1" (T is 1 day or 1 hour). This is mainly limited by the fact that centralized data analysis requires periodic batch extraction, transformation, and loading (ETL) operations on data in different domains, and it cannot immediately obtain the latest data changes in each data domain in real time.
  • ETL periodic batch extraction, transformation, and loading
  • an embodiment of the present invention also provides a data Internet system, including:
  • the first module is used to establish the backbone of the data Internet, network the backbone of the data Internet and basic service components, and establish the data Internet;
  • the second module is used to establish an information entropy reduction function, extract data from the original data set on the data Internet according to the information entropy reduction function, encrypt the extracted data, and generate a twin data set corresponding to the original data set;
  • the third module is used to perform computing tasks on the twin datasets and restore the calculation results to the original datasets corresponding to the twin datasets.
  • the first module includes:
  • the first unit is used to establish a physical domain, a virtual domain and a virtual autonomous domain;
  • the physical domain has several physical proxy nodes, and the virtual domain has several virtual domain internal proxy nodes, routers and an external virtual domain proxy node;
  • the virtual autonomous domain has A virtual autonomous domain external proxy node;
  • the second unit is used to establish the mapping relationship between the physical proxy node and the internal proxy node in the virtual domain, and establish the mapping relationship between the external proxy node in the virtual domain and the external proxy node in the virtual autonomous domain;
  • the third unit is to establish a routing connection between the internal proxy node in the virtual domain and the external proxy node in the virtual domain, and establish a routing connection between the external proxy node in the virtual domain and the external proxy node in the virtual autonomous domain to form the backbone of the data Internet;
  • the fourth unit is used to network the backbone of the data Internet and basic service components to establish the data Internet.
  • the data Internet method and system provided by the embodiments of the present invention make data islands close to real business, efficient, real-time, and secure by establishing physical domains, virtual domains, virtual autonomous domains, information entropy reduction functions, and twin data sets. Interconnect and provide users with a convenient, flexible and controllable joint analysis method.
  • the data Internet method provided by the embodiments of the present invention helps to connect data islands, and can help enterprises or business departments to establish a large-scale, dynamic, remote, heterogeneous, multi-owner data security sharing and complex analysis network, thereby effectively promoting
  • the construction of data sharing and trading platforms between data islands promotes the sharing and joint analysis of data resources, thereby releasing more hidden data value, and helping to establish and improve data resource trading mechanisms and market development.
  • each functional module and unit involved in this embodiment can be implemented by a computer program running on computer hardware, and the program can be stored in a computer-readable storage medium, and the program can be executed when the program is executed. , may include the flow of the above-mentioned method embodiments.
  • the hardware refers to a server or a desktop computer, a notebook computer, etc. comprising one or more processors and storage media; the storage media may be a magnetic disk, an optical disc, a read-only memory Memory, ROM) or random access memory (Random Access Memory, RAM), etc.; the computer program is implemented by not limited to computer languages such as C and C++.

Abstract

提供了一种数据互联网方法及系统,属于数据库技术领域。方法包括:建立数据互联网骨干;将数据互联网骨干和基础服务组件组网,建立数据互联网(S104);建立信息降熵函数,并根据信息降熵函数从数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与原始数据集对应的孪生数据集(S105);对孪生数据集执行计算任务,并将计算结果还原到与孪生数据集对应的原始数据集(S106)。有效推动了数据孤岛之间的数据共享、促进数据资源流通。

Description

一种数据互联网方法及系统 技术领域
本发明涉及数据库技术领域,特别涉及一种数据互联网方法及系统。
背景技术
随着信息化和大数据技术的发展,数据已成为一种越来越重要的特殊资产。国家政策、地方政府和产业界都越来越重视数据的价值,数据的分享、联合查询的需求也与日俱增,各方也在积极推动数据共享和交易平台的建设。然而,在政策和需求双推动的情况下,目前数据共享和交易平台的实际发展情况却不容乐观,市场上并没有建立起真正活跃、有规模、能创造显著价值、形成生态的数据共享和交易平台,反而在越来越强烈的数据共享呼声下形成了越来越多的数据孤岛,最终在实际发展现状和市场期望之间形成了巨大的落差。
这种落差形成的主要原因是传统的“网络基础设施”无法很好地满足数据共享和联合分析的诸多实际需求。实际的数据共享和联合分析需要考虑多方数据各自的安全约束,不是所有数据都可以无条件地集中到某个区域或容器中,例如:需要考虑所有权边界,不能因为数据共享就让参与方散失自己的数据所有权;需要考虑管理权限,让各参与方能够灵活自主地决定自己数据的参与范围、时效等;需要考虑利益边界,可以根据实际数据使用情况,让数据参与方获取相对公平的指标评价和回报;需要考虑实时性,满足越来越多的实时数据分析场景;需要考虑数据源的动态接入和脱离,使得参与方可以自由灵活地加入或退出分享。然而,目前主流的“网络基础设施”都无法完全满足上述需求,其原因在于:
1)互联网/移动互联网:无法满足数据异构性、数据质量统一管理,无法满足数据隐私性保护要求,无法满足(多属主)海量数据比对碰撞需求;
2)区块链网络:仅能实现“记账”功能,无法满足数据共享过程中所有权保护、高性能要求;
3)云计算/数据中心:中心化解决方案,通过ETL(Extract-Transform-Load,抽取-转换-加载)完成“合并小的数据孤岛成为大的数据孤岛”,无法满足数据安全、所有权保护、延时等技术要求。
总之,目前的“网络基础设施”和数据共享交易技术不能真正地保证数据共享过程中的数据安全等约束要求,以及无法实现大规模、实时和动态地进行数据共享和分析。
发明内容
为了解决目前网络基础设施无法真正地保证数据共享过程中的数据安全约束,以及实现大规模、实时和动态地进行数据共享和分析的问题,本发明提供了一种数据互联网方法,包括:
建立数据互联网骨干,并将所述数据互联网骨干和基础服务组件组网,建立数据互联网;
建立信息降熵函数,并根据所述信息降熵函数从所述数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与原始数据集对应的孪生数据集;
对所述孪生数据集执行计算任务,并将计算结果还原到与所述孪生数据集对应的原始数据集。
本发明还提供了一种数据互联网系统,包括:
第一模块,用于建立数据互联网骨干,并将所述数据互联网骨干和基础服务组件组网,建立数据互联网;
第二模块,用于建立信息降熵函数,并根据所述信息降熵函数从所述数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与所述原始数据集对应的孪生数据集;
第三模块,用于对所述孪生数据集执行计算任务,并将计算结果还原到与所述孪生数据集对应的原始数据集。
本发明提供的数据互联网方法及系统,通过数据互联网、信息降熵函数、孪生数据集,使数据孤岛以一种贴近真实业务、高效、实时、安全的方式进行互联,并提供方便、灵活且可控的联合分析手段给用户。
附图说明
图1是本实施例提供的虚拟域网络体系结构示意图;
图2是本实施例提供的虚拟自治域网络体系结构示意图;
图3是本实施例提供的数据互联网网络体系结构示意图;
图4是本实施例提供的数据互联网方法流程图;
图5是本发明实施例提供的孪生数据集和物理数据孤岛关系示意图;
图6是本发明实施例提供的数据互联网骨干示意图;
图7是本发明实施例提供的数据互联网完整示意图;
图8是本发明实施例提供的兼容第三方计算框架的数据互联网示意图;
图9是本发明实施例提供的典型应用实例的数据互联网示意图;
图10是本发明实施例提供的数据互联网系统结构示意图。
具体实施方式
下面结合附图和实施例,对本发明技术方案作进一步描述。
本发明实施例提供的数据互联网是以“数据孤岛”为节点和互联网为底层通信网络的网络基础设施,支持跨安全边界的联合数据分析。数据孤岛是指有安全边界的、可访问的数据存储容器,例如数据库、物联网的终端设备(移动终端、传感器、家庭智能设施等)等。一般来说,若干个虚拟域构成了一个虚拟自治域,而若干个虚拟自治域构成了一个数据互联网。下面对虚拟域、虚拟自治域和数据互联网进行详细说明。
虚拟域与现实物理世界中的物理域不对应,是一个逻辑上存在的域,亦即“虚拟空间”。一个虚拟域可以具有来自不同物理域的若干个虚拟域内部代理节点,进而代理来自不同物理域的物理代理节点和数据资产,并且这些物理代理节点和数据资产可以分别存在于不同的物理域中。一个虚拟域中可以具有若干个路由器,这些路由器之间可以互联以完成数据的路由。一个虚拟域通常仅具有一个虚拟域外部代理节点,虚拟域外部代理节点用于作为该虚拟域的网关,用于数据的进出控制、路由寻址、提供数据资源列表和数据访问服务。虚拟域是数据互联网中的最小网络结构,其网络体系结构如图1所示。在同一个虚拟域中,虚拟域内部代理节点必须与一个路由器直连,进而通过路由器与虚拟域内的其他虚拟域内部代理节点和虚拟域外部代理节点连接;直连到同一个路由器的所有虚拟域内部代理节点共同构成了一个虚拟域子网。虚拟域具有IoD DNS(Internet of Data Domain Name System,数据互联网域名系统)服务,以完成IoD域名的映射工作;虚拟域还具有IoD NAT(Internet of Data Domain Network Address Translation,数据互联网网络地址转换)服务,以完成虚拟域中的内外网地址转换工作。虚拟域还具有前置设备, 前置设备是直接安装和运行在数据互联网所拥有的服务器的操作系统(通常为Linux系统)上的一套软件系统,数据互联网管理人员可以通过在前置设备上进行配置,以完成服务器角色划分(虚拟域代理节点、物理代理节点、路由器、DNS等所有IoD设备角色)、路由设置等工作。
虚拟自治域是由若干个虚拟域通过路由器组成,其网络体系结构如图2所示。在一个虚拟自治域中,各个虚拟域通过各自的虚拟域网关(即虚拟域外部代理节点)与外部的路由器直连,并通过路由器与虚拟自治域网关和其它的虚拟域网关连接。虚拟自治域具有自治域资产元数据系统,各个虚拟域中的虚拟域内部代理节点会将其代理的资产(数据容器、计算容器、孪生数据集等)的元数据信息(统一资源标志符、资产名、资产说明、资产拥有者等)注册到该资产元数据系统,从而可通过该资产元数据系统发现和管理虚拟自治域中的所有资产,进而可以被外部所发现和访问,以及提供资产信息的获取服务等。虚拟自治域具有Iod DNS服务,以完成IoD域名的映射工作。虚拟自治域具有Iod NAT服务,以完成虚拟自治域中的内外网地址转换工作。虚拟自治域具有管理系统,以完成虚拟自治域中的管理工作。每个虚拟自治域具有一个虚拟自治域外部代理节点,即虚拟自治域网关,用于整个虚拟自治域的对外连接。
数据互联网由若干个虚拟自治域和基础服务组件组成,其网络体系结构如图3所示;其中,基础服务组件包括公共资产元数据系统、DNS、数据交易记录系统等。在一个数据互联网中,各个虚拟自治域通过各自的虚拟自治域网关与其他的虚拟自治域网关直连,从而实现了虚拟自治域之间的相互连接。数据互联网中有数据互联网资产元数据系统,各个虚拟自治域中的虚拟自治域网关将各自虚拟自治域的自治域资产元数据系统中的资产元数据信息 同步到数据互联网的数据互联网资产元数据系统中,从而可通过该资产元数据系统发现和管理整个数据互联网中的所有资产。数据互联网具有IoD DNS服务,以完成IoD域名的映射工作。数据互联网具有数据交易记录系统,用于记录所有的数据交易事务,从而支持跨虚拟自治域的数据交易业务。数据互联网还可以具有一些独立于各个虚拟自治域的公共服务,例如数据交易系统、业务模型服务、数据模型服务、第三方应用、应用商店等,用户可以通过这些公共服务完成跨虚拟自治域的数据分析和数据交易等工作。数据交易记录系统用于记录所有的数据交易事务。由于数据交易系统要服务于多个独立的数据交易方,因此该系统要保证第三方中立、不可窜改、信息可追溯和稳定可靠。一般情况下,可基于可信区块链系统来打造该系统。
参见图1至图4,本发明实施例提供的数据互联网方法,具体包括如下步骤:
步骤S101:建立物理域、虚拟域和虚拟自治域。
根据预设的规划,例如基于某集团实际物理网络和虚拟域搭建需求形成的一个虚拟域搭建规划,建立物理域、虚拟域和虚拟自治域。
物理域是指对应于现实物理世界中的实际网络域,即物理上独立并与外部网络有明确物理隔离边界的网络,亦即“真实空间”,且一般都包含唯一的控制内外网络连接的网关设备;物理域具有若干个物理代理节点和数据孤岛节点。物理代理节点用于提供物理域中的计算资产(物理代理节点自身包含的CPU、内存、磁盘、网络等)和连接物理域中的数据资产(数据孤岛节点的物理数据库、数据源等)。物理代理节点通过容器技术对外提供计算资产和数据资产的访问服务,其中用于提供计算资产访问服务的是计算容器,用于提供数据资产访问服务的是数据容器;数据容器是装载孪生数据集的载体,并对外提供相应的访问。物理代理节点可以同时运行多个计算容器和数据容器,物理代理节点提 供的所有计算容器和数据容器在其所属的虚拟域内部代理节点上都有一个代理容器相对应,并最终通过虚拟域内部代理节点对外提供访问服务。同一个物理域中的物理代理节点之间可以互联。物理代理节点并不直接对外提供访问服务,而是通过虚拟域内部代理节点来完成访问服务的代理转发。
虚拟域内部代理节点用于代理同一个物理域中的多个物理代理节点和多个数据孤岛节点。虚拟域内部代理节点代理属于它的物理代理节点,进而代理物理代理节点的所有容器,任何外部对容器及容器内装载的孪生数据集的访问都需要经过虚拟域内部代理节点,即所有容器在虚拟域内部代理节点上都有一个代理容器相对应;虚拟域内部代理节点也代理属于它的数据孤岛节点,进而代理基于数据孤岛节点产生的所有孪生数据集。同一个虚拟域内部代理节点的所有物理代理节点都可以访问它代理的所有数据孤岛节点。同一个物理代理节点或数据孤岛节点都只能唯一地挂载在一个虚拟域内部代理节点上。通过虚拟域内部代理节点,所有的计算资产和数据资产都可被注册到数据互联网中。通过信息降熵函数,完成信息从物理域的数据孤岛节点到虚拟域内部代理节点的孪生数据集的映射;通过容器技术,完成计算资源从物理域的物理代理节点到虚拟域内部代理节点的数据容器和计算容器的映射。虚拟域内部代理节点可以与同一个虚拟域中的路由器相连接,以完成数据在虚拟域内的路由。在同一个虚拟域中,每个虚拟域内部代理节点均具有虚拟域内的唯一IoD IP地址。
虚拟域外部代理节点是虚拟域的网关节点,虚拟域中只有一个节点可以作为虚拟域外部代理节点。虚拟域外部代理节点是虚拟域对外的网关节点,是该域的所有数据资产对接数据互联网网络的唯一出入口,即域外的任何用户或应用访问域内的任一数据容器和计算容器都必须通过虚拟域外部代理节点,虚拟域外部代理节点具有容器访问权限的控制、数据及指令的路由等功能。由于虚 拟域外部代理节点是虚拟域的网关,因此虚拟域外部代理节点具有IoD路由器的所有基本功能。虚拟域外部代理节点与外部的路由器进行连接,并通过路由器的路由服务完成数据的路由。虚拟域内部代理节点和虚拟域外部代理节点均可以包括MQ(Message Queue,消息队列)服务,并支持MQTT(Message Queuing Telemetry Transport,消息队列遥测传输)协议,用于帮助节点间的消息路由、存储、传递等功能的实现。在数据互联网中,每个虚拟域外部代理节点均有唯一IoD IP。不同的虚拟域内部可以使用各自独立的IoD IP分配策略,即组建独立的虚拟域局域网,并在确保虚拟域内部每个设备(虚拟域内部代理节点、路由器等)均有唯一IoD IP的前提下,由虚拟域外部代理节点通过NAT服务完成内外地址的转换。
步骤S102:建立物理代理节点与虚拟域内部代理节点的映射关系,以及建立虚拟域外部代理节点与虚拟自治域外部代理节点的映射关系。
虚拟域具有多个虚拟域内部代理节点和一个虚拟域外部代理节点,虚拟域内部代理节点和虚拟域外部代理节点均来自物理域实体组件的映射。虚拟域内部代理节点对应的主机设备均配置有IoD IP地址、IoD MAC地址和IoD子网掩码等。虚拟自治域也是一个逻辑域,虚拟自治域外部代理节点是虚拟域外部代理节点的映射。
步骤S103:建立虚拟域内部代理节点与虚拟域外部代理节点的路由连接,建立虚拟域外部代理节点与虚拟自治域外部代理节点的路由连接,形成数据互联网骨干。
虚拟域内部代理节点通过IoD路由器与虚拟域外部代理节点进行路由连接,IoD路由器的默认IP地址设置为虚拟域外部代理节点的IoD IP地址。虚拟自治域具有一个虚拟自治域网关(即虚拟自治域外部代理节点)和多个数据互联网 路由器。在实际应用中,可以使用多个具有相同功能的网关节点,从而可以达成服务的高可用性以及负载均衡。虚拟自治域中每个数据互联网路由器的默认IoD IP地址设置为虚拟自治域网关的IoD IP地址,将每个虚拟域中虚拟域网关的默认IP地址设置到虚拟自治域中某个数据互联网路由器上。虚拟自治域中还包括IoD NAT服务、IoD DNS服务、资源元数据系统和虚拟自治域管理系统等公共服务;对于IoD NAT服务,需要安装IoD NAT服务和为虚拟自治域中的各个主机设置专用的IoD IP。对于虚拟自治域中的公共服务均需要将它们的默认IoD IP地址设置为虚拟自治域网关的IoD IP地址。
步骤S104:将数据互联网骨干和基础服务组件组网,建立数据互联网。
基础服务组件包括公共资产元数据系统、IoD DNS和数据交易记录系统等。另外,还可以在数据互联网中建立一个单独的虚拟自治域,由数据互联网网络管理员进行管理,该单独的虚拟自治域中部署应用商店、数据交易系统、业务模型服务、数据模型服务和第三方应用等其他公共服务。在数据互联网中建立公共主机设备时,都需要为其配置IoD IP地址,在IoD DNS中配置其域名信息,在每个虚拟自治域网关的静态路由表中配置每个主机IoD IP对应的IoD MAC地址。
下面举例说明数据互联网的建立过程,其中涉及四个物理域,即物理域1、物理域2、物理域3和物理域4,如图6所示:
1)在物理域1中:建立物理代理节点p1、物理代理节点p2、数据孤岛节点1、数据孤岛节点2、数据孤岛节点3、虚拟域内部代理节点v1;所有物理代理节点和数据孤岛节点均不能与物理域1外联通,但它们之间可以互联,仅有虚拟域内部代理节点v1能与物理域1外联通。
2)在物理域2中:建立物理代理节点p3、数据孤岛节点4、数据孤岛节点 5、虚拟域内部代理节点v2、IoD路由器r1和虚拟域网关节点g1。所有物理代理节点和数据孤岛节点、虚拟域内部代理节点均不能与物理域2外联通,但它们之间可以互联,仅有IoD路由器r1和虚拟域网关节点g1能与物理域2外联通;物理域1的虚拟域内部代理节点v1能与物理域2的IoD路由器r1联通。
3)在物理域3中:建立物理代理节点p4、数据孤岛节点6、数据孤岛节点7、虚拟域内部代理节点v3、IoD路由器r2、虚拟域网关节点g2、IoD路由器r3和虚拟自治域网关节点g3。所有物理代理节点和数据孤岛节点、虚拟域内部代理节点、虚拟域网关节点均不能与物理域3外联通,但它们之间可以互联,仅有IoD路由器r3和虚拟自治域网关节点g3能与物理域3外联通;物理域2的虚拟域网关节点g1能与物理域3的IoD路由器r3进行联通。
4)在物理域4中:建立物理代理节点p5、数据孤岛节点8、数据孤岛节点9、虚拟域内部代理节点v4、IoD路由器r5、虚拟域网关节点g4、IoD路由器r4和虚拟自治域网关节点g5。所有物理代理节点和数据孤岛节点、虚拟域内部代理节点、虚拟域网关节点均不能与物理域4外联通,但它们之间可以互联,仅有虚拟自治域网关节点g5能与物理域4外联通;物理域3的虚拟自治域网关节点g3能与物理域4的虚拟自治域网关节点g5联通。
在物理域的网络拓扑建立完成后,就可以建立物理域的映射,形成虚拟域,如图6所示,其中涉及三个虚拟域,即虚拟域1、虚拟域2和虚拟域3:
1)在虚拟域1中:物理域1中的虚拟域内部代理节点v1映射为虚拟域内部代理节点v1’,物理域2中的虚拟域内部代理节点v2映射为虚拟域内部代理节点v2’、IoD路由器r1映射为IoD路由器r1’、虚拟域网关节点g1映射为虚拟域网关节点g1’;虚拟域内部代理节点v1’、虚拟域内部代理节点v2’和IoD路由器r1’均不能与虚拟域1外联通,仅有虚拟域网关节点g1’能与虚拟域1外 进行联通;通过虚拟域内部代理节点v1’和IoD路由器r1’的连接,使物理域1和物理域2共同构成了虚拟域1。
2)在虚拟域2中:物理域3中的虚拟域内部代理节点v3映射为虚拟域内部代理节点v3’、IoD路由器r2映射为IoD路由器r2’、虚拟域网关节点g2映射为虚拟域网关节点g2’;虚拟域内部代理节点v3’和IoD路由器r2’不能与虚拟域2外联通,仅有虚拟域网关节点g2’能与虚拟域2外联通;物理域3中的一部分节点构成了虚拟域2。
3)在虚拟域3中:物理域4中的虚拟域内部代理节点v4映射为虚拟域内部代理节点v4’、IoD路由器r5映射为IoD路由器r5’、虚拟域网关节点g4映射为虚拟域网关节点g4’;虚拟域内部代理节点v4’和IoD路由器r5’不能与虚拟域3外联通,仅有虚拟域网关节点g4’能与虚拟域3外联通;物理域4中的一部分节点构成了虚拟域3。
在虚拟域的网络拓扑建立完成后,就可以建立虚拟域的映射,形成虚拟自治域,如图6所示,其中涉及二个虚拟自治域,即虚拟自治域1、虚拟自治域2:
1)在虚拟自治域1中:物理域3中的IoD路由器r3映射为IoD路由器r3’、虚拟自治域网关节点g3映射为虚拟自治域网关节点g3’;虚拟域1的虚拟域网关节点g1’和虚拟域2的虚拟域网关节点g2’均连接到IoD路由器r3’,从而构成虚拟自治域1。IoD路由器r3’不能与虚拟自治域1外联通,仅有虚拟自治域网关节点g3’能与虚拟自治域1外联通;由图6可以看出,物理域1、物理域2和物理域3共同组成了虚拟自治域1。
2)在虚拟自治域2中:物理域4中的IoD路由器r4映射为IoD路由器r4’、虚拟自治域网关节点g5映射为虚拟自治域网关节点g5’;虚拟域3的虚拟域网 关节点g4’连接到IoD路由器r4’上,从而构成虚拟自治域2。IoD路由器r4’不能与虚拟自治域2外联通,仅有虚拟自治域网关节点g5’能与虚拟自治域2外联通;由图6可以看出,物理域4单独组成了虚拟自治域2。
3)虚拟自治域之间的连接:虚拟自治域之间均通过虚拟自治域网关节点进行直接连接;由图6可以看出,虚拟自治域网关节点g5’和虚拟自治域网关节点g3’直接连接,从而完成两个虚拟自治域的连接。
在虚拟自治域的网络拓扑建立完成后,就可以加入IoD DNS、资源元数据系统和数据交易记录系统等公共服务,这样若干个虚拟自治域和公共服务边构成了完整的数据互联网,如图7所示。
步骤S105:建立信息降熵函数,并根据信息降熵函数从数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与原始数据集对应的孪生数据集。
根据信息论,每个数据孤岛均是拥有数据的一个信源。基于数据孤岛信源,任一用户的任一数据查询任务均是一次需求独特的信息(有效数据)挖掘过程,即一个消除熵、消除不确定性、获取目标信息的过程,而这个过程中包含了对噪音(无效数据)的过滤工作。数据的有效和无效是针对具体数据查询分析任务而确定的,是相对的,例如对于一个查询任务而言是信息的数据对于另一个查询任务而言可能就是噪音。默认情况下,对于所有查询而言,一个数据孤岛信源的总数据量是同样大的,但是各个查询分析任务基于该信源的查询工作量却各不相同,这样需要过滤的噪音量也是不一样的,需要处理的噪音越多的查询分析任务的工作量越大。综合来看,查询分析任务的整体执行效率会被数据噪音过滤工作的效率所影响,而噪音过滤工作的效率又被噪音数量的大小所影响,因此降低噪音数量就可以提高查询分析任务 的整体执行效率。根据上述理论分析,本发明实施例提出和使用了信息降熵函数,信息降熵函数是降低信息熵的函数,它可以根据具体的查询分析任务而设计,用于帮助查询分析任务降低数据噪音数量,建立相应的目标孪生数据集,从而提高任务的整体执行效率。信息降熵函数是连接物理域数据孤岛到数据互联网的工具,能够完成数据从物理域到虚拟域的映射。
基于数字孪生理论,本发明实施例提出和使用了孪生数据集,孪生数据集是通过信息降熵函数从原始物理数据集抽取,并将抽取出的数据通过脱敏函数/加密函数等进行安全加密后得到的子数据集,如图5所示。孪生数据集本质上是其对应的物理模型的一个虚拟模型,它包含数字孪生的虚拟模型的基本特性,例如实时动态性、双向性等。除了数字孪生的特点外,本发明实施例的孪生数据集还具有如下特有的功能和优点:
1)基于实际业务场景:在数据互联网中,孪生数据集不是原始物理数据集的简单拷贝,而是根据实际业务场景的需要和特点指定相应的数据提取算法和规则,然后根据该数据提取算法和规则从原始物理数据集中抽取出数据。
2)数据子集:在实际场景中,通过信息降熵函数建立的孪生数据集的数据规模通常会小于其原始数据集,从而能够显著提高基于孪生数据集的数据采集和分析效率。但是,孪生数据集的数据规模的大小主要是由具体的实际业务情况和信息降熵函数来确定,理论上也可以将全量的原始数据集映射为孪生数据集。另外,孪生数据集也可以是空集。
3)安全性:孪生数据集可以是原始数据集的全集或部分子集的映射,它的真实内容完全等同于原始数据集的内容,降熵函数没有进行加密或脱敏等操作,一旦孪生数据集的数据泄漏可能会等同造成原始数据集的数据泄漏,有安全方面的隐患。因此,通常会使用脱敏/加密函数使孪生数据集获得很高的数据安全 性,从而大大拓展它的实际应用场景。
4)等效性:针对目标问题域中核心计算逻辑,在孪生数据集上的计算效果与在“真实空间”原始数据集上的计算效果一致。
5)动态实时性:孪生数据集可随着其对应的物理模型的变化而进行动态实时地变化,从而保证了基于孪生数据集的分析结果的时效性。
6)双向性:不仅可以从物理模型生成对应的孪生数据集,而且还可以将基于孪生数据集的计算分析结果反馈给物理模型,从而形成双向的、不间断的闭环信息反馈。物理模型可以根据反馈补充其某些纬度的信息,或者对产品进行持续的优化。
7)支持批量或实时流计算:信息降熵函数主要侧重规则和算法的定义,不限制其具体实现方法、使用的技术、计算的形式等。通常可以根据具体业务场景和特点,选择使用定时或间隔性的批处理计算方式,或是选择实时的流计算方式,或是批处理和流计算结合的计算方式。
8)提供访问API(Application Programming Interface,应用程序接口):对于孪生数据集,系统可以向应用程序提供功能丰富和灵活的API,从而方便应用的使用和提高分析效率。例如,可以提供数据查询分析的API,这样方便用户进行OLTP和OLAP产讯任务;可以提供数据集监听的API,应用程序可以对指定的数据集进行监听,这样在孪生数据集发生变化的时候系统可以提醒应用程序,从而可以触发应用程序的某些设定逻辑。一般通过数据互联网的物理代理节点所提供的数据容器来提供本服务。
9)提供计算能力:孪生数据集不仅可以是提供查询的数据集,而且还可以提供计算能力。为了使用孪生数据集的计算能力,通常需要在数据互联网的物理代理节点中提供相应的计算容器,该计算容器具有必要且大小可配置的硬件 支持(如CPU、内存、硬盘等)。该计算容器的计算任务基于但不限于其对应的孪生数据集的数据,并且可以执行不同的、可定制化的计算任务类型。对于数据集为空的孪生数据集,同样可以为其建立计算容器。计算能力使得数据互联网在提供对外查询服务的同时,也可以接受计算下推命令、完成边缘计算等任务。
10)可兼容第三方访问接口:孪生数据集不仅可以直接提供数据集的访问接口,而且还可以兼容第三方访问接口。第三方访问接口包括但不限于数据查询接口和数据计算接口,例如JDBC(Java Database Connection)接口、MapReduce计算接口、Presto计算查询接口、边缘计算框架接口(如EdgeX等)、Tensorflow计算接口、FATE(Federated AI Technology Enabler)计算接口、安全多方计算(Secure Muti-party Computation)框架等。第三方应用或平台发送给孪生数据集的访问命令会被数据互联网路由传递到孪生数据集所对应的计算容器上,该计算容器会根据协议和具体任务要求完成相应孪生数据集数据的读取和计算,并最终将计算结果通过数据互联网返回给第三方应用或平台。整个调用过程对第三方应用或平台来说是透明的,第三方应用或平台将对孪生数据集的访问的计算操作均当作在本地完成。通常通过数据互联网的物理代理节点所提供的计算容器来提供本服务。
在信息降熵函数建立完成后,根据信息降熵函数从数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与原始数据集对应的孪生数据集。如图7所示,在各个域里的各个独立数据库(例如数据孤岛1、数据孤岛3、数据孤岛5、数据孤岛7等)上的数据集(例如d1、d2、d3、d4等),均可以通过信息降熵函数和脱敏/加密函数生成孪生数据集(例如d1-1、d1d2-1、d3-1、d4-1等);一个原始数据集可以生成一个孪生数据集 (例如通过数据集d3生成孪生数据集d3-1),也可以若干个原始数据集生成一个孪生数据集(例如通过数据集d1和数据集d2生成孪生数据集d1d2-1)。孪生数据集的优点在于可以根据业务需求缩小对原始数据集的访问范围和提供安全的数据集。
孪生数据集是由数据互联网的物理代理节点所提供的计算容器计算生成的,该计算容器包括信息降熵函数和脱敏/加密函数等程序逻辑,计算生成的孪生数据集会加载到对应的数据容器中,从而对外提供数据访问服务。如图7所示,物理代理节点p2通过其提供的计算容器c1将原始数据集d1和d2计算生成孪生数据集d1d2-1,孪生数据集d1d2-1加载在数据容器d1d2-1中。
数据互联网的物理代理节点承担了所有实际的数据服务和计算服务任务,而数据容器和计算容器的对外访问管理是通过虚拟域内部代理节点来完成的。如图7所示,物理代理节点p2挂载的数据容器d1d2-1在数据互联网中实际由虚拟域1的虚拟域内部代理节点v1’来代理(即d1d2-1’)。
在建立完孪生数据集、提供相应的数据容器和计算容器服务之后,应用就可以通过数据互联网完成跨物理域的数据资源和计算资源的访问工作。如图7所示,应用A通过连接数据互联网的虚拟自治域1的虚拟自治域网关节点g3’完成对虚拟自治域1中的数据资源和计算资源的访问,而应用B通过连接数据互联网的虚拟自治域1和虚拟自治域2的虚拟自治域网关节点g3’和g5’完成对虚拟自治域1和虚拟自治域2中的数据资源和计算资源的访问。
除了应用,数据互联网还兼容第三方访问接口。如图8所示,MapReduce计算框架和Tensorflow计算框架均能够通过访问数据互联网的虚拟自治域1和虚拟自治域2的虚拟自治域网关节点g3’和g5’完成对虚拟自治域1和虚拟自治域2中的数据资源和计算资源的访问。MapReduce计算框架可以将虚拟 域1中数据容器d1-1’所承载的孪生数据集当作计算任务的输入(在程序中指定数据文件路径为数据互联网里的孪生数据集域名),同时将map计算任务在计算容器c1’中完成、将reduce计算任务在计算容器c2’中完成,从而通过数据互联网的数据资源和计算资源完成整个计算任务。同时,可以在数据互联网的计算容器里运行MapReduce计算框架的任务管理程序(例如JobTracker,或者Application Master等)。由图8可以看出,对于MapReduce计算框架来说,通过数据互联网就可以在无代码更改的前提下完成不同域、远程的数据源和计算资源的访问和使用,效果等同于本地化的调用,而具体的数据网络路由、安全认证等工作均是由数据互联网自己完成、对MapReduce计算框架来说都是透明的,因此数据互联网理论上可以无缝地集成第三方接口、应用或框架。数据互联网兼容其它第三方接口、应用或框架的思路可以参考且不限于上述的方案(兼容MapReduce计算框架方案)。
步骤S106:对孪生数据集执行计算任务,并将计算结果还原到与孪生数据集对应的原始数据集。
用户可以根据孪生数据集设计对应的计算任务,并在申请相应权限后执行计算任务。平台会提供必要API和相应功能给应用,以方便用户可以较高效地完成数据读取和任务开发。例如,通过提供针对孪生数据集的监听接口,应用可以很便捷地对目标数据集进行监听,并自动执行预设的相关程序逻辑。
由于孪生数据集与原始数据集具有等效性,因此基于孪生数据集的计算结果等同于在原始数据集上的计算结果,进而可以将在孪生数据集中的计算结果通过虚拟域的根节点逐级下发回到其对应的数据孤岛中。如果计算结果数据经过加密,那么通过对应解密(对称加密、同态加密等)等算法还原出原始计算结果数据。将计算结果数据存储到数据孤岛中,从而完成计算结果 的还原操作。在这个过程中,计算任务都是在计算容器中完成,包括使用计算好的结果同步更新孪生数据集和原始数据集的任务。
为了更加清楚地阐述本发明实施例的技术方案,下面通过一典型应用实例中数据互联网的建立和使用过程,详细说明并验证本发明实施例数据互联方法的可行性。
本典型实例是通过数据互联网查找出通过电子卡口的套牌车,并协助业务A机关打击此类违法行为。所谓套牌车是指不法分子伪造和非法套取真牌车的号牌、型号和颜色,使走私、拼装、报废和盗抢来的车辆在表面披上了“合法”的外衣。套牌就是贴标签,参照真实牌照,将号码相同的假牌套在其他车上。图9为本典型实例现实中实际的数据分布情况。X省业务A综合数据均是按照各个地市来分布的,存储于不同的物理域中,每个物理域又包含若干个数据库。例如:业务A综合数据包含的物理域有“X省业务A物理域1”、“X省业务A物理域2”、“X省业务A物理域3”;“X省业务A物理域1”包含的物理域节点是物理节点1、物理节点2、物理节点3,包含的数据孤岛是d1、d2;“X省业务A物理域2”包含的物理域节点是4、5,包含的数据孤岛是d3;“X省业务A物理域3”包含的物理域节点是8、9,包含的数据孤岛是d5。X省业务B部门有一个物理域为“X省业务B物理域1”,该域包含的物理域节点是6、7,包含的数据孤岛是d4,该数据孤岛存储了全省交通部门数据。假设各个地市的业务A综合数据库中相关数据结构均相同,包含相关表如下:人员信息表RYXX,包括属性字段人员编号RYBH、姓名NAME、身份证号SFZH、地址ADDRESS、电话TEL;人员照片表RXZP,包括属性字段人员编号RYBH、照片PHOTO。假设交通信息数据库中相关结构为:车辆信息表VEHICLE,包括车牌HP、车主身份证号SFZH;卡口车 辆行驶记录表EVENT,包括车牌HP、时间TIME、经过卡口名KAKOU。很明显,通过业务A综合数据和交通部门数据进行联合查询就可以根据车牌号来查询车主的详细信息。
查找套牌车的通用方法步骤如下:
1)通过电子卡口识别路过的所有车的车牌号码;
2)如果城市电子卡口网络中相邻时间节点的电子卡口识别出相同的车牌,并且这两个电子卡口最短距离除以上述相邻时间差所获得的平均车速超过可接受速度范围(例如120公里/小时),那么该车牌即为可疑套牌车;
但是上述方法在现行技术中难以实现。由于管理边界、数据规模、数据实时性要求,至今各省(市)卡口数据库呈“数据孤岛”状,相互之间的数据无法融合,因此均不知道对方数据库里经过卡口的行车车牌,从而无法在任一数据孤岛直接找出存在跨域信息的套牌车。然而现行的分析技术通常是将所有数据孤岛的数据通过ETL手段抽取传送到一个集中的系统中,然后做联合分析,但是这种方案的实时性较差,往往是“T+1”的反馈时效,例如晚1天得到分析结果,这样增加了业务A机关办理套牌车案件的难度、降低了数据的价值。
本发明实施例提供的数据互联网能够解决跨数据孤岛抓取套牌车的问题,过程如下:
一、建立数据互联网主干,包括如下步骤:
1)建立数据互联网基础服务。
2)根据实际真实情况,建立物理域。首先在物理域的真实网络环境里,根据需求,建立该物理域中应包含的所有物理代理节点、虚拟域内部代理节点、网关、路由器等主机。如图9所示,在实际情况中总 共有4个物理网络域,因此建立对应的4个物理域,即X省业务A物理域1、X省业务A物理域2、X省业务A物理域3和X省业务B物理域1:
a)在X省业务A物理域1中:建立物理代理节点p1、物理代理节点p2、数据孤岛节点1、数据孤岛节点2、数据孤岛节点3、虚拟域内部代理节点v1。
b)在X省业务A物理域2中:建立物理代理节点p3、数据孤岛节点4、数据孤岛节点5、虚拟域内部代理节点v2、路由器r1、虚拟域网关节点g1。
c)在X省业务A物理域3中:建立物理代理节点p5、数据孤岛节点8、数据孤岛节点9、虚拟域内部代理节点v4、路由器r5、虚拟域网关节点g4、路由器r4、虚拟自治域网关节点g5。
d)在X省业务B物理域1中:建立物理代理节点p4、数据孤岛节点6、数据孤岛节点7、虚拟域内部代理节点v3、路由器r2、虚拟域网关节点g2、路由器r3、虚拟自治域网关节点g3。
3)根据业务需求,建立虚拟域:
a)建立X省业务A虚拟域1。该虚拟域包含映射自X省业务A物理域的虚拟域内部代理节点v1'、虚拟域内部代理节点v2'、路由器r1'、虚拟域网关节点g1'。由图9可以看出,X省业务A虚拟域1包含了X省业务A物理域1和X省业务A物理域2两个物理域对应的虚拟域节点,因此是一个跨物理域的虚拟域。
b)建立X省业务B虚拟域1。该虚拟域包含映射自X省业务B物理域1的虚拟域内部代理节点v3'、路由器r2'、虚拟域网关 节点g2'。由图9可以看出,该虚拟域与实际物理域一一对应。
c)建立X省业务A虚拟域2。该虚拟域包含映射自X省业务A物理域3的虚拟域内部代理节点v4'、路由器r5'、虚拟域网关节点g4'。由图9可以看出,该虚拟域与实际物理域一一对应。
二、根据数据互联网主干,建立孪生数据集。
1)建立信息降熵函数。对于X省业务A虚拟域,其所有数据源包含的业务A的数据结构相同,可以统一为它们建立一个信息降熵函数。对于X省业务B虚拟域,由于业务A数据和业务B数据结构不同,因此需要为X省业务B虚拟域单独建立信息降熵函数。
●业务B数据:
■只有一个物理域和一个数据库的数据集d4,该数据集d4中包含的表是人员信息表RYXX和人员照片表RXZP,其信息降熵函数是将这两张表里任何记录的变动(新增/删除/更新)操作均发送到这两张表对应的孪生数据集(人员信息表RYXX'、人员照片表RXZP')中,并在孪生数据集中执行对应的操作以复制相应的变动。由于这部分数据量比较小,并且变动相对较少,因此信息降熵函数并不需要做额外复杂的操作,就可以建立映射孪生数据集;
●业务A数据:
■对任何物理域中的任意数据孤岛中的相关数据集,例如d1、d2、d3、d5,其数据库中车辆信息表VEHICLE的信息降熵函数是将这该表里任何记录的变动(新增/删除/更新)操作都发送到该表对应的孪生数据集中,并在孪生数据集(车输信息表 VEHICLE')中执行对应的操作以复制相应的变动。由于这部分数据量比较小,并且变动相对较少,因此该信息降熵函数并不需要做额外复杂的操作,就可以建立映射孪生数据集;
■对每个物理域,为其在物理代理节点上建立相应的数据容器(如d1-1、d3-1、d5-1等),然后在数据容器里为其建立一个名为卡口车牌集合的数据结构,该数据结构用来存储其域内所有数据孤岛的卡口表中出现过的车牌HP。通过卡口车牌集合判断一个车牌是否已经存在于其中,或者新增一个车牌到其中。卡口车牌集合可被命名为“域Y_HP_KAKOU_SET”,例如图9中几个业务A域的卡口车牌集合名分别为“域1_HP_KAKOU_SET”、“域2_HP_KAKOU_SET”和“域3_HP_KAKOU_SET”;
■对每个物理域,在其物理代理节点上相应的数据容器(如d1-1、d3-1、d5-1等)中,为其包含的所有数据孤岛的卡口车辆行驶记录表EVENT统一建立一个唯一的孪生数据集,可命名为“域Y卡口车辆行驶记录表EVENT'”,例如图9中几个业务A域的对应孪生数据集为“域1卡口车辆行驶记录表EVENT'”、“域2卡口车辆行驶记录表EVENT'”和“域3卡口车辆行驶记录表EVENT'”。对应孪生数据集的表中每一个车牌都只会有一条记录,该记录主要记录的信息为车牌HP、最新一次通过的卡口名KAKOU、最新一次通过卡口的时间TIME等信息;
■对每个物理域,当其中任何数据孤岛中的卡口车辆行驶记录表EVENT有新增数据时,获取该新增数据中包含的车牌号,并根据“域Y_HP_KAKOU_SET”判断该车牌是否已经存在于其中。本 文中命名该新增车牌号为车牌N1。如果已经存在,不需要做任何操作;如果不存在,则进行以下操作:
i.在该物理域中,将该车牌信息存入对应的“域Y_HP_KAKOU_SET”中;
ii.在该物理域中,将卡口车牌信息存入对应的孪生数据集的“域Y卡口车辆行驶记录表EVENT'”,并记录车牌HP、最新一次通过的卡口名KAKOU、最新一次通过卡口的时间TIME等信息;
iii.在所有物理域中,依次访问每一个物理域对应的卡口车牌集合“域Y_HP_KAKOU_SET”,并判断其是否包含车牌N1。如果没有包含,那么不需要做任何操作;如果包含,那么需要:
a)对于该物理域,访问其物理域的所有数据孤岛的卡口车辆行驶记录表EVENT,并寻找每个数据孤岛中关于车牌N1的最近一条信息;
b)对于该物理域,对比每个数据孤岛关于车牌N1的最近信息的时间,并且选择出时间最新的一条记录,将该记录发送到其孪生数据集“域Y卡口车辆行驶记录表EVENT'”并更新该车牌对应记录的最新一次通过的卡口名KAKOU、最新一次通过卡口的时间TIME等信息。
2)建立孪生数据集。首先,根据步骤1)中提到的逻辑,在其物理代理节点上相应的数据容器(如d1-1、d3-1、d5-1等)中,可以先建立空的孪生数据集。例如:交通数据的“人员信息表RYXX'”和“人员照片表RXZP'”;业务A数据的“车输信息表VEHICLE'”、“域1卡口车辆行驶记录表 EVENT'”、“域2卡口车辆行驶记录表EVENT'”和“域3卡口车辆行驶记录表EVENT'”。然后,将步骤1)所定义好的信息降熵函数在数据互联网的计算容器中作为常驻应用运行起来,这样这些信息降熵函数就可以不断更新对应的孪生数据集。在建立孪生数据集的时候,基于安全因素考虑,可以通过执行脱敏算法/加密算法(对称加密、非对称加密等)来为孪生数据集提高安全级别,从而可以对外提供安全的数据集。
3)对孪生数据集执行计算任务,并将计算结果还原到与孪生数据对应的数据孤岛。
i.查找套牌车应用的具体逻辑可以在本步骤实现,例如图9中的应用1。
应用具体逻辑是:
●监听X省业务A虚拟域中的所有“域Y卡口车辆行驶记录表EVENT''”,即“域1卡口车辆行驶记录表EVENT''”、“域2卡口车辆行驶记录表EVENT''”和“域3卡口车辆行驶记录表EVENT''”;
●当任意一个“域Y卡口车辆行驶记录表EVENT''”发生数据变更时,系统会提醒应用,此时应用执行的逻辑是:
a)在发生变更的“域Y卡口车辆行驶记录表EVENT''”里获取到当前发生变更的记录对应的车牌号,简称变更车牌号;
b)通知系统去查询每个“域Y卡口车辆行驶记录表EVENT''”,看其是否拥有变更车牌号对应的记录。如果没有,则不做任何操作;如果有,则调用系统,让系统查询该“域Y卡口车辆行驶记录表EVENT''”对应的物理数据源的数据中变更车牌号相关的最新一条记录,并且用该条记录更新“域Y卡口车辆行驶记录 表EVENT''”中该变更车牌号相关的记录信息,例如车牌HP、最新一次通过的卡口名KAKOU、最新一次通过卡口的时间TIME等信息;
c)在更新过当前变更车牌号的所有“域Y卡口车辆行驶记录表EVENT''”中,选出最靠近当前时间的“最新一次通过卡口的时间TIME”的两条记录,然后进行计算。即根据这两条记录的卡口之间的距离,以及两个时间点的时间差,计算出车完成这段路程所需的最小速度(速度=距离/时间差)。随后,将该速度和应用提前设置好的阈值进行比较。如果速度小于阈值,证明汽车很可能在正常行驶,是套牌车的概率低,因此不做任何处置;如果速度大于阈值,则说明出现在不同卡口的这两辆车不太可能是同一辆汽车,因此它们其中至少一辆是套牌车的概率较高,从而应用应该触发告警。例如,当阈值为120公里/小时,如果速度为100公里/小时,不做任何报警;当速度为130公里/小时,则立即向这两个卡口所属的域的业务A机关报告相关情况。
ii.将计算结果还原到与孪生数据集对应的数据孤岛。在第i步执行的时候,可以发现可疑的套牌车,并向相关卡口所属的域的业务A机关报告相关情况。然而,对孪生数据集执行计算任务得到的计算结果是加密后的数据,即加密后的车牌号、卡口号和物理域号,但是通过解密程序可以轻松地获取到原始的车牌号、卡口号和物理域号。由于在孪生数据集/孪生数据集上的计算效果与原始数据集的计算效果一致,因此可以通过数据互联网将计算结果从虚拟空间回传到对应的物理域中,并且完成解密工作。此时,物理域中的数据库和 相关应用就能得到可疑的套牌车的原始数据信息,从而提醒和协助业务A机关进行可疑套牌车的进一步追踪工作。
以上验证了通过数据互联网能够解决跨数据孤岛抓取套牌车的问题,可以看出数据互联网的确能够安全高效地完成跨数据孤岛和数据域的联合分析计算。
本发明实施例提供的数据互联网方法,支撑跨多个安全域数据“互联”,兼容各类数据安全边界,能够满足数据共享和交易过程中产生的诸多技术要求,具体优势表现在:
1)兼容数据孤岛各种安全约束边界
数据孤岛有自己的“域”,即数据孤岛的边界,一般以企业边界、部门边界或者网络边界体现,在一个域内数据的安全约束相同。基于多数据孤岛进行的关联性分析,相比于通常具有统一数据安全约束的集中式数据集,通常需要跨域完成。而完成多数据孤岛跨域关联性分析,需要以不打破每个数据孤岛的安全约束边界为前提。
2)大规模连接异地数据孤岛
数据孤岛联合分析相比于集中式数据分析的显著区别在于数据孤岛通常分布在多个地方,即异地分布性。这就会产生数据孤岛联合分析过程中无法避免的网络异构性、窄带宽下计算任务QoS、传输可靠性与安全性等技术问题。而实际的数据分析需求则希望能够克服这些技术困难,大规模地连接数据孤岛,进而完成场景更丰富、数据覆盖面更广的联合分析任务。
3)动态扩展数据网络
集中式数据集通常不支持计算模型运行时动态拓展数据模型(包括异构和同构数据模型),只能够支持基于既有数据模型对数据集的拓展。而数据孤岛联 合分析在分析过程中,需要面对动态接驳或者断开的异构数据孤岛,即异构数据模型的动态拓展与伸缩性。
4)实时分析
集中式数据集通常不支持实时的数据分析任务,一般的分析结果都是有一定延迟,例如常见的“T+1”(T为1天或1小时)。这主要受限于集中式数据分析需要对不同域的数据定期批量做数据集中提取、转换和加载操作(ETL),它无法实时立刻得到每个数据域的最新数据变化。但是,越来越多的业务场景却对数据的实时性提出了更高的要求,传统的集中式数据分析方案已无法满足这类数据分析需求。
参见图10,本发明实施例还提供了一种数据互联网系统,包括:
第一模块,用于建立数据互联网骨干,并将数据互联网骨干和基础服务组件组网,建立数据互联网;
第二模块,用于建立信息降熵函数,并根据信息降熵函数从数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与原始数据集对应的孪生数据集;
第三模块,用于对孪生数据集执行计算任务,并将计算结果还原到与孪生数据集对应的原始数据集。
在具体应用中,第一模块包括:
第一单元,用于建立物理域、虚拟域和虚拟自治域;物理域具有若干个物理代理节点,虚拟域具有若干个虚拟域内部代理节点、路由器和一个虚拟域外部代理节点;虚拟自治域具有一个虚拟自治域外部代理节点;
第二单元,用于建立物理代理节点与虚拟域内部代理节点的映射关系,建立虚拟域外部代理节点与虚拟自治域外部代理节点的映射关系;
第三单元,建立虚拟域内部代理节点与虚拟域外部代理节点的路由连接,建立虚拟域外部代理节点与虚拟自治域外部代理节点的路由连接,形成数据互联网骨干;
第四单元,用于将数据互联网骨干和基础服务组件组网,建立数据互联网。
本发明实施例提供的数据互联网方法及系统,通过建立物理域、虚拟域、虚拟自治域、信息降熵函数、孪生数据集,使数据孤岛以一种贴近真实业务、高效、实时、安全的方式进行互联,并提供方便、灵活且可控的联合分析手段给用户。本发明实施例提供的数据互联网方法,有助于连通数据孤岛,能够帮助企业或业务部门建立大规模、动态、异地、异构、多属主数据的安全共享与复杂分析网络,从而可有效推动数据孤岛之间的数据共享和交易平台的建设,促进数据资源的分享和联合分析,进而释放出更多隐藏的数据价值,有助于建立健全数据资源交易机制和市场发展。
在实际应用中,本实施例中所涉及的各个功能模块及单元,均可以由运行在计算机硬件上的计算机程序实现,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的硬件指的是包含一个或者多个处理器和存储介质的服务器或者台式计算机、笔记本电脑等;所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等;所述计算机程序由不限于C、C++等计算机语言实现。
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (12)

  1. 一种数据互联网方法,其特征在于,包括:
    建立数据互联网骨干,并将所述数据互联网骨干和基础服务组件组网,建立数据互联网;
    建立信息降熵函数,并根据所述信息降熵函数从所述数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与所述原始数据集对应的孪生数据集;
    对所述孪生数据集执行计算任务,并将计算结果还原到与所述孪生数据集对应的原始数据集。
  2. 如权利要求1所述的数据互联网方法,其特征在于,所述建立数据互联网骨干的步骤具体包括:
    建立物理域、虚拟域和虚拟自治域;所述物理域具有若干个物理代理节点,所述虚拟域具有若干个虚拟域内部代理节点、路由器和一个虚拟域外部代理节点;所述虚拟自治域具有一个虚拟自治域外部代理节点;
    建立所述物理代理节点与虚拟域内部代理节点的映射关系,建立所述虚拟域外部代理节点与虚拟自治域外部代理节点的映射关系;
    建立所述虚拟域内部代理节点与虚拟域外部代理节点的路由连接,建立所述虚拟域外部代理节点与虚拟自治域外部代理节点的路由连接,形成数据互联网骨干。
  3. 如权利要求2所述的数据互联网方法,其特征在于,所述虚拟域还具有IoD DNS、IoD NAT和前置设备。
  4. 如权利要求2所述的数据互联网方法,其特征在于,所述虚拟自治域还 具有资产元数据系统、IoD DNS、IoD NAT和管理系统。
  5. 如权利要求2所述的数据互联网方法,其特征在于,所述基础服务组件包括公共资产元数据系统、DNS和数据交易记录系统。
  6. 如权利要求2所述的数据互联网方法,其特征在于,所述物理代理节点用于提供物理域中的计算资产和连接物理域中的数据资产;所述物理代理节点通过容器技术对外提供计算资产和数据资产的访问服务,其中用于提供计算资产访问服务的是计算容器,用于提供数据资产访问服务的是数据容器;所述数据容器是装载孪生数据集的载体,并对外提供相应的访问。
  7. 如权利要求2所述的数据互联网方法,其特征在于,所述信息降熵函数根据具体的查询分析任务而设计,用于帮助查询分析任务降低数据噪音数量,建立相应的目标孪生数据集,从而提高任务的整体执行效率;所述信息降熵函数是连接物理域数据孤岛到数据互联网的工具,能够完成数据从物理域到虚拟域的映射。
  8. 如权利要求2所述的数据互联网方法,其特征在于,所述孪生数据集是原始数据集的全集或部分子集的映射,或是空集。
  9. 如权利要求2所述的数据互联网方法,其特征在于,所述对抽取出的数据进行加密的步骤具体为对抽取出的数据使用脱敏/加密函数加密。
  10. 如权利要求2所述的数据互联网方法,其特征在于,所述孪生数据集是提供查询的数据集,并且具有计算能力和兼容第三方访问接口。
  11. 一种数据互联网系统,其特征在于,包括:
    第一模块,用于建立数据互联网骨干,并将所述数据互联网骨干和基础服务组件组网,建立数据互联网;
    第二模块,用于建立信息降熵函数,并根据所述信息降熵函数从所述数据互联网上的原始数据集中抽取出数据,对抽取出的数据进行加密,生成与所述原始数据集对应的孪生数据集;
    第三模块,用于对所述孪生数据集执行计算任务,并将计算结果还原到与所述孪生数据集对应的原始数据集。
  12. 如权利要求11所述的数据互联网系统,其特征在于,所述第一模块包括:
    第一单元,用于建立物理域、虚拟域和虚拟自治域;所述物理域具有若干个物理代理节点,所述虚拟域具有若干个虚拟域内部代理节点、路由器和一个虚拟域外部代理节点;所述虚拟自治域具有一个虚拟自治域外部代理节点;
    第二单元,用于建立所述物理代理节点与虚拟域内部代理节点的映射关系,建立所述虚拟域外部代理节点与虚拟自治域外部代理节点的映射关系;
    第三单元,建立所述虚拟域内部代理节点与虚拟域外部代理节点的路由连接,建立所述虚拟域外部代理节点与虚拟自治域外部代理节点的路由连接,形成数据互联网骨干;
    第四单元,用于将所述数据互联网骨干和基础服务组件组网,建立数据互联网。
PCT/CN2022/070183 2021-04-01 2022-01-04 一种数据互联网方法及系统 WO2022206089A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110356423.7 2021-04-01
CN202110356423.7A CN115168315A (zh) 2021-04-01 2021-04-01 一种数据互联网方法及系统

Publications (1)

Publication Number Publication Date
WO2022206089A1 true WO2022206089A1 (zh) 2022-10-06

Family

ID=83457883

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070183 WO2022206089A1 (zh) 2021-04-01 2022-01-04 一种数据互联网方法及系统

Country Status (2)

Country Link
CN (1) CN115168315A (zh)
WO (1) WO2022206089A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866442A (zh) * 2009-04-15 2010-10-20 廊坊市信息资源管理办公室 一种公务网络平台系统
EP2849091A1 (en) * 2013-09-16 2015-03-18 Siemens Aktiengesellschaft Method and system for merging data into a database table
CN105893599A (zh) * 2016-04-20 2016-08-24 北京云宏信达信息科技有限公司 时序数据的比对方法及系统
CN105957349A (zh) * 2016-04-20 2016-09-21 北京云宏信达信息科技有限公司 一种跨地区套牌车识别的方法及系统
US20170041192A1 (en) * 2015-08-07 2017-02-09 Hewlett-Packard Development Company, L.P. Cloud models based on logical network interface data
CN111949830A (zh) * 2019-05-17 2020-11-17 即云天下(北京)数据科技有限公司 离散式索引方法与系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866442A (zh) * 2009-04-15 2010-10-20 廊坊市信息资源管理办公室 一种公务网络平台系统
EP2849091A1 (en) * 2013-09-16 2015-03-18 Siemens Aktiengesellschaft Method and system for merging data into a database table
US20170041192A1 (en) * 2015-08-07 2017-02-09 Hewlett-Packard Development Company, L.P. Cloud models based on logical network interface data
CN105893599A (zh) * 2016-04-20 2016-08-24 北京云宏信达信息科技有限公司 时序数据的比对方法及系统
CN105957349A (zh) * 2016-04-20 2016-09-21 北京云宏信达信息科技有限公司 一种跨地区套牌车识别的方法及系统
CN111949830A (zh) * 2019-05-17 2020-11-17 即云天下(北京)数据科技有限公司 离散式索引方法与系统

Also Published As

Publication number Publication date
CN115168315A (zh) 2022-10-11

Similar Documents

Publication Publication Date Title
CN110351381B (zh) 一种基于区块链的物联网可信分布式数据共享方法
CN111988338B (zh) 基于区块链的权限可控的物联网云平台及数据交互方法
CN108134764B (zh) 一种分布式数据共享交换方法及系统
US8538981B2 (en) Stream sharing for event data within an enterprise network
CN102947797B (zh) 使用横向扩展目录特征的在线服务访问控制
Xu et al. Blendsm-ddm: Blockchain-enabled secure microservices for decentralized data marketplaces
CN106982136B (zh) 一种多域分层的多域物联网平台及多域管理方法
WO2021143462A1 (zh) 基于前置机进行数据交换的系统及方法
CN111984717A (zh) 一种大数据智慧政务平台信息管理方法
CN112702402A (zh) 基于区块链技术实现政务信息资源共享和交换的系统、方法、装置、处理器及其存储介质
WO2020186807A1 (zh) 一种基于区块链技术的电力数据链接系统及方法
US9830333B1 (en) Deterministic data replication with conflict resolution
CN112835977B (zh) 一种基于区块链的数据库管理方法及系统
CN112866380B (zh) 一种基于区块链的链网架构
CN102207978A (zh) 数据库访问方法和系统
CN105225072A (zh) 一种多应用系统的访问管理方法及系统
Demichev et al. The approach to managing provenance metadata and data access rights in distributed storage using the hyperledger blockchain platform
CN101771724B (zh) 异构分布式信息集成方法、装置及系统
CN104881749A (zh) 面向多租户的数据管理方法和数据存储系统
WO2022206089A1 (zh) 一种数据互联网方法及系统
CN113011960A (zh) 基于区块链的数据访问方法、装置、介质及电子设备
CN116800541A (zh) 一种航班运行数据分类分级访问控制及访问方法
CN117014175A (zh) 云系统的权限处理方法、装置、电子设备及存储介质
CN110109949A (zh) 社会信用信息服务平台
CN115296866A (zh) 一种边缘节点的访问方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778278

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22778278

Country of ref document: EP

Kind code of ref document: A1