CN117033487A - System and method for flexibly arranging interfaces based on data sharing - Google Patents

System and method for flexibly arranging interfaces based on data sharing Download PDF

Info

Publication number
CN117033487A
CN117033487A CN202311009226.3A CN202311009226A CN117033487A CN 117033487 A CN117033487 A CN 117033487A CN 202311009226 A CN202311009226 A CN 202311009226A CN 117033487 A CN117033487 A CN 117033487A
Authority
CN
China
Prior art keywords
data
unit
module
metadata
subunit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311009226.3A
Other languages
Chinese (zh)
Other versions
CN117033487B (en
Inventor
张煇
王瑾锋
杨勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changhe Information Co ltd
Beijing Changhe Digital Intelligence Technology Co ltd
Original Assignee
Changhe Information Co ltd
Beijing Changhe Digital Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changhe Information Co ltd, Beijing Changhe Digital Intelligence Technology Co ltd filed Critical Changhe Information Co ltd
Priority to CN202311009226.3A priority Critical patent/CN117033487B/en
Publication of CN117033487A publication Critical patent/CN117033487A/en
Application granted granted Critical
Publication of CN117033487B publication Critical patent/CN117033487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a system and a method for flexibly arranging interfaces based on data sharing, which relate to the technical field of data transmission and comprise the following steps: the scheduling module is used for acquiring metadata; the metadata management module is used for acquiring metadata from the centralized metadata base; the data distribution module is used for acquiring actual data and sending the actual data to a user in a P2P mode; the data storage module is used for storing and returning actual data; the user access module is used for sending a data reading request and receiving actual data; the scheduling module, the metadata management module, the data distribution module and the data storage module adopt a containerized micro-service architecture. Aiming at the problem of low data transmission efficiency in the prior art, the application provides a system and a method for flexibly arranging interfaces based on data sharing, metadata management and incremental data processing are realized through flexible interface arrangement, and data access and transmission efficiency are improved through asynchronous transmission and cache access technologies, so that the performance of the whole system is improved.

Description

System and method for flexibly arranging interfaces based on data sharing
Technical Field
The invention relates to the technical field of data transmission, in particular to a system and a method for flexibly arranging interfaces based on data sharing.
Background
The current society is in the digital and networked rapid development stage, and various industries and systems generate a large amount of data, so that the data has important application value. However, each system and each data exist in isolation, and cannot be effectively shared and multiplexed, so that the problem of low data utilization rate is caused.
The existing flexible interface arranging system based on data sharing has the defects that the network connection mode between the systems is low in efficiency, real-time data transmission cannot be performed, and the data access efficiency is low; the efficiency of the data processing mode is low, the transmission data amount is large, and the system overhead is large; the flexibility of arrangement between interfaces is poor, dynamic assembly is not possible, and different service requirements are difficult to adapt.
In the related art, for example, in chinese patent document CN111897878A, a primary and secondary data synchronization method is disclosed, the method includes: acquiring a data file in a main data source centralized database through a first-level message middleware, and analyzing the data file to acquire corresponding routing field information; according to the system route, providing the corresponding data file to a message queue corresponding to a distributed database in the second-level message middleware for pulling; synchronizing the pulled data file to a corresponding distributed database through a message queue; however, this solution has at least the following technical problems:
There are a large number of invalid and duplicate data transmissions, and the system data transmission efficiency is low; each time access needs to be read from the storage, the problems of blocking and access delay are easy to occur, and the access efficiency is reduced; and the context information is synchronously transmitted, so that transmission blockage is easy to generate, and the data transmission efficiency is reduced.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problem of low data transmission efficiency in the prior art, the invention provides a system and a method for flexibly arranging interfaces based on data sharing, which improve the data access and transmission efficiency of the system through metadata management, incremental data processing, asynchronous transmission, cache access and the like.
2. Technical proposal
The aim of the invention is achieved by the following technical scheme.
An aspect of the embodiments of the present specification provides a system for flexibly arranging interfaces based on data sharing, including:
the scheduling module is used for acquiring metadata; the metadata management module is connected with the scheduling module and is used for acquiring metadata from the centralized metadata base; the data distribution module is connected with the scheduling module and is used for acquiring actual data and sending the actual data to a user in a P2P mode; the data storage module is used for storing and returning actual data; the user access module is used for sending a data reading request and receiving actual data; the scheduling module, the metadata management module, the data distribution module and the data storage module adopt a containerized micro-service architecture.
Still further, the metadata management module includes: a receiving unit configured to receive an acquisition request; the query unit is connected with the receiving unit and is used for querying and returning metadata; the updating unit is respectively connected with the receiving unit and the inquiring unit and is used for incrementally updating the metadata; the updating synchronization unit is connected with the updating unit and is used for detecting and synchronizing metadata changes; and the caching unit is connected with the query unit and is used for caching elements with access frequency larger than a threshold value.
Still further, the data distribution module comprises: the data acquisition unit is used for acquiring actual data; the access control unit is connected with the data acquisition unit and used for access control; the asynchronous transmission unit is connected with the data acquisition unit and is used for asynchronously transmitting actual data; an increment updating unit connected with the data acquisition unit and used for transmitting only the actual data changing part; the data encryption unit is connected with the asynchronous transmission unit and used for encrypting and transmitting actual data; and the data transmission unit is respectively connected with the asynchronous transmission unit and the data encryption unit and is used for transmitting actual data by P2P.
Still further, the asynchronous transfer unit includes: a data receiving subunit, configured to receive actual data; the data receiving subunit is connected with the data receiving subunit and used for buffering the actual data with different priorities; a network state detection subunit, configured to detect a network state; the priority setting subunit is connected with the priority buffer queue subunit and is used for setting the priority of the actual data according to the priority parameter; the dynamic scheduling subunit is respectively connected with the network state detection subunit and the priority setting subunit and is used for dynamically adjusting the data transmission sequence according to the network state and the priority of the actual data; the priority parameter comprises a data type, a service attribute and a time stamp; the asynchronous transmission unit realizes dynamic priority scheduling through priority setting and dynamic scheduling.
Still further, the delta update unit includes: a data comparison subunit for comparing the current data with the history data to obtain a change part; a change identification subunit for identifying the change portion; a delta compression subunit for compressing the changing portion; a difference lifting subunit for extracting the compressed change portion; a transmission control subunit configured to transmit only the extracted altered portion; the current data is actual data obtained from the data storage module at the current moment; the historical data is the actual data cached at the moment before the current moment.
Still further, the access control unit includes: a policy subunit for defining an access control policy; a user subunit for maintaining user information; the permission subunit is connected with the strategy subunit and is used for granting the user access permission; and the auditing subunit is used for accessing the control log record.
Still further, the data storage module comprises: a receiving unit for receiving actual data; the de-duplication unit is connected with the receiving unit and is used for de-duplication of the actual data; the compression unit is connected with the de-duplication unit and used for compressing actual data; the encryption unit is connected with the compression unit and used for encrypting actual data; the storage unit is connected with the encryption unit and used for storing actual data; and the data reading unit is connected with the storage unit and used for reading data.
Still further, the system further comprises: the scheduling module container is used for packaging the scheduling module; the metadata management module container is used for packaging the metadata management module; the data distribution module container is used for packaging the data distribution module; the data storage module container is used for packaging the data storage module; the dispatch module container, the metadata management module container, the data distribution module container and the data storage module container are connected by adopting a service registration and discovery mechanism.
Another aspect of the embodiments of the present disclosure further provides a method for flexibly arranging interfaces based on data sharing, including: receiving a data reading request of a user; responding to the data reading request, and acquiring metadata in a metadata management module by a scheduling module; the metadata management module acquires metadata from the centralized metadata base according to the request for acquiring the metadata, and returns the metadata to the scheduling module; the data storage module acquires actual data corresponding to the metadata and sends the actual data to the data distribution module; the data distribution module adopts a P2P mode to send actual data to a user.
Still further, the method further comprises: the scheduling module, the metadata management module, the data distribution module and the data storage module adopt a containerized micro-service architecture; the metadata management module only carries out increment update on the metadata change; the data distribution module performs asynchronous transmission, incremental updating, fine granularity access control and data encryption processing.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
(1) Metadata management and incremental data synchronization are used, invalid data transmission is reduced, a caching mechanism is introduced, frequently accessed data is cached, actual data access time is reduced, asynchronous transmission context data is used, synchronous blocking is avoided, data are compressed and de-duplicated, the transmission data volume of the data is reduced, and the transmission efficiency of multi-source data is improved;
(2) Through the containerized micro-service architecture and metadata management, flexible connection and assembly among different systems are realized, different service requirements are met, and flexible interface arrangement capability is improved;
(3) The data encryption and the access control improve the security of the system and reduce the system interruption caused by security risks. Drawings
The present specification will be further described by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:
FIG. 1 is an exemplary block diagram of a system for flexible orchestration of interfaces based on data sharing, shown according to some embodiments of the present description;
FIG. 2 is an exemplary block diagram of a metadata management module shown in accordance with some embodiments of the present description;
FIG. 3 is an exemplary block diagram of a data distribution module shown in accordance with some embodiments of the present description;
FIG. 4 is an exemplary block diagram of a data storage module shown in accordance with some embodiments of the present description;
FIG. 5 is an exemplary flow chart of a method for flexible orchestration of interfaces based on data sharing, according to some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.
It should be appreciated that as used in this specification, a "system," "apparatus," "unit" and/or "module" is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.
As used in the specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
Based on the technical problems, the specification provides a system and a method for flexibly arranging interfaces based on data sharing.
The method and system provided in the embodiments of the present specification are described in detail below with reference to the accompanying drawings.
FIG. 1 is an exemplary block diagram of a system for flexible orchestration of interfaces based on data sharing, as shown in FIG. 1, according to some embodiments of the present description, comprising: a scheduling module 110, a metadata management module 120, a data distribution module 130, a data storage module 140, and a user access module 150. Wherein, the scheduling module 110 is configured to obtain metadata; the metadata management module 120 is connected to the scheduling module 110, and is configured to obtain metadata from a centralized metadata database; the data distribution module 130 is connected with the scheduling module 110, and is used for acquiring actual data and sending the actual data to a user in a P2P mode; the data storage module 140 is used for storing and returning actual data; the user access module 150 is used for transmitting a data read request and receiving actual data. The scheduling module 110, the metadata management module 120, the data distribution module 130, and the data storage module 140 are all constructed using a containerized micro-service architecture. I.e. each module is packaged in a micro-service form in a respective container, and the containers are connected through a service registration and discovery mechanism. In this way, flexible connection and assembly between different systems are realized by adopting the containerized micro-service architecture and uniformly allocating the cooperation among the modules by the scheduling module so as to meet different service demands. Meanwhile, the metadata management reduces the actual data transmission quantity and improves the data access efficiency. The system supports efficient sharing and utilization of multi-source heterogeneous data.
Specifically, the present scheme may be clarified from different data streams, specifically as follows:
1. user access request flow direction: (1) And the user sends a data reading request through the user access module. The request contains information such as an identification of the access data. (2) the user access module sends the request to the scheduling module. (3) The scheduling module calls an interface of the metadata management module, and corresponding data source information is queried according to the identification. (4) The metadata management module searches metadata information such as the position and the data format of the data source from the centralized metadata base and returns the metadata information to the scheduling module.
2. Metadata flow direction: (1) The metadata management module acquires metadata of each data source in a crawler, pushing and other modes. (2) The metadata management module stores metadata into a centralized metadata base, supporting incremental updates. (3) The metadata management module provides a query interface to the scheduling module for accessing the metadata database.
3. Actual data flow direction: (1) The scheduling module generates a data access task and sends the data access task to the data distribution module. The task contains information such as actual data position, data format and the like. (2) The data distribution module calls an interface provided by the data storage module to acquire actual data. (3) The data storage module reads data from the storage system and returns. (4) The data distribution module uses a P2P mode to send data to the user access module. (5) Optionally, the user access module feeds back the access log to the scheduling module.
More specifically, the scheduling module adopts a service registration and discovery mechanism to abstract and externally open a data source through a standard interface so as to realize unified access to multi-source heterogeneous data. Each functional module is constructed in a micro-service mode and is packaged in a container, so that loose coupling of functions is realized. The metadata management module performs centralized management on the metadata of the data sources and supports efficient data source discovery. The data distribution module distributes data in a P2P mode, so that the data access efficiency is improved. The method realizes the effective integration of multi-source heterogeneous data, the flexible expansion of the system, the discoverability of data sources and the improvement of data access performance, and supports the efficient sharing and utilization of data.
The container is preferably a Docker container, and the application program is packaged into a standardized unit by using Docker technology so as to realize quick and portable application program deployment and operation. And each functional module is packaged into a standardized mirror image by using a Docker, so that consistency of service delivery and deployment is realized. The service deployment can be realized rapidly and consistently based on the Docker mirror image, and the deployment online time is shortened. The Docker container provides an independent operating environment for each service instance, and applications do not interfere with each other. The Docker ensures that the mirror image can be operated unchanged in different environments, thereby facilitating service migration. The Docker supports fast horizontal extension service instances, meeting high concurrency demands. Full lifecycle management of applications can be achieved through Docker composition orchestration.
Wherein, the centralized metadata base refers to a metadata warehouse built in the system, and metadata information of each data source is stored in the warehouse in a centralized way, such as data format, data content description, data unique identifier and the like. The metadata management module can access the centralized metadata base through a standard interface to acquire the required metadata. By adopting the centralized metadata base, metadata can be prevented from being scattered in each data source, unified management is facilitated, and the metadata management module can acquire global metadata information in a simple and centralized mode, so that subsequent operations such as data access and transmission are guided.
Among them, the P2P system is a point-to-point (Peer-to-Peer) data transmission system. This approach allows data transfer directly between participating nodes (peers) in the network without the need for a central server. In the system, a data distribution module adopts a P2P mode to send actual data to a user. That is, the data distribution module may establish a connection directly with the user access module and send data over the connection without forwarding the data through an intermediate central server. The expandability and the transmission efficiency of the system can be improved by adopting the P2P mode. As the number of users increases, it is also easy to expand by adding data distribution nodes.
Wherein the containerized micro-service architecture is an architectural way to divide a single application into a series of loosely coupled smaller services. These fine-grained microservices are packaged in separate containers that communicate over a network. In the system, a dispatching module, a metadata management module, a data distribution module and a data storage module all adopt a containerized micro-service architecture. I.e., each module is packaged as a separate microservice and run in a respective container instance. The isolation and security provided by the container technology enables a loose coupling between the modules. By adopting the containerized micro-service architecture, the examples of each module can be elastically expanded according to the requirements, and the scalability of the system is realized. And meanwhile, the reusability and maintainability of the service are improved.
The service registration and discovery mechanism refers to a mechanism for realizing mutual discovery and access among micro service instances in a containerized micro service architecture. By this mechanism, each micro-service instance will actively register its own meta-information such as network address with the registry. The registry maintains a service name to service instance address mapping table. When one micro service needs to call another micro service, the registry is firstly queried for the instance address of the service needing to be called, and then the target service instance is accessed through the returned address. Thus, registration and discovery between services are realized without manually maintaining access addresses of service instances. In the system, the mechanism is adopted to realize the access and collaboration among different containerized modules.
In summary, through cooperation between each module, the technical effect of improving the data access efficiency is realized: the metadata management module reduces repeated data transmission, the data distribution module improves transmission efficiency by using P2P, asynchronous transmission and other modes, and the data storage module performs optimization processing such as compression, encryption and the like on data so as to reduce data quantity. And each module is taken as an organic whole, so that the data access and transmission performance of the system are improved together.
FIG. 2 is an exemplary block diagram of a metadata management module shown in accordance with some embodiments of the present description, as shown in FIG. 2, the metadata management module 120 includes: a receiving unit 121, a querying unit 122, an updating unit 123, an updating synchronizing unit 124, and a buffering unit 125.
Wherein the receiving unit 121 is configured to receive a request for acquiring metadata; the query unit 122 is connected to the receiving unit 121, and is configured to query and return corresponding metadata; the updating unit 123 is respectively connected to the receiving unit 121 and the querying unit 122, and is used for incrementally updating metadata; the update synchronization unit 124 is connected to the update unit 123, and is configured to detect a change of metadata and synchronize the change to the centralized metadata base; the caching unit 125 is connected to the query unit 122, and is configured to cache metadata elements with access frequency greater than a preset threshold.
By receiving the metadata acquisition request by the receiving unit 121, the querying unit 122 queries and returns metadata, the updating unit 123 continuously updates metadata in increment, the update synchronization unit 124 synchronizes metadata changes to the centralized metadata base, and the frequent access metadata cache of the cache unit 125 and other modules cooperate, so that efficient and accurate metadata management is realized, and support is provided for subsequent data access.
Wherein the metadata is data describing the data, which interprets characteristics and context information of the data. Metadata gives information about the basic properties, meaning, relationships, etc. of data, but does not include specific content of the data itself. In the present system, metadata may include information such as data format, data source, unique data identifier, association relationship between data, and the like. Such information may be used to direct specific data access, identification, processing, etc. Metadata is adopted to manage a large number of distributed data sets, rather than directly operating the data sets, unnecessary data transmission and processing can be effectively reduced, and the data access efficiency of the system is improved.
The system counts and analyzes the frequency of each metadata element to be accessed, and when the frequency of one metadata element to be accessed exceeds a preset threshold, the metadata element is cached in the cache memory. The purpose of the metadata cache is to increase the read speed of frequently accessed metadata elements. Because metadata management requires handling metadata for a large number of data sources, reading metadata directly from a metadata repository may be slow. And the cache part of the hot spot metadata can avoid the need of inquiring and reading the metadata base every time, thus greatly reducing the access time and improving the efficiency of system metadata management.
In summary, in the multi-source heterogeneous data sharing and utilizing scenario, there are problems of complex data format, unclear semantics, and the like. Directly operating on raw data to achieve data sharing can result in a large amount of data integration work, and cannot support data interoperability and application quickly and efficiently. The metadata management module effectively solves the technical problems by collecting and managing metadata of each data source: 1. the metadata collects information such as data format, data meaning and the like, and reduces the complexity of directly processing the original data. 2. The data can be better understood based on the metadata, and data interoperation across systems is realized. 3. The centralized management of the metadata supports the efficient acquisition of information of various data from one place, and reduces the difficulty of acquiring the data. 4. The metadata can guide the data processing flow, greatly reduce invalid data transmission and improve the efficiency of data sharing and utilization. 5. The progressive update of metadata can quickly adapt to changes in multi-source data. Through the improvement of the technical effects, the metadata management module effectively supports sharing, interoperation and efficient utilization of massive and heterogeneous data.
Fig. 3 is an exemplary block diagram of a data distribution module shown in accordance with some embodiments of the present description, as shown in fig. 3, the data distribution module 130 includes: a data acquisition unit 131, an access control unit 132, an asynchronous transmission unit 133, an incremental update unit 134, a data encryption unit 135, and a data transmission unit 136.
The data obtaining unit 131 is configured to invoke the data storage module interface to obtain actual data; the access control unit 132 is connected to the data acquisition unit 131, and is configured to control data access according to the access rights allocated by the scheduling module; the asynchronous transmission unit 133 is connected to the data acquisition unit 131, and is configured to transmit data in an asynchronous manner, so as to improve transmission efficiency; the increment updating unit 134 is connected to the data acquiring unit 131, and is configured to transmit only the data changing portion, reducing the data amount; the data encryption unit 135 is connected to the asynchronous transmission unit 133, and is used for ensuring the data transmission security by using encryption technology; the data transmission unit 136 is connected to the asynchronous transmission unit 133 and the data encryption unit 135, respectively, and transmits data to the user access module using a P2P transmission protocol.
Specifically, the data acquisition unit 131: and calling an open standardized data access interface of the data storage module to acquire data of various data sources, so that the heterogeneity of the data sources is shielded. The access control unit 132: and verifying and controlling the access rights of the data acquisition unit to different data according to the access tokens distributed by the scheduling module, so that the safety of the data is ensured. Asynchronous transfer unit 133: and the data transmission is carried out in an asynchronous mode, and the sender and the receiver execute concurrently, so that the waiting time of synchronous transmission is avoided, and the transmission rate is improved. Delta updating unit 134: and the transmitted data is updated and transmitted in an increment mode, only the change part is transmitted, and the transmission quantity of redundant data is reduced. The data encryption unit 135: encryption/decryption techniques are used on the transmission path to prevent leakage of data during transmission. A data transmission unit 136: and the P2P mode is utilized to carry out data distribution among nodes, so that high-speed data distribution is realized.
More specifically, the heterogeneity of the masked data sources refers to the data acquisition unit invoking a standardized data access interface that is open to the data storage module without concern for specific differences in the underlying different data sources. The key here is that the data storage module opens a unified interface, with all possible differences encapsulated inside the interface. The externally provided access interface remains consistent whether the underlying layer is a relational database or unstructured storage. Therefore, the data acquisition unit can always acquire data through the same set of standard interfaces, and the differences of access modes, data formats, interaction protocols and the like of different data sources do not need to be concerned or adapted. This acts to mask the isomerism. The shielding isomerism can reduce the development difficulty of the system, reduce the customization workload aiming at different data sources and improve the expansibility of the system.
In summary, when the data distribution module transmits a large amount of sensitive data, the data safety is ensured, and the efficient distribution and transmission are realized; the data distribution module distributes the data to the user access module efficiently and safely by acquiring the actual data, performing access control, asynchronous transmission, incremental updating, data encryption, P2P transmission and other processes, and provides low-delay and reliable data access service for users.
Asynchronous transmission refers to that in the process of data transmission, a sender and a receiver can execute respective program logic in parallel, and the sequential synchronous execution is not needed. In the data distribution module of the system, after adopting the asynchronous transmission technology, the data acquisition unit can continuously read the data, and the data transmission unit can transmit the acquired data to the user access module in parallel. They do not require strict synchronization and can handle the respective jobs concurrently, thereby improving the overall efficiency of data transmission. In contrast to asynchronous transmission, synchronous transmission, a sender needs to wait for a receiver to finish processing before continuing to send the next data. The limitation of synchronous transmission is that it is easy to cause a waiting time of a sender, resulting in a reduction in transmission speed.
Wherein the access control unit 132 includes: policy subunit: is responsible for defining access control strategies, and can specify the access authority range, access subject constraint and other rules of different types of data. The policy is stored in the form of an access control matrix; subscriber subunits: maintaining identity information and attributes of all users of the system, and providing basic data support for authorization; rights subunit: according to the access control strategy, under the support of the user sub-unit, accurately granting each user access right to the appointed data category; audit subunit: all access control related important logs are recorded for post-hoc auditing and analysis.
Specifically, the access control policy is defined by the policy subunit first, and then the user information is managed by the user subunit. When an access request exists, the authority subunit judges whether to grant the access authority or not based on the policy and the user information. While the audit subunit logs.
In summary, the unified access control policy explicitly defines the access authority range; perfect user management provides basic data support for access control; accurate authority control only grants specific access authority to authorized users; and the detailed access log supports audit and examination. The access control unit realizes the safe access control management in the open data environment, effectively ensures the data security, realizes the fine-grained access control by separating the strategy, the user, the authority and the audit function, only opens the data access authority to the authorized user, and ensures the data security.
The asynchronous transfer unit 133 includes: the data receiving subunit receives various data to be transmitted in real time. The multiple priority buffer queue subunits are used for buffering data with different priorities, and data with high priority can be first in a high-priority queue. The network state detection subunit monitors network state changes, such as bandwidth, packet loss rate, and the like. The priority setting subunit dynamically sets the transmission priority of each piece of data according to priority parameters such as the data type, the service attribute, the timestamp and the like. The dynamic scheduling subunit dynamically adjusts the data transmission sequence of each queue by combining the network state and the data priority, and realizes intelligent optimized transmission scheduling. By the scheme, the asynchronous transmission unit realizes the function of dynamically optimizing the data transmission sequence according to the network environment and the importance of the data. Compared with a fixed priority transmission strategy, the transmission quality and the overall transmission efficiency of important data are greatly improved.
Specifically, the priority is set according to the priority parameter, and the priority level can be determined according to the actual data type and service attribute according to the following principle: 1. the data type is the highest data priority of the video, and is set to level 1.2. The data type is the data priority order of voice, set to level 2.3. The data priority for a data type file or text is typically low, set to level 3.4. For the data of the same type, the service attribute is that the priority of monitoring alarm is higher than that of common service, and the level is improved by 1 level. 5. The data with the latest data time stamp is advanced by 1 level in priority. 6. And adjusting the priority according to the data size, and reducing the data priority with larger volume by 1 level. And comprehensively judging through factors such as data types, service attributes, time stamps, data sizes and the like, setting priority levels, and then preferentially transmitting data with high priority levels.
In conclusion, an asynchronous transmission mode is adopted, and the sending end and the receiving end execute concurrently, so that waiting is avoided, and transmission efficiency is improved; a plurality of priority queues are arranged, important data and common data can be distinguished, and important data transmission obtains priority; detecting network state change, dynamically scheduling queue sequence, improving overall throughput when bandwidth is abundant, and ensuring transmission quality of important data when bandwidth is tense; optimizing the data transmission sequence according to the service requirement, instead of simple first-come first-serve, so as to meet the service transmission requirement; compared with a fixed priority strategy, the transmission sequence is more intelligent and dynamic, and the transmission quality and the overall efficiency of important data are greatly improved. By means of asynchronous transmission, dynamic priority scheduling and the like, the asynchronous transmission unit fully improves the transmission efficiency and transmission quality in a mass data sharing scene, effectively solves the difficult problem faced by data transmission, and has obvious technical advantages.
Wherein the delta updating unit 134 includes: the data comparison subunit is used for comparing the currently acquired data with the data cached in the history piece by piece to find out the part of the data change. And the change identification subunit is used for marking changed data, such as setting a change flag bit. And the differential compression subunit compresses the identified change part by using a compression algorithm, so as to reduce the transmission load. The difference extraction subunit extracts the compressed modified partial data. And a transmission control subunit transmitting only the extracted modified partial data, not the complete data. The current data is the latest real-time data acquired from the data storage module, and the historical data is the old data cached before the current moment. The incremental update unit transmits only the necessary data change portions by data comparison, change identification, delta compression and extraction. Compared with the transmission of complete data, the data volume and the transmission load are greatly reduced, and the efficiency of a transmission system is improved.
Specifically, the data change part can be found out by the following technical scheme: a data comparison subunit: and carrying out data comparison by adopting a hash check mode. Dividing the current data and the historical data into data blocks with a certain length, calculating the hash value of each data block, comparing the hash values of the current data block and the historical data block, and judging whether the data block is changed or not. A change identification subunit: and setting a change flag bit field with the length of 1 bit. When the data block comparison result is changed, the change mark position of the corresponding data block is set to be 1, which indicates that the data block is changed. Otherwise, the change flag bit is kept to be 0, which indicates that the data block is unchanged. The comparison is carried out by adopting the data block division and hash check modes, so that the comparison times of the complete data can be effectively reduced, and the comparison efficiency is optimized. Meanwhile, the single-bit change flag bit is very compact, and only a small amount of storage overhead is added.
Specifically, the compression algorithm is an algorithm capable of reducing the amount of data. It receives the original data as input and outputs the compressed data by a specific compression method. Common compression algorithms include: -LZW algorithm: and statistically analyzing the input data, and replacing repeated character strings with high occurrence frequency, so that the data volume is reduced. -huffman coding: with variable length coding, the characters with high frequency are given as shorter codes, and the characters with low frequency are given as longer codes, thereby reducing the total data amount. -run-length encoding: the repeated numbers are replaced by counters, and redundancy is reduced. -differential encoding: the data is differentially processed, and only differential values are encoded and transmitted. In the incremental updating subunit of the system, huffman coding or LZW algorithm and the like can be selected to compress differential data, and change information is stored by less data quantity, so that the aim of reducing transmission load is achieved.
In summary, in a massive data transmission scenario, a high transmission load is generated when a large amount of repeated data is completely transmitted. The increment updating unit finds out the change part through data comparison, only transmits necessary change data, and greatly reduces the data quantity; the size of the changed data is further reduced by using a compression algorithm, so that the transmission load is reduced; only the identified change data is transmitted, a large amount of unchanged static data is filtered, and the network flow consumption is reduced; the frequent full transmission is not needed, and the load pressure of storage and a network is reduced; the data of the source end and the receiving end are synchronized without transmitting complete data, so that the synchronization efficiency is improved, and the increment updating unit greatly reduces the data transmission quantity, lightens the network load and improves the transmission efficiency of system data on the premise of ensuring the synchronization effect by means of identification change, compression transmission and the like.
FIG. 4 is an exemplary block diagram of the data storage module 140 shown in accordance with some embodiments of the present description, as shown in FIG. 4, the data storage module 140 includes: a receiving unit 141 for receiving actual data; a deduplication unit 142 connected to the receiving unit 141, for deduplicating the actual data; a compression unit 143 connected to the deduplication unit 142, for compressing the actual data; an encryption unit 144 connected to the compression unit 143 for encrypting the actual data; a storage unit 145 connected to the encryption unit 144 for storing the processed actual data; the data reading unit 146 is connected to the storage unit 145, and is configured to read the stored data according to the request.
Specifically, the receiving unit 141 is responsible for receiving various real-time data streams; the duplicate removal unit 142 performs duplicate data detection and filtering on the received data, thereby improving storage efficiency; the compression unit 143 compresses the data amount by using a compression algorithm, thereby reducing the occupation of the storage space; the encryption unit 144 encrypts data to ensure security; the storage unit 145 organizes data storage to a database or the like; the data reading unit 146 manages inquiry and read access of data. Through the cooperative work of the key units, the data storage module realizes the efficient and safe storage of mass data.
Among them, deduplication (Deduplication) is a data compression technique that is used to eliminate duplicate copies in data and reduce storage. The working principle is as follows: 1. dividing the data into data blocks with fixed sizes; 2. calculating a hash signature of each data block; 3. comparing hash signatures of different data blocks to find out repeated data blocks; 4. only unique data blocks are reserved and duplicate data blocks are deleted. Deduplication techniques are commonly applied in backup storage and archive storage scenarios. Because multiple copies of the same file or content have a large number of content duplicates, deduplication can effectively reduce storage. In the technical scheme of the data storage module, a deduplication unit based on local sensitive hash can be added after the receiving unit, deduplication processing is carried out on the received data, storage of redundant data is reduced, and storage efficiency is improved.
The data compression is a process of utilizing the statistical redundancy of data and using fewer bit numbers to represent information so as to reduce the original data under certain conditions. The aim is to reduce the presentation length of the data to reduce storage costs. The working principle of data compression is as follows: 1. analyzing the data characteristics and finding out the repeated modes and statistical rules in the data. 2. Using data compression algorithms (e.g., huffman coding, run length coding, etc.), repeatability is eliminated, shorter codes are set for patterns that occur frequently, and longer codes are set for patterns that occur infrequently. 3. And carrying out compression coding on the data to obtain compressed data. 4. Upon decompression, the original data is restored. In the technical scheme of the data storage module, a lossless compression unit based on an LZW algorithm can be added after the deduplication unit, so that data are compressed, and the occupied storage space is reduced.
In summary, the duplicate data is reduced by the duplicate removal technology, the storage efficiency is improved, and the cost is reduced; the compression coding technology reduces the data size and saves the storage space; the encryption technology ensures the security of data transmission and storage; optimizing a data organization structure, and supporting efficient query; data slicing and load balancing ensure the read-write performance of the system; the data storage module realizes high-performance and expandable storage of massive heterogeneous data on the premise of ensuring storage efficiency and data safety through various technical means.
FIG. 5 is an exemplary flow chart of a method for flexibly orchestrating interfaces based on data sharing according to some embodiments of the present description, as shown in FIG. 5, comprising the steps of: s210, the user access module provides a standard data reading interface, receives data reading requests initiated by various users, and the requests contain identification information of required data. S220, acquiring metadata according to the request, and after receiving the reading request, the scheduling module calls a query interface of the metadata management module to accurately search corresponding metadata information of the data source according to the data identification information. S230, acquiring actual data according to the metadata, searching a storage position where the actual data is located by using the metadata by the scheduling module, and calling a query interface of the data storage module to acquire the actual data. S240, the actual data is sent in a P2P mode, the scheduling module sends the actual data to the data distribution module, and the module distributes the actual data to the requesting user at a high speed by using a P2P network technology. And after the user receives the actual data, the user access module completes the whole data reading flow. Wherein, a containerized micro-service architecture is adopted to construct a system module; incremental updating is only performed on the metadata; asynchronous transmission, incremental updating, access control and data encryption are carried out on actual data.
In some embodiments, step S210 may be performed by the user access module 150, where the user access module is responsible for receiving various data reading requests initiated by a user; step S220 may be executed by the scheduling module 110, and after receiving the read request, the scheduling module may call the metadata management module to obtain corresponding metadata; step S230 may be performed by the scheduling module 110, where the scheduling module 110 queries and obtains actual data from the data storage module 140 using the obtained metadata; step S240 may be performed by the data distribution module 130, and the data distribution module 130 rapidly distributes actual data based on the P2P technology.
Specifically, a containerized micro-service architecture is constructed, a Docker and other container technologies are used, modules such as scheduling, metadata, data storage, data access and the like are deployed as micro-services according to system function requirements, and a registration and discovery mechanism between the micro-services is constructed. And seamless integration of heterogeneous data sources is realized through a unified external interface of the micro-service. And (3) realizing loose coupling connection of the modules, and registering meta-information such as addresses of all micro service instances by using a service registry after the micro service is constructed. The micro service actively registers itself when started, and discovers and accesses the provider instance from the registry when the service is called. A loose coupling between the modules is achieved. And the metadata management module only incrementally updates metadata, synchronizes the metadata in an incremental mode, periodically polls metadata change in the data source, and only updates the change part to the metadata database, thereby reducing the synchronization cost. The data security and the access efficiency are ensured, the actual data is transmitted in an asynchronous mode, and transmission blocking is avoided; updating in an incremental manner, and transmitting only the change part; realizing a fine-grained access control strategy; and the encryption technology is used for ensuring the data security. And the P2P technology is used for data distribution, a P2P component is integrated in a data distribution module, actual data are transmitted to multiple nodes in a slicing way, and the nodes are mutually exchanged to realize high-speed data distribution.
More specifically, the micro-service architecture enables loose coupling of applications by defining service contracts, thereby supporting seamless integration of heterogeneous systems; the service registration discovery mechanism fully utilizes the network position transparency principle to realize the dynamic binding of the service address; the incremental update metadata utilizes an incremental calculation model, so that a large amount of redundant data processing is avoided; asynchronous transmission is based on a pipeline parallel principle, and the P2P technology utilizes a distributed parallel computing principle, so that the data interaction efficiency is improved together; the cryptographic algorithm ensures confidentiality of data, and the access control mechanism provides rights management and authentication.
In summary, different systems can flexibly call data services based on standard interfaces; metadata pipes understand coupled data access; the P2P distribution improves the data interaction efficiency. The scheme realizes the efficient sharing and utilization of the multi-source heterogeneous data.
The foregoing has been described schematically the invention and embodiments thereof, which are not limiting, but are capable of other specific forms of implementing the invention without departing from its spirit or essential characteristics. The drawings are also intended to depict only one embodiment of the invention, and therefore the actual construction is not intended to limit the claims, any reference number in the claims not being intended to limit the claims. Therefore, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical scheme are not creatively designed without departing from the gist of the present invention, and all the structural manners and the embodiments belong to the protection scope of the present patent. In addition, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" preceding an element does not exclude the inclusion of a plurality of such elements. The various elements recited in the product claims may also be embodied in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Claims (10)

1. A system for flexible orchestration of interfaces based on data sharing, comprising:
the scheduling module is used for acquiring metadata;
the metadata management module is connected with the scheduling module and is used for acquiring metadata from the centralized metadata base;
the data distribution module is connected with the scheduling module and is used for acquiring actual data and sending the actual data to a user in a P2P mode;
the data storage module is used for storing and returning actual data;
the user access module is used for sending a data reading request and receiving actual data;
the scheduling module, the metadata management module, the data distribution module and the data storage module adopt a containerized micro-service architecture.
2. The system according to claim 1, wherein:
the metadata management module comprises:
a receiving unit configured to receive an acquisition request;
the query unit is connected with the receiving unit and is used for querying and returning metadata;
the updating unit is respectively connected with the receiving unit and the inquiring unit and is used for incrementally updating the metadata;
the updating synchronization unit is connected with the updating unit and is used for detecting and synchronizing metadata changes;
and the caching unit is connected with the query unit and is used for caching elements with access frequency larger than a threshold value.
3. The system according to claim 1, wherein:
the data distribution module comprises:
the data acquisition unit is used for acquiring actual data;
the access control unit is connected with the data acquisition unit and used for access control;
the asynchronous transmission unit is connected with the data acquisition unit and is used for asynchronously transmitting actual data;
an increment updating unit connected with the data acquisition unit and used for transmitting only the actual data changing part;
the data encryption unit is connected with the asynchronous transmission unit and used for encrypting and transmitting actual data;
and the data transmission unit is respectively connected with the asynchronous transmission unit and the data encryption unit and is used for transmitting actual data by P2P.
4. A system according to claim 3, characterized in that:
the asynchronous transmission unit includes:
a data receiving subunit, configured to receive actual data;
the data receiving subunit is connected with the data receiving subunit and used for buffering the actual data with different priorities;
a network state detection subunit, configured to detect a network state;
the priority setting subunit is connected with the priority buffer queue subunit and is used for setting the priority of the actual data according to the priority parameter;
the dynamic scheduling subunit is respectively connected with the network state detection subunit and the priority setting subunit and is used for dynamically adjusting the data transmission sequence according to the network state and the priority of the actual data;
The priority parameter comprises a data type, a service attribute and a time stamp;
the asynchronous transmission unit realizes dynamic priority scheduling through priority setting and dynamic scheduling.
5. A system according to claim 3, characterized in that:
the delta update unit includes:
a data comparison subunit for comparing the current data with the history data to obtain a change part;
a change identification subunit for identifying the change portion;
a delta compression subunit for compressing the changing portion;
a difference lifting subunit for extracting the compressed change portion;
a transmission control subunit configured to transmit only the extracted altered portion;
the current data are actual data obtained from the data storage module at the current moment; the historical data is the actual data cached at the moment before the current moment.
6. A system according to claim 3, characterized in that:
the access control unit includes:
a policy subunit for defining an access control policy;
a user subunit for maintaining user information;
the permission subunit is connected with the strategy subunit and is used for granting the user access permission;
and the auditing subunit is used for accessing the control log record.
7. The system according to claim 1, wherein:
the data storage module comprises:
a receiving unit for receiving actual data;
the de-duplication unit is connected with the receiving unit and is used for de-duplication of the actual data;
the compression unit is connected with the de-duplication unit and used for compressing actual data;
the encryption unit is connected with the compression unit and used for encrypting actual data;
the storage unit is connected with the encryption unit and used for storing actual data;
and the data reading unit is connected with the storage unit and used for reading data.
8. The system according to claim 1, wherein:
the system further comprises:
the scheduling module container is used for packaging the scheduling module;
the metadata management module container is used for packaging the metadata management module;
the data distribution module container is used for packaging the data distribution module;
the data storage module container is used for packaging the data storage module;
the dispatch module container, the metadata management module container, the data distribution module container and the data storage module container are connected by adopting a service registration and discovery mechanism.
9. A method for flexibly arranging interfaces based on data sharing, comprising:
receiving a data reading request of a user;
Responding to the data reading request, and acquiring metadata in a metadata management module by a scheduling module;
the metadata management module acquires metadata from a centralized metadata base according to the request for acquiring metadata, and returns the metadata to the scheduling module;
the data storage module acquires actual data corresponding to the metadata and sends the actual data to the data distribution module;
and the data distribution module transmits the actual data to the user in a P2P mode.
10. The method according to claim 9, wherein:
the method further comprises:
the scheduling module, the metadata management module, the data distribution module and the data storage module adopt a containerized micro-service architecture;
the metadata management module only carries out increment update on the metadata change;
the data distribution module performs asynchronous transmission, incremental updating, fine granularity access control and data encryption processing.
CN202311009226.3A 2023-08-11 2023-08-11 System and method for flexibly arranging interfaces based on data sharing Active CN117033487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311009226.3A CN117033487B (en) 2023-08-11 2023-08-11 System and method for flexibly arranging interfaces based on data sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311009226.3A CN117033487B (en) 2023-08-11 2023-08-11 System and method for flexibly arranging interfaces based on data sharing

Publications (2)

Publication Number Publication Date
CN117033487A true CN117033487A (en) 2023-11-10
CN117033487B CN117033487B (en) 2024-05-07

Family

ID=88644289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311009226.3A Active CN117033487B (en) 2023-08-11 2023-08-11 System and method for flexibly arranging interfaces based on data sharing

Country Status (1)

Country Link
CN (1) CN117033487B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252506A (en) * 2007-12-29 2008-08-27 中国建设银行股份有限公司 Data transmission system
CN109857570A (en) * 2018-12-29 2019-06-07 航天信息股份有限公司 The standardized data class micromodule of operation system
WO2020162680A1 (en) * 2019-02-08 2020-08-13 아콘소프트 주식회사 Microservice system and method
CN113094385A (en) * 2021-03-10 2021-07-09 广州中国科学院软件应用技术研究所 Data sharing fusion platform and method based on software definition open toolset
CN114138877A (en) * 2021-10-25 2022-03-04 杭萧钢构股份有限公司 Method, device and equipment for realizing theme data service based on micro-service architecture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252506A (en) * 2007-12-29 2008-08-27 中国建设银行股份有限公司 Data transmission system
CN109857570A (en) * 2018-12-29 2019-06-07 航天信息股份有限公司 The standardized data class micromodule of operation system
WO2020162680A1 (en) * 2019-02-08 2020-08-13 아콘소프트 주식회사 Microservice system and method
CN113094385A (en) * 2021-03-10 2021-07-09 广州中国科学院软件应用技术研究所 Data sharing fusion platform and method based on software definition open toolset
CN114138877A (en) * 2021-10-25 2022-03-04 杭萧钢构股份有限公司 Method, device and equipment for realizing theme data service based on micro-service architecture

Also Published As

Publication number Publication date
CN117033487B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
US10223506B2 (en) Self-destructing files in an object storage system
US8510275B2 (en) File aware block level deduplication
US9491104B2 (en) System and method for storing/caching, searching for, and accessing data
KR100592647B1 (en) System and method for a caching mechanism for a central synchronization server
US7546284B1 (en) Virtual message persistence service
US10853242B2 (en) Deduplication and garbage collection across logical databases
KR101570892B1 (en) Method and system of using a local hosted cache and cryptographic hash functions to reduce network traffic
US7409397B2 (en) Supporting replication among a plurality of file operation servers
CN110019240A (en) A kind of service data interaction method, apparatus and system
KR101150146B1 (en) System and method for managing cached objects using notification bonds
CN108536778B (en) Data application sharing platform and method
US20130191523A1 (en) Real-time analytics for large data sets
US20200128094A1 (en) Fast ingestion of records in a database using data locality and queuing
CN112445626B (en) Data processing method and device based on message middleware
CN104573064B (en) A kind of data processing method under big data environment
CN111327446B (en) Configuration data processing method, software defined network device, system and storage medium
CN113407600B (en) Enhanced real-time calculation method for dynamically synchronizing multi-source large table data in real time
US10503737B1 (en) Bloom filter partitioning
JP2009151560A (en) Resource management method, information processing system, information processor and program
CN117033487B (en) System and method for flexibly arranging interfaces based on data sharing
US20230401197A1 (en) Auto refresh of directory tables for stages
CN113810231B (en) Log analysis method, system, electronic equipment and storage medium
US20240143371A1 (en) Data Deduplication for Replication-Based Migration of Virtual Machines
WO2019120629A1 (en) On-demand snapshots from distributed data storage systems
US11989166B2 (en) Systems and methods for improved servicing of queries for blockchain data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant