CN114356873A - Data sharing system and method - Google Patents

Data sharing system and method Download PDF

Info

Publication number
CN114356873A
CN114356873A CN202210004214.0A CN202210004214A CN114356873A CN 114356873 A CN114356873 A CN 114356873A CN 202210004214 A CN202210004214 A CN 202210004214A CN 114356873 A CN114356873 A CN 114356873A
Authority
CN
China
Prior art keywords
data
cache
written
layer
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210004214.0A
Other languages
Chinese (zh)
Inventor
范东来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinabank Payments Beijing Technology Co Ltd
Original Assignee
Chinabank Payments Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinabank Payments Beijing Technology Co Ltd filed Critical Chinabank Payments Beijing Technology Co Ltd
Priority to CN202210004214.0A priority Critical patent/CN114356873A/en
Publication of CN114356873A publication Critical patent/CN114356873A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data sharing system and method, and relates to the technical field of big data. One embodiment of the system comprises: the user interaction layer is used for receiving a data processing request, wherein the data processing request comprises a data writing request, and the data writing request comprises data to be written; the data storage layer is used for persistently storing the data to be written according to the data writing request; a multi-level distributed cache layer having a plurality of cache nodes, each cache node having at least one tag; the multi-level distributed cache layer is used for performing label adaptation on data to be written, so as to determine a first target cache node from a plurality of cache nodes, and cache the data to be written to the first target cache node. The implementation method can carry out efficient data sharing by utilizing the customized and personalized tags without changing the existing storage pattern, and the management is carried out through a uniform interface, so that the details are hidden for users, and the development and the maintenance are convenient.

Description

数据共享系统、方法Data sharing system and method

技术领域technical field

本发明涉及大数据技术领域,尤其涉及一种数据共享系统、方法。The invention relates to the technical field of big data, and in particular, to a data sharing system and method.

背景技术Background technique

随着大数据时代的到来,数据产生的速度越来越快,种类也越来越多。它大量遍布于组织的内部与外部,并且随时不停地在流动,这些数据保存的位置也有可能遍布于不同的地理位置,这样的现状给数据使用者带来了很多困难,加上数据使用者通常也遍布于不同的地理位置,数据交换、共享进而使用的需求越来越频繁,这给数据高效分发和使用造成了很大的困难。With the advent of the era of big data, the speed of data generation is getting faster and faster, and there are more and more types. It is widely distributed inside and outside the organization, and it is constantly flowing at any time. The location of these data storage may also be located in different geographical locations. This situation brings a lot of difficulties to the data users. In addition, the data users They are usually located in different geographical locations, and the demand for data exchange, sharing and use is becoming more and more frequent, which has caused great difficulties in the efficient distribution and use of data.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明实施例提供一种数据共享系统、方法,能够在不改变现有存储格局的情况下,利用定制化、个性化标签进行高效的数据共享,通过动态组合标签实现灵活的缓存策略,从而将数据缓存至更适合的缓存节点中,以便于数据快速读取、分析,提高数据共享效率和数据使用价值;并且通过统一的接口进行管理。此外,用户无需为数据共享编写代码,只是通过最普通的读取和写入命令便可实现,细节对用户进行隐藏,便于开发和维护。In view of this, the embodiments of the present invention provide a data sharing system and method, which can use customized and personalized tags for efficient data sharing without changing the existing storage format, and realize flexible caching by dynamically combining tags Strategies to cache data in more suitable cache nodes for fast data reading and analysis, improve data sharing efficiency and data use value; and manage through a unified interface. In addition, users do not need to write code for data sharing, but can be achieved through the most common read and write commands, and details are hidden from users, which is convenient for development and maintenance.

为实现上述目的,根据本发明实施例的一个方面,提供了一种数据共享系统,包括:To achieve the above purpose, according to an aspect of the embodiments of the present invention, a data sharing system is provided, including:

用户交互层,用于接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据;a user interaction layer, configured to receive a data processing request, where the data processing request includes a data writing request, and the data writing request includes data to be written;

数据存储层,用于根据所述数据写入请求,持久化存储所述待写入数据;a data storage layer, configured to persistently store the data to be written according to the data write request;

多级分布式缓存层,所述多级分布式缓存层具有多个缓存节点,每个所述缓存节点具有至少一个标签;所述多级分布式缓存层用于对所述待写入数据进行标签适配,以从所述多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点。A multi-level distributed cache layer, the multi-level distributed cache layer has a plurality of cache nodes, and each of the cache nodes has at least one tag; the multi-level distributed cache layer is used to perform a Label adaptation is used to determine a first target cache node from the plurality of cache nodes, and cache the to-be-written data to the first target cache node.

可选地,所述多级分布式缓存层还包括标签配置模块,用于接收标签配置信息,根据所述标签配置信息,确定所述缓存节点的至少一个标签。Optionally, the multi-level distributed cache layer further includes a label configuration module, configured to receive label configuration information, and determine at least one label of the cache node according to the label configuration information.

可选地,所述标签配置信息包括以下一种或多种:缓存节点的物理位置、机房位置、读写速度、硬件配置信息、数据源类型、数据用途和剩余缓存空间。Optionally, the tag configuration information includes one or more of the following: physical location of the cache node, computer room location, read and write speed, hardware configuration information, data source type, data usage, and remaining cache space.

可选地,所述多级分布式缓存层还包括层级管理单元,用于对所述缓存节点的缓存资源进行层级划分。Optionally, the multi-level distributed cache layer further includes a hierarchical management unit configured to perform hierarchical division on the cache resources of the cache node.

可选地,每个所述缓存节点的层级数相同。Optionally, the number of levels of each of the cache nodes is the same.

可选地,所述数据存储层还包括第一文件系统和至少一个第二文件系统,所述至少一个第二文件系统挂载在所述第一文件系统的全局目录下。Optionally, the data storage layer further includes a first file system and at least one second file system, and the at least one second file system is mounted under a global directory of the first file system.

可选地,所述数据处理请求还包括数据读取请求;Optionally, the data processing request further includes a data reading request;

所述多级分布式缓存层还用于根据所述数据读取请求,确定第二目标缓存节点,并从所述第二目标缓存节点中读取目标数据。The multi-level distributed cache layer is further configured to determine a second target cache node according to the data read request, and read target data from the second target cache node.

可选地,所述系统还包括元数据服务层,用于记录所述待写入数据的存储路径和缓存路径。Optionally, the system further includes a metadata service layer, configured to record a storage path and a cache path of the data to be written.

为实现上述目的,根据本发明实施例的另一个方面,提供了一种数据共享方法,所述数据共享方法应用于本发明实施例所述的数据共享系统,所述数据共享方法包括:To achieve the above object, according to another aspect of the embodiments of the present invention, a data sharing method is provided. The data sharing method is applied to the data sharing system according to the embodiments of the present invention, and the data sharing method includes:

接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据;receiving a data processing request, where the data processing request includes a data write request, and the data write request includes data to be written;

根据所述数据写入请求,持久化存储所述待写入数据;According to the data writing request, persistently store the data to be written;

对所述待写入数据进行标签适配,以从多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点;其中,所述多个缓存节点中的每一缓存节点具有至少一个标签。Perform tag adaptation on the data to be written to determine a first target cache node from multiple cache nodes, and cache the data to be written to the first target cache node; wherein the multiple cache nodes Each of the cache nodes has at least one label.

可选地,在接收数据处理请求之前,所述方法还包括:Optionally, before receiving the data processing request, the method further includes:

接收标签配置信息,根据所述标签配置信息确定所述缓存节点的至少一个标签。receiving label configuration information, and determining at least one label of the cache node according to the label configuration information.

可选地,所述标签配置信息包括以下一种或多种:缓存节点的物理位置、机房位置、读写速度、硬件配置信息、数据源类型、数据用途和剩余缓存空间。Optionally, the tag configuration information includes one or more of the following: physical location of the cache node, computer room location, read and write speed, hardware configuration information, data source type, data usage, and remaining cache space.

可选地,所述数据处理请求还包括数据读取请求;Optionally, the data processing request further includes a data reading request;

所述方法还包括:根据所述数据读取请求,确定第二目标缓存节点,并从所述第二目标缓存节点中读取目标数据。The method further includes: determining a second target cache node according to the data read request, and reading target data from the second target cache node.

为实现上述目的,根据本发明实施例的又一个方面,提供了一种一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本发明实施例的数据共享方法。To achieve the above object, according to yet another aspect of the embodiments of the present invention, an electronic device is provided, comprising: one or more processors; a storage device for storing one or more programs, when the one or more A plurality of programs are executed by the one or more processors, so that the one or more processors implement the data sharing method of the embodiment of the present invention.

为实现上述目的,根据本发明实施例的再一个方面,提供了一种一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现本发明实施例的数据共享方法。In order to achieve the above object, according to another aspect of the embodiments of the present invention, a computer-readable medium is provided, on which a computer program is stored, and when the program is executed by a processor, the data sharing method of the embodiment of the present invention is implemented .

上述发明中的一个实施例具有如下优点或有益效果:因为通过统一的用户交互层接收用户数据处理请求,用户不直接与底层的数据存储进行交互,不需了解底层的数据存储逻辑,只需编写数据写入或读取命令即可;对多级分布式缓存层中的缓存节点进行多维度打标,确定每个缓存节点的至少一个标签,通过动态组合标签实现灵活的缓存策略,从而将数据缓存至更适合的缓存节点中,以便于数据快速读取、分析,提高数据共享效率和数据使用价值。以现有的文件系统作为持久化存储层,并将其他文件系统挂载在该持久化存储层的文件系统下,能够在不改变现有存储格局的情况下,利用定制化、个性化标签进行高效的数据共享,通过动态组合标签实现灵活的缓存策略,从而将数据缓存至更适合的缓存节点中,以便于数据快速读取、分析,提高数据共享效率和数据使用价值;并且通过统一的接口进行管理。此外,用户无需为数据共享编写代码,只是通过最普通的读取和写入命令便可实现,细节对用户进行隐藏,便于开发和维护。An embodiment of the above invention has the following advantages or beneficial effects: because the user data processing request is received through a unified user interaction layer, the user does not directly interact with the underlying data storage, does not need to understand the underlying data storage logic, and only needs to write Data writing or reading command is enough; multi-dimensional marking is performed on the cache nodes in the multi-level distributed cache layer, at least one tag of each cache node is determined, and flexible caching strategies are implemented by dynamically combining tags, so that the data Cache to a more suitable cache node to facilitate fast data reading and analysis, improve data sharing efficiency and data use value. Using the existing file system as the persistent storage layer, and mounting other file systems under the file system of the persistent storage layer, it can use customized and personalized labels to perform storage without changing the existing storage structure. Efficient data sharing, flexible caching strategy is realized through dynamic combination of tags, so that data is cached in more suitable cache nodes, so that data can be read and analyzed quickly, and data sharing efficiency and data use value are improved; and through a unified interface to manage. In addition, users do not need to write code for data sharing, but can be achieved through the most common read and write commands, and details are hidden from users, which is convenient for development and maintenance.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

附图用于更好地理解本发明,不构成对本发明的不当限定。其中:The accompanying drawings are used for better understanding of the present invention and do not constitute an improper limitation of the present invention. in:

图1是现有技术的集中式数据存储方式的结构示意图;1 is a schematic structural diagram of a centralized data storage method in the prior art;

图2是现有技术的分布式数据存储方式的结构示意图;2 is a schematic structural diagram of a distributed data storage method in the prior art;

图3是本发明实施例的数据共享系统的结构示意图;3 is a schematic structural diagram of a data sharing system according to an embodiment of the present invention;

图4是本发明实施例的数据共享系统的数据存储层的结构示意图;4 is a schematic structural diagram of a data storage layer of a data sharing system according to an embodiment of the present invention;

图5是本发明另一实施例的数据共享系统的结构示意图;5 is a schematic structural diagram of a data sharing system according to another embodiment of the present invention;

图6是本发明实施例的数据共享方法的主要流程的示意图;6 is a schematic diagram of a main flow of a data sharing method according to an embodiment of the present invention;

图7是本发明实施例可以应用于其中的示例性系统架构图;FIG. 7 is an exemplary system architecture diagram to which an embodiment of the present invention may be applied;

图8是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。FIG. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明,其中包括本发明实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本发明的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

在大数据环境下,数据规模和复杂度的增加往往非常迅速,对数据存储系统的要求也越来越高。目前,如果数据需要大范围、跨区域进行共享,通常有两种方式:集中式存储和分布式存储。其中,如图1所示,集中式存储需要将进行共享的数据逐级进行汇总,每一层的数据存储都有可能分布在不同的地理位置,最后统一由数据共享区提供服务。但是,这种方式的数据链路过长,数据时效性较差以及数据占用存储和计算较多。而,分布式存储如图2所示,各个数据使用方会按照自己的需求向自己所需要的数据发起请求,然后由另一方将数据传输至本地,进行使用。分布式存储相较于集中式存储,虽然比较灵活,能够很快响应需求,但是数据互相依赖较多。随着时间增长,数据之间的联系越来越紧密,数据源之间的依赖会变得越来越难管理,管理成本剧增。In the big data environment, the scale and complexity of data often increase very rapidly, and the requirements for data storage systems are also getting higher and higher. At present, if data needs to be shared on a large scale and across regions, there are usually two ways: centralized storage and distributed storage. Among them, as shown in Figure 1, centralized storage needs to aggregate the shared data level by level. The data storage of each layer may be distributed in different geographical locations, and finally the data sharing area provides services. However, the data link of this method is too long, the data timeliness is poor, and the data occupies a lot of storage and computation. However, distributed storage is shown in Figure 2. Each data user will initiate a request for the data it needs according to its own needs, and then the other party will transmit the data to the local for use. Compared with centralized storage, distributed storage is more flexible and can quickly respond to needs, but data is more dependent on each other. With the growth of time, the connection between data becomes more and more closely, the dependence between data sources will become more and more difficult to manage, and the management cost will increase sharply.

为解决上述至少一个技术问题,本发明实施例构建了一种数据共享系统。在该数据共享系统中有统一的访问方式,即该数据共享系统提供统一服务接口,以供用户进行数据读写处理。在该数据共享系统中,还存在着多种数据存储介质,且多以分布式文件系统为主,这些分布式文件系统物理上可能处于不同的地理位置,需要共享的数据也散落在不同的数据存储介质中。并且,这些不同的数据存储介质都具有至少一个标签,该标签可以根据不同的场景需求设置,例如可以根据物理位置、读写速度等进行打标。在进行数据写入操作时,根据待写入的数据的来源、作用、读写要求等与存储介质的标签进行适配,从而从多个存储介质中确定目标存储介质。本发明实施例的数据共享系统通过灵活组合标签,可以得到精确的、符合场景要求的缓存策略,可以满足复杂的数据共享要求。In order to solve at least one of the above technical problems, the embodiment of the present invention constructs a data sharing system. There is a unified access method in the data sharing system, that is, the data sharing system provides a unified service interface for users to read and write data. In this data sharing system, there are also a variety of data storage media, and most of them are distributed file systems. These distributed file systems may be physically located in different geographical locations, and the data to be shared is also scattered in different data in the storage medium. Moreover, these different data storage media all have at least one label, and the label can be set according to different scene requirements, for example, it can be marked according to physical location, read and write speed, and the like. During the data writing operation, the tag of the storage medium is adapted according to the source, function, read and write requirements, etc. of the data to be written, so as to determine the target storage medium from multiple storage mediums. By flexibly combining tags, the data sharing system of the embodiment of the present invention can obtain an accurate caching strategy that meets the requirements of the scenario, and can meet complex data sharing requirements.

图3是本发明一实施例的数据共享系统的结构示意图,如图3所示,该数据共享系统300包括用户交互层301、数据存储层302和多级分布式缓存层303。FIG. 3 is a schematic structural diagram of a data sharing system according to an embodiment of the present invention. As shown in FIG. 3 , the data sharing system 300 includes a user interaction layer 301 , a data storage layer 302 and a multi-level distributed cache layer 303 .

其中,用户交互层301,用于接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据。该数据处理请求还包括数据读取请求。在可选的实施例中,用户交互层可以包括用户交互界面,通过该用户交互界面接收用户写入或读取数据的需求。用户不需要适配甚至了解具体的数据访问协议和路径,而只需要知道数据存储路径即可进行数据读写操作,极大的方便了数据维护。The user interaction layer 301 is configured to receive a data processing request, where the data processing request includes a data writing request, and the data writing request includes data to be written. The data processing request also includes a data read request. In an optional embodiment, the user interaction layer may include a user interaction interface, through which a user's request for writing or reading data is received. Users do not need to adapt or even understand the specific data access protocol and path, but only need to know the data storage path to perform data read and write operations, which greatly facilitates data maintenance.

数据存储层302,用于根据所述数据写入请求,持久化存储所述待写入数据。在本实施例中,数据存储层302用于持久化存储待写入的数据,多级分布式缓存层303用于缓存数据存储层302中的数据,以提高数据读写速度,进行提高数据分析效率,提高数据的价值。The data storage layer 302 is configured to persistently store the data to be written according to the data writing request. In this embodiment, the data storage layer 302 is used to persistently store the data to be written, and the multi-level distributed cache layer 303 is used to cache the data in the data storage layer 302, so as to improve the speed of reading and writing data and improve the data analysis. efficiency and increase the value of data.

在可选的实施例中,数据存储层可以以任意一个标准的文件系统作为数据存储层,该文件系统可以是分布式的。例如HDFS(Hadoop Distributed File S8stem,分布式文件系统)、NFS(Network File S8stem,网络文件系统)、Ceph、AWS S3(Amazon Web ServicesSimple Storage Service,简便的存储服务)、OSS(Object Storage Service,对象存储)等。In an optional embodiment, the data storage layer may use any standard file system as the data storage layer, and the file system may be distributed. For example, HDFS (Hadoop Distributed File S8stem, distributed file system), NFS (Network File S8stem, network file system), Ceph, AWS S3 (Amazon Web Services Simple Storage Service, simple storage service), OSS (Object Storage Service, object storage) )Wait.

在可选的实施例中,数据存储层302还包括第一文件系统和至少一个第二文件系统,所述至少一个第二文件系统挂载在所述第一文件系统的全局目录下。在本实施例中,如图4所示,在将任意一个标准的文件系统(即第一文件系统)作为数据存储层之后,还可以将其他文件系统(即第二文件系统)挂载到该标准的文件系统(即第一文件系统)下,从而可以横向扩展数据存储层,也不需要对现有架构进行较大改动,便于维护管理。用户可以通过用户交互层301可以访问第一文件系统或第二文件系统,用户不需要适配甚至了解具体的数据访问协议和路径,而只需要知道数据存储路径即可进行数据读写操作,极大的方便了数据维护。In an optional embodiment, the data storage layer 302 further includes a first file system and at least one second file system, and the at least one second file system is mounted under a global directory of the first file system. In this embodiment, as shown in FIG. 4 , after any standard file system (ie, the first file system) is used as the data storage layer, other file systems (ie, the second file system) may also be mounted to the data storage layer. Under the standard file system (ie, the first file system), the data storage layer can be expanded horizontally, and it is not necessary to make major changes to the existing architecture, which is convenient for maintenance and management. The user can access the first file system or the second file system through the user interaction layer 301. The user does not need to adapt or even understand the specific data access protocol and path, but only needs to know the data storage path to perform data read and write operations. It greatly facilitates data maintenance.

多级分布式缓存层303用于缓存数据。该多级分布式缓存层303具有多个缓存节点。多级分布式缓存层303用于对所述待写入数据进行标签适配,以从所述多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点。在数据存储层302下的存储资源中,普遍的还存在额外的存储资源,如这些数据存储节点(即文件系统所在的机器上或文件系统所在的存储介质上)的内存、高速固态硬盘区,这些存储资源独立于文件系统之外,由文件系统所在的Host操作系统所管理。在本实施例中,可以将这些独立于文件系统之外的高速存储资源作为数据共享系统的多级分布式缓存层。在可选的实施例中,可以进一步的对多级分布式缓存层303的每一缓存节点进行层级划分,即将缓存节点中不同的存储资源作为不同的层级,例如可以按照读写速度划分。作为示例,第一级可以读写最高速的存储介质,如内存,第二层是读写速度慢一点的存储介质,如SSD所对应的目录,依次类推。值得说明的是,每个缓存节点的分层数是一致的,但是每一层对应的目录可以是一致的,也可以不是一致的。例如,所有缓存节点设置的分层数一致,不会出现一个缓存节点有3层,另一个缓存节点是4层的情况;对于不同的节点,其设置的目录可以不一致,例如节点A的一级目录是内存,二级目录是SSD,节点B的一级目录是SSD,二级目录是内存。The multi-level distributed cache layer 303 is used to cache data. The multi-level distributed cache layer 303 has multiple cache nodes. The multi-level distributed cache layer 303 is configured to perform tag adaptation on the data to be written, so as to determine a first target cache node from the plurality of cache nodes, and cache the data to be written to the first target cache node. A target cache node. Among the storage resources under the data storage layer 302, there are generally additional storage resources, such as the memory of these data storage nodes (that is, on the machine where the file system is located or on the storage medium where the file system is located), high-speed solid-state hard disk area, These storage resources are independent of the file system and are managed by the host operating system where the file system is located. In this embodiment, these high-speed storage resources independent of the file system can be used as a multi-level distributed cache layer of the data sharing system. In an optional embodiment, each cache node of the multi-level distributed cache layer 303 may be further divided into levels, that is, different storage resources in the cache nodes may be regarded as different levels, for example, may be divided according to read and write speeds. As an example, the first level can read and write the fastest storage medium, such as memory, the second level is the storage medium with slower read and write speed, such as the directory corresponding to the SSD, and so on. It is worth noting that the number of layers of each cache node is consistent, but the directory corresponding to each layer may or may not be consistent. For example, the number of layers set by all cache nodes is the same, and there will not be a situation where one cache node has 3 layers and the other cache node has 4 layers; for different nodes, the set directories may be inconsistent, for example, a cache node of node A has different layers. The first-level directory is memory, the second-level directory is SSD, the first-level directory of Node B is SSD, and the second-level directory is memory.

进一步的,本实施例中的每个缓存节点具有至少一个标签。在本实施例中,需要对每一个缓存节点进行多维度的打标,以将不同的数据缓存到不同的缓存节点中,从而以一种易于管理且简单地方式进行数据共享,提高数据读取速度和使用价值。Further, each cache node in this embodiment has at least one label. In this embodiment, it is necessary to perform multi-dimensional marking on each cache node to cache different data in different cache nodes, so as to share data in an easy-to-manage and simple manner and improve data reading. speed and use value.

在可选的实施例中,多级分布式缓存层303还包括标签配置模块,用于接收标签配置信息,根据所述标签配置信息,确定所述缓存节点的至少一个标签。其中,所述标签配置信息包括但不限于以下一种或多种:缓存节点的物理位置、机房位置、读写速度、硬件配置信息、数据源类型、数据用途和剩余缓存空间。In an optional embodiment, the multi-level distributed cache layer 303 further includes a tag configuration module, configured to receive tag configuration information, and determine at least one tag of the cache node according to the tag configuration information. The tag configuration information includes, but is not limited to, one or more of the following: the physical location of the cache node, the location of the computer room, the read/write speed, hardware configuration information, data source type, data usage, and remaining cache space.

作为具体的示例,若根据物理位置确定缓存节点的标签,则可以将缓存节点所在的地理位置作为标签,如北京、上海等。若根据机房位置确定缓存节点的标签,则可以将缓存节点所在的机房位置或机房标识作为标签,如北京市第一机房或北京0001机房。若根据读写速度确定缓存节点的标签,则可以将不同的读写速度设置成不同的级别,如高速、中速、低速等,标签即为高速读写、中速读写或低速读写。若根据硬件配置信息确定缓存节点的标签,则可以将缓存节点的系统类型、总存储空间等作为标签。若根据数据源类型确定缓存节点的标签,则以数据来源或与数据相关的业务或项目名称作为标签。若根据数据用途确定缓存节点的标签,则可以将数据的用途作为标签,如实时分析、离线计算、物品推荐或用户画像等。若根据剩余缓存空间确定缓存节点的标签,则可以将缓存节点当前剩余的缓存空间大小作为标签,也可以预先将不同大小的剩余的缓存空间进行划分,例如将剩余的缓存空间大于5T的作为剩余缓存空间充足,剩余缓存空间在2T-5T之间的作为剩余缓存空间适中,小于2T的作为剩余缓存空间紧张。值得说明的是,剩余缓存空间是自动更新的,每当向缓存节点写入数据之后,该缓存节点的剩余缓存空间就会减小,对应的标签也会更新。As a specific example, if the label of the cache node is determined according to the physical location, the geographic location where the cache node is located may be used as the label, such as Beijing and Shanghai. If the label of the cache node is determined according to the location of the computer room, the location of the computer room where the cache node is located or the identifier of the computer room can be used as the label, such as Beijing No. 1 computer room or Beijing 0001 computer room. If the label of the cache node is determined according to the read and write speed, different read and write speeds can be set to different levels, such as high speed, medium speed, low speed, etc. The label is high speed read and write, medium speed read and write or low speed read and write. If the label of the cache node is determined according to the hardware configuration information, the system type and total storage space of the cache node may be used as the label. If the label of the cache node is determined according to the data source type, the data source or the name of the business or project related to the data is used as the label. If the label of the cache node is determined according to the use of the data, the use of the data can be used as the label, such as real-time analysis, offline calculation, item recommendation or user portrait. If the label of the cache node is determined according to the remaining cache space, the current remaining cache space size of the cache node can be used as the label, or the remaining cache space of different sizes can be divided in advance, for example, the remaining cache space larger than 5T is used as the remaining cache space. The cache space is sufficient, the remaining cache space is between 2T-5T as the remaining cache space is moderate, and the remaining cache space is less than 2T as the remaining cache space is tight. It is worth noting that the remaining cache space is automatically updated. Whenever data is written to the cache node, the remaining cache space of the cache node will decrease, and the corresponding label will also be updated.

在本实施例中,可以灵活动态组合标签,通过标签指定数据缓存至更适合的缓存节点中,以获得更好的性能。例如A市的数据存储在HDFS中,B市的数据分析师需要使用这份数据进行数据挖掘,则本发明实施例的数据共享系统可以将这份数据缓存在B市的缓存节点上或距离B市最近的缓存节点上,以提高数据读取速度。In this embodiment, tags can be flexibly and dynamically combined, and data specified by tags is cached in a more suitable cache node to obtain better performance. For example, the data in city A is stored in HDFS, and a data analyst in city B needs to use this data for data mining, then the data sharing system in this embodiment of the present invention can cache this data on a cache node in city B or at a distance B The nearest cache node in the city to improve the data reading speed.

在可选的实施例中,当接收到数据写入请求之后,会将待写入数据存储至数据存储层302,以持久化存储。在将待写入数据保存至数据存储层302之后,可以同步或异步将该待写入数据缓存至适配的缓存节点中。当接收到数据读取请求之后,多级分布式缓存层303根据所述数据读取请求,确定第二目标缓存节点,并从所述第二目标缓存节点中读取目标数据。若第二目标缓存节点中没有查询到目标数据,则从数据存储层302中读取。即,当接收到数据读取请求之后,优先从多级分布式缓存层303读取数据,若多级分布式缓存层303没有查询到对应的数据则从数据存储层302中读取。In an optional embodiment, after a data writing request is received, the data to be written is stored in the data storage layer 302 for persistent storage. After the data to be written is stored in the data storage layer 302 , the data to be written can be cached in an adapted cache node synchronously or asynchronously. After receiving the data read request, the multi-level distributed cache layer 303 determines a second target cache node according to the data read request, and reads target data from the second target cache node. If the target data is not queried in the second target cache node, it is read from the data storage layer 302 . That is, after receiving the data read request, data is preferentially read from the multi-level distributed cache layer 303 , and if the multi-level distributed cache layer 303 does not query corresponding data, it is read from the data storage layer 302 .

本发明实施例的数据共享系统,通过用户交互层统一接收用户的数据处理请求,用户不直接与底层的数据存储进行交互,不需了解底层的数据存储逻辑,只需编写数据写入或读取命令即可;对多级分布式缓存层中的缓存节点进行多维度打标,确定每个缓存节点的至少一个标签,通过动态组合标签实现灵活的缓存策略,从而将数据缓存至更适合的缓存节点中,以便于数据快速读取、分析,提高数据共享效率和数据使用价值。本发明实施例的数据共享系统还可以将其他的文件系统挂载在数据存储层,从而可以在不改变现有存储格局的情况下,利用定制化标签进行高效的数据交换,并且通过统一的接口进行管理。此外,用户无需为数据共享编写代码,只是通过最普通的读取和写入命令便可实现,细节对用户进行隐藏,体验会更好。In the data sharing system of the embodiment of the present invention, the user's data processing request is received uniformly through the user interaction layer. The user does not directly interact with the underlying data storage, and does not need to understand the underlying data storage logic, and only needs to write data to write or read. The command is enough; perform multi-dimensional marking on the cache nodes in the multi-level distributed cache layer, determine at least one label of each cache node, and implement a flexible cache strategy by dynamically combining the labels, so as to cache the data to a more suitable cache In the node, it is convenient to read and analyze data quickly, and improve the efficiency of data sharing and the value of data use. The data sharing system according to the embodiment of the present invention can also mount other file systems on the data storage layer, so that customized labels can be used to perform efficient data exchange without changing the existing storage format, and a unified interface can be used to exchange data efficiently. to manage. In addition, users do not need to write code for data sharing, but can be achieved through the most common read and write commands, and the details are hidden from users, and the experience will be better.

在可选的实施例中,本发明实施例的数据共享系统可以基于开源软件Alluxio实现。其中,Alluxio是面向混合云环境的开源数据编排与存储系统。将至少一个第二文件系统挂载在第一文件系统的全局目录下可以通过Alluxio的挂载命令实现,如alluxio fsmount/test s3a://aaa/aa。其中,alluxio是命令名字(如果通过Alluxio实现数据共享系统,就是alluxio,如果是其它开源软件或是自定义开发的,则该命令名字可以随机设置,在程序中进行声明即可),fs、mount都是命令参数,/test是挂载点,也就是目录,s3a://aaa/aa是需要被挂载的文件系统URL(Uniform Resource Locator,统一资源定位符)。In an optional embodiment, the data sharing system of the embodiment of the present invention may be implemented based on the open source software Alluxio. Among them, Alluxio is an open source data orchestration and storage system for hybrid cloud environments. Mounting at least one second file system in the global directory of the first file system can be implemented through the Alluxio mount command, such as alluxio fsmount/test s3a://aaa/aa. Among them, alluxio is the command name (if the data sharing system is implemented through Alluxio, it is alluxio. If it is other open source software or custom development, the command name can be set randomly and declared in the program), fs, mount All are command parameters, /test is the mount point, that is, the directory, and s3a://aaa/aa is the URL (Uniform Resource Locator) of the file system that needs to be mounted.

图5是本发明另一实施例的数据共享系统500的结构示意图,如图5所示,该数据共享系统500包括:FIG. 5 is a schematic structural diagram of a data sharing system 500 according to another embodiment of the present invention. As shown in FIG. 5 , the data sharing system 500 includes:

用户交互层501,用于接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据;The user interaction layer 501 is configured to receive a data processing request, where the data processing request includes a data writing request, and the data writing request includes data to be written;

数据存储层502,用于根据所述数据写入请求,持久化存储所述待写入数据;A data storage layer 502, configured to persistently store the data to be written according to the data writing request;

多级分布式缓存层,所述多级分布式缓存层具有多个缓存节点,每个所述缓存节点具有至少一个标签;所述多级分布式缓存层用于对所述待写入数据进行标签适配,以从所述多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点。A multi-level distributed cache layer, the multi-level distributed cache layer has a plurality of cache nodes, and each of the cache nodes has at least one tag; the multi-level distributed cache layer is used to perform a Label adaptation is used to determine a first target cache node from the plurality of cache nodes, and cache the to-be-written data to the first target cache node.

元数据服务层503,用于记录所述待写入数据的存储路径和缓存路径。The metadata service layer 503 is used to record the storage path and the cache path of the data to be written.

在本实施例中,数据的存储位置(存储路径和缓存路径)统一由元数据服务层管理,便于后续数据维护。In this embodiment, the storage locations (storage paths and cache paths) of data are uniformly managed by the metadata service layer, which is convenient for subsequent data maintenance.

图6是本发明实施例的一种数据共享方法的主要流程的示意图,如图6所示,该方法包括:FIG. 6 is a schematic diagram of the main process of a data sharing method according to an embodiment of the present invention. As shown in FIG. 6 , the method includes:

步骤S601:接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据和数据读取请求;Step S601: Receive a data processing request, where the data processing request includes a data write request, and the data write request includes data to be written and a data read request;

步骤S602:根据所述数据写入请求,持久化存储所述待写入数据;Step S602: Persistently store the data to be written according to the data writing request;

步骤S603:对所述待写入数据进行标签适配,以从多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点;其中,所述多个缓存节点中的每一缓存节点具有至少一个标签;Step S603: Perform label adaptation on the data to be written, so as to determine a first target cache node from a plurality of cache nodes, and cache the data to be written to the first target cache node; Each cache node in the plurality of cache nodes has at least one label;

步骤S604:根据所述数据读取请求,确定第二目标缓存节点,并从所述第二目标缓存节点中读取目标数据。Step S604: Determine a second target cache node according to the data read request, and read target data from the second target cache node.

本发明实施例的数据共享方法,应用于上文中的数据共享系统。在进行数据写入操作时,可以通过不同之间的标签组合,来个性化地指定缓存地址,如地理位置=A市,缓存速度=高速等,并且可以通过自定义标签来达到近乎任意的缓存策略。而在进行读取操作时,优先会从缓存中进行获取。The data sharing method according to the embodiment of the present invention is applied to the above data sharing system. When performing data writing operations, the cache address can be individually specified through different tag combinations, such as geographic location = city A, cache speed = high speed, etc., and almost arbitrary cache can be achieved by custom tags Strategy. When a read operation is performed, it is preferentially obtained from the cache.

在可选的实施例中,在接收数据处理请求之前,所述方法还包括:接收标签配置信息,根据所述标签配置信息确定所述缓存节点的至少一个标签。所述标签配置信息包括以下一种或多种:缓存节点的物理位置、机房位置、读写速度、硬件配置信息、数据源类型、数据用途和剩余缓存空间。In an optional embodiment, before receiving the data processing request, the method further includes: receiving label configuration information, and determining at least one label of the cache node according to the label configuration information. The tag configuration information includes one or more of the following: the physical location of the cache node, the location of the computer room, read and write speed, hardware configuration information, data source type, data usage and remaining cache space.

作为具体的示例,若根据物理位置确定缓存节点的标签,则可以将缓存节点所在的地理位置作为标签,如北京、上海等。若根据机房位置确定缓存节点的标签,则可以将缓存节点所在的机房位置或机房标识作为标签,如北京市第一机房或北京0001机房。若根据读写速度确定缓存节点的标签,则可以将不同的读写速度设置成不同的级别,如高速、中速、低速等,标签即为高速读写、中速读写或低速读写。若根据硬件配置信息确定缓存节点的标签,则可以将缓存节点的系统类型、总存储空间等作为标签。若根据数据源类型确定缓存节点的标签,则以数据来源或与数据相关的业务或项目名称作为标签。若根据数据用途确定缓存节点的标签,则可以将数据的用途作为标签,如实时分析、离线计算、物品推荐或用户画像等。若根据剩余缓存空间确定缓存节点的标签,则可以将缓存节点当前剩余的缓存空间大小作为标签,也可以预先将不同大小的剩余的缓存空间进行划分,例如将剩余的缓存空间大于5T的作为剩余缓存空间充足,剩余缓存空间在2T-5T之间的作为剩余缓存空间适中,小于2T的作为剩余缓存空间紧张。值得说明的是,剩余缓存空间是自动更新的,每当向缓存节点写入数据之后,该缓存节点的剩余缓存空间就会减小,对应的标签也会更新。As a specific example, if the label of the cache node is determined according to the physical location, the geographic location where the cache node is located may be used as the label, such as Beijing and Shanghai. If the label of the cache node is determined according to the location of the computer room, the location of the computer room where the cache node is located or the identifier of the computer room can be used as the label, such as Beijing No. 1 computer room or Beijing 0001 computer room. If the label of the cache node is determined according to the read and write speed, different read and write speeds can be set to different levels, such as high speed, medium speed, low speed, etc. The label is high speed read and write, medium speed read and write or low speed read and write. If the label of the cache node is determined according to the hardware configuration information, the system type and total storage space of the cache node may be used as the label. If the label of the cache node is determined according to the data source type, the data source or the name of the business or project related to the data is used as the label. If the label of the cache node is determined according to the use of the data, the use of the data can be used as the label, such as real-time analysis, offline calculation, item recommendation or user portrait. If the label of the cache node is determined according to the remaining cache space, the current remaining cache space size of the cache node can be used as the label, or the remaining cache space of different sizes can be divided in advance, for example, the remaining cache space larger than 5T is used as the remaining cache space. The cache space is sufficient, the remaining cache space is between 2T-5T as the remaining cache space is moderate, and the remaining cache space is less than 2T as the remaining cache space is tight. It is worth noting that the remaining cache space is automatically updated. Whenever data is written to the cache node, the remaining cache space of the cache node will decrease, and the corresponding label will also be updated.

在本实施例中,可以灵活动态组合标签,通过标签指定数据缓存至更适合的缓存节点中,以获得更好的性能。In this embodiment, tags can be flexibly and dynamically combined, and data specified by tags is cached in a more suitable cache node to obtain better performance.

本发明实施例的数据共享方法,通过定制化、个性化标签进行高效的数据共享,通过动态组合标签实现灵活的缓存策略,从而将数据缓存至更适合的缓存节点中,以便于数据快速读取、分析,提高数据共享效率和数据使用价值。In the data sharing method of the embodiment of the present invention, efficient data sharing is performed through customized and personalized tags, and flexible caching strategies are realized through dynamic combination of tags, so that data is cached in a more suitable cache node, so that data can be read quickly. , analysis, improve data sharing efficiency and data use value.

图7示出了可以应用本发明实施例的数据共享方法或数据共享系统的示例性系统架构700。FIG. 7 shows an exemplary system architecture 700 of a data sharing method or data sharing system to which embodiments of the present invention may be applied.

如图7所示,系统架构700可以包括终端设备701、702、703,网络704和服务器705。网络704用以在终端设备701、702、703和服务器705之间提供通信链路的介质。网络704可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 7 , the system architecture 700 may include terminal devices 701 , 702 , and 703 , a network 704 and a server 705 . The network 704 is the medium used to provide the communication link between the terminal devices 701 , 702 , 703 and the server 705 . Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备701、702、703通过网络704与服务器705交互,以接收或发送消息等。终端设备701、702、703上可以安装有各种通讯客户端应用,例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 701, 702, 703 to interact with the server 705 through the network 704 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 701 , 702 and 703 , such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备701、702、703可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器705可以是提供各种服务的服务器,例如对用户利用终端设备701、702、703所浏览的购物类网站提供支持的后台管理服务器。后台管理服务器可以对接收到的产品信息查询请求等数据进行分析等处理,并将处理结果(例如目标推送信息、产品信息)反馈给终端设备。The server 705 may be a server that provides various services, such as a background management server that provides support for shopping websites browsed by the terminal devices 701 , 702 , and 703 . The background management server can analyze and process the received product information query request and other data, and feed back the processing results (eg, target push information, product information) to the terminal device.

需要说明的是,本发明实施例所提供的数据共享方法一般由服务器705执行,相应地,数据共享系统一般设置于服务器705中。It should be noted that the data sharing method provided in the embodiment of the present invention is generally executed by the server 705 , and accordingly, the data sharing system is generally set in the server 705 .

应该理解,图7中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 7 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

下面参考图8,其示出了适于用来实现本发明实施例的终端设备的计算机系统800的结构示意图。图8示出的终端设备仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Referring next to FIG. 8 , it shows a schematic structural diagram of a computer system 800 suitable for implementing a terminal device according to an embodiment of the present invention. The terminal device shown in FIG. 8 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.

如图8所示,计算机系统800包括中央处理单元(CPU)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有系统800操作所需的各种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。As shown in FIG. 8, a computer system 800 includes a central processing unit (CPU) 801, which can be loaded into a random access memory (RAM) 803 according to a program stored in a read only memory (ROM) 802 or a program from a storage section 808 Instead, various appropriate actions and processes are performed. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801 , the ROM 802 , and the RAM 803 are connected to each other through a bus 804 . An input/output (I/O) interface 805 is also connected to bus 804 .

以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。The following components are connected to the I/O interface 805: an input section 806 including a keyboard, a mouse, etc.; an output section 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 808 including a hard disk, etc. ; and a communication section 809 including a network interface card such as a LAN card, a modem, and the like. The communication section 809 performs communication processing via a network such as the Internet. A drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage section 808 as needed.

特别地,根据本发明公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。在该计算机程序被中央处理单元(CPU)801执行时,执行本发明的系统中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs in accordance with the disclosed embodiments of the present invention. For example, embodiments disclosed herein include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 809, and/or installed from the removable medium 811. When the computer program is executed by the central processing unit (CPU) 801, the above-described functions defined in the system of the present invention are performed.

需要说明的是,本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.

描述于本发明实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括发送模块、获取模块、确定模块和第一处理模块。其中,这些模块的名称在某种情况下并不构成对该单元本身的限定,例如,发送模块还可以被描述为“向所连接的服务端发送图片获取请求的模块”。The modules involved in the embodiments of the present invention may be implemented in a software manner, and may also be implemented in a hardware manner. The described modules can also be provided in the processor, for example, it can be described as: a processor includes a sending module, an obtaining module, a determining module and a first processing module. Among them, the names of these modules do not constitute a limitation on the unit itself under certain circumstances. For example, the sending module can also be described as "a module that sends a request for image acquisition to the connected server".

作为另一方面,本发明还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括:As another aspect, the present invention also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes:

接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据和数据读取请求;receiving a data processing request, where the data processing request includes a data write request, and the data write request includes data to be written and a data read request;

根据所述数据写入请求,持久化存储所述待写入数据;According to the data writing request, persistently store the data to be written;

对所述待写入数据进行标签适配,以从多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点;其中,所述多个缓存节点中的每一缓存节点具有至少一个标签。Perform tag adaptation on the data to be written to determine a first target cache node from multiple cache nodes, and cache the data to be written to the first target cache node; wherein the multiple cache nodes Each of the cache nodes has at least one label.

本发明实施例的技术方案,在进行数据写入操作时,可以通过不同之间的标签组合,来个性化地指定缓存地址,如地理位置=北京,缓存速度=高速等,并且可以通过自定义标签来达到近乎任意的缓存策略。而在进行读取操作时,优先会从缓存中进行获取。According to the technical solution of the embodiment of the present invention, when performing data writing operations, the cache address can be individually specified through the combination of different tags, such as geographic location=Beijing, cache speed=high speed, etc., and can be customized tags to achieve nearly arbitrary caching strategies. When a read operation is performed, it is preferentially obtained from the cache.

上述具体实施方式,并不构成对本发明保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等,均应包含在本发明保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims (14)

1.一种数据共享系统,其特征在于,包括:1. a data sharing system, is characterized in that, comprises: 用户交互层,用于接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据;a user interaction layer, configured to receive a data processing request, where the data processing request includes a data writing request, and the data writing request includes data to be written; 数据存储层,用于根据所述数据写入请求,持久化存储所述待写入数据;a data storage layer, configured to persistently store the data to be written according to the data write request; 多级分布式缓存层,所述多级分布式缓存层具有多个缓存节点,每个所述缓存节点具有至少一个标签;所述多级分布式缓存层用于对所述待写入数据进行标签适配,以从所述多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点。A multi-level distributed cache layer, the multi-level distributed cache layer has a plurality of cache nodes, and each of the cache nodes has at least one tag; the multi-level distributed cache layer is used to perform a Label adaptation is used to determine a first target cache node from the plurality of cache nodes, and cache the to-be-written data to the first target cache node. 2.根据权利要求1所述的系统,其特征在于,所述多级分布式缓存层还包括标签配置模块,用于:2. The system according to claim 1, wherein the multi-level distributed cache layer further comprises a tag configuration module for: 接收标签配置信息;receive tag configuration information; 根据所述标签配置信息,确定所述缓存节点的至少一个标签。At least one label of the cache node is determined according to the label configuration information. 3.根据权利要求2所述的系统,其特征在于,所述标签配置信息包括以下一种或多种:缓存节点的物理位置、机房位置、读写速度、硬件配置信息、数据源类型、数据用途和剩余缓存空间。3. The system according to claim 2, wherein the tag configuration information includes one or more of the following: physical location of the cache node, location of the computer room, read/write speed, hardware configuration information, data source type, data usage and remaining cache space. 4.根据权利要求1所述的系统,其特征在于,所述多级分布式缓存层还包括层级管理单元,用于对所述缓存节点的缓存资源进行层级划分。4 . The system according to claim 1 , wherein the multi-level distributed cache layer further comprises a hierarchical management unit, which is used for hierarchically dividing the cache resources of the cache nodes. 5 . 5.根据权利要求4所述的系统,其特征在于,每个所述缓存节点的层级数相同。5. The system according to claim 4, wherein the number of levels of each of the cache nodes is the same. 6.根据权利要求1所述的系统,其特征在于,所述数据存储层还包括第一文件系统和至少一个第二文件系统,所述至少一个第二文件系统挂载在所述第一文件系统的全局目录下。6. The system according to claim 1, wherein the data storage layer further comprises a first file system and at least one second file system, and the at least one second file system is mounted on the first file in the global directory of the system. 7.根据权利要求1-6任一项所述的系统,其特征在于,所述数据处理请求还包括数据读取请求;7. The system according to any one of claims 1-6, wherein the data processing request further comprises a data reading request; 所述多级分布式缓存层还用于根据所述数据读取请求,确定第二目标缓存节点,并从所述第二目标缓存节点中读取目标数据。The multi-level distributed cache layer is further configured to determine a second target cache node according to the data read request, and read target data from the second target cache node. 8.根据权利要求7所述的系统,其特征在于,所述系统还包括元数据服务层,用于记录所述待写入数据的存储路径和缓存路径。8 . The system according to claim 7 , wherein the system further comprises a metadata service layer, configured to record the storage path and the cache path of the data to be written. 9 . 9.一种数据共享方法,其特征在于,所述数据共享方法应用于权利要求1-8任一项所述的数据共享系统,所述数据共享方法包括:9. A data sharing method, wherein the data sharing method is applied to the data sharing system according to any one of claims 1-8, and the data sharing method comprises: 接收数据处理请求,所述数据处理请求包括数据写入请求,所述数据写入请求包括待写入数据;receiving a data processing request, where the data processing request includes a data write request, and the data write request includes data to be written; 根据所述数据写入请求,持久化存储所述待写入数据;According to the data writing request, persistently store the data to be written; 对所述待写入数据进行标签适配,以从多个缓存节点中确定第一目标缓存节点,并将所述待写入数据缓存至所述第一目标缓存节点;其中,所述多个缓存节点中的每一缓存节点具有至少一个标签。Perform tag adaptation on the data to be written to determine a first target cache node from multiple cache nodes, and cache the data to be written to the first target cache node; wherein the multiple cache nodes Each of the cache nodes has at least one label. 10.根据权利要求9所述的方法,其特征在于,在接收数据处理请求之前,所述方法还包括:10. The method according to claim 9, wherein before receiving the data processing request, the method further comprises: 接收标签配置信息,根据所述标签配置信息确定所述缓存节点的至少一个标签。receiving label configuration information, and determining at least one label of the cache node according to the label configuration information. 11.根据权利要求10所述的方法,其特征在于,所述标签配置信息包括以下一种或多种:缓存节点的物理位置、机房位置、读写速度、硬件配置信息、文件系统类型、数据源类型、数据用途和剩余缓存空间。11. The method according to claim 10, wherein the label configuration information comprises one or more of the following: the physical location of the cache node, the location of the computer room, read and write speed, hardware configuration information, file system type, data Source type, data usage, and remaining cache space. 12.根据权利要求9所述的方法,其特征在于,所述数据处理请求还包括数据读取请求;12. The method according to claim 9, wherein the data processing request further comprises a data reading request; 所述方法还包括:根据所述数据读取请求,确定第二目标缓存节点,并从所述第二目标缓存节点中读取目标数据。The method further includes: determining a second target cache node according to the data read request, and reading target data from the second target cache node. 13.一种电子设备,其特征在于,包括:13. An electronic device, characterized in that, comprising: 一个或多个处理器;one or more processors; 存储装置,用于存储一个或多个程序,storage means for storing one or more programs, 当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求9-12中任一所述的方法。The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 9-12. 14.一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求9-12中任一所述的方法。14. A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 9-12 is implemented.
CN202210004214.0A 2022-01-04 2022-01-04 Data sharing system and method Pending CN114356873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210004214.0A CN114356873A (en) 2022-01-04 2022-01-04 Data sharing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210004214.0A CN114356873A (en) 2022-01-04 2022-01-04 Data sharing system and method

Publications (1)

Publication Number Publication Date
CN114356873A true CN114356873A (en) 2022-04-15

Family

ID=81107008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210004214.0A Pending CN114356873A (en) 2022-01-04 2022-01-04 Data sharing system and method

Country Status (1)

Country Link
CN (1) CN114356873A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024260231A1 (en) * 2023-06-20 2024-12-26 海光信息技术股份有限公司 Cache structure and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060037505A (en) * 2004-10-28 2006-05-03 장성태 Multiprocessor System with Multiple Cache Structures and Its Replacement Method
US20160170880A1 (en) * 2014-12-11 2016-06-16 Intel Corporation Multicast tree-based data distribution in distributed shared cache
CN110209695A (en) * 2018-02-06 2019-09-06 北京京东尚科信息技术有限公司 Method and apparatus towards multilingual data buffer storage
CN113360425A (en) * 2021-06-28 2021-09-07 深圳市高德信通信股份有限公司 Distributed multi-level cache system
US20210319369A1 (en) * 2021-06-25 2021-10-14 Intel Corporation Multi-level caching for dynamic deep learning models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060037505A (en) * 2004-10-28 2006-05-03 장성태 Multiprocessor System with Multiple Cache Structures and Its Replacement Method
US20160170880A1 (en) * 2014-12-11 2016-06-16 Intel Corporation Multicast tree-based data distribution in distributed shared cache
CN110209695A (en) * 2018-02-06 2019-09-06 北京京东尚科信息技术有限公司 Method and apparatus towards multilingual data buffer storage
US20210319369A1 (en) * 2021-06-25 2021-10-14 Intel Corporation Multi-level caching for dynamic deep learning models
CN113360425A (en) * 2021-06-28 2021-09-07 深圳市高德信通信股份有限公司 Distributed multi-level cache system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
莫洪武等: "基于Velocity CTP3分布式多级缓存的研究与应用", 软件导刊, vol. 9, no. 10, 30 October 2010 (2010-10-30), pages 21 - 22 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024260231A1 (en) * 2023-06-20 2024-12-26 海光信息技术股份有限公司 Cache structure and electronic device

Similar Documents

Publication Publication Date Title
CN110019350B (en) Data query method and device based on configuration information
US8788760B2 (en) Adaptive caching of data
WO2017167095A1 (en) Model training method and device
JP2021523436A (en) Input and output schema mapping
US11120001B2 (en) Table discovery in distributed and dynamic computing systems
US20170153909A1 (en) Methods and Devices for Acquiring Data Using Virtual Machine and Host Machine
CN110162410B (en) A message processing method and device
US20190165993A1 (en) Collaborative triggers in distributed and dynamic computing systems
WO2019109923A1 (en) Message processing method and system, storage medium and electronic device
CN114625762A (en) Metadata acquisition method, network equipment and system
CN113760638A (en) A log service method and device based on kubernetes cluster
CN112015790B (en) A method and device for data processing
WO2022257604A1 (en) Method and apparatus for determining user tag
CN114356873A (en) Data sharing system and method
CN113535673B (en) Method and device for generating configuration file and data processing
CN112804366B (en) Method and device for resolving domain name
CN114979025B (en) Resource refreshing method, device, equipment and readable storage medium
CN107220003A (en) A kind of method for reading data and system
CN117499396A (en) Data processing method and device based on cloud native platform system
CN117389475A (en) A data processing method and device
CN110347654B (en) Method and device for online cluster characteristics
WO2012171363A1 (en) Method and equipment for data operation in distributed cache system
CN117312263A (en) Data circulation method, system, device and readable storage medium
CN114896244A (en) Method, apparatus, apparatus, and computer-readable medium for configuring database tables
CN112711572B (en) Online capacity expansion method and device suitable for database and table division

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination