TWI782306B

TWI782306B - An access docking device and system, and a method and device applied to the access docking device

Info

Publication number: TWI782306B
Application number: TW109127138A
Authority: TW
Inventors: 祖立軍; 袁航; 王穎卓; 李樹楠; 章超; 呂智慧; 王濤
Original assignee: 大陸商中國銀聯股份有限公司
Priority date: 2019-09-23
Filing date: 2020-08-11
Publication date: 2022-11-01
Also published as: JP2022547691A; CN110688674B; KR20220051224A; WO2021057317A1; CN110688674A; TW202113622A; JP7369860B2

Abstract

The present invention discloses an access docking device and system, and a met hod and device applied to the access docking device. The access docking device is set up in a Hadoop computing server and includes a compatibility interface layer for implementing a Hadoop document system interface to achieve an access docking with a Hadoop computing service component; an operational implementation layer for providing a first interface function to the compatibility interface layer to implement the file operation required by the Hadoop computing service component under the document system interface; a storage access layer for providing a second interface function to the operational implementation layer to convert the document operation into an access operation of the target storage in a distributed storage. The access docking device can be used to achieve the effects of decoupling the Hadoop computing service and the storage service and directly accessing the target storage in the distributed storage.

Description

Access docking device, system and method and device for applying the access docking device

本發明屬分佈式存儲技術領域，具體涉及一種訪問對接器、系統及應用該訪問對接器的方法及裝置。 The invention belongs to the technical field of distributed storage, and specifically relates to an access docking device, a system, and a method and device for applying the access docking device.

本部分旨在為權利要求書中陳述的本發明的實施方式提供背景或上下文。此處的描述不因為包括在本部分中就承認是現有技術。隨著大數據技術的不斷發展，Hadoop計算服務與存儲服務解耦分離由於具有以下優勢而逐漸形成為新的發展趨勢：一、可以使存儲資源的技術架構相對穩定，避免受到計算組件的頻繁升級或者擴展的影響；二、便於實現存儲資源的共享。然而在現有技術中，並沒有性能好且可用性高的解决方案以實現上述Hadoop計算服務與存儲服務的解耦分離。 This section is intended to provide a background or context for implementations of the invention that are recited in the claims. The descriptions herein are not admitted to be prior art by inclusion in this section. With the continuous development of big data technology, the decoupling and separation of Hadoop computing services and storage services has gradually become a new development trend due to the following advantages: 1. It can make the technical architecture of storage resources relatively stable and avoid frequent upgrades of computing components Or the impact of expansion; Second, it is convenient to realize the sharing of storage resources. However, in the prior art, there is no solution with good performance and high availability to realize the above-mentioned decoupling and separation of Hadoop computing services and storage services.

針對上述現有技術中難以實現Hadoop計算服務與存儲服務解耦分離這一問題，提出了一種訪問對接器、系統及應用該訪問對接器的方法及裝置，利用這種訪問對接器、系統及其應用，能夠解决上述問題。 Aiming at the problem that it is difficult to realize the decoupling and separation of Hadoop computing services and storage services in the above-mentioned prior art, an access docking device, a system, and a method and device for applying the access docking device are proposed. Using the access docking device, system and application thereof , can solve the above problems.

本發明提供了以下方案。 The present invention provides the following solutions.

第一方面，提供一種訪問對接器，部署於Hadoop計算服務器，包括：兼容接口層，用於兼容實現Hadoop的文件系統接口，從而實現與Hadoop計算服務組件的訪問對接；操作實現層，通過向兼容接口層提供第一接口函數，從而在文件系統接口下實現Hadoop計算服務組件所需的文件操作；存儲訪問層，通過向操作實現層提供第二接口函數，從而將文件操作轉化為對分佈式存儲中的對象存儲的訪問操作。 In the first aspect, an access docking device is provided, which is deployed on a Hadoop computing server, including: a compatible interface layer, which is used to implement the file system interface of Hadoop in compatibility, so as to realize the access docking with Hadoop computing service components; The interface layer provides the first interface function, so as to realize the file operation required by the Hadoop computing service component under the file system interface; store The access layer converts the file operation into an access operation to the object storage in the distributed storage by providing the second interface function to the operation implementation layer.

在一些可能的實施方式中，分佈式存儲為Ceph集群。在一些可能的實施方式中，對象存儲的訪問操作為對Ceph集群中rados集群的訪問操作。 In some possible implementation manners, the distributed storage is a Ceph cluster. In some possible implementation manners, the access operation of the object storage is an access operation to the rados cluster in the Ceph cluster.

在一些可能的實施方式中，存儲訪問層包括：Crush計算單元，用於和Ceph集群的Mon節點建立通信以獲取Ceph集群的CrushMap，並通過Crush算法計算Ceph集群中對象存儲設備OSD的位置；文件讀寫單元，用於與Ceph集群中的對象存儲設備OSD建立Socket通信，以實現對Ceph集群的訪問操作。 In some possible implementations, the storage access layer includes: a Crush computing unit, which is used to establish communication with the Mon node of the Ceph cluster to obtain the CrushMap of the Ceph cluster, and calculate the position of the object storage device OSD in the Ceph cluster through the Crush algorithm; The read-write unit is used to establish Socket communication with the object storage device OSD in the Ceph cluster, so as to realize the access operation to the Ceph cluster.

在一些可能的實施方式中，文件操作至少包括以下中的一種或多種：列舉出文件及文件夾、創建文件夾、刪除文件夾、得到文件的狀態信息、重命名文件、文件夾返回、打開文件的指針、將數據流寫入打開的文件中、讀取打開的文件的數據，實現用戶認證。 In some possible implementations, the file operations include at least one or more of the following: list files and folders, create folders, delete folders, obtain file status information, rename files, return folders, and open files pointer, write the data stream into the opened file, read the data of the opened file, and realize user authentication.

在一些可能的實施方式中，存儲訪問層由部署在Hadoop指定目錄下的動態鏈接庫文件(Libcephrgw.so)而實現，且第二接口函數為動態鏈接庫文件Libcephrgw.so中封裝的、用於訪問Ceph集群中rados集群的C++接口函數。 In some possible implementations, the storage access layer is implemented by a dynamic link library file (Libcephrgw.so) deployed in the Hadoop specified directory, and the second interface function is encapsulated in the dynamic link library file Libcephrgw.so for Access the C++ interface function of the rados cluster in the Ceph cluster.

在一些可能的實施方式中，操作實現層由部署在Hadoop指定目錄下的第二Java包(cephlibrgw.jar)而實現，第二Java包(cephlibrgw.jar)用於將動態鏈接庫文件(Libcephrgw.so)封裝的C++接口函數轉化為java接口函數，且第一接口函數為java接口函數。 In some possible implementations, the operation implementation layer is implemented by deploying the second Java package (cephlibrgw.jar) under the Hadoop specified directory, and the second Java package (cephlibrgw.jar) is used to use the dynamic link library file (Libcephrgw. so) The encapsulated C++ interface function is converted into a java interface function, and the first interface function is a java interface function.

在一些可能的實施方式中，第二Java包(cephlibrgw.jar)利用JNI實現Java接口函數與C++接口函數之間的轉換。 In some possible implementation manners, the second Java package (cephlibrgw.jar) utilizes JNI to implement conversion between Java interface functions and C++ interface functions.

在一些可能的實施方式中，兼容接口層由部署在Hadoop指定目錄下的第一Java包(CephRgwFileSystem.jar)而實現。在一些可能的實施方式中，文件系統接口的操作複用HDFS的實現。 In some possible implementation manners, the compatible interface layer is implemented by the first Java package (CephRgwFileSystem.jar) deployed in the specified directory of Hadoop. In some possible implementations, the operations of the file system interface multiplex the HDFS implementation.

在一些可能的實施方式中，兼容接口層還用於：使Hadoop的yarn組件在運行時調用第一Java包(CephRgwFileSystem.jar)的功能函數。 In some possible implementation manners, the compatible interface layer is also used for: enabling the yarn component of Hadoop to call the function of the first Java package (CephRgwFileSystem.jar) during operation.

在一些可能的實施方式中，訪問對接器部署於Hadoop計算服務器集群中的各計算服務器節點。 In some possible implementation manners, the access docking device is deployed on each computing server node in the Hadoop computing server cluster.

在一些可能的實施方式中，Hadoop配置文件內容core-site.xml中包含訪問對接器的主類信息。 In some possible implementation manners, the content of the Hadoop configuration file core-site.xml includes the main class information of the access docker.

在一些可能的實施方式中，分佈式存儲利用空閒存儲接口向Hadoop計算服務器集群以外的其他計算平臺提供存儲服務。 In some possible implementation manners, the distributed storage utilizes an idle storage interface to provide storage services to other computing platforms other than the Hadoop computing server cluster.

在一些可能的實施方式中，分佈式存儲為Ceph集群，且空閒存儲接口包括塊設備存儲接口和文件系統存儲接口。 In some possible implementation manners, the distributed storage is a Ceph cluster, and the free storage interface includes a block device storage interface and a file system storage interface.

第三方面，提供一種應用訪問對接器的方法，包括：接收Hadoop計算服務組件的訪問請求；利用如上述第一方面的訪問對接器，將訪問請求轉化為對分佈式存儲中的對象存儲的訪問操作。 In a third aspect, a method for applying an access docking device is provided, including: receiving an access request from a Hadoop computing service component; using the access docking device in the first aspect above to convert the access request into an access to object storage in distributed storage operate.

在一些可能的實施方式中，在接收Hadoop計算服務組件的訪問請求之前，還包括：利用Hadoop的配置文件內容core-site.xml獲取訪問對接器的主類信息。 In some possible implementation manners, before receiving the access request of the Hadoop computing service component, it further includes: using the Hadoop configuration file content core-site.xml to obtain the main class information of the access docker.

第四方面，提供一種應用訪問對接器的裝置，包括：接收模塊，用於接收Hadoop計算服務組件的訪問請求；訪問模塊，用於利用如上述第一方面的訪問對接器，將訪問請求轉化為對分佈式存儲中的對象存儲的訪問操作。 In a fourth aspect, a device for applying an access docking device is provided, including: a receiving module, configured to receive an access request from a Hadoop computing service component; an access module, configured to convert the access request into Access operations to object storage in distributed storage.

在一些可能的實施方式中，還包括：加載模塊，用於利用Hadoop配置文件內容core-site.xml獲取訪問對接器的主類信息。 In some possible implementation manners, it also includes: a loading module, configured to use the content core-site.xml of the Hadoop configuration file to obtain the main class information of the access docker.

第五方面，提供一種應用訪問對接器的裝置，包括：一個或者多個多核處理器；存儲器，用於存儲一個或多個程序；當一個或多個程序被一個或者多個多核處理器執行時，使得一個或多個多核處理器實現如上述第三方面的方法。 In a fifth aspect, there is provided an application access docking device, including: one or more multi-core processors; memory for storing one or more programs; when one or more programs are executed by one or more multi-core processors , so that one or more multi-core processors implement the method in the third aspect above.

第六方面，提供一種計算機可讀存儲介質，計算機可讀存儲介質存儲有程序，當程序被多核處理器執行時，使得多核處理器執行如上述第三方面的方法。 In a sixth aspect, a computer-readable storage medium is provided. The computer-readable storage medium stores a program, and when the program is executed by a multi-core processor, the multi-core processor executes the method in the third aspect above.

本申請實施例採用的上述至少一個技術方案能夠達到以下有益效果：通過上述訪問對接器中的兼容接口層、操作實現層和對象訪問層協同工作，可以在不改變任何Hadoop存儲服務及管理層以上的接口與軟件實現情况下，支持Hadoop的計算服務與存儲服務異構解耦，實現Hadoop的計算服務組件直接以對象存儲的訪問操作的方式訪問異構的分佈式存儲，性能和可用性提升，上述分佈式存儲比如是Ceph集群。 The above-mentioned at least one technical solution adopted in the embodiment of the present application can achieve the following beneficial effects: through the cooperative work of the compatible interface layer, the operation implementation layer and the object access layer in the above-mentioned access docker, it is possible to change any Hadoop storage service and management layer above In the case of interface and software implementation, it supports the heterogeneous decoupling of Hadoop's computing services and storage services, enabling Hadoop's computing service components to directly access heterogeneous distributed storage in the form of object storage access operations, improving performance and availability. Distributed storage is, for example, a Ceph cluster.

應當理解，上述說明僅是本發明技術方案的概述，以便能夠更清楚地瞭解本發明的技術手段，從而可依照說明書的內容予以實施。為了讓本發明的上述和其它目的、特徵和優點能夠更明顯易懂，以下特舉說明本發明的具體實施方式。 It should be understood that the above description is only an overview of the technical solution of the present invention, so as to understand the technical means of the present invention more clearly, so as to be implemented according to the contents of the description. In order to make the above and other objects, features and advantages of the present invention more comprehensible, the specific implementation manners of the present invention are illustrated below.

1:Hadoop計算服務器 1:Hadoop computing server

11:Hadoop計算服務組件 11:Hadoop Computing Service Components

100:訪問對接器 100: Access docker

101:兼容接口層 101: Compatible interface layer

102:操作實現層 102: Operation implementation layer

103:存儲訪問層 103:Storage Access Layer

1031:文件讀寫單元 1031: file reading and writing unit

1032:Crush計算單元 1032: Crush calculation unit

2:Ceph集群 2: Ceph cluster

300:方法 300: method

400:裝置 400: device

401:接收模塊 401: receiving module

402:訪問模塊 402: access module

200:外部設備 200: external equipment

500:裝置 500: device

10:處理器 10: Processor

20:存儲器 20: memory

21:RAM 21: RAM

22:高速緩存 22: Cache

23:ROM 23:ROM

24:程序模塊 24: Program module

30:顯示單元 30: Display unit

40:I/O接口 40: I/O interface

50:網絡適配器 50: network adapter

60:總線 60: bus

600:計算機可讀存儲介質 600: computer readable storage medium

通過閱讀下文的示例性實施例的詳細描述，本領域普通技術人員將明白本文所述的有點和益處以及其他優點和益處。附圖僅用於示出示例性實施例的目的，而並不認為是對本發明的限制。而且在整個附圖中，用相同的標號表示相同的部件。在附圖中：[圖1]係本發明訪問對接器、系統及應用該訪問對接器的方法及裝置之一實施例的訪問對接器的結構示意圖。 The advantages and benefits described herein, as well as other advantages and benefits, will be apparent to those of ordinary skill in the art from reading the following detailed description of the exemplary embodiments. The drawings are only for the purpose of illustrating exemplary embodiments and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to denote the same parts. In the accompanying drawings: [Fig. 1] is a structural schematic diagram of the access docking device, one embodiment of the access docking device, system, and method and device for applying the access docking device of the present invention.

[圖2]係本發明訪問對接器、系統及應用該訪問對接器的方法及裝置之一實施例的FileSystem接口的示意圖。 [ Fig. 2 ] is a schematic diagram of the FileSystem interface of an embodiment of the access docking device, the system and the method and device for using the access docking device of the present invention.

[圖3]係本發明訪問對接器、系統及應用該訪問對接器的方法及裝置之一實施例的應用訪問對接器的方法的流程示意圖。 [ Fig. 3 ] is a schematic flowchart of a method for using an access docker in one embodiment of the access docker, system, and method and device for applying the access docker in the present invention.

[圖4]係本發明訪問對接器、系統及應用該訪問對接器的方法及裝置之一實施例的應用訪問對接器的裝置的結構示意圖。 [ Fig. 4 ] is a structural diagram of a device for applying an access docker in one embodiment of the access docker, system, and method and device for applying the access docker in the present invention.

[圖5]係本發明訪問對接器、系統及應用該訪問對接器的方法及裝置之一實施例的應用訪問對接器的裝置的結構示意圖。 [ Fig. 5 ] is a structural diagram of a device for applying an access docker in one embodiment of the access docker, the system, and the method and device for applying the access docker in the present invention.

[圖6]係本發明訪問對接器、系統及應用該訪問對接器的方法及裝置之一實施例的計算機可讀存儲介質的示意圖。 [ FIG. 6 ] is a schematic diagram of a computer-readable storage medium of an embodiment of the access docking device, the system, and the method and device for using the access docking device of the present invention.

在附圖中，相同或對應的標號表示相同或對應的部分。 In the drawings, the same or corresponding reference numerals denote the same or corresponding parts.

下面將參照附圖更詳細地描述本公開的示例性實施例。雖然附圖中顯示了本公開的示例性實施例，然而應當理解，可以以各種形式實現本公開而不應被這裡闡述的實施例所限制。相反，提供這些實施例是為了能夠更透徹地理解本公開，並且能夠將本公開的範圍完整的傳達給本領域的技術人員 Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and It should not be limited by the examples set forth herein. Rather, these embodiments are provided so that the present disclosure can be more thoroughly understood, and the scope of the present disclosure can be fully conveyed to those skilled in the art.

在本發明中，應理解，諸如“包括”或“具有”等術語旨在指示本說明書中所公開的特徵、數字、步驟、行為、部件、部分或其組合的存在，並且不旨在排除一個或多個其他特征、數字、步驟、行為、部件、部分或其組合存在的可能性。 In the present invention, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts or combinations thereof disclosed in the specification, and are not intended to exclude one or multiple other features, numbers, steps, acts, parts, parts or combinations thereof.

在說明本發明之前，先對本發明中出現的若干技術用語進行簡單說明。 Before describing the present invention, some technical terms appearing in the present invention are briefly explained.

Hadoop：一個由Apache基金會所開發的分佈式系統基礎架構。用戶可以在不瞭解分佈式底層細節的情况下，開發分佈式程序。充分利用集群的威力進行高速運算和存儲。Hadoop是目前應用面最廣的一種分佈式計算平臺，採用MapReduce分佈式計算模型，提供了一系列的接口與框架，幫助用戶高效的利用分佈式集群的計算資源，提高計算的並行性。 Hadoop: A distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage. Hadoop is currently the most widely used distributed computing platform. It adopts the MapReduce distributed computing model and provides a series of interfaces and frameworks to help users efficiently utilize the computing resources of distributed clusters and improve the parallelism of computing.

Ceph：一種為可靠性和可擴展性而設計的統一的、分佈式文件系統，具有優秀的性能。Object Storage：對象存儲，也叫做基於對象的存儲，是用來描述解决和處理離散單元的方法的通用術語，這些離散單元被稱作為對象。就像文件一樣，對象包含數據，但是和文件不同的是，對象在一個層結構中不會再有層級結構。每個對象都在一個被稱作存儲池的扁平地址空間的同一級別裡，一個對象不會屬另一個對象的下一級。另外還需要說明的是，在不衝突的情况下，本發明中的實施例及實施例中的特徵可以相互組合。下面將參考附圖並結合實施例來詳細說明本發明。 Ceph: A unified, distributed file system designed for reliability and scalability, with excellent performance. Object Storage: Object storage, also known as object-based storage, is a general term used to describe methods of addressing and processing discrete units called objects. Like files, objects contain data, but unlike files, objects are never hierarchical within a hierarchy. Every object is in the same level of a flat address space called a storage pool, and one object does not belong to the next level of another object. In addition, it should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other. The present invention will be described in detail below with reference to the accompanying drawings and examples.

如圖1所示出，本實施例提供一種訪問對接器100。該訪問對接器100部署於Hadoop計算服務器1上，且訪問對接器100包括：兼容接口層101、操作實現層102和存儲訪問層103。其中，兼容接口層101用於兼容實現Hadoop的文件系統接口，從而實現與Hadoop計算服務組件11的訪問對接，操作實現層102通過向兼容接口層101提供第一接口函數，從而在文件系統接口下實現Hadoop計算服務組件所需的文件操作，存儲訪問層103通過向操作實現層102提供第二接口函數，從而將文件操作轉化為對分佈式存儲中的對象存儲的訪問操作，以實現與分佈式存儲的訪問對接。 As shown in FIG. 1 , this embodiment provides an access docking device 100 . The access dock 100 is deployed on the Hadoop computing server 1 , and the access dock 100 includes: a compatible interface layer 101 , an operation implementation layer 102 and a storage access layer 103 . Wherein, the compatible interface layer 101 is used for compatiblely implementing the file system interface of Hadoop, thereby realizing the access docking with the Hadoop computing service component 11, and the operation realization layer 102 provides the first interface function to the compatible interface layer 101, thereby under the file system interface Realize the required file operations of the Hadoop computing service components, the storage access layer 103 provides the second interface function to the operation implementation layer 102, thereby converting the file operations into access operations to the object storage in the distributed storage, to realize and distribute Stored access docking.

在一些可能的實施方式中，上述分佈式存儲可以優選為Ceph集群2。可以理解，本實施例也可以應用於實現Hadoop與Ceph集群之外的其他分佈式存儲設備的對接，本實施例以Ceph集群為例進行描述，但不限於此。通過採用CEPH集群作為Hadoop對接的分佈式存儲，對於Hadoop而言，可以有效地提高文件讀寫性能、提升文件訪問效率，同時Hadoop中的數據可以通過CEPH掛載到用戶空間中，實現了數據的多樣化管理；對於CEPH集群而言，通過Hadoop平臺對CEPH集群的訪問，為CEPH集群提供了Java編程語言的訪問接口，使得CEPH的應用場景和應用範圍得到了更大的擴展。 In some possible implementation manners, the above-mentioned distributed storage may preferably be a Ceph cluster 2 . It can be understood that this embodiment can also be applied to realize the interconnection between Hadoop and other distributed storage devices other than the Ceph cluster. This embodiment uses the Ceph cluster as an example for description, but is not limited thereto. By adopting the CEPH cluster as the distributed storage connected to Hadoop, for Hadoop, it can effectively improve the file read and write performance and improve the file access efficiency. At the same time, the data in Hadoop can be mounted to the user space through CEPH, realizing data storage Diversified management; for the CEPH cluster, the access to the CEPH cluster through the Hadoop platform provides the access interface of the Java programming language for the CEPH cluster, which greatly expands the application scenarios and application scope of CEPH.

在一些可能的實施方式中，Hadoop配置文件內容core-site.xml中包含訪問對接器的主類信息。舉例來說，如表1所示，在Hadoop的配置文件內容core-site.xml增加以下配置項：

件(create)、創建目錄(mkdir)、創建文件流(open)等虛方法，用於Hadoop所需的各種文件操作在Ceph上的實現；其中，配置項fs.cephRgw.impl表示cephRgw的實現類。配置項ceph.auth.id、ceph.conf.file、ceph.auth.access用戶的access key、ceph.auth.secret、ceph.auth.secret、mon host等設置了Ceph集群的參數，配置項fs.AbstractFileSystem.cephRgw.impl表示cephRgw的抽象文件系統的實現類。 In some possible implementation manners, the content of the Hadoop configuration file core-site.xml includes the main class information of the access docker. For example, as shown in Table 1, add the following configuration items to the Hadoop configuration file content core-site.xml:

Create virtual methods such as file (create), create directory (mkdir), and create file stream (open), which are used to implement various file operations required by Hadoop on Ceph; among them, the configuration item fs.cephRgw.impl represents the implementation class of cephRgw . The configuration items ceph.auth.id, ceph.conf.file, ceph.auth.access user's access key, ceph.auth.secret, ceph.auth.secret, mon host, etc. set the parameters of the Ceph cluster, and the configuration item fs. AbstractFileSystem.cephRgw.impl represents the implementation class of cephRgw's abstract file system.

進一步地，AbstractFileSystem在Hadoop中扮演了一個類似虛擬文件系統(VFS)的角色，由Hadoop在文件系統格式不明確的時候使用，實現了創建文CephRgw功能函數包括：CephRgw(URIthisUri,Configurationconf)throwsIOException,URISyntaxException；表示使Hadoop層組件運行過程中調用CephRgwFileSystem類裡面的函數。 Furthermore, AbstractFileSystem plays a role similar to a virtual file system (VFS) in Hadoop, which is used by Hadoop when the format of the file system is unclear, and realizes the function of creating a file CephRgw Functions include: CephRgw(URIthisUri,Configurationconf)throwsIOException, URISyntaxException ; Indicates that the function in the CephRgwFileSystem class is called during the running process of the Hadoop layer component.

以下對兼容接口層101、操作實現層102和存儲訪問層103的功能、內部實現結構進行示例性說明。 The functions and internal implementation structures of the compatible interface layer 101 , the operation implementation layer 102 and the storage access layer 103 are illustrated below.

(1)兼容接口層101 (1) Compatible interface layer 101

兼容接口層101用於兼容實現Hadoop文件系統接口(FileSystem)，從而實現與Hadoop計算服務組件的訪問對接；具體地，上述兼容接口層101利用CephRgwFileSystem類實現FileSystem接口，進而可以經由FileSystem接口與Hadoop計算服務組件形成訪問對接。具體可以被Hadoop計算服務組件調用以執行各種與文件有關的方法或實現與文件有關的操作，實現Hadoop文件系統接口功能，屏蔽Hadoop計算服務組件對文件IO的調用差異。 The compatible interface layer 101 is used to implement the Hadoop file system interface (FileSystem) compatiblely, thereby realizing the access docking with Hadoop computing service components; Service components form an access interface. Specifically, it can be called by Hadoop computing service components to execute various file-related methods or implement file-related operations, realize Hadoop file system interface functions, and shield Hadoop computing service components from calling differences in file IO.

圖2示出該FileSystem接口中包含的抽象方法，FileSystem接口支持Hadoop計算服務組件按需求執行與文件有關的操作，具體功能包括但不限於：通過配置文件初始化文件系統、創建文件或文件夾、獲取文件或文件夾的信息、設置文件或文件夾權限、創建文件讀寫數據流、對文件進行讀寫操作、重命名或者刪除文件夾。 Figure 2 shows the abstract methods contained in the FileSystem interface. The FileSystem interface supports Hadoop computing service components to perform file-related operations on demand. The specific functions include but are not limited to: initializing the file system through configuration files, creating files or folders, obtaining File or folder information, set file or folder permissions, create file read and write data streams, read and write files, rename or delete folders.

在一些可能的實施方式中，上述兼容接口層101也即CephRgwFileSystem層，可以由部署在Hadoop指定目錄下的第一Java包CephRgwFileSystem.jar而實現。比如，可以在Hadoop的share/Hadoop/common/lib下放置該CephRgwFileSystem.jar。此外，還可以利用上述CephRgwFileSystem.jar，同時實現Hadoop的調度服務組件(比如Yarn)中對於諸如緩存存放位置等特殊文件存儲需求的對接。 In some possible implementation manners, the above-mentioned compatible interface layer 101, that is, the CephRgwFileSystem layer, can be implemented by the first Java package CephRgwFileSystem.jar deployed in the specified directory of Hadoop. For example, the CephRgwFileSystem.jar can be placed under share/Hadoop/common/lib of Hadoop. In addition, the above-mentioned CephRgwFileSystem.jar can also be used to realize the docking of Hadoop's scheduling service components (such as Yarn) for special file storage requirements such as cache storage locations.

在一些可能的實施方式中，CephRgwFileSystem層還用於：在Hadoop的組件運行時，使yarn調用CephRgwFileSystem類中的功能函數。比如，可以通過部署在Hadoop中的CephRgw功能函數而實現上述功能。 In some possible implementation manners, the CephRgwFileSystem layer is also used to: make yarn call the function function in the CephRgwFileSystem class when the Hadoop components are running. For example, the above functions can be realized through the CephRgw function deployed in Hadoop.

舉例來說，該CephRgw功能函數可以為：CephRgw(URIthisUri,Configuration conf)throws IOException,URISyntaxException；在一些可能的實施方式中，上述CephRgwFileSystem類的操作複用HDFS類的實現，從而保持HDFS客戶端的文件讀取操作的邏輯與兼容性要求，因此，在調用CephRgwFileSystem類中的功能函數時，可以使Hadoop的組件通過本機分佈式方式訪問Ceph集群集群，無需重寫業務代碼，簡化了客戶端代碼使用。 For example, the CephRgw functional function can be: CephRgw(URIthisUri, Configuration conf) throws IOException, URISyntaxException; Therefore, when calling the functional functions in the CephRgwFileSystem class, Hadoop components can access the Ceph cluster in a local distributed manner, without rewriting the business code, which simplifies the use of client code.

舉例來說，如表2所示出，該CephRgwFileSystem類包含的功能函數包括：

For example, as shown in Table 2, the functional functions included in the CephRgwFileSystem class include:

由上述兼容接口層101可以看出，通過引入FileSystem的新的實現類CephRgwFileSystem，可以實現了對應HDFS訪問的兼容，對Hadoop計算服務組件的對接。 It can be seen from the compatibility interface layer 101 above that by introducing CephRgwFileSystem, a new implementation class of FileSystem, compatibility with HDFS access and connection with Hadoop computing service components can be realized.

(2)操作實現層102 (2) Operation Realization Layer 102

操作實現層102通過向上層的兼容接口層101提供第一接口函數，從而在FileSystem接口下實現Hadoop計算服務組件所需求的文件操作；具體地，操作實現層102也即cephlibrgw層，可以通過部署在Hadoop指定目錄下的第二Java包cephlibrgw.jar而實現。比如，可以在Hadoop的share/Hadoop/common/lib下放置該cephlibrgw.jar以實現上述cephlibrgw層。 The operation implementation layer 102 provides the first interface function to the compatible interface layer 101 of the upper layer, thereby realizing the file operation required by the Hadoop computing service component under the FileSystem interface; specifically, the operation implementation layer 102 is also the cephlibrgw layer, which can be deployed in It is realized by the second Java package cephlibrgw.jar in the specified directory of Hadoop. For example, the cephlibrgw.jar can be placed under share/Hadoop/common/lib of Hadoop to implement the above cephlibrgw layer.

在一些可能的實施方式中，上述文件操作至少包括以下中的一種或多種：列舉出文件及文件夾、創建文件夾、刪除文件夾、得到文件的狀態信息、重命名文件、文件夾返回、打開文件的指針、將數據流寫入打開的文件中、讀取打開的文件的數據，實現用戶認證。 In some possible implementations, the above file operations include at least one or more of the following: list files and folders, create folders, delete folders, obtain file status information, rename files, return folders, open The pointer of the file, write the data stream into the opened file, read the data of the opened file, and realize user authentication.

舉例來說，如表3所示，由cephlibrgw層提供的第一接口函數為Java接口函數，可以包括：

For example, as shown in Table 3, the first interface function provided by the cephlibrgw layer is a Java interface function, which may include:

(3)存儲訪問層103 (3) Storage access layer 103

存儲訪問層103通過向操作實現層102提供第二接口函數，從而將文件操作轉化為對分佈式存儲中的對象存儲的訪問操作。其中對象存儲的訪問操作具體是對Ceph集群中rados集群的訪問操作。 The storage access layer 103 provides the second interface function to the operation implementation layer 102, thereby transforming the file operation into an access operation to the object storage in the distributed storage. The access operation of the object storage is specifically the access operation of the rados cluster in the Ceph cluster.

在一些可能的實施方式中，存儲訪問層103是一個C語言層，可以由部署在Hadoop指定目錄下的動態鏈接庫文件Libcephrgw.so而實現，比如可以在Hadoop的/usr/lib64/文件夾下放置libcephrgw.so實現上述存儲訪問層103。 In some possible implementations, the storage access layer 103 is a C language layer, which can be implemented by the dynamic link library file Libcephrgw.so deployed in the Hadoop specified directory, such as The libcephrgw.so can be placed under the /usr/lib64/ folder of Hadoop to implement the storage access layer 103 above.

存儲訪問層103向操作實現層102提供的第二接口函數具體可以是在Libcephrgw.so中封裝的、用於訪問Ceph集群中rados集群的C++接口函數。該C++接口函數提供了文件創建、文件訪問、文件讀取、文件寫入、文件更新、目錄列表、文件名查詢、文件狀態查詢、系統狀態查詢等基本操作的函數接口，並重新封裝了初始化系統句柄、獲得操作句柄等函數，用戶只需要申請相應的句柄後直接調用操作函數即可進行相應的操作，無需手動管理Ceph內部的中間變量與參數。 The second interface function provided by the storage access layer 103 to the operation implementation layer 102 may specifically be a C++ interface function encapsulated in Libcephrgw.so for accessing the rados cluster in the Ceph cluster. The C++ interface function provides function interfaces for basic operations such as file creation, file access, file reading, file writing, file update, directory listing, file name query, file status query, and system status query, and repackages the initialization system Handle, obtain operation handle and other functions, the user only needs to apply for the corresponding handle and directly call the operation function to perform the corresponding operation, without manually managing the intermediate variables and parameters inside Ceph.

舉例來說，如表4所示，由Libcephrgw.so提供的第二接口函數為C++接口函數，且可以包括：

For example, as shown in Table 4, the second interface function provided by Libcephrgw.so is a C++ interface function, and may include:

在一些可能的實施方式中，上述操作實現層102還可以調用Libcephrgw.so封裝的C++接口函數，並將其轉化為提供給上層兼容接口層的java接口函數，也即第一接口函數。具體地，操作實現層102利用JNI實現Java接口函數與C++接口函數之間的轉換。其中，上述JNI提供了若干的調用接口以實現了Java語言和C++語言的通信。可以理解，Hadoop所採用的程序語言為Java語言，Ceph集群所採用的語言為C++語言，Java語言無法直接操作硬件，因此可以通過JNI調用C++的庫或函數進而操作硬件，避免重複開發。 In some possible implementations, the above-mentioned operation implementation layer 102 can also call the C++ interface function encapsulated by Libcephrgw.so, and convert it into a java interface function provided to the upper compatible interface layer, that is, the first interface function. Specifically, the operation implementation layer 102 uses JNI to implement conversion between Java interface functions and C++ interface functions. Among them, the above-mentioned JNI provides several calling interfaces to realize the communication between the Java language and the C++ language. It can be understood that the programming language used by Hadoop is Java language, and the language used by Ceph cluster is C++ language. Java language cannot directly operate hardware, so you can call C++ libraries or functions through JNI to operate hardware to avoid repeated development.

在一些可能的實施方式中，如圖1所示出，存儲訪問層103具體包括：Crush計算單元1032與文件讀寫單元1031，其中Crush計算單元用於和Ceph集群的Mon節點建立通信以獲取Ceph集群的CrushMap，並通過Crush算法計算Ceph集群中對象存儲設備OSD的位置；文件讀寫單元，用於與Ce ph集群2中的對象存儲設備OSD建立Socket通信，以實現對Ceph集群的訪問操作，也即實現對於Ceph集群的對接。 In some possible implementations, as shown in Figure 1, the storage access layer 103 specifically includes: a Crush computing unit 1032 and a file reading and writing unit 1031, wherein the Crush computing unit is used to establish communication with the Mon node of the Ceph cluster to obtain the Ceph The CrushMap of the cluster, and calculate the location of the object storage device OSD in the Ceph cluster through the Crush algorithm; the file read and write unit is used to communicate with the Ceph The object storage device OSD in the ph cluster 2 establishes Socket communication to realize the access operation to the Ceph cluster, that is, to realize the connection to the Ceph cluster.

在一些可能的實施方式中，訪問對接器100具體部署於Hadoop計算服務器集群中的各Hadoop計算服務器節點。從而實現Hadoop的大數據計算服務分佈式直接訪問Ceph存儲，無需額外經過網關，訪問路徑較短，性能和可用性提升。 In some possible implementation manners, the access docking device 100 is specifically deployed on each Hadoop computing server node in the Hadoop computing server cluster. In this way, Hadoop's big data computing services can be distributed and directly access Ceph storage without additional gateways, the access path is short, and performance and availability are improved.

通過上述訪問對接器中的兼容接口層、操作實現層和對象訪問層協同工作，可以在不改變任何Hadoop存儲服務及管理層以上的接口與軟件實現情况下，支持Hadoop的計算服務與存儲服務異構解耦，實現Hadoop的計算服務組件直接以對象存儲的訪問操作的方式訪問異構的分佈式存儲，性能和可用性提升。 Through the cooperative work of the compatible interface layer, operation implementation layer and object access layer in the above-mentioned access docking device, it is possible to support Hadoop computing services and storage services without changing any Hadoop storage services and interfaces and software implementations above the management layer. Structural decoupling enables Hadoop's computing service components to directly access heterogeneous distributed storage in the form of object storage access operations, improving performance and availability.

基於上述訪問對接器，本申請實施例還提供了一種訪問對接系統。包括：Hadoop計算服務器集群與分佈式存儲，其中在Hadoop計算服務器集群的各計算服務器上部署有上述訪問對接器，用於將各計算服務器對接至該分佈式存儲。 Based on the above access docking device, an embodiment of the present application further provides an access docking system. It includes: a Hadoop computing server cluster and distributed storage, wherein the above-mentioned access docking device is deployed on each computing server of the Hadoop computing server cluster to connect each computing server to the distributed storage.

在一些可能的實施方式中，分佈式存儲利用空閒存儲接口向Hadoop計算服務器集群以外的計算平臺提供存儲服務。比如，Ceph集群的存儲資源可以同時共享給大數據、虛機、容器不同應用使用，從而實現存儲資源共享。 In some possible implementation manners, the distributed storage uses an idle storage interface to provide storage services to computing platforms outside the Hadoop computing server cluster. For example, the storage resources of the Ceph cluster can be shared with different applications of big data, virtual machines, and containers at the same time, so as to realize the sharing of storage resources.

需要說明的是，本申請實施例中的訪問對接系統可以實現訪問對接器的實施例的各個方面，並達到相同的效果和功能，這裡不再贅述。 It should be noted that the access docking system in the embodiment of the present application can implement various aspects of the embodiment of the access docking device, and achieve the same effect and function, and will not be repeated here.

基於上述訪問對接器，本申請實施例還提供了一種應用訪問對接器的方法。圖3為根據本申請一實施例的應用訪問對接器的方法的流程示意圖，如圖3所示，方法300包括：步驟S301：接收Hadoop計算服務組件的訪問請求；步驟S302：利用上述訪問對接器，將訪問請求轉化為對分佈式存儲中的對象存儲的訪問操作。 Based on the access docking device described above, the embodiment of the present application further provides a method for using the access docking device. Fig. 3 is a schematic flow diagram of a method for application access docking device according to an embodiment of the present application. As shown in Fig. 3, method 300 includes: step S301: receiving the access request of Hadoop computing service component; step S302: using the above-mentioned access docking device , transforming the access request into an access operation to the object storage in the distributed storage.

在一些可能的實施方式中，方法300還可以包括：利用Hadoop配置文件內容core-site.xml獲取訪問對接器的主類信息。 In some possible implementation manners, the method 300 may further include: acquiring the main class information of the access docker by using the content core-site.xml of the Hadoop configuration file.

接下來，以put文件為例的數據訪問流程對上述應用訪問對接器的方法進行詳細描述。 Next, the data access process using the put file as an example will describe in detail the method for the above application to access the docking device.

首先，由兼容接口層執行：步驟S41：將put文件分片。 Firstly, it is executed by the compatible interface layer: step S41: fragment the put file.

步驟S42：通過create函數將put文件以數據流方式傳給操作實現層。 Step S42: Send the put file to the operation implementation layer in the form of data flow through the create function.

其中，Hadoop根據core-site.xml文件的io.file.buffer.size配置項對文件進行分片(默認是4096字節)；通過Filesystem接口中定義的create函數構建CephRgwOutputStream的文件輸出流，傳遞給下層操作實現層層；在CephRgwOutputStream中根據core-site.xml文件的ceph.io.buffer.size配置項設置緩衝區大小(默認是4M)；Hadoop調用CephRgwOutputStream的Write函數將文件內容傳遞到cephlibrgw.jar。 Among them, Hadoop fragments the file according to the io.file.buffer.size configuration item of the core-site.xml file (the default is 4096 bytes); the file output stream of CephRgwOutputStream is constructed through the create function defined in the Filesystem interface, and passed to The lower layer operation is implemented layer by layer; in CephRgwOutputStream, set the buffer size according to the ceph.io.buffer.size configuration item of the core-site.xml file (the default is 4M); Hadoop calls the Write function of CephRgwOutputStream to transfer the file content to cephlibrgw.jar .

其次，由操作實現層執行：步驟S43：實現Java接口函數與C++接口函數對接，將數據流繼續向下傳遞給存儲訪問層。 Second, performed by the operation implementation layer: Step S43: Realize the connection between the Java interface function and the C++ interface function, and continue to pass the data flow down to the storage access layer.

其中，操作實現層通過調用下層存儲訪問層提供的C++接口函數，將文件數據流傳遞給存儲訪問層。 Wherein, the operation implementation layer transfers the file data flow to the storage access layer by calling the C++ interface function provided by the lower storage access layer.

再次，由存儲訪問層執行：S44：獲取Ceph集群信息，並對數據流進行再次分片。S45：計算各分片對應的OSD位置S46：和OSD直接進行通信，上傳文件。 Again, executed by the storage access layer: S44: obtain Ceph cluster information, and re-shard the data flow. S45: Calculate the OSD position corresponding to each slice. S46: Communicate directly with the OSD and upload files.

其中，存儲訪問層通過其中的Crush計算單元和ceph mon通信，獲取ceph集群信息，並根據ceph集群的底層Objects大小(默認是4M)對文件數據流進行再切分；存儲訪問層通過其中的Crush計算單元和ceph mon通信，獲取Crush Map，並根據分片信息在Crush計算單元計算出每個分片對應的主OSD的ip及端口號；存儲訪問層通過其中文件讀寫操作和OSD建立通信進行數據異步傳輸，傳輸完成後，OSD端將消息返回通過應用上述訪問對接器，利用其中的兼容接口層、操作實現層和對象訪問層協同工作，可以在不改變任何Hadoop存儲服務及管理層以上的接口與軟件實現情况下，支持Hadoop的計算服務與存儲服務異構解耦，實現Hadoop的計算服務組件直接以對象存儲的訪問操作的方式訪問異構的分佈式存儲，性能和可用性提升。 Among them, the storage access layer communicates with the ceph mon through the Crush computing unit, obtains the ceph cluster information, and re-segments the file data stream according to the underlying Objects size of the ceph cluster (default is 4M); the storage access layer passes the Crush The computing unit communicates with ceph mon to obtain the Crush Map, and calculates the ip and port number of the main OSD corresponding to each fragment in the Crush computing unit according to the fragmentation information; the storage access layer establishes communication through file read and write operations and OSD The data is transmitted asynchronously. After the transmission is completed, the OSD end will return the message. By applying the above-mentioned access docker, using the compatible interface layer, operation implementation layer and object access layer to work together, it can be used without changing any Hadoop storage services and above the management layer. In the case of interface and software implementation, it supports the heterogeneous decoupling of Hadoop's computing services and storage services, enabling Hadoop's computing service components to directly access heterogeneous distributed storage in the form of object storage access operations, improving performance and availability.

基於上述訪問對接器，本申請還提供一種應用訪問對接器的裝置。圖4為根據本申請一實施例的應用訪問對接器的裝置的結構示意圖，如圖4所示，裝置400包括：接收模塊401，用於接收Hadoop計算服務組件的訪問請求；訪問模塊402，用於利用上述訪問對接器，將訪問請求轉化為對分佈式存儲中的對象存儲的訪問操作。 Based on the above access docking device, the present application further provides a device for using the access docking device. Fig. 4 is a schematic structural diagram of an application access docking device according to an embodiment of the present application. As shown in Fig. 4, the device 400 includes: a receiving module 401 for receiving an access request from a Hadoop computing service component; an access module 402 for using The above-mentioned access docker is used to convert the access request into an access operation to the object storage in the distributed storage.

在一些實施例中，裝置400還包括：加載模塊，用於利用Hadoop配置文件內容core-site.xml加載訪問對接器。 In some embodiments, the apparatus 400 further includes: a loading module, configured to load the access docker using the core-site.xml content of the Hadoop configuration file.

所屬技術領域的技術人員能夠理解，本發明的各個方面可以實現為設備、方法或計算機可讀存儲介質。因此，本發明的各個方面可以具體實現為以下形式，即：完全的硬件實施方式、完全的軟件實施方式(包括固件、微代碼等)，或硬件和軟件方面結合的實施方式，這裡可以統稱為“電路”“模塊”或“設備”。 Those skilled in the art can understand that various aspects of the present invention can be implemented as devices, methods, or computer-readable storage media. Therefore, various aspects of the present invention can be embodied in the following forms, that is: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which can be collectively referred to herein as "circuit", "module" or "equipment".

在一些可能的實施方式中，本發明的一種應用訪問對接器的裝置可以至少包括一個或多個處理器、以及至少一個存儲器。其中，存儲器存儲有程序，當程序被處理器執行時，使得處理器執行如圖3所示的步驟：步驟S301：接收Hadoop計算服務組件的訪問請求；步驟S302利用上述的訪問對接器，將訪問請求轉化為對分佈式存儲中的對象存儲的訪問操作。 In some possible implementation manners, an apparatus for accessing a docking device by application of the present invention may at least include one or more processors and at least one memory. Wherein, the memory stores a program, and when the program is executed by the processor, the processor is made to perform the steps shown in Figure 3: Step S301: Receive the access request of the Hadoop computing service component; Step S302 uses the above-mentioned access docking device to access The request is converted into an access operation to the object storage in the distributed storage.

下面參照圖5來描述根據本發明的這種實施方式的應用訪問對接器的裝置500。 An apparatus 500 for accessing a docking device by an application according to this embodiment of the present invention will be described below with reference to FIG. 5 .

圖5顯示的裝置500僅僅是一個示例，不應對本發明實施例的功能和使用範圍帶來任何限制。 The device 500 shown in FIG. 5 is only an example, and should not impose any limitation on the functions and scope of use of this embodiment of the present invention.

如圖5所示，裝置500可以以通用計算設備的形式表現，包括但不限於：至少一個處理器10、至少一個存儲器20、連接不同設備組件的總線60。 As shown in FIG. 5 , the apparatus 500 may be in the form of a general-purpose computing device, including but not limited to: at least one processor 10 , at least one memory 20 , and a bus 60 connecting different device components.

總線60包括數據總線、地址總線和控制總線。 The bus 60 includes a data bus, an address bus and a control bus.

存儲器20可以包括易失性存儲器，例如隨機存取存儲器(RAM)21和/或高速緩存存儲器22，還可以進一步包括只讀存儲器(ROM)23。 The memory 20 may include a volatile memory, such as a random access memory (RAM) 21 and/or a cache memory 22 , and may further include a read only memory (ROM) 23 .

存儲器20還可以包括程序模塊24，這樣的程序模塊24包括但不限於：操作設備、一個或者多個應用程序、其它程序模塊以及程序數據，這些示例中的每一個或某種組合中可能包括網絡環境的實現。 Memory 20 may also include program modules 24, such program modules 24 including, but not limited to, an operating device, one or more application programs, other program modules, and program data, each or some combination of which may include network realization of the environment.

裝置500還可以與一個或多個外部設備200(例如鍵盤、指向設備、藍牙設備等)通信，也可與一個或者多個其他設備進行通信。這種通信可以通過輸入/輸出(I/O)接口40進行，並在顯示單元30上進行顯示。並且，裝置500還可以通過網絡適配器50與一個或者多個網絡(例如局域網(LAN)，廣域網(WAN)和/或公共網絡，例如因特網)通信。如圖所示，網絡適配器50通過總線60與裝置500中的其它模塊通信。應當明白，儘管圖中未示出，但可以結合裝置500使用其它硬件和/或軟件模塊，包括但不限於：微代碼、設備驅動器、冗餘處理單元、外部磁盤驅動陣列、RAID設備、磁帶驅動器以及數據備份存儲設備等。 The apparatus 500 can also communicate with one or more external devices 200 (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more other devices. Such communication may be performed through an input/output (I/O) interface 40 and displayed on the display unit 30 . Moreover, the device 500 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network such as the Internet) through the network adapter 50 . As shown, network adapter 50 communicates with other modules in device 500 via bus 60 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with apparatus 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID devices, tape drives And data backup storage devices, etc.

圖6示出了一種計算機可讀存儲介質，用於執行如上所述的方法。在一些可能的實施方式中，本發明的各個方面還可以實現為一種計算機可讀存儲介質的形式，其包括程序代碼，當所述程序代碼在被處理器執行時，所述程序代碼用於使所述處理器執行上面描述的方法。 Fig. 6 shows a computer-readable storage medium for performing the method described above. In some possible implementations, various aspects of the present invention can also be implemented in the form of a computer-readable storage medium, which includes program code, and when the program code is executed by a processor, the program code is used to use The processor executes the methods described above.

上面描述的方法包括了上面的附圖中示出和未示出的多個操作和步驟，這裡將不再贅述。 The method described above includes multiple operations and steps shown and not shown in the above figures, which will not be repeated here.

所述計算機可讀存儲介質可以採用一個或多個可讀介質的任意組合。可讀介質可以是可讀信號介質或者可讀存儲介質。可讀存儲介質例如可以是──但不限於──電、磁、光、電磁、紅外線、或半導體的設備、設備或器件，或者任意以上的組合。可讀存儲介質的更具體的例子(非窮舉的列表)包括：具有一個或多個導線的電連接、便携式盤、硬盤、隨機存取存儲器(RAM)、只讀存儲器(ROM)、可擦式可編程只讀存儲器(EPROM或閃存)、光纖、便携式緊凑盤只讀存儲器(CD-ROM)、光存儲器件、磁存儲器件、或者上述的任意合適的組合。 The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. Readable storage media can, for example, Is - but is not limited to - an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor device, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

如圖6所示，描述了根據本發明的實施方式的計算機可讀存儲介質600，其可以採用便携式緊凑盤只讀存儲器(CD-ROM)並包括程序代碼，並可以在終端設備，例如個人電腦上運行。然而，本發明的計算機可讀存儲介質不限於此，在本文件中，可讀存儲介質可以是任何包含或存儲程序的有形介質，該程序可以被指令執行設備、設備或者器件使用或者與其結合使用。 As shown in FIG. 6 , a computer-readable storage medium 600 according to an embodiment of the present invention is described, which can adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and can be stored in a terminal device, such as a personal run on the computer. However, the computer-readable storage medium of the present invention is not limited thereto. In this document, the readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution device, device, or device .

可以以一種或多種程序設計語言的任意組合來編寫用於執行本發明操作的程序代碼，所述程序設計語言包括面向對象的程序設計語言─諸如Java、C++等，還包括常規的過程式程序設計語言─諸如“C”語言或類似的程序設計語言。程序代碼可以完全地在用戶計算設備上執行、部分地在用戶設備上執行部分在遠程計算設備上執行、或者完全在遠程計算設備或服務器上執行。在涉及遠程計算設備的情形中，遠程計算設備可以通過任意種類的網絡──包括局域網(LAN)或廣域網(WAN)─連接到用戶計算設備，或者，可以連接到外部計算設備(例如利用因特網服務提供商來通過因特網連接)。 Program code for carrying out the operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural programming Language - such as "C" or similar programming language. The program code may execute entirely on the user computing device, partly on the user device and partly on the remote computing device, or entirely on the remote computing device or server. In cases involving a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it may be connected to an external computing device (for example, using an Internet service Provider via Internet connection).

此外，儘管在附圖中以特定順序描述了本發明方法的操作，但是，這並非要求或者暗示必須按照該特定順序來執行這些操作，或是必須執行全部所示的操作才能實現期望的結果。附加地或備選地，可以省略某些步驟，將多個步驟合並為一個步驟執行，和/或將一個步驟分解為多個步驟執行。 In addition, although operations of the methods of the present invention are depicted in a particular order in the figures, this does not require or imply that the operations must be performed in that particular order, or that all operations must be performed. The actions shown in the section will achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.

雖然已經參考若干具體實施方式描述了本發明的精神和原理，但是應該理解，本發明並不限於所公開的具體實施方式，對各方面的劃分也不意味著這些方面中的特徵不能組合以進行受益，這種劃分僅是為了表述的方便。本發明旨在涵蓋所附權利要求的精神和範圍內所包括的各種修改和等同佈置。 Although the spirit and principles of the invention have been described with reference to a number of specific embodiments, it should be understood that the invention is not limited to the specific embodiments disclosed, nor does division of aspects imply that features in these aspects cannot be combined to achieve optimal performance. Benefit, this division is only for the convenience of expression. The present invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

1:Hadoop計算服務器 1:Hadoop computing server

11:Hadoop計算服務組件 11:Hadoop Computing Service Components

100:訪問對接器 100: Access docker

101:兼容接口層 101: Compatible interface layer

102:操作實現層 102: Operation implementation layer

103:存儲訪問層 103:Storage Access Layer

1031:文件讀寫單元 1031: file reading and writing unit

1032:Crush計算單元 1032: Crush calculation unit

2:Ceph集群 2: Ceph cluster

Claims

An access docking device is characterized in that it is deployed on a Hadoop computing server, and includes: a compatible interface layer, which is used to implement compatible file system interfaces of Hadoop, thereby realizing access docking with Hadoop computing service components; The compatible interface layer provides the first interface function, thereby realizing the file operation required by the Hadoop computing service component under the file system interface; the storage access layer provides the second interface function to the operation implementation layer, thereby the described File operations are transformed into access operations on object storage in distributed storage, where the distributed storage is a Ceph cluster, and the storage access layer includes: a Crush computing unit, which is used to establish communication with the Mon node of the Ceph cluster to obtain Ceph The CrushMap of the cluster, and calculate the position of the object storage device OSD in the Ceph cluster by the Crush algorithm, wherein the storage access layer is implemented by the dynamic link library file deployed in the Hadoop specified directory, and the second interface function is the Encapsulated in the dynamic link library file, for accessing the C++ interface function of the object storage in the distributed storage, the operation implementation layer is implemented by the second Java package deployed in the Hadoop specified directory, the second Java The package is used to convert the C++ interface function encapsulated in the dynamic link library file into a java interface function, and the java interface function is the first interface function.

The access docking device according to claim 1, wherein the access operation of the object storage is an access operation to the rados cluster in the Ceph cluster.

The access docking device as described in claim item 1, wherein the storage access layer also includes: a file read and write unit, which is used to establish Socket communication with the object storage device OSD in the Ceph cluster, so as to realize the access to the Ceph cluster operate.

The access docking device according to claim 1, wherein the file operations include at least one or more of the following: list files and folders, create folders, delete folders, Get the status information of the file, rename the file, return the folder, open the pointer of the file, write the data stream into the opened file, read the data of the opened file, and realize user authentication.

The access docking device according to claim 1, wherein the second Java package utilizes JNI to implement conversion between the Java interface function and the C++ interface function.

The access docking device according to claim 1, wherein the compatible interface layer is implemented by a first Java package deployed in a Hadoop specified directory.

The access docking device according to claim 1, wherein the operation of the file system interface reuses the implementation of the Hadoop distributed file system.

The access docking device according to claim 6, wherein the compatible interface layer is further configured to: enable the yarn component of Hadoop to call the function in the first Java package during operation.

The access docking device according to claim 1, wherein the access docking device is deployed on each computing server node in the Hadoop computing server cluster.

The access docking device according to claim 1, wherein the content of the Hadoop configuration file core-site.xml includes the main class information of the access docking device.

An access docking system, comprising: a Hadoop computing server cluster and distributed storage, characterized in that the access docking device as described in any one of request items 1-10 is deployed on each computing server node of the Hadoop computing server cluster , for connecting each computing server node to the distributed storage.

The access docking system according to claim 11, wherein the distributed storage uses an idle storage interface to provide storage services to computing platforms other than the Hadoop computing server cluster.

The access docking system according to claim 12, wherein the distributed storage is a Ceph cluster, and the free storage interface includes a block device storage interface and a file system storage interface.

A method for applying an access docking device, comprising: receiving an access request of a Hadoop computing service component; utilizing the access docking device as described in any one of request items 1-10, converting the access request into a distributed The access operation of the object storage in the storage.

The method according to claim 14, wherein, before receiving the access request of the Hadoop computing service component, further comprising: obtaining the main class information of the access docker by using the content core-site.xml of the Hadoop configuration file.

An application access docking device, characterized in that it includes: a receiving module for receiving the access request of the Hadoop computing service component; an access module for utilizing the access docking device as described in any one of request items 1-10 , converting the access request into an access operation on the object storage in the distributed storage.

The device according to claim 16, further comprising: a loading module, configured to obtain the main class information of the access docker by using the Hadoop configuration file content core-site.xml.

An application access docking device, characterized in that it includes: one or more multi-core processors; a memory for storing one or more programs; when the one or more programs are processed by the one or more multi-cores When executed by a multi-core processor, the one or more multi-core processors implement the method described in claim 14.

A computer-readable storage medium, the computer-readable storage medium stores a program, and when the program is executed by a multi-core processor, the multi-core processor executes the method described in claim 14.