WO2013170504A1 - Large data storage system - Google Patents

Large data storage system Download PDF

Info

Publication number
WO2013170504A1
WO2013170504A1 PCT/CN2012/076516 CN2012076516W WO2013170504A1 WO 2013170504 A1 WO2013170504 A1 WO 2013170504A1 CN 2012076516 W CN2012076516 W CN 2012076516W WO 2013170504 A1 WO2013170504 A1 WO 2013170504A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
physical server
data
server
storage disk
Prior art date
Application number
PCT/CN2012/076516
Other languages
French (fr)
Chinese (zh)
Inventor
王东临
金友兵
Original Assignee
天津书生投资有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天津书生投资有限公司 filed Critical 天津书生投资有限公司
Priority to US13/858,489 priority Critical patent/US20140181116A1/en
Publication of WO2013170504A1 publication Critical patent/WO2013170504A1/en
Priority to US14/943,909 priority patent/US20160112413A1/en
Priority to US15/055,373 priority patent/US20160182638A1/en
Priority to US15/594,374 priority patent/US20170249093A1/en
Priority to US16/378,076 priority patent/US20190235777A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45579I/O management, e.g. providing access to device drivers or storage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the present invention relates to the field of data storage, and in particular, to a big data storage system. Background technique
  • Figure 1 shows a big data storage system commonly used in the prior art.
  • the big data storage in the prior art is usually in the form of a SAN and a fiber switch, which is very expensive.
  • the cloud storage technology represented by Hadoop uses a large number of inexpensive servers to form a large amount of storage capacity, which greatly reduces the cost compared with the SAN.
  • each storage device still needs to be equipped with a corresponding storage server, which requires high network bandwidth and often needs With expensive network equipment, and the Name Node still has a single point of failure risk, cost, performance and reliability are still not ideal.
  • the embodiment of the invention provides a big data storage system to provide a high performance, low input, high reliability big data storage architecture.
  • a large data storage system which is included in the embodiment of the present invention, includes a plurality of virtual machines running on a first physical server, and a first storage disk, wherein the first physical server is directly connected to the first direct storage. Disk connection;
  • the first directly connected storage disk is configured to provide data storage
  • the other one of the multiple virtual machines is connected to the virtual machine supporting the storage sharing function through an internal bus, and is configured to receive a request from the user, and read the first by the virtual machine supporting the storage sharing function according to the user request. Directly connect the data of the storage disk to present the data on the first directly connected storage disk to the user.
  • the directly connected storage disk is directly connected to the physical server, and the access efficiency is higher than that of the network connection, and multiple physical machines are run on one physical server, so that one physical server is replaced.
  • the functions of multiple physical servers in the prior art are flexible and inexpensive, and the access speed is fast because multiple virtual machines are connected through an internal bus. Therefore, the data storage system provided by the embodiments of the present invention has the advantages of high performance and low cost.
  • FIG. 1 is a structural block diagram of a large data storage system commonly used in the prior art.
  • FIG. 2 is a structural block diagram of a big data storage system according to an embodiment of the present invention.
  • FIG. 3 is a structural block diagram of a big data storage system according to an embodiment of the present invention.
  • FIG. 4 is a structural block diagram of a big data storage system according to another embodiment of the present invention.
  • FIG. 5 is a structural block diagram of a big data storage system according to another embodiment of the present invention. detailed description
  • FIG. 2 is a structural block diagram of a big data storage system according to an embodiment of the present invention.
  • the physical server 100 is directly connected to the direct-attached storage 200, wherein the virtual server 101 runs on a plurality of virtual machines 101 to 104, wherein the virtual machine 104 has a storage sharing function; the virtual machines 101 to 103 and the virtual machine 104 Connected via an internal bus.
  • the virtual machines 101 to 103 are configured to receive a request from the user, and read the data of the direct-attached storage 200 through the virtual machine 104 according to the user request, and present the data on the direct-attached storage 200 to the user.
  • Direct Connect Storage 200 is used to provide data storage.
  • each direct storage may be formed by a disk array.
  • the disk array may adopt a RAID mode to improve reliability. You can increase the capacity by increasing the number of disks in the disk array.
  • the direct-attached storage 200 may also be cascaded by a plurality of disk arrays by means such as SAS lines.
  • the multi-virtual machine in the embodiment of the present invention is equivalent to the server cluster in the prior art.
  • the scalable DAS in the embodiment of the present invention is compared with the San in the prior art, but the technical solution provided by the embodiment of the present invention may be used.
  • the need for prior art storage servers and expensive fiber optic network systems is no longer required, and the cost is greatly reduced.
  • the data when data is read, the data needs to be read to the storage server, and then through the network switch, and finally to the application server, and when the data is read by using the technical solution of the embodiment of the present invention,
  • the data is directly read from the shared virtual machine, and then the application virtual machine is transmitted through the internal bus. It can be seen that the data access efficiency of the technical solution provided by the embodiment of the present invention is better.
  • FIG. 3 is a structural block diagram of a specific big data storage system according to an embodiment of the present invention.
  • two sets of application service groups are established in one physical server, and each set of application service groups includes three application servers with different functions, as shown in the figure, wherein each group of application service groups includes a post-web server vml or Vm4 (corresponds to the web server in the pre-server, for security reasons, the pre-server is usually located in another independent physical server, as shown in Figure 4), the application server vm2 or vm5 (used to provide different users) Applications such as mail servers, file servers, etc., upload server vm3 or vm6 (for receiving and processing user upload requests and data); the physical server further includes a virtual machine vm7, which has storage sharing capabilities With this virtual machine vm7, multiple virtual machines can access one Das device at the same time.
  • the virtual machine vml-vm6 is connected to the virtual machine vm7 through the internal bus of the physical server, and directly connected to the DAS through the virtual machine vm7.
  • the virtual machine vml-vm6 is connected to the virtual machine vm7 through the NFS protocol.
  • the application service group may further include a database server; each application service group may also include different types and different numbers of virtual servers, for example, the first application service group may include two application services.
  • the second application service group can contain no application server or only one application server, but a database server.
  • the number of virtual machines included in the two is not limited to the number shown in FIG. 2.
  • a person skilled in the art can understand that the type and number of application service groups of a virtual machine on a single physical server are not limited to the number of illustrated, and the number of application service groups can be increased or decreased according to the performance of the physical server and the needs of the actual application. .
  • FIG. 4 is a structural diagram of a big data storage system according to another embodiment of the present invention.
  • the big data storage system is based on the big data storage system shown in Figures 2 and 3, and is further extended.
  • the physical server 100 and the direct attached storage disk 200 shown in FIG. 2 are referred to as a storage subsystem
  • the big data storage system shown in FIG. 4 includes at least N subsystems (N is an integer greater than or equal to 1, in a large In the case of data storage, N is usually a very large number).
  • N is an integer greater than or equal to 1, in a large In the case of data storage, N is usually a very large number).
  • Each subsystem processes and stores data for different users, that is, stores different user data in different subsystems according to the user ID.
  • each subsystem may store 10000 user data, store user data with ID 0-9999 in DAS1 of the first subsystem, and store user data with ID 10000-19999 in the second sub-item. System DAS2, and so on.
  • the system shown in FIG. 4 further includes: a pre-server for receiving a request of the user, and directing the request of the user to the corresponding subsystem according to the correspondence between each user and the subsystem recorded in the index database. , is processed and stored by different subsystems; an index database is used to record the correspondence between the user ID and the subsystem (the correspondence is not necessarily the foregoing sequential relationship, and it is possible that the user of ID1000 is in subsystem one, ID1001 The user in subsystem two, ID1002 user is in subsystem one).
  • the pre-database and the index database may be in the same physical server.
  • the pre-server serves as a unified user portal, and the user request can be imported into the corresponding subsystem.
  • the processing flow is: the pre-server directs the request of user B to the physical server in the second subsystem, and the physical server of the second subsystem finds that the requested document is located in the first subsystem. After that, the physical server requesting the first subsystem provides the shared document to it.
  • the physical server of the first subsystem After receiving the request from the second subsystem, the physical server of the first subsystem first verifies the validity of the request (ie, verifies whether the user B has the right), and then obtains the shared document from the Dasl of the first subsystem, and Return it to the physical server of the first subsystem.
  • the system further includes a Nas system as a backup for each DAS.
  • a Nas system as a backup for each DAS. Once Das is damaged, the virtual server in the subsystem can directly read backup data from the NAS to provide services to users. Since the NAS is only used for backup, the performance requirements of the Nas are not high, so the cost can be greatly reduced.
  • this figure illustrates only one Nas disk, but in one embodiment, any number of Nas may be used as a backup system.
  • the system further includes an offline backup server for backing up data on the Nas.
  • the dual backup of Nas backup and offline backup further ensures the security of the system.
  • each physical server in the diagram of Figure 4 omits the shared server virtual machine.
  • FIG. 5 is a structural diagram of a big data storage system according to another embodiment of the present invention. As shown in FIG. 5, physical servers 100 and 300 are directly connected to direct-attached storages 200 and 400, respectively, and further include a monitoring server 500.
  • the virtual servers 101 to 103 read the data of the direct-attached storage 200 through the virtual machine 104, and present the data on the direct-attached storage 200 to the user; the virtual servers 301 to 303 read the data of the direct-attached storage 400, The data on the direct attached storage 400 is presented to the user.
  • the monitoring server 500 monitors that the physical server 300 stops working, the user request responded to by the original physical server 300 is directed to the physical server 100, and the virtual machine (may be the virtual machines 101 to 103 may be The newly added virtual machines 105 to 107) present the data on the direct attached storage 400 to the user.
  • the monitoring server 500 monitors that the physical server 100 stops working, the user request responded by the original physical server 100 is directed to the physical service 300.
  • the virtual machine on the physical server 300 presents the data on the direct attached storage 200 to the user.
  • the information is returned to the pre-server and the index database, and the index database updates the correspondence between the user ID and the subsystem, and the subsequent pre-server will be the original
  • the user request directed to the physical server 300 is directed to the physical server 100.
  • the virtual server 101-104 image on the physical server 100 is stored in the direct-attached storage 200.
  • the physical server 300 can invoke the virtual machine 101 on the direct-attached storage 200.
  • a mirror of 104 runs a new virtual machine to access data on the directly connected storage 200.
  • the SSD hard disk and memory can be built in the server 100 and/or 300 as a buffer to further improve performance.
  • a big data storage system can contain 4000 storage subsystems, and each physical server can be directly connected with some or all of them.
  • the storage connection so that once the monitoring system detects that the physical server of a certain subsystem stops working, the user request originally connected to the physical server is imported to other physical servers directly connected to the subsystem, and through other physical The server accesses the direct attached storage of the subsystem.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a large data storage system having a high-performance and low-investment large data storage architecture. The system comprises a plurality of virtual machines running on a first physical server, and a first storage disk. The first physical server is directly connected to the first storage disk. The first storage disk is used for providing data storage. One of the virtual machines is used for supporting storage sharing functions. The other virtual machines are connected through an internal bus to the virtual machine supporting the storage sharing functions, and are used for receiving user requests and reading, according to such user requests, data in the first storage disk by means of the virtual machine supporting the storage sharing function, and presenting the data from the first storage disk to the user.

Description

一种大数据存储系统 技术领域  A big data storage system
本发明涉及数据存储领域, 特别涉及一种大数据存储系统。 背景技术  The present invention relates to the field of data storage, and in particular, to a big data storage system. Background technique
现有技术中存在多种大数据存储系统,图 1示出了现有技术中常用的一 种大数据存储系统。 如图 1 所示, 现有技术中的大数据存储通常采用 SAN 以及光纤交换机的方式, 价格十分昂贵。 以 Hadoop为代表的云存储技术采 用大量廉价服务器来构成海量存储容量, 与 SAN相比大大降低了成本, 但 每台存储设备依然需要配备相应的存储服务器, 对网络带宽要求也很高, 往 往需要用昂贵的网络设备, 而且 Name Node依然存在单点故障风险, 成本、 性能和可靠性仍然不够理想。  There are a variety of big data storage systems in the prior art, and Figure 1 shows a big data storage system commonly used in the prior art. As shown in Figure 1, the big data storage in the prior art is usually in the form of a SAN and a fiber switch, which is very expensive. The cloud storage technology represented by Hadoop uses a large number of inexpensive servers to form a large amount of storage capacity, which greatly reduces the cost compared with the SAN. However, each storage device still needs to be equipped with a corresponding storage server, which requires high network bandwidth and often needs With expensive network equipment, and the Name Node still has a single point of failure risk, cost, performance and reliability are still not ideal.
为此,需要提供一种高性能、低成本的能存储大数据的大数据存储架构。 发明内容  To this end, it is necessary to provide a high-performance, low-cost big data storage architecture capable of storing big data. Summary of the invention
本发明实施例提供了一种大数据存储系统,以提供一种高性能、低投入、 高可靠性的大数据存储架构。  The embodiment of the invention provides a big data storage system to provide a high performance, low input, high reliability big data storage architecture.
本发明实施例提到的一种大数据存储系统, 包括运行在第一物理服务器 上的多虚拟机, 以及第一存储磁盘, 其中, 所述第一物理服务器直接与所述 第一直连存储磁盘连接; 其中,  A large data storage system, which is included in the embodiment of the present invention, includes a plurality of virtual machines running on a first physical server, and a first storage disk, wherein the first physical server is directly connected to the first direct storage. Disk connection; where
所述第一直连存储磁盘, 用于提供数据存储;  The first directly connected storage disk is configured to provide data storage;
所述多虚拟机中的一台, 用于支持存储共享功能;  One of the multiple virtual machines for supporting a storage sharing function;
所述多虚拟机中的其他台,与所述支持存储共享功能的虚拟机通过内部 总线连接, 用于接收用户的请求, 根据用户请求, 通过所述支持存储共享功 能的虚拟机读取第一直连存储磁盘的数据,将第一直连存储磁盘上的数据呈 现给用户。 利用本发明实施例提供的大数据存储系统,直连存储磁盘与物理服务器 直接连接, 相比于网络连接, 访问效率高, 通过一台物理服务器上运行多虚 拟机, 使得一台物理服务器就替换现有技术中多台物理服务器的功能, 架构 灵活而且价格低廉, 另外, 由于多虚拟机之间通过内部总线连接, 访问速度 快。 因此本发明实施例提供的数据存储系统兼有高性能和低成本的优势。 附图说明 The other one of the multiple virtual machines is connected to the virtual machine supporting the storage sharing function through an internal bus, and is configured to receive a request from the user, and read the first by the virtual machine supporting the storage sharing function according to the user request. Directly connect the data of the storage disk to present the data on the first directly connected storage disk to the user. With the big data storage system provided by the embodiment of the present invention, the directly connected storage disk is directly connected to the physical server, and the access efficiency is higher than that of the network connection, and multiple physical machines are run on one physical server, so that one physical server is replaced. The functions of multiple physical servers in the prior art are flexible and inexpensive, and the access speed is fast because multiple virtual machines are connected through an internal bus. Therefore, the data storage system provided by the embodiments of the present invention has the advantages of high performance and low cost. DRAWINGS
图 1为现有技术常有的大数据存储系统的结构框图。  FIG. 1 is a structural block diagram of a large data storage system commonly used in the prior art.
图 2为本发明实施例提供的大数据存储系统的结构框图。  FIG. 2 is a structural block diagram of a big data storage system according to an embodiment of the present invention.
图 3为本发明一实施例提供的大数据存储系统的结构框图。  FIG. 3 is a structural block diagram of a big data storage system according to an embodiment of the present invention.
图 4为本发明另一实施例提供的大数据存储系统的结构框图。  FIG. 4 is a structural block diagram of a big data storage system according to another embodiment of the present invention.
图 5为本发明另一实施例提供的大数据存储系统的结构框图。 具体实施方式  FIG. 5 is a structural block diagram of a big data storage system according to another embodiment of the present invention. detailed description
以下结合附图及实施例, 对本发明进行进一步详细说明。 应当理解, 此 处所描述的具体实施例仅仅用于解释本发明, 并不用于限定本发明。  The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
图 2为本发明实施例提供的大数据存储系统的结构框图。 如图 1所示, 物理服务器 100与直连存储 200直接连接,其中物理服务器 100上运行着多 台虚拟机 101至 104 ,其中虚拟机 104具有存储共享功能;虚拟机 101至 103 与虚拟机 104通过内部总线连接。  FIG. 2 is a structural block diagram of a big data storage system according to an embodiment of the present invention. As shown in FIG. 1, the physical server 100 is directly connected to the direct-attached storage 200, wherein the virtual server 101 runs on a plurality of virtual machines 101 to 104, wherein the virtual machine 104 has a storage sharing function; the virtual machines 101 to 103 and the virtual machine 104 Connected via an internal bus.
虚拟机 101 至 103用于接收用户的请求, 根据用户请求, 通过虚拟机 104读取直连存储 200的数据, 将直连存储 200上的数据呈现给用户。  The virtual machines 101 to 103 are configured to receive a request from the user, and read the data of the direct-attached storage 200 through the virtual machine 104 according to the user request, and present the data on the direct-attached storage 200 to the user.
直连存储 200, 用于提供数据存储。  Direct Connect Storage 200 is used to provide data storage.
本领域技术人员可以理解,物理服务器上虚拟机的数量并不限于图示的 个数, 根据物理服务器的性能以及实际应用的需要, 该虚拟机的类型和数量 可以增加或减少。 将数据呈现给用户也只是本发明的用途之一, 在实际应用 中, 其它对数据进行处理的应用也包含在本发明方案之内。 在本发明一实施例中, 每一个直连存储可以由一个磁盘阵列构成, 在本 发明一实施例中, 磁盘阵列可以采用 RAID方式, 提高可靠性。 可以通过增 加该磁盘阵列的磁盘个数来增加容量。 在本发明一实施例中, 直连存储 200 也可以由多个磁盘阵列通过如 SAS线的方式来级联构成。 Those skilled in the art can understand that the number of virtual machines on the physical server is not limited to the number of illustrations, and the type and number of the virtual machines may be increased or decreased according to the performance of the physical server and the needs of the actual application. Presenting the data to the user is also one of the uses of the present invention. In practical applications, other applications for processing the data are also included in the solution of the present invention. In an embodiment of the invention, each direct storage may be formed by a disk array. In an embodiment of the invention, the disk array may adopt a RAID mode to improve reliability. You can increase the capacity by increasing the number of disks in the disk array. In an embodiment of the invention, the direct-attached storage 200 may also be cascaded by a plurality of disk arrays by means such as SAS lines.
本发明实施例中的多虚拟机相当于现有技术中的服务器集群,本发明实 施例中的可扩充的 DAS相对于现有技术中的 San, 但利用本发明实施例提 供的技术方案,可以不再需要现有技术的存储服务器以及昂贵的光纤网络系 统, 成本大大降低。 另外, 在现有技术中, 当读取数据时, 首先需要将数据 读取到存储服务器, 然后通过网络交换机, 最后才到应用服务器, 而利用本 发明实施例的技术方案, 读取数据时, 直接将数据读取到共享虚拟机, 然后 通过内部总线传输应用虚拟机, 由此可见, 本发明实施例提供的技术方案数 据存取效率更好。  The multi-virtual machine in the embodiment of the present invention is equivalent to the server cluster in the prior art. The scalable DAS in the embodiment of the present invention is compared with the San in the prior art, but the technical solution provided by the embodiment of the present invention may be used. The need for prior art storage servers and expensive fiber optic network systems is no longer required, and the cost is greatly reduced. In addition, in the prior art, when data is read, the data needs to be read to the storage server, and then through the network switch, and finally to the application server, and when the data is read by using the technical solution of the embodiment of the present invention, The data is directly read from the shared virtual machine, and then the application virtual machine is transmitted through the internal bus. It can be seen that the data access efficiency of the technical solution provided by the embodiment of the present invention is better.
在本发明一实施例中,可以在单台物理应用服务器中可以部署多组应用 服务组, 以提高系统服务性能。 图 3为本发明实施例提供的一个具体大数据 存储系统的结构框图。 如图 3所示, 在一个物理服务器中建立了两组应用服 务组, 每组应用服务组包括三个不同功能的应用服务器, 如图示, 其中每组 应用服务组包括后置 Web服务器 vml或 vm4 (与前置服务器中的 web服务 器对应,为了安全起见,该前置服务器通常位于另一个独立的物理服务器中, 如图中 4所示) 、 应用服务器 vm2或 vm5 (用于为用户提供不同的应用, 比如邮件服务器、 文件服务器等) 、 上传服务器 vm3或 vm6 (用于接收和 处理用户的上传请求和数据) ; 该物理服务器中还进一步包括一个虚拟机 vm7 , 该虚拟机具备存储共享能力, 利用该虚拟机 vm7, 多台虚拟机可以同 时访问一个 Das设备。 虚拟机 vml-vm6通过物理服务器的内部总线与虚拟 机 vm7连接, 通过虚拟机 vm7与 DAS直接连接。 在本发明一实施例中, 虚 拟机 vml-vm6通过 NFS协议与虚拟机 vm7连接。 在本发明一实施例中, 应 用服务组中还可以包括数据库服务器;每个应用服务组也可以包含不同类型 和不同数量的虚拟服务器, 比如第一个应用服务组中可以包括两个应用服务 器, 第二个应用服务组可以不包含应用服务器或只包含一个应用服务器, 却 包含一个数据库服务器。 另外, 两者所包含的虚拟机数量也并不限于图 2所 示的个数。 In an embodiment of the present invention, multiple groups of application service groups can be deployed in a single physical application server to improve system service performance. FIG. 3 is a structural block diagram of a specific big data storage system according to an embodiment of the present invention. As shown in FIG. 3, two sets of application service groups are established in one physical server, and each set of application service groups includes three application servers with different functions, as shown in the figure, wherein each group of application service groups includes a post-web server vml or Vm4 (corresponds to the web server in the pre-server, for security reasons, the pre-server is usually located in another independent physical server, as shown in Figure 4), the application server vm2 or vm5 (used to provide different users) Applications such as mail servers, file servers, etc., upload server vm3 or vm6 (for receiving and processing user upload requests and data); the physical server further includes a virtual machine vm7, which has storage sharing capabilities With this virtual machine vm7, multiple virtual machines can access one Das device at the same time. The virtual machine vml-vm6 is connected to the virtual machine vm7 through the internal bus of the physical server, and directly connected to the DAS through the virtual machine vm7. In an embodiment of the invention, the virtual machine vml-vm6 is connected to the virtual machine vm7 through the NFS protocol. In an embodiment of the present invention, the application service group may further include a database server; each application service group may also include different types and different numbers of virtual servers, for example, the first application service group may include two application services. The second application service group can contain no application server or only one application server, but a database server. In addition, the number of virtual machines included in the two is not limited to the number shown in FIG. 2.
本领域技术人员可以理解,单个物理服务器上虚拟机的应用服务组类型 和数量并不限于图示的个数, 根据物理服务器的性能以及实际应用的需要, 该应用服务组的数量可以增加或减少。  A person skilled in the art can understand that the type and number of application service groups of a virtual machine on a single physical server are not limited to the number of illustrated, and the number of application service groups can be increased or decreased according to the performance of the physical server and the needs of the actual application. .
图 4为本发明另一实施例提供的大数据存储系统的组织结构图。 如图 4 所示, 该大数据存储系统基于图 2和图 3所示的大数据存储系统, 并进行了 进一步的扩展。如果将图 2所示的物理服务器 100和直连存储磁盘 200称之 为一个存储子系统的话, 图 4所示的大数据存储系统包括至少 N个子系统 ( N为大于等于 1的整数, 在大数据存储的情况下, N通常为一个非常大的 数字) 。 每个子系统处理和存储不同用户的数据, 即按照用户 ID将不同用 户数据存储在不同的子系统中。在某个例子中,可以是每个子系统存储 10000 个用户数据, 将 ID为 0-9999的用户数据存储在第一子系统的 DAS1中, 将 ID为 10000-19999的用户数据存储在第二子系统的 DAS2, 并以此类推。  FIG. 4 is a structural diagram of a big data storage system according to another embodiment of the present invention. As shown in Figure 4, the big data storage system is based on the big data storage system shown in Figures 2 and 3, and is further extended. If the physical server 100 and the direct attached storage disk 200 shown in FIG. 2 are referred to as a storage subsystem, the big data storage system shown in FIG. 4 includes at least N subsystems (N is an integer greater than or equal to 1, in a large In the case of data storage, N is usually a very large number). Each subsystem processes and stores data for different users, that is, stores different user data in different subsystems according to the user ID. In an example, each subsystem may store 10000 user data, store user data with ID 0-9999 in DAS1 of the first subsystem, and store user data with ID 10000-19999 in the second sub-item. System DAS2, and so on.
如图 4所示的系统中, 进一步包含有: 前置服务器, 用于接收用户的请 求, 根据索引数据库中记载的每个用户与子系统的对应关系, 将该用户的请 求导向相应的子系统, 由不同的子系统处理和存储; 索引数据库, 用于记载 存储有用户 ID与子系统之间的对应关系 (该对应关系不一定是前述顺序关 系, 有可能 ID1000的用户在子系统一, ID1001的用户在子系统二, ID1002 的用户又在子系统一)。 在本发明一实施例中, 前置数据库与索引数据库可 以在同一个物理服务器中。  The system shown in FIG. 4 further includes: a pre-server for receiving a request of the user, and directing the request of the user to the corresponding subsystem according to the correspondence between each user and the subsystem recorded in the index database. , is processed and stored by different subsystems; an index database is used to record the correspondence between the user ID and the subsystem (the correspondence is not necessarily the foregoing sequential relationship, and it is possible that the user of ID1000 is in subsystem one, ID1001 The user in subsystem two, ID1002 user is in subsystem one). In an embodiment of the invention, the pre-database and the index database may be in the same physical server.
当系统快速扩展子系统时, 只需要在索引数据库中添加用户 ID与子系 统的对应关系, 后续用户访问时, 前置服务器作为统一的用户入口, 将用户 请求导入相应的子系统即可。  When the system rapidly expands the subsystem, it only needs to add the correspondence between the user ID and the subsystem in the index database. When the subsequent user accesses, the pre-server serves as a unified user portal, and the user request can be imported into the corresponding subsystem.
在本发明一实施例中,如果用户 A共享一文档给另一个用户 B, 同时用 户 A的数据位于第一子系统, 而用户 B的请求由第二子系统负责处理, 那 么当用户 B期望访问该共享文档时, 处理流程为: 前置服务器将用户 B的 请求导向第二子系统中的物理服务器,第二子系统的物理服务器发现该请求 的文档位于第一子系统后,请求第一子系统的物理服务器向其提供该共享文 档。 第一子系统的物理服务器接收到来自第二子系统的请求后, 首先验证该 请求的有效性 (即验证用户 B是否有权限) , 然后从第一子系统的 Dasl中 获取该共享文档, 并将其返回给第一子系统的物理服务器。 In an embodiment of the present invention, if user A shares a document to another user B, and user A's data is located in the first subsystem, and user B's request is handled by the second subsystem, then When user B desires to access the shared document, the processing flow is: the pre-server directs the request of user B to the physical server in the second subsystem, and the physical server of the second subsystem finds that the requested document is located in the first subsystem. After that, the physical server requesting the first subsystem provides the shared document to it. After receiving the request from the second subsystem, the physical server of the first subsystem first verifies the validity of the request (ie, verifies whether the user B has the right), and then obtains the shared document from the Dasl of the first subsystem, and Return it to the physical server of the first subsystem.
该系统中还进一步包括一个 Nas系统,作为各个 DAS的备份,一旦 Das 损坏, 子系统中的虚拟服务器可以直接从 NAS上读取备份数据为用户提供 服务。 由于 NAS只做备份使用, 对该 Nas的性能要求并不高, 因此可以大 大降低其成本。 另外, 本图仅示意出一个 Nas磁盘, 但在某一实施例中, 可 以是任何多个 Nas作为备份系统。  The system further includes a Nas system as a backup for each DAS. Once Das is damaged, the virtual server in the subsystem can directly read backup data from the NAS to provide services to users. Since the NAS is only used for backup, the performance requirements of the Nas are not high, so the cost can be greatly reduced. In addition, this figure illustrates only one Nas disk, but in one embodiment, any number of Nas may be used as a backup system.
在本发明一实施例中, 该系统进一步包括离线备份服务器, 用于备份 Nas上的数据。 通过 Nas备份和离线备份双重备份, 进一步保证系统的安全 性。  In an embodiment of the invention, the system further includes an offline backup server for backing up data on the Nas. The dual backup of Nas backup and offline backup further ensures the security of the system.
本领域技术人员可以理解,图 4的图中每个物理服务器省略了共享服务 器虚拟机。  Those skilled in the art will appreciate that each physical server in the diagram of Figure 4 omits the shared server virtual machine.
图 5为本发明另一实施例提供的大数据存储系统的组织结构图。 如图 5 所示, 其中, 物理服务器 100和 300, 分别于直连存储 200和 400直接连接, 另外, 还包括一个监控服务器 500。  FIG. 5 is a structural diagram of a big data storage system according to another embodiment of the present invention. As shown in FIG. 5, physical servers 100 and 300 are directly connected to direct-attached storages 200 and 400, respectively, and further include a monitoring server 500.
正常情况下, 虚拟服务器 101至 103通过虚拟机 104读取直连存储 200 的数据, 将直连存储 200上的数据呈现给用户; 虚拟服务器 301至 303通过 读取直连存储 400的数据, 将直连存储 400上的数据呈现给用户。 但一旦监 控服务器 500监控到物理服务器 300停止工作后,就将原物理服务器 300所 响应的用户请求导向物理服务器 100, 由物理服务器 100上的虚拟机(可能 是虚拟机 101至 103 ,也可能是新增加的虚拟机 105至 107 )将直连存储 400 上的数据呈现给用户。 反之, 一旦监控服务器 500监控到物理服务器 100停 止工作后, 就将原物理服务器 100所响应的用户请求导向物理服务 300 , 由 物理服务器 300上的虚拟机将直连存储 200上的数据呈现给用户。 具体来说, 当监控服务器 500监控到物理服务器 300停止工作后, 将该 信息返回给前置服务器和索引数据库, 索引数据库更新用户 ID与子系统之 间的对应关系,后续前置服务器则将原本该导向物理服务器 300的用户请求 导向物理服务器 100。 Normally, the virtual servers 101 to 103 read the data of the direct-attached storage 200 through the virtual machine 104, and present the data on the direct-attached storage 200 to the user; the virtual servers 301 to 303 read the data of the direct-attached storage 400, The data on the direct attached storage 400 is presented to the user. However, once the monitoring server 500 monitors that the physical server 300 stops working, the user request responded to by the original physical server 300 is directed to the physical server 100, and the virtual machine (may be the virtual machines 101 to 103 may be The newly added virtual machines 105 to 107) present the data on the direct attached storage 400 to the user. On the contrary, once the monitoring server 500 monitors that the physical server 100 stops working, the user request responded by the original physical server 100 is directed to the physical service 300. The virtual machine on the physical server 300 presents the data on the direct attached storage 200 to the user. Specifically, after the monitoring server 500 monitors that the physical server 300 stops working, the information is returned to the pre-server and the index database, and the index database updates the correspondence between the user ID and the subsystem, and the subsequent pre-server will be the original The user request directed to the physical server 300 is directed to the physical server 100.
在本发明另一实施例中, 直连存储 200中存储有物理服务器 100上各 虚拟机 101-104镜像; 当物理服务器 100停止工作后, 物理服务器 300可以 调用直连存储 200上虚拟机 101-104的镜像来运行新的虚拟机访问直连存储 200上的数据。  In another embodiment of the present invention, the virtual server 101-104 image on the physical server 100 is stored in the direct-attached storage 200. After the physical server 100 stops working, the physical server 300 can invoke the virtual machine 101 on the direct-attached storage 200. A mirror of 104 runs a new virtual machine to access data on the directly connected storage 200.
在本发明另一实施例中, 服务器 100和 /或 300中可以内置 SSD硬盘和 内存作为緩沖, 从而进一步提高性能。  In another embodiment of the present invention, the SSD hard disk and memory can be built in the server 100 and/or 300 as a buffer to further improve performance.
本领域技术人员可以理解,整个大数据存储系统可以通过扩展存储子系 统的数量来进行扩展, 比如一个大数据存储系统可以包含 4000个存储子系 统, 每一个物理服务器可以与其中部分或全部直连存储连接, 这样一旦监控 系统检测到某一子系统的物理服务器停止工作,则将原来接入到该物理服务 器的用户请求导入到其他与该子系统直连存储连接的其他物理服务器,通过 其他物理服务器来访问该子系统的直连存储。  Those skilled in the art can understand that the entire big data storage system can be extended by expanding the number of storage subsystems. For example, a big data storage system can contain 4000 storage subsystems, and each physical server can be directly connected with some or all of them. The storage connection, so that once the monitoring system detects that the physical server of a certain subsystem stops working, the user request originally connected to the physical server is imported to other physical servers directly connected to the subsystem, and through other physical The server accesses the direct attached storage of the subsystem.
本领域技术人员还可以理解,本发明各实施例描述的技术方案还可以进 行各种组合, 组合得到的大数据存储系统也属于本申请公开的范围。 比如, 目前图 4所示的各物理服务器中均只列出一组应用服务组,但显然该各物理 服务器的内部构成可如图 2或图 3所示。 又比如, 图 4的各子系统可以两两 分组, 每组内采取图 5所示的技术方案, 以保证冗余性。  It is also understood by those skilled in the art that the technical solutions described in the embodiments of the present invention can also be variously combined, and the combined big data storage system is also within the scope of the present disclosure. For example, at present, only one set of application service groups is listed in each physical server shown in FIG. 4, but it is obvious that the internal components of the physical servers can be as shown in FIG. 2 or FIG. 3. For another example, the subsystems of Figure 4 can be grouped in pairs, and the technical solutions shown in Figure 5 are adopted in each group to ensure redundancy.
利用本发明实施例, 没有单点故障, 因此安全性更好。  With the embodiment of the present invention, there is no single point of failure, so the security is better.
以上所述仅为本发明的较佳实施例而已, 并不用以限制本发明, 凡在本 发明的精神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在 本发明的保护范围之内。  The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are included in the spirit and scope of the present invention, should be included in the present invention. Within the scope of protection.

Claims

权利要求 Rights request
1、 一种大数据存储系统, 其特征在在于, 包括运行在第一物理服务器 上的多虚拟机, 以及第一存储磁盘, 其中, 所述第一物理服务器直接与所述 第一存储磁盘连接; 其中, 1. A big data storage system, characterized in that it includes multiple virtual machines running on a first physical server, and a first storage disk, wherein the first physical server is directly connected to the first storage disk ; in,
所述第一存储磁盘, 用于提供数据存储; The first storage disk is used to provide data storage;
所述多虚拟机中的一台, 用于支持存储共享功能; One of the multiple virtual machines is used to support the storage sharing function;
所述多虚拟机中的其他台,与所述支持存储共享功能的虚拟机通过内部 总线连接, 用于接收用户的请求, 根据用户请求, 通过所述支持存储共享功 能的虚拟机读取第一存储磁盘的数据, 将第一存储磁盘上的数据呈现给用 户。 Other stations in the multiple virtual machines are connected to the virtual machine that supports the storage sharing function through an internal bus, and are used to receive the user's request. According to the user's request, read the first data through the virtual machine that supports the storage sharing function. Store the data on the disk, and present the data on the first storage disk to the user.
2、 如权利要求 1所述的系统, 其特征在于, 所述运行在第一物理服务 器上的多虚拟机分为至少两个服务组,每个服务组通过所述支持存储共享功 能的虚拟机读取第一存储磁盘的数据。 2. The system of claim 1, wherein the multiple virtual machines running on the first physical server are divided into at least two service groups, and each service group uses the virtual machine supporting the storage sharing function. Read data from the first storage disk.
3、 如权利要求 1所述的系统, 其特征在于, 将第一物理服务器和第一 存储磁盘成为一子系统时, 所述系统进一步包括: 3. The system of claim 1, wherein when the first physical server and the first storage disk are combined into a subsystem, the system further includes:
至少一个子系统, 用于处理和存储不同用户的数据; At least one subsystem for processing and storing data of different users;
前置服务器,用于接收用户的请求,根据每个用户与子系统的对应关系, 将该用户的请求导向相应的子系统, 由不同的子系统处理和存储。 The front-end server is used to receive user requests, and according to the corresponding relationship between each user and subsystem, direct the user's request to the corresponding subsystem, and then process and store it by different subsystems.
4、 如权利要求 3所述的系统, 其特征在于, 进一步包括: 4. The system of claim 3, further comprising:
索引数据库, 用于记载存储有用户 ID与子系统之间的对应关系, 以供 前置服务器调用。 The index database is used to record and store the correspondence between user IDs and subsystems for call by the front-end server.
5、 如权利要求 3所述的系统, 其特征在于, 所述至少一个子系统包括 运行在第二物理服务器上的多虚拟机, 以及第二存储磁盘; 5. The system of claim 3, wherein the at least one subsystem includes multiple virtual machines running on a second physical server and a second storage disk;
所述第一物理服务器和所述第二物理服务器分别直接与所述第一存储 磁盘和第二存储磁盘连接; The first physical server and the second physical server are directly connected to the first storage disk and the second storage disk respectively;
所述第二物理服务器上的多虚拟机, 进一步用于, 当所述第一物理服务 器不能正常工作时, 访问所述第一存储磁盘上的数据。 The multiple virtual machines on the second physical server are further used to: when the first physical server When the server fails to work properly, the data on the first storage disk is accessed.
6、 如权利要求 5所述的系统, 其特征在于, 所述访问第一存储磁盘上 的数据的所述第二物理服务器上的多虚拟机为所述第二物理服务器上原有 的服务组, 或新建的服务组。 6. The system of claim 5, wherein the multiple virtual machines on the second physical server that access data on the first storage disk are original service groups on the second physical server, or a new service group.
7、 如权利要求 5所述的系统, 其特征在于, 所述第一存储磁盘进一步 用于存储所述第一物理服务器的多虚拟机镜像; 7. The system of claim 5, wherein the first storage disk is further used to store multiple virtual machine images of the first physical server;
所述第二物理服务器进一步用于当所述第一物理服务不能正常工作时, 调用所述第一存储磁盘中的所述第一物理服务器的多虚拟机镜像,通过所述 第一物理服务器的多虚拟机镜像访问所述第一直连存储磁盘的数据。 The second physical server is further configured to call the multi-virtual machine image of the first physical server in the first storage disk when the first physical service cannot work normally, through the first physical server. The multiple virtual machine images access the data of the first directly connected storage disk.
8、 如权利要求 5、 6或 7所述的系统, 其特征在于, 进一步包括: 监控服务器,用于监控所述第一服务器和所述第二物理服务器的工作状 态。 8. The system according to claim 5, 6 or 7, further comprising: a monitoring server, configured to monitor the working status of the first server and the second physical server.
9、 如权利要求 1至 5任一所述的系统, 其特征在于, 进一步包括: NAS ,用于备份所述第一存储磁盘上的数据,当所述第一磁盘存损坏时, 为所述多虚拟机直接提供用户数据。 9. The system according to any one of claims 1 to 5, further comprising: NAS, used to back up data on the first storage disk, and when the first disk is damaged, NAS is used to backup the data on the first storage disk. Multiple virtual machines provide user data directly.
10、 如权利要求 1至 5任一所述的系统, 其特征在于, 所述直连存储由 一个或一组级联的磁盘阵列构成。 10. The system according to any one of claims 1 to 5, characterized in that the direct-connected storage is composed of one or a group of cascaded disk arrays.
PCT/CN2012/076516 2011-10-11 2012-06-06 Large data storage system WO2013170504A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US13/858,489 US20140181116A1 (en) 2011-10-11 2013-04-08 Method and device of cloud storage
US14/943,909 US20160112413A1 (en) 2011-10-11 2015-11-17 Method for controlling security of cloud storage
US15/055,373 US20160182638A1 (en) 2011-10-11 2016-02-26 Cloud serving system and cloud serving method
US15/594,374 US20170249093A1 (en) 2011-10-11 2017-05-12 Storage method and distributed storage system
US16/378,076 US20190235777A1 (en) 2011-10-11 2019-04-08 Redundant storage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210151984.4A CN103428232B (en) 2012-05-16 2012-05-16 A kind of big data storage system
CN201210151984.4 2012-05-16

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/271,165 Continuation-In-Part US9176953B2 (en) 2008-06-04 2011-10-11 Method and system of web-based document service

Related Child Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/075841 Continuation WO2013163832A1 (en) 2011-10-11 2012-05-22 Cloud storage method and device

Publications (1)

Publication Number Publication Date
WO2013170504A1 true WO2013170504A1 (en) 2013-11-21

Family

ID=49583034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/076516 WO2013170504A1 (en) 2011-10-11 2012-06-06 Large data storage system

Country Status (2)

Country Link
CN (1) CN103428232B (en)
WO (1) WO2013170504A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111142777A (en) * 2018-11-03 2020-05-12 广州市明领信息科技有限公司 Big data storage system
US11507622B2 (en) 2020-03-25 2022-11-22 The Toronto-Dominion Bank System and method for automatically managing storage resources of a big data platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243413A (en) * 2005-06-24 2008-08-13 信科索尔特公司 System and method for virtualizing backup images
CN101377745A (en) * 2007-08-28 2009-03-04 张玉昆 Virtual computer system and method for implementing data sharing between each field
CN101652749A (en) * 2007-04-05 2010-02-17 微软公司 Network group name for virtual machines
CN101859317A (en) * 2010-05-10 2010-10-13 浪潮电子信息产业股份有限公司 Method for establishing database cluster by utilizing virtualization

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101370027A (en) * 2008-07-09 2009-02-18 中国网通集团宽带业务应用国家工程实验室有限公司 Network storage system, method and application server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243413A (en) * 2005-06-24 2008-08-13 信科索尔特公司 System and method for virtualizing backup images
CN101652749A (en) * 2007-04-05 2010-02-17 微软公司 Network group name for virtual machines
CN101377745A (en) * 2007-08-28 2009-03-04 张玉昆 Virtual computer system and method for implementing data sharing between each field
CN101859317A (en) * 2010-05-10 2010-10-13 浪潮电子信息产业股份有限公司 Method for establishing database cluster by utilizing virtualization

Also Published As

Publication number Publication date
CN103428232B (en) 2018-07-24
CN103428232A (en) 2013-12-04

Similar Documents

Publication Publication Date Title
US9990262B2 (en) Dynamic mirroring
US11218539B2 (en) Synchronous replication for storage
US9537710B2 (en) Non-disruptive failover of RDMA connection
US10496320B2 (en) Synchronous replication
US10423332B2 (en) Fibre channel storage array having standby controller with ALUA standby mode for forwarding SCSI commands
US20210075665A1 (en) Implementing switchover operations between computing nodes
US11200082B2 (en) Data storage system employing dummy namespaces for discovery of NVMe namespace groups as protocol endpoints
EP3380922B1 (en) Synchronous replication for file access protocol storage
US8554867B1 (en) Efficient data access in clustered storage system
WO2015027901A1 (en) Cloud service system and method
US20210406280A1 (en) Non-disruptive transition to synchronous replication state
WO2013170504A1 (en) Large data storage system
RU2646312C1 (en) Integrated hardware and software system
US10855522B2 (en) Dual port storage device emulation
Salapura et al. Enabling enterprise-class workloads in the cloud
Birk et al. The TPT-RAID Architecture for Box-Fault Tolerant Storage Systems
WO2016195841A1 (en) Dynamic mirroring

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12876854

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12876854

Country of ref document: EP

Kind code of ref document: A1